Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker lost the data when write dataframe to csv file #2774

Open
jmsking opened this issue Jun 14, 2019 · 2 comments
Open

Worker lost the data when write dataframe to csv file #2774

jmsking opened this issue Jun 14, 2019 · 2 comments

Comments

@jmsking
Copy link

jmsking commented Jun 14, 2019

the code as follows(i delete some unimportant codes):

import dask.dataframe as dd
def load_data(self):
        data = dd.read_csv(self.input_path, blocksize="5Mib")
        return data.to_delayed()
def batch_predict(self, batch_data):
        app_model = self.load_model()
        res = app_model.predict(batch)   // type(res) is ndarray
def save_result(self, data):
        df = pd.DataFrame(data)
        res = df.to_csv(self.output_path, mode='a', header=['predict'])
        if res is None:
            return 1
        else:
            return 0
def static_succ_num(self, *flag):
        partitions = len(*flag)
        actual = sum(*flag)
        return actual == partitions

client = Client('scheduler:32666')
client.restart()
batches = delayed(self.load_data)()
batches = batches.compute()
n_partitions = len(batches)
print('partitions: ', n_partitions)   // print 1730
batches = client.persist(batches)
print(type(batches[0]))     // print Delayed
res = [delayed(self.batch_predict)(batch) for batch in batches]
print(type(res[0]))     // print Delayed
res = [delayed(self.save_result)(batch_res) for batch_res in res]
print(type(res[0]))
res = delayed(self.static_succ_num)(res)
res = client.compute(res)
res = client.gather(res)
print(res)   // print True

when i run the code above, the output file only have part of 1730, so i think there are same task did not write data into csv, and i want to know the reason, thanks at first!

@TomAugspurger
Copy link
Member

See http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports for writing bug reports. It looks like there are a few things preventing your example from running, and I'm guessing it could be simplified further.

@martindurant
Copy link
Member

@jmsking , can you provide the minimal example, as requested? If not, we should close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants