read_csv: Implement reading of number of rows #1656

tomspur · 2020-07-15T22:20:44Z

Implement reading of number of rows (nrows) in read_csv by using spark's limit.

On the first read, this does not seem to help much with regards to reading speed, because inferSchema is True and spark seems to scan over the full data anyway (see e.g. here). It would help to use it as a similar parameter to read_csv than in pandas to make the API more compatible.

Implement reading of number of rows (nrows) in read_csv.

ueshin

LGTM.

ueshin · 2020-07-16T01:39:25Z

Thanks! merging.

itholic · 2020-07-16T04:11:25Z

Nice work. Thanks! :D

tomspur · 2020-07-16T06:41:53Z

Thank you for the quick response and merge! :)

read_csv: Implement reading of number of rows

6aa3ca7

Implement reading of number of rows (nrows) in read_csv.

ueshin approved these changes Jul 16, 2020

View reviewed changes

ueshin merged commit 7bdf141 into databricks:master Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: Implement reading of number of rows #1656

read_csv: Implement reading of number of rows #1656

tomspur commented Jul 15, 2020

ueshin left a comment

ueshin commented Jul 16, 2020

itholic commented Jul 16, 2020

tomspur commented Jul 16, 2020

read_csv: Implement reading of number of rows #1656

read_csv: Implement reading of number of rows #1656

Conversation

tomspur commented Jul 15, 2020

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Jul 16, 2020

itholic commented Jul 16, 2020

tomspur commented Jul 16, 2020