Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change deprecation warning for CSV.read #687

Merged
merged 1 commit into from
Jul 28, 2020
Merged

Change deprecation warning for CSV.read #687

merged 1 commit into from
Jul 28, 2020

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jul 10, 2020

Fixes JuliaData/DataFrames.jl#2309. After much
discussion, people really like the CSV.read function call naming, so
it was decided to keep it around, while still allowing a break from
DataFrames.jl as a dependency.

I also realized that CSV.read does indeed provide one bit of
value/functionality: we can wrap CSV.File in Tables.CopiedColumns.
What this means is that columnar sinks can safely use the columns passed
without needing to make copies; i.e. they can assume ownership of the
columns. In CSV.read, the user is essentially saying, "I want to make
a CSV.File and pass it directly to sink" which also implies that
CSV.File doesn't need to "own" its own columns.

The only question left in my mind is, with the 1.0 release, what to do
with CSV.read(file) when no sink argument is passed. Suggestions
have included just returning a CSV.File, since it's a valid table
anyway. Or allowing DataFrames.jl to define the no-sink-arg version and
return a DataFrame; that one's a bit awkward as a form of "blessed"
type piracy, but could also be useful for users as convenience (they
just have to remember to do using DataFrames before trying to use it).
The other awkward part is that we're currently warning users of the
deprecation and that they should explicitly spell out DataFrame
whereas we might not actually require that.

Fixes JuliaData/DataFrames.jl#2309. After much
discussion, people really like the `CSV.read` function call naming, so
it was decided to keep it around, while still allowing a break from
DataFrames.jl as a dependency.

I also realized that `CSV.read` does indeed provide one bit of
value/functionality: we can wrap `CSV.File` in `Tables.CopiedColumns`.
What this means is that columnar sinks can safely use the columns passed
without needing to make copies; i.e. they can assume ownership of the
columns. In `CSV.read`, the user is essentially saying, "I want to make
a `CSV.File` and pass it directly to `sink`" which also implies that
`CSV.File` doesn't need to "own" its own columns.

The only question left in my mind is, with the 1.0 release, what to do
with `CSV.read(file)` when no `sink` argument is passed. Suggestions
have included just returning a `CSV.File`, since it's a valid table
anyway. Or allowing DataFrames.jl to define the no-sink-arg version and
return a `DataFrame`; that one's a bit awkward as a form of "blessed"
type piracy, but could also be useful for users as convenience (they
just have to remember to do `using DataFrames` before trying to use it).
The other awkward part is that we're currently warning users of the
deprecation and that they should explicitly spell out `DataFrame`
whereas we might not actually require that.
src/CSV.jl Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Jul 10, 2020

what to do with CSV.read(file) when no sink argument is passed

I would disallow it

one's a bit awkward as a form of "blessed" type piracy, but could also be useful for users as convenience

I think it is better to require specifying sink.

@piever
Copy link

piever commented Jul 10, 2020

The only question left in my mind is, with the 1.0 release, what to do with CSV.read(file) when no sink argument is passed.

A part from the possibility of returning CSV.File, from the point of view of the casual user who has never heard of DataFrames and just wants to read a .csv file into something they can understand, I imagine it could be useful to default to a simple sink made of simple Base julia types, like Tables.columntable.

Unfortunately, this could be problematic for wide data (hopefully future julia versions will be able to deal with big named tuples). Maybe it's safer to disallow for now, and only decide on this later (it can be a post 1.0 decision if on 1.0 it errors).

@bkamins
Copy link
Member

bkamins commented Jul 10, 2020

it can be a post 1.0 decision if on 1.0 it errors

This is one of the reasons I prefer throwing an error for now (@nalimilan ™️)

@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #687 into master will decrease coverage by 0.04%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #687      +/-   ##
==========================================
- Coverage   84.28%   84.23%   -0.05%     
==========================================
  Files          10       10              
  Lines        1801     1802       +1     
==========================================
  Hits         1518     1518              
- Misses        283      284       +1     
Impacted Files Coverage Δ
src/CSV.jl 28.57% <0.00%> (-4.77%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4136307...ac3761d. Read the comment docs.

@JeffBezanson
Copy link

👍 Very in favor of this. I also think it would be fine to make DataFrame the default sink. The possibility of different types of tables is not SO important that every user must be forced to think about it from the first time they read a csv file in Julia.

@JeffBezanson
Copy link

Ah, I understand you want to remove the DataFrames dependency. I think JuliaData/DataFrames.jl#1764 is the way to go --- if that happens then maybe we can add back a default sink.

@quinnj quinnj merged commit c5ebf92 into master Jul 28, 2020
@quinnj quinnj deleted the jq/csvread branch July 28, 2020 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do we need CSV.read?
4 participants