Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add links to alternatives to the readme #1006

Merged
merged 6 commits into from Aug 27, 2022
Merged

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Jun 2, 2022

As discussed here https://discourse.julialang.org/t/how-do-i-know-if-a-package-is-good/82133, it might be nice if this package linked to alternative ways to read CSV files. Really all packages should do this, but this one is what you find if you google "csv julia"... and I bet that many people googling that just want DelimitedFiles.

I'm not really qualified to write the list of fancier alternatives, since DelimitedFiles does what I need right now. But @juliohm @sl-solution @chriselrod from discourse thread may have better ideas. And in particular about what most important differences should be mentioned.

I'm especially unqualified to list Python or R alternatives. That's less obviously necessary, but their names are terms people might search for.

Xref JuliaPy/Pandas.jl#87 about Pandas.jl -> DataFrames.jl.

README.md Outdated
Comment on lines 48 to 49
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/),
which is perfect for quickly reading small files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it returns Arrays, so for this to be perfect, your DelimitedFiles should probably be homogenous w/ respect to types, otherwise you'll get an Matrix{Any}, which I wouldn't call perfect, even for small files.

Copy link
Contributor Author

@mcabbott mcabbott Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe something like this:

Suggested change
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/),
which is perfect for quickly reading small files.
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/).
This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogenous element type.
On large files, CSV.jl will be much faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more notes:

CSV doesn't return a DataFrame by default, it returns a CSV.File, so we should be careful not to imply it does

DelimitedFiles won't be a stdlib in upcoming Julia versions, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly I shouldn't be writing this... see what you think, DataFrames -> Tables.jl now.

And, I hadn't seen, but JuliaLang/julia#44663 is the proposal to remove. But it will surely be a while, and the package will probably remain the right choice for reading 10 lines.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DelimitedFiles won't be a stdlib in upcoming Julia versions

Well that's slightly horrifying

README.md Outdated Show resolved Hide resolved
mcabbott and others added 2 commits June 2, 2022 12:47
Co-authored-by: Chris Elrod <elrodc@gmail.com>
@quinnj
Copy link
Member

quinnj commented Jun 6, 2022

Yeah..........this is mostly fine. I'm hopeful that by the CSV.jl 1.0 release, we can get the time-to-first-read to be really competitive with DelimitedFiles.jl and then I don't think there's really any reason to use it.

DLMReader.jl is a great reference because it supports some more exotic parsing configurations if you need to get really custom.

I'd prefer we remove the reference to CSVFiles.jl/TextParse.jl; they haven't been updated or had any real work done for a long time, and they don't provide any kind of functionality not supported in CSV.jl.

@mcabbott
Copy link
Contributor Author

mcabbott commented Jun 7, 2022

really competitive with DelimitedFiles.jl and then I don't think there's really any reason to use it

Unless you want a matrix not a dataframe, right? There is a place for loading/saving 10 numbers without figuring out any complicated types.

prefer we remove the reference to CSVFiles.jl/TextParse.jl; they haven't been updated

I guess my vote is to then say roughly that. It's useful information if you are trying to figure out how all these packages relate to each other. Otherwise you have to try to infer from the dates... is X not mentioned by Y because it's the newer nice thing which didn't exist when Y's summary was written, or because it's the older attempt at the same which is no longer needed, or just because the authors of X and Y don't get along?

@quinnj
Copy link
Member

quinnj commented Jun 10, 2022

Yeah, that's fair. I'm fine adding that text then.

README.md Outdated Show resolved Hide resolved
@davidanthoff
Copy link

I'd prefer we remove the reference to CSVFiles.jl/TextParse.jl; they haven't been updated or had any real work done for a long time, and they don't provide any kind of functionality not supported in CSV.jl.

I'm fine either way, but it is probably worth pointing out that this set of packages works well and for some applications not having updates frequently is a huge plus. We are using these packages in my research group extensively, because they work just fine, and not having to deal with updates is a plus for many of our use-cases. And neither package is deprecated.

Comment on lines +56 to +57
* [DLMReader.jl](https://github.com/sl-solution/DLMReader.jl) also aims to be fast for large files,
closely associated with [InMemoryDatasets.jl](https://github.com/sl-solution/InMemoryDatasets.jl).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLMReader.jl is a great reference because it supports some more exotic parsing configurations if you need to get really custom.

Should this link say something like "exotic custom parsing"?

@quinnj quinnj merged commit cdb0cbb into JuliaData:main Aug 27, 2022
@mcabbott mcabbott deleted the patch-1 branch August 28, 2022 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants