Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add links to alternatives to the readme #1006

Merged
merged 6 commits into from Aug 27, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Expand Up @@ -40,3 +40,19 @@ Contributions are very welcome, as are feature requests and suggestions. Please
[codecov-url]: https://codecov.io/gh/JuliaData/CSV.jl

[issues-url]: https://github.com/JuliaData/CSV.jl/issues

## Alternatives

There are several other packages for reading CSV files in Julia, which may suit your needs better:

* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/),
which is perfect for quickly reading small files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it returns Arrays, so for this to be perfect, your DelimitedFiles should probably be homogenous w/ respect to types, otherwise you'll get an Matrix{Any}, which I wouldn't call perfect, even for small files.

Copy link
Contributor Author

@mcabbott mcabbott Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe something like this:

Suggested change
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/),
which is perfect for quickly reading small files.
* The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/).
This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogenous element type.
On large files, CSV.jl will be much faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more notes:

CSV doesn't return a DataFrame by default, it returns a CSV.File, so we should be careful not to imply it does

DelimitedFiles won't be a stdlib in upcoming Julia versions, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly I shouldn't be writing this... see what you think, DataFrames -> Tables.jl now.

And, I hadn't seen, but JuliaLang/julia#44663 is the proposal to remove. But it will surely be a while, and the package will probably remain the right choice for reading 10 lines.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DelimitedFiles won't be a stdlib in upcoming Julia versions

Well that's slightly horrifying


* [CSVFiles.jl](https://github.com/queryverse/CSVFiles.jl) uses the [FileIO.jl](https://github.com/JuliaIO/FileIO.jl) API
into any [IterableTables.jl](https://github.com/queryverse/IterableTables.jl) sink.
The package uses [TextParse.jl](https://github.com/queryverse/TextParse.jl) for parsing.
mcabbott marked this conversation as resolved.
Show resolved Hide resolved

* [DLMReader.jl](https://github.com/sl-solution/DLMReader.jl) also aims to be fast for large files.
Closely associated with [InMemoryDatasets.jl](https://github.com/sl-solution/InMemoryDatasets.jl) rather than [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl)

* [Pandas.jl](https://github.com/JuliaPy/Pandas.jl) wraps Python's pandas library, via PyCall.jl.