-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignoreemptylines should default to true #638
Comments
Wow, I'm trying to fix this and after running the tests I realized that CSV assumes that |
(btw, got an error when testing on master:) (@v1.5) pkg> test CSV
Testing CSV
Status `/tmp/jl_gtSfjc/Project.toml`
[336ed68f] CSV v0.6.2 `~/.julia/dev/CSV`
[324d7699] CategoricalArrays v0.8.1
[944b1d66] CodecZlib v0.7.0
[a93c6f00] DataFrames v0.21.2
[48062228] FilePathsBase v0.8.0
[69de0a69] Parsers v1.0.5
[2dfb63ee] PooledArrays v0.5.3
[bd369af6] Tables v1.0.4
[ea10d353] WeakRefStrings v0.6.2
[ade2ca70] Dates
[a63ad114] Mmap
[9a3f8284] Random
[8dfed614] Test
[4ec0a83e] Unicode
Status `/tmp/jl_gtSfjc/Manifest.toml`
[336ed68f] CSV v0.6.2 `~/.julia/dev/CSV`
[324d7699] CategoricalArrays v0.8.1
[944b1d66] CodecZlib v0.7.0
[34da2185] Compat v3.11.0
[9a962f9c] DataAPI v1.3.0
[a93c6f00] DataFrames v0.21.2
[864edb3b] DataStructures v0.17.17
[e2d170a0] DataValueInterfaces v1.0.0
[48062228] FilePathsBase v0.8.0
[41ab1584] InvertedIndices v1.0.0
[82899510] IteratorInterfaceExtensions v1.0.0
[682c06a0] JSON v0.21.0
[e1d29d7a] Missings v0.4.3
[bac558e1] OrderedCollections v1.2.0
[69de0a69] Parsers v1.0.5
[2dfb63ee] PooledArrays v0.5.3
[189a3867] Reexport v0.2.0
[a2af1166] SortingAlgorithms v0.3.1
[3783bdb8] TableTraits v1.0.0
[bd369af6] Tables v1.0.4
[3bb67fe8] TranscodingStreams v0.9.5
[ea10d353] WeakRefStrings v0.6.2
[83775a58] Zlib_jll v1.2.11+10
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
thread = 1 warning: only found 7 / 8 columns on data row: 1. Filling remaining columns with `missing`
thread = 1 warning: error parsing Int64 on row = 1, col = 1: "abc
", error=INVALID: SENTINEL | NEWLINE | EOF | INVALID_DELIMITER
CSV.File basics: Test Failed at /home/yakir/.julia/dev/CSV/test/basics.jl:367
Expression: df.id.sentinel == 0x6690d6bed2a7da46
Evaluated: 0x734baccd23d8b4d6 == 0x6690d6bed2a7da46
Stacktrace:
[1] top-level scope at /home/yakir/.julia/dev/CSV/test/basics.jl:367
[2] top-level scope at /home/yakir/julia/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1114
[3] top-level scope at /home/yakir/.julia/dev/CSV/test/basics.jl:7
.
.
.
Test Summary: | Pass Fail Total
CSV | 1748 1 1749
CSV.File | 1555 1 1556
CSV.File basics | 182 1 183
CSV.File iteration | 366 366
CSV.write | 83 83
CategoricalArray levels (including ordering) | 7 7
PooledArrays | 14 14
CSV.Rows | 37 37
CSV.detect | 8 8
CSV.findrowstarts! | 4 4
CSV.promote_typecode | 2 2
CSV.File with select/drop | 38 38
ERROR: LoadError: Some tests did not pass: 1748 passed, 1 failed, 0 errored, 0 broken.
in expression starting at /home/yakir/.julia/dev/CSV/test/runtests.jl:13
ERROR: Package CSV errored during testing |
Sorry for the slow response here; I changed the default and ran the tests to see what you saw. I think I'm ok w/ changing the default, but as you noticed there's a bunch of tests that rely on the current default, so it'd just be a matter of going through and making sure This would be a breaking change, but fortunately, we're getting ready for a 1.0 release that could include a breaking change like this. |
FWIW I got a CSV file produced by LimeSurvey which included a final line break, and it's annoying to get a row full of missings because of that. So I support this change (or special-casing the last row, which is a bit weird). |
@quinnj I've made a PR with one test failing no matter Line 336 in f93473f
everything else passes, since I didn't find a I personally think this is a better default so I took the liberty to make a PR hope you won't mind @yakir12 |
God no, I just got back fro a lengthy vacation and welcome your initiative, thanks! |
Implemented; thanks @Moelf ! |
This is related to #621 , right now
ignoreemptylines
's default isfalse
. I think it should default totrue
.I cannot imagine that the majority of people that want to parse a csv file intend to have missing in all the columns by leaving a line empty. Like, how would that even come to pass (as in, what program encodes
missing
in all columns by leaving an empty line)?Empty lines should be flagged as they are now (there is a warning about it), but they should also be ignored since the majority of users will want to filter those away since they stem from some error (such as shown in #621 or some other source) and not an intent.
The text was updated successfully, but these errors were encountered: