Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: IMPORT silently converts \r\n to \n #25344

Closed
dt opened this issue May 7, 2018 · 10 comments
Closed

importccl: IMPORT silently converts \r\n to \n #25344

dt opened this issue May 7, 2018 · 10 comments
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-blocking-adoption

Comments

@dt
Copy link
Member

dt commented May 7, 2018

Looks like Go's CSV reader is is doing this for us:

The Reader converts all \r\n sequences in its input to plain \n, including in multiline field values, so that the returned data does not depend on which line-ending convention an input file uses.
-- https://golang.org/pkg/encoding/csv/#Reader

@dt dt added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery labels May 7, 2018
@dt dt added this to Triage in Bulk IO via automation May 7, 2018
@vivekmenezes vivekmenezes moved this from Triage to Backlog high priority in Bulk IO Jul 10, 2018
@vivekmenezes vivekmenezes moved this from Backlog high priority to Backlog in Bulk IO Jul 10, 2018
@dt
Copy link
Member Author

dt commented Jul 30, 2018

More details: It appears this is being caused by the behaviour of Golang's encoding/csv library, which folds \r\n into \n when reading. This was fixed in golang/go#21201 but then reverted golang/go#22746. It appears based on that second issue that Go is unlikely to change that behavior, so we may need to look into using an alternative CSV reader lib.

@neeral
Copy link
Contributor

neeral commented Aug 1, 2018

How does forking encoding/csv and applying golang/go#21201 sound? The library itself is very small and consists of a handful of files.
The other approach would be to select another library, which would be a riskier change.

@dt
Copy link
Member Author

dt commented Aug 1, 2018

Yeah, forking the stdlib pkg, with that patch, sounds like a good plan. I guess we could technically just vendor the stdlib package and modify it, but that seems likely to be confusing, so a proper fork, with a distinct import path, seems easier to me.

@neeral
Copy link
Contributor

neeral commented Aug 1, 2018

@dt if you can create the fork in the cockroachdb org (I don't have permissions), I can apply the patch to it.

@dt
Copy link
Member Author

dt commented Aug 1, 2018

@neeral Thanks! Actually, looking at this, since it's really just the reader.go file we need, what we've usually done in similar cases is just check the modified stdlib file (and its tests) directly in to the repo, usually somewhere under pkg/util. Some examples of this in syncutil I believe. I think we could go that route here too?

@neeral
Copy link
Contributor

neeral commented Aug 1, 2018

Yes, that works. I'll be sending a PR for this.

neeral added a commit to neeral/cockroach that referenced this issue Aug 2, 2018
See cockroachdb#25344.

It appears this is being caused by the behaviour of Golang's
encoding/csv library, which folds \r\n into \n when reading. This was
fixed in golang/go#21201 but then reverted golang/go#22746. It appears
based on that second issue that Go is unlikely to change that behavior.

Check in the stdlib `encoding/csv` into `pkg/util` with
golang/go#22746 reverted.

Release note:
`\r\n` characters in CSV files were silently converted into `\n`. This
causes imported data to be different. This is now fixed.
neeral added a commit to neeral/cockroach that referenced this issue Aug 2, 2018
See cockroachdb#25344.

It appears this is being caused by the behaviour of Golang's
encoding/csv library, which folds \r\n into \n when reading. This was
fixed in golang/go#21201 but then reverted golang/go#22746. It appears
based on that second issue that Go is unlikely to change that behavior.

Check in the stdlib `encoding/csv` into `pkg/util` with
golang/go#22746 reverted.

Release note:
`\r\n` characters in CSV files were silently converted into `\n`. This
causes imported data to be different. This is now fixed.
craig bot pushed a commit that referenced this issue Aug 2, 2018
28181: importccl: Preserve '\r\n' during CSV import r=benesch a=neeral

See #25344.

Co-authored-by: neeral <neeral@users.noreply.github.com>
@neeral
Copy link
Contributor

neeral commented Aug 2, 2018

Thanks for reviewing, @benesch @dt.

@dt Could we backport #28181 to release-2.0 please?

@dt
Copy link
Member Author

dt commented Aug 3, 2018

@neeral yep, #28225, Thanks!

dt pushed a commit to dt/cockroach that referenced this issue Aug 6, 2018
See cockroachdb#25344.

It appears this is being caused by the behaviour of Golang's
encoding/csv library, which folds \r\n into \n when reading. This was
fixed in golang/go#21201 but then reverted golang/go#22746. It appears
based on that second issue that Go is unlikely to change that behavior.

Check in the stdlib `encoding/csv` into `pkg/util` with
golang/go#22746 reverted.

Release note:
`\r\n` characters in CSV files were silently converted into `\n`. This
causes imported data to be different. This is now fixed.
craig bot pushed a commit that referenced this issue Aug 6, 2018
27868: backport-2.0: storage: prevent unbounded raft log growth without quorum r=nvanbenschoten a=nvanbenschoten

Backport 2/2 commits from #27774.

/cc @cockroachdb/release

---

Fixes #27772.

This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.

There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
1. When the leader proposes a command and cannot establish a
   quorum. In this case, we know the leader has the entry in
   its log, so there's no need to refresh it with `reasonTicks`.
   To avoid this, we no longer use `refreshTicks` as a leader.
2. When a follower proposes a command that is forwarded to the
   leader who cannot establish a quorum. In this case, the
   follower can't be sure (currently) that the leader got the
   proposal, so it needs to refresh using `reasonTicks`. However,
   the leader now detects duplicate forwarded proposals and
   avoids appending redundant entries to its log. It does so
   by maintaining a set of in-flight forwarded proposals that
   it has received during its term as leader. This set is reset
   after every leadership change.

Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.

Release note (bug fix): Prevent loss of quorum situations from
allowing unbounded growth of a Range's Raft log.


28225: release-2.0: importccl: Preserve '\r\n' during CSV import r=dt a=dt

Backport 1/1 commits from #28181.

/cc @cockroachdb/release

---

See #25344.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: neeral <neeral@users.noreply.github.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
@neeral
Copy link
Contributor

neeral commented Aug 8, 2018

I think we can close this? Thanks @dt for helping.

@dt
Copy link
Member Author

dt commented Aug 8, 2018

Yep, #28181. Thanks for the fix!

@dt dt closed this as completed Aug 8, 2018
@dt dt moved this from Backlog to Aug 20th Done in Bulk IO Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-blocking-adoption
Projects
None yet
Development

No branches or pull requests

3 participants