Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename _convert to maybe_encode #64

Merged
merged 3 commits into from
Jul 7, 2020
Merged

rename _convert to maybe_encode #64

merged 3 commits into from
Jul 7, 2020

Conversation

johnnychen94
Copy link
Member

@johnnychen94 johnnychen94 commented Jul 6, 2020

This is a sub PR of #61

Changes:

  • rename _convert to maybe_encode
  • add tests for maybe_encode
  • more patches to ignore CRLF/LF differences

I'm unclear what should maybe_encode(format"TXT", df::DataFrame) and maybe_encode(format"SHA256", df::DataFrame) behave. The current behavior is maybe_encode(fmt, string(df))

Converting it using string can be prune to show changes. Hence I suggest we do something like

function maybe_encode(::Type{DataFormat{:TXT}}, df::DataFrame)
    save("tmp.csv", df)
    _ignore_crlf(read(String, "tmp.csv"))
end

function maybe_encode(::Type{DataFormat{:SHA256}}, df::DataFrame)
    save("tmp.csv", df)
    bytes2hex(sha256(read("tmp.csv")))
end

@oxinabox if you like this idea, I can make a PR for this change.

* add tests for maybe_encode
* ignore CRLF/LF differences
test/runtests.jl Outdated Show resolved Hide resolved
It has no practical usage except that it's conceptually right.
@@ -24,7 +24,7 @@ function savefile(file::TextFile, content)
write(file.filename, string(content))
end

function query_extended(filename)
function query_extended(filename::AbstractString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function expects only string types splitext, I added this to make sure nothing strange is passed here as a safety net.

src/fileio.jl Outdated Show resolved Hide resolved
src/fileio.jl Outdated
# Helpers
_join(x::AbstractArray{<:AbstractString}) = mapreduce(_ignore_crlf, (x,y)->x*"\n"*y, x)
_sha256(x) = bytes2hex(sha256(x))
_ignore_crlf(x::AbstractString) = replace(x, "\r"=>"")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just say we are ignoring \r ?

Suggested change
_ignore_crlf(x::AbstractString) = replace(x, "\r"=>"")
_ignore_linefeed(x::AbstractString) = replace(x, "\r"=>"")

Copy link
Member Author

@johnnychen94 johnnychen94 Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're suggesting Carriage Return (\r) instead of linefeed (\n)?

I renamed to _ignore_CR with additional docstring.

Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
@johnnychen94
Copy link
Member Author

johnnychen94 commented Jul 7, 2020

I'm merging this to avoid possible conflict and save myself some efforts from rebasing.

If there're any further comments, I'll address them in future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants