Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle file names with special characters #25

Merged
merged 2 commits into from
Feb 12, 2024
Merged

Conversation

tomer-amir
Copy link
Contributor

What

Use the Golang built-in CSV library to escape special characters in file names automatically

Why

Some customers (like EA, Shell, and Included Health) have files with special characters like \n or \r in their names.
Currently, since we're not really writing a CSV this breaks the file format.

Example:

Say the customer has a file called weird\nname.txt.
Right now, the output file will be:

weird
name.txt <blob_sha> true

using the CSV library this becomes:

"weird
name.txt" <blob_sha> true

if we read this using a CSV reader, it should be able to handle this correctly (tested in C#)

Same for the input file list.

How

Use CSV readers and writers on both ends instead of self-parsing the CSV

@tomer-amir tomer-amir requested review from ariel-levy and a team February 8, 2024 22:11
Comment on lines +339 to +342
err = csvWriter.Write([]string{"Path", "BlobId", "IsFile"})
if err != nil {
return 0, fmt.Errorf("failed to write file headers '%v': %v", optionalIndexFilePath, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we write headers previously also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but this helps in C# to deserialize the data

git/git_test.go Outdated
@@ -510,7 +510,7 @@ func (gitSuite *gitTestSuite) TestSnapshotWithIndexPath() {
})

gitSuite.Nil(err)
gitSuite.verifyIndexFile(40, indexFilePath)
gitSuite.verifyIndexFile(41, indexFilePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it 41 cause you added a file with special characters in the test repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. because of the headers line...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to add a test for it then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a test for the special characters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requires adding these files to dc-heacth github.com/apiirolab (the repo that the test is running on).
I don't have permissions to, and I am not sure what this could break...
we can also get back to this later.

return fmt.Errorf("failed to read paths file from location: '%v', error: '%v'", opts.PathsFileLocation, err)
}
for i := range lines {
provider.fileListToSnap[lines[i][0]] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need some validation that lines[i] isn't empty? or do we trust the flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am writing this as a CSV from C#, and I didn't see any way that this broke when I tested it...
Do you have a specific case in mind you want me to test?

@TalfinApiiro
Copy link
Contributor

LGTM, but do you want to wait for Ariel?

@tomer-amir tomer-amir merged commit ae22951 into main Feb 12, 2024
@tomer-amir tomer-amir deleted the tomer/escape-file-names branch February 12, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants