Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend export command to show tombstone + change output format to CSV #610

Closed
wants to merge 1 commit into from

Conversation

brstgt
Copy link
Contributor

@brstgt brstgt commented Dec 22, 2017

No description provided.

if version == storage.Version1 {
size = n.Size
}
fmt.Printf("\"%s\",\"%s\",%d,%t,%s,%s,%s,%t\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my second thought, tab separated values will be easier to parse than dealing with escapes.

@brstgt
Copy link
Contributor Author

brstgt commented Dec 22, 2017 via email

@brstgt
Copy link
Contributor Author

brstgt commented Dec 22, 2017 via email

@chrislusf
Copy link
Collaborator

maybe not over-engineer this. Tab should be fine. I do not think people will try to access files via http with a tab in the url.

@brstgt
Copy link
Contributor Author

brstgt commented Dec 23, 2017

I thought about that for a while. To be honest, my very first approach was a bit naive.
I don't consider it as over engineering. Especially when this tool is used for repair purposes, it has to be robust. Not quoting properly can even produce security leaks as known from SQL injections. It's about what people can do, not what the normally do.
I consider this tool as an intermediate step towards a proper repair mechanism. Looking a little bit into the future, I'd also offer this export functionality as a rest endpoint that streams volume records that can then be aggregated and compared by a repair coordinator. From my point of view it would be consistent to have the same output format for both the API endpoint and the CLI tool.

With that in mind, the question is, what format is best suited for that.

Regular JSON:

  • It's not well suited for streaming, so I wouldn't consider that as an option

Line separated JSON per record:

  • Can be streamed well, Elasticsearch uses that for batch processing.
  • JSON is the heart of the web, so well suited for APIs
  • Seaweed speaks JSON in all other endpoints
  • Downside: Fieldnames are transferred for every record

TSV/CSV

  • Not typically used in APIs
  • Smaller in size than JSON

With respect to the repair process, do you have any thought or opinion about all that?
Of course we can talk about nits like tab vs. commas but in the end we have to be on the same page about the long term goal. If we have that, it's easier to talk about the small steps.
If you prefer, we can also continue that discussion in a different place (issue or forum)

@chrislusf
Copy link
Collaborator

chrislusf commented Dec 23, 2017 via email

@brstgt
Copy link
Contributor Author

brstgt commented Dec 23, 2017 via email

@chrislusf
Copy link
Collaborator

chrislusf commented Dec 23, 2017 via email

@ingardm
Copy link
Contributor

ingardm commented Feb 8, 2018

Any update on this?

@brstgt
Copy link
Contributor Author

brstgt commented Feb 8, 2018

To be honest: We think about moving from seaweed to ceph as I don't see a strong community here. So - probably not from our side.

@ingardm
Copy link
Contributor

ingardm commented Feb 8, 2018

We're just getting started with seaweed. We already have both gluster and ceph clusters running, but for small files this is the best we've found for our use case.

chrislusf added a commit that referenced this pull request Jul 15, 2018
@chrislusf
Copy link
Collaborator

merged via 3edfe1d

@chrislusf chrislusf closed this Jul 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants