Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle Formats (JSON vs BSON vs Avro vs Parquet) #3

Closed
therne opened this issue Oct 31, 2018 · 2 comments
Closed

Bundle Formats (JSON vs BSON vs Avro vs Parquet) #3

therne opened this issue Oct 31, 2018 · 2 comments
Labels
discussion 개선에 대한 논의가 필요한 사항

Comments

@therne
Copy link
Contributor

therne commented Oct 31, 2018

Bundle 의 요구사항은 다음과 같습니다.

  • 앞에 Header가 위치
  • 번들 안에서 i번째 데이터를 가져올 수 있어야 함

Row-based JSON (Current)

Pros:

  • Easy to code
  • Easy to debug
  • Platform-wide

Cons:

  • Heavy
  • Slow
  • Doesn't support raw byte encoding

BSON

Pros:

  • Compact
  • Raw-byte encoding (Binary Protocol!)

Cons:

  • Doesn't support row-based retrieval
  • Hard to debug

Apache Avro

Pros:

  • Row-based Retrieval
  • Fast
  • Compact
  • Enterprise Friendly (Especially Hadoop Infrastructure)

Cons:

  • Isn't it over-engineering?
  • Worse at Column-based Retrieval
    • e.g) Aggregate only payload / ANID

Apache Parquet

Pros:

  • Column-based Retrieval
  • Fast
  • Compact
  • Enterprise Friendly (Especially Hadoop Infrastructure)

Cons:

  • Isn't it over-engineering?
  • Worse at Row-based Retrieval
    • e.g) Retrieve data on ith index
@therne
Copy link
Contributor Author

therne commented Oct 31, 2018

Using Avro in Go: https://github.com/actgardner/gogen-avro

@byeongsu-hong
Copy link
Contributor

일단 있는것들로 돌아가는 틀을 만든 후 도입해도 될것같습니다.

@therne therne added the discussion 개선에 대한 논의가 필요한 사항 label Oct 31, 2018
@therne therne added this to the Airbloc Public Alpha milestone Oct 31, 2018
@therne therne removed this from the Airbloc Public Alpha milestone Nov 15, 2018
@therne therne closed this as completed Jan 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 개선에 대한 논의가 필요한 사항
Projects
None yet
Development

No branches or pull requests

2 participants