Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Develop CSV importer #4

Open
alanbernstein opened this issue Mar 9, 2017 · 4 comments
Open

Develop CSV importer #4

alanbernstein opened this issue Mar 9, 2017 · 4 comments

Comments

@alanbernstein
Copy link
Contributor

For the use case work, we put together a CSV import system that is specific to the two use cases, but lays some groundwork for working with more general data sources. The scope is limited to well-formatted, well-defined tabular data, so users will be responsible for providing clean data.

@bruth
Copy link

bruth commented Aug 12, 2018

Mind sharing those use cases and how a CSV file would map to the structure of an index?

@alanbernstein
Copy link
Contributor Author

The mapping for relational data is outlined in our docs at https://www.pilosa.com/docs/latest/data-model/#relational-analogy, and we have a few use case writeups at https://www.pilosa.com/use-cases/. I believe the two referenced in this ticket are transportation and network traffic. Note that these pages are overdue for some updates; you can see up to date PDK use case code in the repo: https://github.com/pilosa/pdk/tree/master/usecase.

@bruth
Copy link

bruth commented Aug 13, 2018

Thanks. I found the table in the Python notebook you put together helpful as well as the suggestion for binning strategies. The general recommendation for row IDs is that they are contiguous to optimize the bitmap compression (via roaring)? Is this handled if a field is created that supports keys?

@jaffee
Copy link
Member

jaffee commented Aug 24, 2018

@bruth it isn't as crucial that row IDs be continuous, but column IDs should be as close to continuous as possible. It is handled if you use keys.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants