Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

Optimize Data Store for performance #31

Open
schnuerle opened this issue Apr 23, 2018 · 3 comments
Open

Optimize Data Store for performance #31

schnuerle opened this issue Apr 23, 2018 · 3 comments
Assignees
Labels
Help Wanted Good items to start with if you are looking to help with the project In Progress Currently being worked on Phase 1 RDS End to end data processor with hooks and alarms v3.0

Comments

@schnuerle
Copy link
Contributor

Once the RDS is complete in Phase 1, we'd like to optimize it for performance. If you have ideas on how to do this with index or other methods, you can make a pull request to add support for it with examples to this repo so everyone can benefit. Use this issue as a place to discuss and collaborate.

@schnuerle schnuerle created this issue from a note in Phase 1 - Raw Waze to RDS (Data Store) Apr 23, 2018
@schnuerle schnuerle added Help Wanted Good items to start with if you are looking to help with the project Phase 1 RDS End to end data processor with hooks and alarms labels Apr 23, 2018
@jrstill
Copy link
Contributor

jrstill commented Apr 24, 2018

As discussed in #25, there are some pretty big performance gains to be had on insert-heavy workloads (such as when reprocessing lots of old files) if the FKs are removed. The larger the quantity of files in the queue, the greater the impact, as lock contention only gets worse the more there is. Just noting all of that here so we can review later.

@schnuerle
Copy link
Contributor Author

I've got some scripts to remove all the FKs and some to add them all back in. I've been backloading data and what I do is remove the FKs and it's more than twice as fast, then add them in at the end.

Once all the data is backloaded (doing it in 1 month chunks now) I'd like to post scripts for creating indexes, which really speed up query analysis.

I'm working on writing these and documenting them and will make a PR to add the docs and scripts to the repo when done.

@schnuerle
Copy link
Contributor Author

Working on this here: #41

@schnuerle schnuerle self-assigned this Jul 13, 2018
@schnuerle schnuerle added the v3.0 label Jul 13, 2018
@schnuerle schnuerle added Multi Cloud Terraform and other work run code on multiple platforms. In Progress Currently being worked on and removed Multi Cloud Terraform and other work run code on multiple platforms. labels Dec 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Help Wanted Good items to start with if you are looking to help with the project In Progress Currently being worked on Phase 1 RDS End to end data processor with hooks and alarms v3.0
Projects
Development

No branches or pull requests

2 participants