Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of daily mongodb loads #6

Open
Ameya05 opened this issue Dec 9, 2017 · 1 comment
Open

Improve performance of daily mongodb loads #6

Ameya05 opened this issue Dec 9, 2017 · 1 comment
Assignees

Comments

@Ameya05
Copy link
Member

Ameya05 commented Dec 9, 2017

The current mongodb implementation does the following things daily -

  1. Uses the existing SQLite intermediate databases for all stations.
  2. Drops existing mongodb database.
  3. Computes data states for all stations and all dates (from 01/01/1994 to today).
  4. Inserts these values into a new mongodb database.

The current process takes upwards of 24 hours to process approximately upto data states 10,000 GPS Stations. Daily re-computation of Historical data states is redundant and can be avoided.

A simple performance improvement would be compute only the end_date's data state and attach it to the mongodb collection everyday, instead of dropping and creating from scratch.

@Ameya05 Ameya05 self-assigned this Dec 9, 2017
Ameya05 added a commit that referenced this issue Dec 9, 2017
…emoved historical data state calculation. Commented code which wouldn't be required to be run daily e.g. Meta data inserts, db drop, index computation
@Ameya05
Copy link
Member Author

Ameya05 commented Dec 9, 2017

To-Do :

  1. Check if database exists. If it does not, then run the old create_mongodb.py script which processes all stations and computes historical data states.
  2. For each Station, check if station data is present. If it is not, then this means station is new and so insert station's metadata into other collections as well as compute this stations historical data states.

Ameya05 added a commit that referenced this issue Dec 12, 2017
Reverted create_mongodb.py to its previous state to perform entire historical date computation and inserts from scratch.
The mongodb_load_delta.py now does following additional checks:
1. Checks if Station is encountered for the first time.
2. Refreshes meta_network collection to reflect latest load meta data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant