Skip to content

FVSAggregate and ConditionVariantLookup THE SINGLE CSV WAY

Matthew Perry edited this page Jun 27, 2013 · 1 revision
FVSAggregate and ConditionVariantLookup (THE SINGLE-CSV WAY for dev machines)
  • Run fvsbatch

IF schema / parsing has changed

1. manually run extract.py on a single untarred .bz directory.
2. copy model defn 
3. schemamigration 
4. migrate 
5. alter postgres COPY in import_data to match schema

ELSE

  • sed merge csvs (see below)
cd /usr/local/data/out
# copy header
sed -n 1p `ls var*csv | head -n 1` > merged.csv
#copy all but the first line from all other files
for i in var*.csv; do sed 1d $i >> merged.csv ; done
  • make sure number of rows matches expected
ls -1 var*csv | wc -l  # 402
cat `ls var*csv | head -n 1` | wc -l # expect 127; minus header => 126 data rows
cat merged.csv | wc -l # 402 * 126 + 1 = 50653
  • make sure number of columns matches expected
# find number of commas
fgrep -o , merged.csv | wc -l  # 3241792
# should be exactly divisible by number of rows (eg. 3241792 %  50653 == 0)
# 3241792/ 50653.0 == 64.0 == (number of expected columns - 1)
  • TODO inspect & QC pre DB

  • copy csv to lot/fixtures/downloads/fvsaggregate.csv

  • delete old records

sudo su postgres
psql -d forestplanner
    
DELETE FROM trees_fvsaggregate;
DELETE FROM trees_conditionvariantlookup;
  • run manage.py import_data

  • after data is imported, datamigration is likely necessary:

whenever fake_scenariostands changes:

# Step 1: python manage.py clear_cache

from trees.models import Scenario, Stand, ScenarioNotRunnable
from trees.utils import fake_scenariostands

# Step 2: rerun impute_nearest_neighbor task on all existing stands
for stand in Stand.objects.all():
    stand.cond_id = None
    stand.save()

# THIS SHOULD NOT BE NECESSARY, but a quick way to assign cond_ids for those stands that nearest neighbor failed.
for stand in Stand.objects.all():
    if not stand.cond_id:
      stand.cond_id = random.choice([27707,29413,7224])
      stand.save()

# Step 3... wait for it....
for scenario in Scenario.objects.all():
    print "-" * 80
    print scenario, scenario.name
    try:
        fake_scenariostands(scenario)
        scenario.run()
    except ScenarioNotRunnable:
        pass

  • TODO inspect & QC after loading in database

TODO create/backup/distribute fixtures & move fvsaggregate.csv up to ninkasi (zipped?)

profile queries and optimize indices