-
Notifications
You must be signed in to change notification settings - Fork 11
FVSAggregate and ConditionVariantLookup THE SINGLE CSV WAY
Matthew Perry edited this page Jun 27, 2013
·
1 revision
- Run fvsbatch
IF schema / parsing has changed
1. manually run extract.py on a single untarred .bz directory.
2. copy model defn
3. schemamigration
4. migrate
5. alter postgres COPY in import_data to match schema
ELSE
- sed merge csvs (see below)
cd /usr/local/data/out
# copy header
sed -n 1p `ls var*csv | head -n 1` > merged.csv
#copy all but the first line from all other files
for i in var*.csv; do sed 1d $i >> merged.csv ; done
- make sure number of rows matches expected
ls -1 var*csv | wc -l # 402
cat `ls var*csv | head -n 1` | wc -l # expect 127; minus header => 126 data rows
cat merged.csv | wc -l # 402 * 126 + 1 = 50653
- make sure number of columns matches expected
# find number of commas
fgrep -o , merged.csv | wc -l # 3241792
# should be exactly divisible by number of rows (eg. 3241792 % 50653 == 0)
# 3241792/ 50653.0 == 64.0 == (number of expected columns - 1)
-
TODO inspect & QC pre DB
-
copy csv to
lot/fixtures/downloads/fvsaggregate.csv
-
delete old records
sudo su postgres
psql -d forestplanner
DELETE FROM trees_fvsaggregate;
DELETE FROM trees_conditionvariantlookup;
-
run
manage.py import_data
-
after data is imported, datamigration is likely necessary:
whenever fake_scenariostands changes:
# Step 1: python manage.py clear_cache
from trees.models import Scenario, Stand, ScenarioNotRunnable
from trees.utils import fake_scenariostands
# Step 2: rerun impute_nearest_neighbor task on all existing stands
for stand in Stand.objects.all():
stand.cond_id = None
stand.save()
# THIS SHOULD NOT BE NECESSARY, but a quick way to assign cond_ids for those stands that nearest neighbor failed.
for stand in Stand.objects.all():
if not stand.cond_id:
stand.cond_id = random.choice([27707,29413,7224])
stand.save()
# Step 3... wait for it....
for scenario in Scenario.objects.all():
print "-" * 80
print scenario, scenario.name
try:
fake_scenariostands(scenario)
scenario.run()
except ScenarioNotRunnable:
pass
- TODO inspect & QC after loading in database
TODO create/backup/distribute fixtures & move fvsaggregate.csv up to ninkasi (zipped?)
profile queries and optimize indices