add_data.sh related Python scripts - flexible data loading #53

drotheram · 2021-03-01T19:36:51Z

jvanulde · 2021-04-27T16:14:12Z

@anthonyfok is this done? If not please move to Sprint 33 Milestone.

anthonyfok · 2021-04-27T20:12:21Z

@jvanulde Thanks for the reminder! Done, and am in the process of moving other outstanding tasks to Sprint 33 too.

anthonyfok · 2021-05-10T22:00:51Z

Notes

Wed 2021-02-24 Skype meetings

IIRC, this issue was first discussed on Wednesday, 24 February 2021, over Skype meetings with Will and Drew.

add_data.sh is the main orchestration of the whole thing. It has gotten a lot better over time, but it used to be extremely brittle, so anytime anyone changed any little thing, the whole thing would break. So, we've been spending time and effort trying to make this more flexible...

opendrr-api/python/add_data.sh

For example: Will wrote the following SQL script to pull in the social vulnerability data:

model-factory/scripts/Create_table_sovi_census_canada.sql

If the upstream CSV file (created by e.g. Murray or Tiegan) were changed, e.g. the headers Lon and Lat were changed to lowercase lon and lat, this whole SQL script would break.

model-factory/scripts/Create_risk_dynamics_indicators.sql (where a lot of headers are explicitly defined)
model-factory/scripts/Create_table_shakemap.sql (a version of which had a COPY statement where the header fields are dynamic)
model-factory/scripts/DSRA_runCreateTableShakemap.py (SQL statements in lines 62–67)

So, instead of explicitly defining those header files, headerFields,
we actually read them in (in some cases) from the CSV itself.

And that way when the CSV files are changed, then we end up just loading the whole CSV as it is with the headerFields dynamically generated.

... Some fields are critical...

Mon 2021-05-10 Zoom meeting

About model-factory/scripts/PSRA_copyTables.py

The tables are defined in model-factory/scripts/psra_1.Create_tables.sql.

These Python and SQL scripts are called from opendrr-api/python/add_data.sh like so:

# PSRA_1-8
for PT in ${PT_LIST[@]}
do
  python3 PSRA_runCreate_tables.py --province=${PT} --sqlScript="psra_1.Create_tables.sql"
  python3 PSRA_copyTables.py --province=${PT}
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_2.Create_table_updates.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_3.Create_psra_building_all_indicators.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_4.Create_psra_sauid_all_indicators.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_5.Create_psra_sauid_references_indicators.sql"
done

WIP, have yet to add code to read the header from CSV files. [Eventually] Fixes OpenDRR#53

drotheram assigned anthonyfok and drotheram Mar 1, 2021

drotheram added Enhancement New feature or request Priority: Should Have Severity: Normal Task labels Mar 1, 2021

drotheram added this to the Sprint 29 milestone Mar 1, 2021

anthonyfok mentioned this issue Mar 3, 2021

add_data.sh: Preliminary reorganization OpenDRR/opendrr-api#68

Merged

8 tasks

anthonyfok mentioned this issue Mar 19, 2021

[Meta-issue] Optimize pipeline (python/add_data.sh etc.) OpenDRR/opendrr-api#76

Open

23 tasks

jvanulde modified the milestones: Sprint 29, Sprint 31 Apr 8, 2021

anthonyfok modified the milestones: Sprint 31, Sprint 33 Apr 27, 2021

anthonyfok modified the milestones: Sprint 33, Sprint 34 May 7, 2021

anthonyfok changed the title ~~add_data.sh - flexible data loading~~ add_data.sh related Python scripts - flexible data loading May 11, 2021

anthonyfok added a commit to anthonyfok/model-factory that referenced this issue May 14, 2021

Refactor PSRA_copyTables.py to read CSV headers dynamically

d6fe869

WIP, have yet to add code to read the header from CSV files. [Eventually] Fixes OpenDRR#53

This was referenced May 25, 2021

Pipeline optimization (Sprint 34–36) OpenDRR/opendrr-api#105

Merged

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

Open

jvanulde modified the milestones: Sprint 34, Sprint 35 May 31, 2021

anthonyfok modified the milestones: Sprint 35, Sprint 36 Jun 7, 2021

anthonyfok mentioned this issue Jun 7, 2021

GitHub Actions for CI tests OpenDRR/opendrr-api#113

Open

11 tasks

anthonyfok modified the milestones: Sprint 36, Sprint 37 Jun 17, 2021

drotheram removed this from the Sprint 37 milestone Jul 5, 2021

drotheram added this to the Sprint 38 milestone Jul 5, 2021

jvanulde pinned this issue Jul 7, 2021

anthonyfok modified the milestones: Sprint 38, Sprint 39 Jul 15, 2021

drotheram removed this from the Sprint 39 milestone Sep 13, 2021

jvanulde added this to the Sprint 44 milestone Oct 21, 2021

anthonyfok modified the milestones: Sprint 44, Sprint 45 Oct 21, 2021

drotheram removed the Priority: Should Have label Dec 20, 2021

drotheram removed this from the Sprint 45 milestone Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_data.sh related Python scripts - flexible data loading #53

add_data.sh related Python scripts - flexible data loading #53

drotheram commented Mar 1, 2021 •

edited

Loading

jvanulde commented Apr 27, 2021

anthonyfok commented Apr 27, 2021

anthonyfok commented May 10, 2021 •

edited

Loading

add_data.sh related Python scripts - flexible data loading #53

add_data.sh related Python scripts - flexible data loading #53

Comments

drotheram commented Mar 1, 2021 • edited Loading

Major Priorities

Minor Priorities

jvanulde commented Apr 27, 2021

anthonyfok commented Apr 27, 2021

anthonyfok commented May 10, 2021 • edited Loading

Notes

Wed 2021-02-24 Skype meetings

Mon 2021-05-10 Zoom meeting

About model-factory/scripts/PSRA_copyTables.py

drotheram commented Mar 1, 2021 •

edited

Loading

anthonyfok commented May 10, 2021 •

edited

Loading