[PULL REQUEST] add in self employment to employment module#217
Conversation
There was a problem hiding this comment.
Pull request overview
Adds self-employment as a new jobs category in the employment estimates workflow by sourcing ACS B24080 self-employment data at the block-group level, allocating it to MGRAs via a new blockgroup→MGRA crosswalk, and incorporating the results into the controlled employment outputs.
Changes:
- Adds new SQL extracts for ACS self-employment (regional control + blockgroup detail) and a blockgroup→MGRA allocation crosswalk.
- Updates the employment pipeline to ingest, aggregate, and append self-employment jobs into the MGRA×NAICS jobs matrix.
- Updates fallback behavior and test constants to account for the new self-employment NAICS category.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| sql/employment/xref_blockgroup_to_mgra.sql | New SQL to build blockgroup→MGRA allocation weights for self-employment aggregation. |
| sql/employment/get_self_emp_bg_data.sql | New SQL to pull ACS 5-year B24080 self-employment values by blockgroup. |
| sql/employment/get_regional_self_emp.sql | New SQL to pull a regional self-employment control total (ACS 1-year, with 2020 exception). |
| python/utils.py | Extends year-lookback trigger messages to include ACS 1-year missing-table message. |
| python/tests.py | Updates expected distinct NAICS code count to include the new self-employment category. |
| python/employment.py | Adds self-employment aggregation to MGRA and appends it into the employment pipeline. |
Comments suppressed due to low confidence (1)
python/utils.py:722
read_sql_query_fallback()now treats 'ACS 1-Year Table does not exist' as a lookback trigger, but the function docstring still describes only the older set of messages. Update the docstring to include the new 1-year ACS message so callers know what SQL scripts must return (and keep the message text in sync with the SQL).
# Messages that trigger year lookback
lookback_messages = [
"ACS 5-Year Table does not exist",
"ACS 1-Year Table does not exist",
"LODES data does not exist",
"EDD point-level data does not exist",
"QCEW data does not exist",
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@bryce-sandag please do a test run in |
Eric-Liu-SANDAG
left a comment
There was a problem hiding this comment.
Comments should follow standard grammar rules, notably capitalization of the first letter in a sentence
|
|
Eric-Liu-SANDAG
left a comment
There was a problem hiding this comment.
Okay like this is a really wild bit of feedback, but I think the [naics_code] column should be updated. With the inclusion of SE, it's not a NAICS code anymore. Additionally, there really really should be some metadata attached to the NAICS codes, most notably the NAICS year. So basically, I think there should be some kind of dimension table we refer to (maybe this should be added in [demographic_warehouse].[dim].[employment_type]) instead of the raw "NAICS" code
@GregorSchroeder what do you think about this?
I would be open to renaming everything to |
Made change to switch to industry_code. @GregorSchroeder can you update these two tables, noted below, in production database after changing |
…s to _create_jobs_output
GregorSchroeder
left a comment
There was a problem hiding this comment.
-
Are we using the
xref_block_to_mgra.sqlfile anymore? I think we can delete it and replace with theedd_land_use_split.sql -
Suggest we align our SQL file naming conventions with the rest of the project and pass these names through to the Python as well for consistency.
edd_land_use_split.sql -> get_xref_block_to_mgra.sql
get_self_emp_bg_data.sql -> get_B24080.sql
QCEW_control.sql -> get_region_qcew.sql
- Unresolved a comment from the last PR and added feedback.
GregorSchroeder
left a comment
There was a problem hiding this comment.
Looks good. I have passed field renaming of [naics_code] to [industry_code] through to the production database.
The following suggestions are very nit-picky style things that are optional. I missed some of these last reviews and/or gave suggestions that weren't in perfect alignment.
- Would swap use of word "regional" for "region" in the SQL file names of
get_regional_qcew.sqlandget_regional_self_emp.sql. This aligns with the rest of the SQL files in the project and the geography naming convention we use. - I mistakenly suggested the name
get_xref_block_to_mgra.sqlwhen it really should just bexref_block_to_mgra.sqlto align with the rest of the project. - I would alter the following key names for the
jobs_inputsdictionary in_get_jobs_inputs()to match the SQL file names for traceability.jobs_inputs["self_emp_bg"]->jobs_inputs["B24080"]- I'd be ok with renaming the SQL file
xref_blockgroup_to_mgra.sqltoxref_bg_to_mgra.sqlto match thejobs_inputs["xref_bg_to_mgra"]
tried resolving but these changes did not seem to make it
Describe this pull request. What changes are being made?
This pull request add in the the category of self employment to the employment estimates module. This is done using publicly available data from the ACS at the blockgroup and aggregating self employment jobs down to the MGRA level.
What issues does this pull request address?
close #200
Additional context
for further detail about methodology check issue #200
There is a test run with results in
wsdatabase underrun_id = 4