Skip to content

[PULL REQUEST] add in self employment to employment module#217

Merged
bryce-sandag merged 11 commits into
mainfrom
add-estimates-of-self-employed-individuals
May 5, 2026
Merged

[PULL REQUEST] add in self employment to employment module#217
bryce-sandag merged 11 commits into
mainfrom
add-estimates-of-self-employed-individuals

Conversation

@bryce-sandag
Copy link
Copy Markdown
Contributor

@bryce-sandag bryce-sandag commented Apr 14, 2026

Describe this pull request. What changes are being made?

This pull request add in the the category of self employment to the employment estimates module. This is done using publicly available data from the ACS at the blockgroup and aggregating self employment jobs down to the MGRA level.

What issues does this pull request address?

close #200

Additional context

for further detail about methodology check issue #200
There is a test run with results in ws database under run_id = 4

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds self-employment as a new jobs category in the employment estimates workflow by sourcing ACS B24080 self-employment data at the block-group level, allocating it to MGRAs via a new blockgroup→MGRA crosswalk, and incorporating the results into the controlled employment outputs.

Changes:

  • Adds new SQL extracts for ACS self-employment (regional control + blockgroup detail) and a blockgroup→MGRA allocation crosswalk.
  • Updates the employment pipeline to ingest, aggregate, and append self-employment jobs into the MGRA×NAICS jobs matrix.
  • Updates fallback behavior and test constants to account for the new self-employment NAICS category.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
sql/employment/xref_blockgroup_to_mgra.sql New SQL to build blockgroup→MGRA allocation weights for self-employment aggregation.
sql/employment/get_self_emp_bg_data.sql New SQL to pull ACS 5-year B24080 self-employment values by blockgroup.
sql/employment/get_regional_self_emp.sql New SQL to pull a regional self-employment control total (ACS 1-year, with 2020 exception).
python/utils.py Extends year-lookback trigger messages to include ACS 1-year missing-table message.
python/tests.py Updates expected distinct NAICS code count to include the new self-employment category.
python/employment.py Adds self-employment aggregation to MGRA and appends it into the employment pipeline.
Comments suppressed due to low confidence (1)

python/utils.py:722

  • read_sql_query_fallback() now treats 'ACS 1-Year Table does not exist' as a lookback trigger, but the function docstring still describes only the older set of messages. Update the docstring to include the new 1-year ACS message so callers know what SQL scripts must return (and keep the message text in sync with the SQL).
    # Messages that trigger year lookback
    lookback_messages = [
        "ACS 5-Year Table does not exist",
        "ACS 1-Year Table does not exist",
        "LODES data does not exist",
        "EDD point-level data does not exist",
        "QCEW data does not exist",
    ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/employment/get_regional_self_emp.sql Outdated
Comment thread sql/employment/get_self_emp_bg_data.sql Outdated
Comment thread sql/employment/xref_blockgroup_to_mgra.sql Outdated
Comment thread python/employment.py Outdated
Comment thread sql/employment/get_regional_self_emp.sql Outdated
Comment thread sql/employment/get_region_self_emp.sql
Comment thread sql/employment/get_self_emp_bg_data.sql Outdated
Comment thread sql/employment/xref_blockgroup_to_mgra.sql Outdated
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor

@bryce-sandag please do a test run in [EstimatesProgram] so that data associated with this PR won't be deleted when [ws] is wiped

Copy link
Copy Markdown
Contributor

@Eric-Liu-SANDAG Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should follow standard grammar rules, notably capitalization of the first letter in a sentence

Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Comment thread python/tests.py
Comment thread sql/employment/get_region_self_emp.sql
Comment thread sql/employment/get_B24080.sql
Comment thread sql/employment/xref_bg_to_mgra.sql
@bryce-sandag
Copy link
Copy Markdown
Contributor Author

run_id = 192 in production database has data

Copy link
Copy Markdown
Contributor

@Eric-Liu-SANDAG Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay like this is a really wild bit of feedback, but I think the [naics_code] column should be updated. With the inclusion of SE, it's not a NAICS code anymore. Additionally, there really really should be some metadata attached to the NAICS codes, most notably the NAICS year. So basically, I think there should be some kind of dimension table we refer to (maybe this should be added in [demographic_warehouse].[dim].[employment_type]) instead of the raw "NAICS" code

@GregorSchroeder what do you think about this?

Comment thread python/tests.py Outdated
Comment thread python/employment.py Outdated
@GregorSchroeder
Copy link
Copy Markdown
Contributor

Okay like this is a really wild bit of feedback, but I think the [naics_code] column should be updated. With the inclusion of SE, it's not a NAICS code anymore. Additionally, there really really should be some metadata attached to the NAICS codes, most notably the NAICS year. So basically, I think there should be some kind of dimension table we refer to (maybe this should be added in [demographic_warehouse].[dim].[employment_type]) instead of the raw "NAICS" code

@GregorSchroeder what do you think about this?

I would be open to renaming everything to [industry_code] like the BLS does since they add in their own flavors to NAICS like we are doing here. Outside of that relatively simple nomenclature change I would not want to further complicate things over this.

Comment thread python/employment.py Fixed
@bryce-sandag
Copy link
Copy Markdown
Contributor Author

Okay like this is a really wild bit of feedback, but I think the [naics_code] column should be updated. With the inclusion of SE, it's not a NAICS code anymore. Additionally, there really really should be some metadata attached to the NAICS codes, most notably the NAICS year. So basically, I think there should be some kind of dimension table we refer to (maybe this should be added in [demographic_warehouse].[dim].[employment_type]) instead of the raw "NAICS" code
@GregorSchroeder what do you think about this?

I would be open to renaming everything to [industry_code] like the BLS does since they add in their own flavors to NAICS like we are doing here. Outside of that relatively simple nomenclature change I would not want to further complicate things over this.

Made change to switch to industry_code. @GregorSchroeder can you update these two tables, noted below, in production database after changing naics_code to industry_code for variable name and in unique constraint so I can do a test run to make sure all changes were made correctly.
[inputs].[controls_jobs]
[outputs].[jobs]

Comment thread sql/employment/get_B24080.sql
Comment thread sql/employment/xref_blockgroup_to_mgra.sql Outdated
Comment thread python/utils.py
Comment thread python/employment.py
Copy link
Copy Markdown
Contributor

@GregorSchroeder GregorSchroeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Are we using the xref_block_to_mgra.sql file anymore? I think we can delete it and replace with the edd_land_use_split.sql

  • Suggest we align our SQL file naming conventions with the rest of the project and pass these names through to the Python as well for consistency.

edd_land_use_split.sql -> get_xref_block_to_mgra.sql
get_self_emp_bg_data.sql -> get_B24080.sql
QCEW_control.sql -> get_region_qcew.sql

  • Unresolved a comment from the last PR and added feedback.
Image

Comment thread python/employment.py
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Comment thread python/employment.py Outdated
Copy link
Copy Markdown
Contributor

@GregorSchroeder GregorSchroeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have passed field renaming of [naics_code] to [industry_code] through to the production database.

The following suggestions are very nit-picky style things that are optional. I missed some of these last reviews and/or gave suggestions that weren't in perfect alignment.

  • Would swap use of word "regional" for "region" in the SQL file names of get_regional_qcew.sql and get_regional_self_emp.sql. This aligns with the rest of the SQL files in the project and the geography naming convention we use.
  • I mistakenly suggested the name get_xref_block_to_mgra.sql when it really should just be xref_block_to_mgra.sql to align with the rest of the project.
  • I would alter the following key names for the jobs_inputs dictionary in _get_jobs_inputs() to match the SQL file names for traceability.
    • jobs_inputs["self_emp_bg"] -> jobs_inputs["B24080"]
    • I'd be ok with renaming the SQL file xref_blockgroup_to_mgra.sql to xref_bg_to_mgra.sql to match the jobs_inputs["xref_bg_to_mgra"]

Comment thread sql/employment/xref_block_to_mgra.sql
@bryce-sandag bryce-sandag merged commit 3bf02ff into main May 5, 2026
1 check passed
@bryce-sandag bryce-sandag deleted the add-estimates-of-self-employed-individuals branch May 5, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Estimates of Self Employed Individuals

4 participants