Skip to content

Conversation

@rhunwicks
Copy link
Contributor

No description provided.

@rhunwicks rhunwicks self-assigned this Oct 15, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for generating a label recognition workbook that documents how BSS labels are recognized (either via regex patterns or database entries), updates Python version from 3.10 to 3.12, and refactors instance validation to use iterative processing instead of DataFrame operations. The changes also include improved error handling with a new strict configuration option and several code simplifications.

  • Adds new configuration for BSS label recognition workbook output
  • Refactors validation logic to iterate over instances rather than using DataFrames
  • Introduces conditional error handling based on config.strict setting
  • Updates target Python version to 3.12

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
requirements/base.txt Updates gdrivefs dependency commit hash
pyproject.toml Adds project metadata and updates Python version from 3.10 to 3.12
pipelines_tests/test_assets/test_livelihood_activity_regexes.json Adds test cases for new wild food product recognition patterns
pipelines/configs.py Adds configuration fields for BSS label recognition workbook
pipelines/assets/wild_foods.py Simplifies return statements and adds config parameter to function calls
pipelines/assets/wealth_characteristic.py Updates validate_instances call to include config parameter
pipelines/assets/other_cash_income.py Simplifies return statements and adds config parameter to function calls
pipelines/assets/livelihood_activity_regexes.json Adds regex patterns for wild foods and fish product recognition
pipelines/assets/livelihood_activity.py Adds label recognition workbook generation, refactors label matching logic, and improves error handling
pipelines/assets/fixtures.py Refactors validation from DataFrame operations to iterative instance processing and standardizes return values
pipelines/init.py Adds new livelihood_activity_label_recognition_dataframe asset to exports
env.example Documents new environment variables for label recognition workbook

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

df.iloc[num_header_rows:][
(df["A"].iloc[num_header_rows:] != "") & (all_label_attributes.iloc[num_header_rows:, 0].isna())
(df["A"].iloc[num_header_rows:] != "")
& (all_label_attributes.iloc[num_header_rows:]["activity_label"] == "")
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison all_label_attributes.iloc[num_header_rows:]["activity_label"] == "" may not correctly identify null/NA values. If activity_label contains None or pd.NA values instead of empty strings, this condition will miss them. Consider using isna() or checking for both empty strings and null values.

Suggested change
& (all_label_attributes.iloc[num_header_rows:]["activity_label"] == "")
& (
all_label_attributes.iloc[num_header_rows:]["activity_label"].isna()
| (all_label_attributes.iloc[num_header_rows:]["activity_label"] == "")
)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that a missing attribute will be " because of the fillna("") in get_all_label_attributes

record_reference += f"{instance['bss_column']}:{instance['bss_column']}: "
elif "bss_row" in instance:
record_reference += f"{instance['bss_row']}:{instance['bss_row']}: "
record_reference += f"{str({k: v for k,v in instance.items()})}"
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The dict comprehension creates a full copy of the instance which could be large. Consider limiting this to relevant fields or using a more concise representation for error messages to improve readability and performance.

Suggested change
record_reference += f"{str({k: v for k,v in instance.items()})}"
# Only include relevant fields for error messages to improve readability and performance
relevant_fields = ["natural_key", "id", "pk"]
concise_instance = {k: instance[k] for k in relevant_fields if k in instance}
record_reference += f"{concise_instance}"

Copilot uses AI. Check for mistakes.
rhunwicks and others added 3 commits October 14, 2025 23:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
girum-air
girum-air previously approved these changes Oct 15, 2025
@rhunwicks rhunwicks dismissed girum-air’s stale review October 15, 2025 16:51

The merge-base changed after approval.

@rhunwicks rhunwicks merged commit b8046de into main Oct 15, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants