Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/attendance unique key #91

Merged
merged 7 commits into from
Jan 24, 2024
Merged

Conversation

rlittle08
Copy link
Collaborator

@rlittle08 rlittle08 commented Nov 29, 2023

Description & motivation

This aligns with edu_edfi_source PR here to fix a bug in our processing of stu-school/section-attendance. We previously omitted session & attendance_event_category from the unique key, but both of those are needed to align with Ed-Fi. With the fix to edu_edfi_source, we will now keep more records in stg, so this edu_wh PR handles those extra duplicates by:

a) updating unique key of fct_student_school_attendance_event & fct_student_section_attendance_event
b) adjusting fct_student_daily_attendance to order on attendance_event_category & k_session. This is alphabetical, meaning it will be arbitrary, but consistent. This is an improvement over the previous rule, where consistency was not enforced. But we could consider a future improvement where the rule is configurable (e.g. prefer certain categories over others)

More detail on b): attendance_event_category_certainty_order is configurable in xwalk, and the idea is to allow implementations to prefer certain events that seem likely to be corrections in the data. For example, if we have records for both "Absent" and "Tardy", it's likely that "Tardy" was a correction, and should be preferred. However, if we have "Absent" and "In Attendance", it's likely that "Absent" was a correction. This configuration allows the code to work for both positive and negative attendance.

Dependencies

PR Merge Priority:

  • Low
  • Medium
  • High

Changes to existing files:

  • fct_student_school_attendance_event: update unique key post hook, and add new column from xwalk
  • fct_student_section_attendance_event: update unique key post hook
  • fct_student_daily_attendance:
    • add a join to dim_session, so we can fill in total_instructional_days for positive attendance records. Previously total_instructional_days was only populated for filled positive attendance records, which meant those were preferred in the dedupe order. But in cases where have a "real" absence record and "filled" attendance record, we want to prefer the "real" one.
    • add attendance_event_category_certainty_order and k_session to the order by, so we prefer "more certain" records, and if exact tie, take first k_session (arbitrary but consistent)
    • add calendar_date as a column in table, and replace k_calendar_date with calendar_date in the primary key definition (to match the actual dedupe rule)

New files created:

  • new audit table attendance_event_duplicates -- helpful view of the duplicates that are now let through on k_student + k_school + calendar_date, and some diagnostic columns that show how these are handled downstream in fct_student_daily_attendance. This could be used by implementations so they are aware of the rule change, and can expose potential data quality issues

Tests and QC done:

Tested on Boston, and the impact on stu_daily_attendance was minimal. ~700 stu-daily records changed from absent -> not absent, because Tardy is sorted above Unexcused Absence in drafted xwalk for boston here

Future ToDos & Questions:

Please review the method for ordering on attendance_event_category_certainty_order. Is there a better name for this column? Are there edge cases that need to be addressed? Is the order of columns in the order by correct?

@@ -136,7 +151,7 @@ positive_attendance_deduped as (
dbt_utils.deduplicate(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo add comment on reasoning behind dedupe key & order by

@rlittle08 rlittle08 marked this pull request as draft December 14, 2023 22:48
@rlittle08 rlittle08 marked this pull request as ready for review January 19, 2024 19:56
@rlittle08 rlittle08 merged commit 7b6c26f into main Jan 24, 2024
@rlittle08 rlittle08 deleted the bugfix/attendance_unique_key branch January 24, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants