Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description & motivation
This aligns with edu_edfi_source PR here to fix a bug in our processing of stu-school/section-attendance. We previously omitted session & attendance_event_category from the unique key, but both of those are needed to align with Ed-Fi. With the fix to edu_edfi_source, we will now keep more records in stg, so this edu_wh PR handles those extra duplicates by:
a) updating unique key of
fct_student_school_attendance_event
&fct_student_section_attendance_event
b) adjusting
fct_student_daily_attendance
to order onattendance_event_category
&k_session
. This is alphabetical, meaning it will be arbitrary, but consistent. This is an improvement over the previous rule, where consistency was not enforced. But we could consider a future improvement where the rule is configurable (e.g. prefer certain categories over others)More detail on b):
attendance_event_category_certainty_order
is configurable in xwalk, and the idea is to allow implementations to prefer certain events that seem likely to be corrections in the data. For example, if we have records for both "Absent" and "Tardy", it's likely that "Tardy" was a correction, and should be preferred. However, if we have "Absent" and "In Attendance", it's likely that "Absent" was a correction. This configuration allows the code to work for both positive and negative attendance.Dependencies
PR Merge Priority:
Changes to existing files:
fct_student_school_attendance_event
: update unique key post hook, and add new column from xwalkfct_student_section_attendance_event
: update unique key post hookfct_student_daily_attendance
:dim_session
, so we can fill intotal_instructional_days
for positive attendance records. Previouslytotal_instructional_days
was only populated for filled positive attendance records, which meant those were preferred in the dedupe order. But in cases where have a "real" absence record and "filled" attendance record, we want to prefer the "real" one.attendance_event_category_certainty_order
andk_session
to the order by, so we prefer "more certain" records, and if exact tie, take firstk_session
(arbitrary but consistent)calendar_date
as a column in table, and replacek_calendar_date
withcalendar_date
in the primary key definition (to match the actual dedupe rule)New files created:
attendance_event_duplicates
-- helpful view of the duplicates that are now let through on k_student + k_school + calendar_date, and some diagnostic columns that show how these are handled downstream in fct_student_daily_attendance. This could be used by implementations so they are aware of the rule change, and can expose potential data quality issuesTests and QC done:
Tested on Boston, and the impact on stu_daily_attendance was minimal. ~700 stu-daily records changed from absent -> not absent, because Tardy is sorted above Unexcused Absence in drafted xwalk for boston here
Future ToDos & Questions:
Please review the method for ordering on
attendance_event_category_certainty_order
. Is there a better name for this column? Are there edge cases that need to be addressed? Is the order of columns in the order by correct?