fix dedupe in combined behavior stg model - should include tenant & year #42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The dedupe here should include tenant & year, to match the unique key of k_discipline_incident.
Tested on Boston, where this is needed because of their multiyear ODS -- for years 2017-2022, all of their
student_discipline_incident_associationsrecords are duplicated for each api_year, because we run repeated pulls on the multiyear ODS with different values for api_year.Without including tenant & api_year in the dedupe, this dedupe just takes one record, assigned to whichever api_year is the last we pulled. We instead want all records to flow through stg, and handle the duplicates downstream (either with boston-specific queries, or a feature that removes incident dates that fall outside the school year)