Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports vendor info #2374

Merged
merged 6 commits into from
Apr 25, 2023
Merged

Reports vendor info #2374

merged 6 commits into from
Apr 25, 2023

Conversation

edasmalchi
Copy link
Member

@edasmalchi edasmalchi commented Mar 10, 2023

Description

A new table (or intermediate table to be joined to idx_monthly_reports_site) providing a high-level summary of vendors each organization uses for GTFS-schedule related tasks and GTFS-RT related tasks.

Supports

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation
  • agencies.yml

How has this been tested?

Successful build of fct_monthly_reports_site_organization_gtfs_vendors on Eric's staging.

Screenshots (optional)

Screenshot 2023-03-10 at 13 50 37

Copy link
Contributor

@lauriemerrell lauriemerrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, mostly style / kind of hypothetical (the least helpful kinds of comments? 😅 )

One specific thing to do once ready is to add YAML documenting the existence of this table. If you haven't already tested joining with idx_monthly_reports_site I would maybe just double check that to make sure your grain is ok.

I think it's fine to have this in a separate table, I saw you asked in Slack about having to turn this into JSON if it were joined into idx_monthly_reports_site. If you want to be able to use this to filter on the site, I suspect you may need to go that route (join and turn it into JSON on that table), but I could be wrong -- would defer to @ryon and @charlie-costanzo about that. (And just to be clear, that is a specific limitation of a specific adapter that we use in the reports site generation that couldn't handle arrays of structs; I still think that was the elegant way to go on that in the warehouse.)

Even if you do want to join it in, given how much logic there is in this, I think it could make sense to keep this as a separate table (intermediate in that case, as you note in the PR description) and then join it into the index table.

SELECT
source_record_id, name, organization_type
FROM {{ ref('dim_organizations') }}
WHERE _is_current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that by adding this filter, you might risk having a failed join if an organization record for a vendor has been deleted by was active in the past. Don't think it's a huge risk because IIRC the service components tables aren't fully historical yet (which we may want to just double check on..... If there is anyone we know has changed vendors, I think we may need to look into making those fully historical so we aren't assessing a current vendor on a prior vendor's data?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put another way, I think we might eventually need to make a vendor/components bridge table that uses the versioned keys that you could use here 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this is probably ok for now, but maybe we need a ticket to make it more robust in future?

@github-actions
Copy link

Warehouse report 📦

New models 🌱

mart.gtfs_quality.fct_monthly_reports_site_organization_gtfs_vendors

Changed models 🔀

DAG

@edasmalchi
Copy link
Member Author

@lauriemerrell is there a preferred ordering for entries in _mart_gtfs_quality.yml?

@lauriemerrell
Copy link
Contributor

is there a preferred ordering for entries in _mart_gtfs_quality.yml?

It's pretty YOLO, I'd just add it in the end 😅

@edasmalchi
Copy link
Member Author

tested join with idx_monthly_reports_site and lookin' OK (some nulls where no vendor data but I think that's alright)

SELECT * FROM `cal-itp-data-infra-staging.eric_mart_gtfs_quality.idx_monthly_reports_site` as t1
INNER JOIN `cal-itp-data-infra-staging.eric_mart_gtfs_quality.fct_monthly_reports_site_organization_gtfs_vendors` as t2
USING(organization_name, organization_source_record_id,
organization_itp_id, date_start)
WHERE publish_date = '2023-03-01'
LIMIT 1000

@edasmalchi
Copy link
Member Author

I think this could be ready! Would appreciate a docs build/review since I can't do that using the Hub yet

Copy link
Contributor

@lauriemerrell lauriemerrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 , I do think we might want a follow-up ticket to figure out the organization versioning here (probably low priority, I don't think that organization records get deleted often, but if a vendor changes names I think this logic would show the updated name.... That could maybe be desirable? But basically it just might be worth exploring further to confirm)

@edasmalchi edasmalchi merged commit eae4b1e into main Apr 25, 2023
@edasmalchi edasmalchi deleted the reports-vendor-info branch April 25, 2023 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants