# Dive into operator - organization many:many relationships

* Case 1: We expect that one operator (feed) can be linked to multiple organizations. There are a handful of examples, like Foothill Transit feed is linked to Foothill Transit and City of Duarte, and this is indeed valid.
* Case 2: We also expect that the same organization can be linked to multiple feeds. (Bay Area MTC511 is one organization with multiple feeds, so is VCTC GMV, and LA Metro is one we want to keep both rail and bus feeds for the same organization).

Portfolio shows things by organization and so does open data portal. This is desired because it wraps things up to the same "agency". Case 2 is automatically taken care of, and we have focused on this so far.

Case 1 is one where we'd like to select out just one organization name, and right now our sorting / deduping results in selecting the first letter of the alphabet, which isn't necessarily always the one we want. For Foothill Transit, we've been showing City of Duarte, and we actually want to show Foothill Transit....unless we're ok with showing the same info for Foothill Transit AND City of Duarte on 2 separate pages.

In [1]:
import pandas as pd
from update_vars import analysis_date_list, SCHED_GCS, GTFS_DATA_DICT

In [2]:
CROSSWALK_FILE = GTFS_DATA_DICT.schedule_tables.gtfs_key_crosswalk
analysis_date = analysis_date_list[0]

In [3]:
df = pd.read_parquet(
    f"{SCHED_GCS}{CROSSWALK_FILE}_{analysis_date}.parquet"
)

In [4]:
df[df.name=="Foothill Schedule"]

Unnamed: 0,schedule_gtfs_dataset_key,name,schedule_source_record_id,base64_url,organization_source_record_id,organization_name,itp_id,caltrans_district,counties_served,hq_city,...,service_area_sq_miles,population,service_area_pop,subrecipient_type,primary_uza,reporter_type,organization_type,voms_pt,voms_do,year
111,f74424acf8c41e4c1e9fd42838c4875c,Foothill Schedule,recbmQcgs8FDwTzUx,aHR0cHM6Ly9mb290aGlsbHRyYW5zaXQucmlkZXJhbGVydH...,recZm8PD8WIdcDL0M,City of Duarte,97.0,07 - Los Angeles,,,...,,,,,,,,,,
112,f74424acf8c41e4c1e9fd42838c4875c,Foothill Schedule,recbmQcgs8FDwTzUx,aHR0cHM6Ly9mb290aGlsbHRyYW5zaXQucmlkZXJhbGVydH...,recSqgaa8QiQ8CRjl,Foothill Transit,112.0,07 - Los Angeles,Los Angeles,West Covina,...,327.0,12237376.0,1515836.0,,,Full Reporter,Public Agency or Authority of Transit Service,303.0,,2022.0


In [None]:
multiple_orgs = (df.groupby("name")
                 .agg({"organization_name": "nunique"})
                 .reset_index()
                ).query('organization_name > 1')

In [None]:
multiple_feeds = (df.groupby("organization_name")
                 .agg({"schedule_gtfs_dataset_key": "nunique"})
                 .reset_index()
                ).query('schedule_gtfs_dataset_key > 1')

In [None]:
df[df.organization_name.isin(multiple_feeds.organization_name)][
    ["name", "organization_name"]].sort_values("name")

In [None]:
multiple_orgs