# Check July HQTA data

Are `hqta_points` or `hqta_areas` falling in Stanislaus National Forest when they're not supposed to?

## Debug steps
* Open geoparquets from July and June to do side-by-side comparisons. Rule out whether it's a June export issue or something we haven't caught until now.
* Do some local checks for LA Metro, CulverCityBus, since those are some `major_stop_bus` points that were in the forest. Check other `hqta_type` too.
* Do a hyper-local check, by selecting attributes seen in the forest, then plot each individual point to see whether most of the points are on the street network, and one is off in the forest.
* If error can't be reproduced from geoparquets, check shapefile and file geodatabase
* Can't reproduce error from notebook from June or July
* Go through next steps when geoparquet -> shapefile: check ArcMap to see if forest points appear 
* When shapefile -> file gdb feature class: check ArcMap to see if forest points appear

## Findings
* Cannot reproduce error going from our June/July geoparquets
* Cannot reproduce error converting June geoparquet to shapefile
* Cannot reproduce error converting June shapefile to file gdb feature class

## Solution
* Rerun July export, double check if any points or polygons are in the forest. If not, send that new update out to replace open data portal datasets.

In [None]:
import geopandas as gpd
import pandas as pd

from utilities import GCS_FILE_PATH

In [None]:
july_points = gpd.read_parquet(f"{GCS_FILE_PATH}hqta_points.parquet")
july_areas = gpd.read_parquet(f"{GCS_FILE_PATH}hqta_areas.parquet")

june_points = gpd.read_parquet("./june_ca_hq_transit_stops.parquet")
june_areas = gpd.read_parquet("./june_ca_hq_transit_areas.parquet")

hqta_types = july_points.hqta_type.unique().tolist()
hqta_details = july_points.hqta_details.unique().tolist()

## Pick combinations that show up in Stanislaus National Forest, plot individually

None of these points fall in the forest, all in LA street network, as expected.

Can't reproduce error from geoparquets.

In [None]:
primary = 300 #87
secondary = 87 #182

details = "intersection_2_bus_routes_different_operators"

explore_me = june_areas[(june_areas.calitp_itp_id_primary==primary) & 
           (june_areas.calitp_itp_id_secondary==secondary) & 
           (june_areas.hqta_details==details)].reset_index(drop=True)

print(len(explore_me))
explore_me

In [None]:
for i in range(0, len(explore_me)):
    one_point = explore_me[explore_me.index==i]
    display(one_point.explore("hqta_details"))

## Side-by-side maps for July 2022 and June 2022

No points fall in Stanislaus National Forest. 

Maps look fairly similar for the 2 months.

**But, LA Metro's BRT is missing, go back to `A1_download_rail_ferry_brt_stops` to figure out why it's being missed in the custom filtering**

In [None]:
def plot_by_group(gdf_after, gdf_before, category_list: list, col: str, figsize=(5, 5)):
    for i in category_list:
        subset1 = gdf_after[gdf_after[col]==i]
        subset2 = gdf_before[gdf_before[col]==i]
        
        if len(subset1) > 0:
            m1= subset1.plot(column="calitp_itp_id_primary", figsize=figsize, cmap="cividis")
            m1.set_title(f"july: {i}")
            print(m1)
        if len(subset2) > 0:
            m2 = subset2.plot(column="calitp_itp_id_primary", figsize=figsize, cmap="viridis")
            m2.set_title(f"june: {i}")
            print(m2)

In [None]:
plot_by_group(july_points, june_points, hqta_types, "hqta_type", figsize=(5, 5))

In [None]:
plot_by_group(july_areas, june_areas, hqta_types, "hqta_type", figsize=(10, 10))

## LA Metro comparison

In [None]:
plot_by_group(july_areas[july_areas.calitp_itp_id_primary==182], 
              june_areas[june_areas.calitp_itp_id_primary==182], 
              hqta_types, "hqta_type", figsize=(5, 5))

In [None]:
metro_july = july_areas[july_areas.calitp_itp_id_primary==182]
metro_june = june_areas[june_areas.calitp_itp_id_primary==182]

In [None]:
#metro_july.explore("hqta_type")
#metro_june.explore("hqta_type")