## SJSU Capstone Data Analysis
*SJSU-MSTM*

### README
This notebook contains data cleaning, analysis and visualization for the SJSU capstone data; it validates data based on the Caltrans Data Quality Management Plan (DQMP) data quality dimensions listed below.

> Note: Similar to software [unit testing][01.00], it is intended to serve as a test suite for the specified dataset.

*Change Log*
* 10-21-2020: Submit Version 1.0
* 10-21-2020: Baseline Version v0.1
* 11-26-2024: SJSU Capstone Update
* 06-26-2025: Streamlined for simplicity - narrowing down to only requested visuals
        * Total Work Activity by Hotspot Corridors
        * Labor Totals by Hotspot Corridors
        * Work Activity Cost by Hotspot Corridors

*Deliverables*
1. Test case are organized into separate modules and prints test results
2. Each test will output non-compliant records into CSV files for action
3. Data processing module produces transformed data (e.g. table joins)
4. Notebook generates data dictionary after running all test cases
5. All test cases are repeatable and documented

*Data Quality Dimensions (DQMP)*
1. Accuracy and Precision: Data is close to true value and exactness
2. Validity: Conforms to established formats, data types and ranges
3. Completeness: Absence of gaps in data, especially missing values
4. Consistency: Data is collected at similar datetime and location
5. Timeliness: Data is updated on a regular basis
6. Granularity: Data is collected at appropriate level of detail for use
7. Uniqueness: Best effort to collect data from authoritative sources
8. Accessibility: Data is collected or processed into useable formats
9. Reputation: Data is trusted as reliable source

### Results (Summary)
This section reports data validation findings and process steps.

> Note: Data validation is intended to flag non-compliant values for discussion and not correction until confirmed by the team. As a result, the following issues were observed during data validation in addition to non-compliance report.

*Datasets Cleaned, Analyzed & Visualized*
1. Caltrans and Clean CA Litter Collection Totals
2. Clean CA Level of Service (LOS) Scores
3. Caltrans Customer Service Requests (CSRs)

*Data Processing Steps*
1. Import raw data/template files and save as table variables
2. Crosswalk raw data/template fields; populate template
3. Flag missing columns as "Column not provided"
4. Convert column data types as needed
5. Join project with corresponding table
6. Save merged data for validation

### Jupyter Introduction
This notebook will require some basic understanding of the Python programming language, Jupyter platform and data analysis concepts. It is based on this [tutorial][01.02] and [Github Repo][01.03].

Jupyter is a powerful collaborative tool which is open-source and light-weight. It provides all the tools necessary to run data analysis, visualization, statistics and data science [out of the box][01.04]. In addition, it has gain acceptance from industry and academia for collaborating on projects and publishing work.

Jupyter is a combination of text and code with the programming run-time built into the platform so there is no need to install additional software. The text is in the markdown file format (similar to HTML), and code in several languages. It is organized by cells which can consist of either text or code; placed together, they can be sent as a single document to share/publish work.

### Jupyter Notebook
Notebooks are organized by cells, which mainly consist of text (in markdown) and code (Python). It operations like a hybrid between MS Word and Excel file; whereas the entire file is like a document, the cells operate like a spreadsheet. For getting started, feel free to scroll down each cell and navigate around the cells for a quick tour. Here is a breakdown of how to view/edit cells:

*Navigation*
1. Each cell may be edited by hitting ENTER; toggle between cells using the arrow keys or mouse/scroller
2. When editing a cell, be sure to select "markdown" for text or "code" before writing into it
3. Each cell can be run by hitting CTRL + ENTER or the "run" button form the menu bar
4. Output from each cell will appear below; if an error occurs, please read and try to debug it(!)
5. File can be saved by hitting CTRL + "s" or file/save from the pulldown menu above

### Quick Start

*Notes*
1. This notebook will require some Python programming
2. It is widely used and taught in [high school][01.05] and AP Computer Science [courses][01.06]
3. [Jupyter][01.07] supports many other languages, including R, Scala and Julia
4. Python is the most popular of them and can be used for other tasks, primarily data science and web applications

### Data
#### IMMS OBI Reports

*IMMS Activity Codes - Caltrans Maintenance Litter Abatement*  
* D30050 - Caltrans Maintenance Sweeping
* D40050 - Caltrans Maintenance Litter Control
* D40150 - Caltrans Maintenance Road Patrol/Debris Pickup
* D41050 - Caltrans Maintenance Adopt-A-Highway Litter Control
* D42050 - Caltrans Maintenance Encampment Litter-Debris Removal
* D44050 - Caltrans Maintenance Special Programs People (SPP) Litter Control  
* D45050 - Illegal Dumping Debris Removal # Added June 26, 2025
* D60050 - Caltrans Maintenance Graffiti Removal

*IMMS Activity Codes - Clean California Litter Abatement*  
* D30051 - Clean California Sweeping
* D40051 - Clean California Litter Control
* D40151 - Clean California Road Patrol/Debris Pickup
* D41051 - Clean California Adopt-A-Highway Litter Control
* D42051 - Clean California Encampment Litter/Debris Removal
* D43051 - Clean California Dump Days
* D44051 - Clean California Special Programs People (SPP) Litter Control
* D60051 - Clean California Graffiti Removal  
  
*The Data query consists of the parameters listed below which was input into the query interface:
* Maintenance Activity Family: D (Litter Abatement)
* Activity Codes "D30050;D30051;D40050;D40051;D40150;D40151;D41050;D41051;D42050;D42051;D44050;D44051"
* Timeframe: 6-month increments (e.g., 07/01/2021 to 12/31/2021)

#### Data Updates
* I (noah) pulled data from imms on 6/17/2025 for 2024b (7/1/2024 to 12/31/2024) and 2025a (1/1/2025 to 6/16/2025)

### Exercises

*Jupyter*
1. [Intro Guide (DataQuest)][01.08]
2. [Intro Guide (DataCamp)][01.09]
3. [Notebook Intro (Medium)][01.10]
4. [Data Science Tutorial (Jupyter)][01.11]

*Python*
1. [Quick Start][01.12]
2. [Intro Tutorials][01.13]
3. [Quick Start (FCC)][01.14]

*Markdown*
1. [Quick Start (Github)][01.15]
2. [Quick Start Guide (Markdown)][01.16]
3. [Quick Start Tutorial (Markdown)][01.17]

[01.00]: https://en.wikipedia.org/wiki/Unit_testing
[01.01]: https://www.anaconda.com/distribution/
[01.02]: https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190
[01.03]: https://github.com/kadnan/EDA_Python/
[01.04]: https://jupyter.org/jupyter-book/01/what-is-data-science.html
[01.05]: https://codehs.com/info/curriculum/intropython
[01.06]: https://code.org/educate/curriculum/high-school
[01.07]: https://jupyter.org/
[01.08]: https://www.dataquest.io/blog/jupyter-notebook-tutorial/
[01.09]: https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook
[01.10]: https://towardsdatascience.com/a-beginners-tutorial-to-jupyter-notebooks-1b2f8705888a
[01.11]: https://jupyter.org/jupyter-book/01/what-is-data-science.html
[01.12]: https://www.python.org/about/gettingstarted/
[01.13]: https://realpython.com/learning-paths/python3-introduction/
[01.14]: https://guide.freecodecamp.org/python/
[01.15]: https://guides.github.com/features/mastering-markdown/
[01.16]: https://www.markdownguide.org/getting-started/
[01.17]: https://www.markdowntutorial.com/

In [1]:
# 01.01 - load python modules into notebook

# install pip package in current kernel; run only for initial install:
# https://medium.com/@rohanguptha.bompally/python-data-visualization-using-folium-and-geopandas-981857948f02
# !pip install descartes

import matplotlib.pyplot as plt
import numpy as np

# data analysis modules
import pandas as pd
import scipy

# data visualization modules
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages

# for the PDF export
import json
import nbformat
from textwrap import wrap

# Added by Noah to help with importing
import os
from pathlib import Path

# to help import the data
!pip install gcsfs pandas

# set numeric output; turn off scientific notation
pd.set_option("display.float_format", lambda x: "%.2f" % x)

# adjust print settings
pd.options.display.max_columns = 60
pd.options.display.max_rows = 35

# suppress warning
import warnings

warnings.filterwarnings("ignore")





In [2]:
# Added by NS 4/17/2025
# Identify the path to the Data
gcs_path = (
    "gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/"
)
# Identify the path to the output data
gcs_output_folder = "gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/00_study_created_data"

In [3]:
# 02.01 - data import functions

# A function to pull in a CSV file from DDS's Google Cloud Storage bucket
def load_csv_from_gcs_folder(file_name):
    """
    Load a CSV file from GCS using default credentials, with encoding fallback.

    Parameters:
        gcs_path (str): Base GCS path (e.g., 'gs://bucket/folder')
        file_name (str): Name of the CSV file

    Returns:
        pd.DataFrame: Loaded DataFrame or None if load fails
    """
    gcs_uri = f"{gcs_path.rstrip('/')}/{file_name}"
    print(f"Attempting to load: {gcs_uri}")

    try:
        # Try standard UTF-8 first
        df = pd.read_csv(gcs_uri)
        print(f"Loaded CSV with UTF-8 encoding: {gcs_uri}")
        return df
    except UnicodeDecodeError:
        try:
            # Try with Latin-1 fallback if UTF-8 fails
            df = pd.read_csv(gcs_uri, encoding='ISO-8859-1')
            print(f"Loaded CSV with ISO-8859-1 encoding: {gcs_uri}")
            return df
        except Exception as e:
            print(f"Failed to load with ISO-8859-1: {e}")
            return None
    except Exception as e:
        print(f"Error loading CSV from {gcs_uri}: {e}")
        return None


# Added by Noah Sanchez June 2025
# function to write csv file to GCS
# study created data (scd)
def write_scd_data_csv(df, gcs_output_path):
    df.to_csv(gcs_output_path, index=False)


# function to show table info
def data_profile(df, msg):
    # pass in variable into string
    # https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
    print("*** Table Info: %s ***" % msg, "\n")
    print(df.info(), "\n")
    print("*** Table Info: Table Dimensions ***", "\n")
    print(df.shape, "\n")

In [4]:
# 02.02 - data processing functions


# function convert col to string type
# https://www.geeksforgeeks.org/python-pandas-series-astype-to-convert-data-type-of-series/
def convert_str(df, col):
    df[col] = df[col].astype(str)
    return df

In [5]:
# 03.00 - data subset and table join functions

# subset dataset by row values; for example, project list by funding source
# https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values
# df_projects_atp = df_projects[df_projects['SOURCE'].str.contains('ATP')]


# define table join function; merge new and old tables on given column
def join_table(df_left, df_right, col, method, msg):
    df_join = pd.merge(df_left, df_right, on=col, how=method)
    print(msg, "\n")
    print("Before Table Join: ")
    print("Left Table: ", df_left.shape)
    print("Right Table: ", df_right.shape, "\n")
    print("After Table Join: ")
    print("Left + Right Table: ", df_join.shape, "\n")
    return df_join


# define table join function; merge new and old tables with different keys
# https://stackoverflow.com/questions/25888207/pandas-join-dataframes-on-field-with-different-names
def join_table_key(df_left, df_right, id_key, fk_key, msg):
    df_join = pd.merge(
        df_left, df_right, how="left", left_on=[id_key], right_on=[fk_key]
    )
    print(msg, "\n")
    print("Before Table Join: ")
    print("Left Table: ", df_left.shape)
    print("Right Table: ", df_right.shape, "\n")
    print("After Table Join: ")
    print("Left + Right Table: ", df_join.shape, "\n")
    return df_join

In [6]:
# 05.01.01 - data import/cleaning (imms)


# import imms/litter collection totals as csv datasets from a local folder
df_imms_2021b = load_csv_from_gcs_folder("imms_2021b.csv")
df_imms_2022a = load_csv_from_gcs_folder("imms_2022a.csv")
df_imms_2022b = load_csv_from_gcs_folder("imms_2022b.csv")
df_imms_2023a = load_csv_from_gcs_folder("imms_2023a.csv")
df_imms_2023b = load_csv_from_gcs_folder("imms_2023b.csv")
df_imms_2024a = load_csv_from_gcs_folder("imms_2024a.csv")
df_imms_2024b = load_csv_from_gcs_folder("imms_2024b.csv") # Added by NS 6/17/2025
df_imms_2025a = load_csv_from_gcs_folder("imms_2025a.csv") # Added by NS 6/17/2025


# clean data - IMMS 2021B
# check table dim
print("*** Table Dimensions: Original (IMMS 2021B) ***", "\n")
print(df_imms_2021b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_imms_2021b = df_imms_2021b[~df_imms_2021b['From PM'].isnull()]
# df_imms_2021b = df_imms_2021b[~df_imms_2021b['To PM'].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null PM (IMMS 2021B) ***', '\n')
# print(df_imms_2021b.shape , '\n')
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_imms_2021b = df_imms_2021b[~df_imms_2021b["Production Quantity"].isnull()]
df_imms_2021b = df_imms_2021b[~df_imms_2021b["Secondary Prod"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null litter prod (IMMS 2021B) ***", "\n")
print(df_imms_2021b.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows where 'Production Quantity' or 'Secondary Prod' is less than or equal to 1000
df_imms_2021b = df_imms_2021b[df_imms_2021b["Production Quantity"].astype(int) > 1000]
df_imms_2021b = df_imms_2021b[df_imms_2021b["Secondary Prod"].astype(int) > 1000]

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with production ≤ 1000 CY (IMMS 2021B) ***\n"
)
print(df_imms_2021b.shape, "\n")


# clean data - IMMS 2022A
# check table dim
print("*** Table Dimensions: Original (IMMS 2022A) ***", "\n")
print(df_imms_2022a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_imms_2022a = df_imms_2022a[~df_imms_2022a['From PM'].isnull()]
# df_imms_2022a = df_imms_2022a[~df_imms_2022a['To PM'].isnull()]
# check table dim
print("*** Table Dimensions: Remove null PM (IMMS 2022A) ***", "\n")
print(df_imms_2022a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_imms_2022a = df_imms_2022a[~df_imms_2022a["Production Quantity"].isnull()]
df_imms_2022a = df_imms_2022a[~df_imms_2022a["Secondary Prod"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null litter prod (IMMS 2022A) ***", "\n")
print(df_imms_2022a.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows where 'Production Quantity' or 'Secondary Prod' is less than or equal to 1000
df_imms_2022a = df_imms_2022a[df_imms_2022a["Production Quantity"].astype(int) > 1000]
df_imms_2022a = df_imms_2022a[df_imms_2022a["Secondary Prod"].astype(int) > 1000]

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with production ≤ 1000 CY (IMMS 2022A) ***\n"
)
print(df_imms_2022a.shape, "\n")


# clean data - IMMS 2022B
# check table dim
print("*** Table Dimensions: Original (IMMS 2022B) ***", "\n")
print(df_imms_2022b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_imms_2022b = df_imms_2022b[~df_imms_2022b['From PM'].isnull()]
# df_imms_2022b = df_imms_2022b[~df_imms_2022b['To PM'].isnull()]
# check table dim
print("*** Table Dimensions: Remove null PM (IMMS 2022B) ***", "\n")
print(df_imms_2022b.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows with null values in 'Production Quantity' and 'Secondary Prod' columns
df_imms_2022b = df_imms_2022b.dropna(subset=["Production Quantity", "Secondary Prod"])

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with null values in production columns (IMMS 2022B) ***\n"
)
print(df_imms_2022b.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows where 'Production Quantity' or 'Secondary Prod' is less than or equal to 1000
df_imms_2022b = df_imms_2022b[df_imms_2022b["Production Quantity"].astype(int) > 1000]
df_imms_2022b = df_imms_2022b[df_imms_2022b["Secondary Prod"].astype(int) > 1000]

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with production ≤ 1000 CY (IMMS 2022B) ***\n"
)
print(df_imms_2022b.shape, "\n")


# clean data - IMMS 2023A
# check table dim
print("*** Table Dimensions: Original (IMMS 2023A) ***", "\n")
print(df_imms_2023a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_imms_2023a = df_imms_2023a[~df_imms_2023a['From PM'].isnull()]
# df_imms_2023a = df_imms_2023a[~df_imms_2023a['To PM'].isnull()]
# check table dim
print("*** Table Dimensions: Remove null PM (IMMS 2023A) ***", "\n")
print(df_imms_2023a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_imms_2023a = df_imms_2023a[~df_imms_2023a["Production Quantity"].isnull()]
df_imms_2023a = df_imms_2023a[~df_imms_2023a["Secondary Prod"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null litter prod (IMMS 2023A) ***", "\n")
print(df_imms_2023a.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows where 'Production Quantity' or 'Secondary Prod' is less than or equal to 1000
df_imms_2023a = df_imms_2023a[df_imms_2023a["Production Quantity"].astype(int) > 1000]
df_imms_2023a = df_imms_2023a[df_imms_2023a["Secondary Prod"].astype(int) > 1000]

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with production ≤ 1000 CY (IMMS 2023A) ***\n"
)
print(df_imms_2023a.shape, "\n")


# clean data - IMMS 2023B
# check table dim
print("*** Table Dimensions: Original (IMMS 2023B) ***", "\n")
print(df_imms_2023b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_imms_2023b = df_imms_2023b[~df_imms_2023b['From PM'].isnull()]
# df_imms_2023b = df_imms_2023b[~df_imms_2023b['To PM'].isnull()]
# check table dim
print("*** Table Dimensions: Remove null PM (IMMS 2023B) ***", "\n")
print(df_imms_2023b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_imms_2023b = df_imms_2023b[~df_imms_2023b["Production Quantity"].isnull()]
df_imms_2023b = df_imms_2023b[~df_imms_2023b["Secondary Prod"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null litter prod (IMMS 2023B) ***", "\n")
print(df_imms_2023b.shape, "\n")


# ns_edit_1: This section replaces the previous section.
# Remove rows where 'Production Quantity' or 'Secondary Prod' is less than or equal to 1000
df_imms_2023b = df_imms_2023b[df_imms_2023b["Production Quantity"].astype(int) > 1000]
df_imms_2023b = df_imms_2023b[df_imms_2023b["Secondary Prod"].astype(int) > 1000]

# Check table dimensions
print(
    "*** Table Dimensions: Removed entries with production ≤ 1000 CY (IMMS 2023B) ***\n"
)
print(df_imms_2023b.shape, "\n")






# Added by NS to clean the newly added dataframes
def drop_last_three_columns(df):
    """
    Drops columns at positions 32, 33, and 34 from the DataFrame, if they exist.

    Args:
        df (pd.DataFrame): Input DataFrame.

    Returns:
        pd.DataFrame: A copy of the DataFrame with specified columns removed.
    """
    cols_to_drop = df.columns[32:35]  # positions 32, 33, 34
    return df.drop(columns=cols_to_drop)


# remove the last three columns in the dataframe
df_imms_2024b = drop_last_three_columns(df_imms_2024b)
df_imms_2025a = drop_last_three_columns(df_imms_2025a)





# Added by NS April 2025
# export final csv file
write_scd_data_csv(df_imms_2021b, f"{gcs_output_folder}/df_imms_2021b.csv")
write_scd_data_csv(df_imms_2022a, f"{gcs_output_folder}/df_imms_2022a.csv")
write_scd_data_csv(df_imms_2022b, f"{gcs_output_folder}/df_imms_2022b.csv")
write_scd_data_csv(df_imms_2023a, f"{gcs_output_folder}/df_imms_2023a.csv")
write_scd_data_csv(df_imms_2023b, f"{gcs_output_folder}/df_imms_2023b.csv")
write_scd_data_csv(df_imms_2024a, f"{gcs_output_folder}/df_imms_2024a.csv")
write_scd_data_csv(df_imms_2024b, f"{gcs_output_folder}/df_imms_2024b.csv") # Added by NS 6/17/2025
write_scd_data_csv(df_imms_2025a, f"{gcs_output_folder}/df_imms_2025a.csv") # Added by NS 6/17/2025

Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2021b.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2021b.csv
Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2022a.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2022a.csv
Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2022b.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2022b.csv
Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/imms_2023a.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-

# CSR Data? Do I need to update?

In [None]:
# 05.01.02 - data import/cleaning (csr)

# Added by NS 4/17/2025
# Import CSR data as CSV datasets
df_csr_2021b = load_csv_from_gcs_folder("csr_litter_aah_2021b.csv")
df_csr_2022a = load_csv_from_gcs_folder("csr_litter_aah_2022a.csv")
df_csr_2022b = load_csv_from_gcs_folder("csr_litter_aah_2022b.csv")
df_csr_2023a = load_csv_from_gcs_folder("csr_litter_aah_2023a.csv")
df_csr_2023b = load_csv_from_gcs_folder("csr_litter_aah_2023b.csv")
df_csr_2024a = load_csv_from_gcs_folder("csr_litter_aah_2024a.csv")





# clean data - CSR 2021B
# check table dim
print("*** Table Dimensions: Original (CSR 2021B) ***", "\n")
print(df_csr_2021b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2021b = df_csr_2021b[~df_csr_2021b["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2021B) ***", "\n")
print(df_csr_2021b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2021b = df_csr_2021b[~df_csr_2021b["Latitude"].isnull()]
df_csr_2021b = df_csr_2021b[~df_csr_2021b["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2021B) ***", "\n")
print(df_csr_2021b.shape, "\n")

# clean data - CSR 2022A
# check table dim
print("*** Table Dimensions: Original (CSR 2022A) ***", "\n")
print(df_csr_2022a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2022a = df_csr_2022a[~df_csr_2022a["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2022A) ***", "\n")
print(df_csr_2022a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2022a = df_csr_2022a[~df_csr_2022a["Latitude"].isnull()]
df_csr_2022a = df_csr_2022a[~df_csr_2022a["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2022A) ***", "\n")
print(df_csr_2022a.shape, "\n")

# clean data - CSR 2022B
# check table dim
print("*** Table Dimensions: Original (CSR 2022B) ***", "\n")
print(df_csr_2022b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2022b = df_csr_2022b[~df_csr_2022b["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2022B) ***", "\n")
print(df_csr_2022b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2022b = df_csr_2022b[~df_csr_2022b["Latitude"].isnull()]
df_csr_2022b = df_csr_2022b[~df_csr_2022b["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2022B) ***", "\n")
print(df_csr_2022b.shape, "\n")

# clean data - CSR 2023A
# check table dim
print("*** Table Dimensions: Original (CSR 2023A) ***", "\n")
print(df_csr_2023a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2023a = df_csr_2023a[~df_csr_2023a["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2023A) ***", "\n")
print(df_csr_2023a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2023a = df_csr_2023a[~df_csr_2023a["Latitude"].isnull()]
df_csr_2023a = df_csr_2023a[~df_csr_2023a["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2023A) ***", "\n")
print(df_csr_2023a.shape, "\n")

# clean data - CSR 2023B
# check table dim
print("*** Table Dimensions: Original (CSR 2023B) ***", "\n")
print(df_csr_2023b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2023b = df_csr_2023b[~df_csr_2023b["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2023B) ***", "\n")
print(df_csr_2023b.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2023b = df_csr_2023b[~df_csr_2023b["Latitude"].isnull()]
df_csr_2023b = df_csr_2023b[~df_csr_2023b["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2023B) ***", "\n")
print(df_csr_2023b.shape, "\n")

# clean data - CSR 2024A
# check table dim
print("*** Table Dimensions: Original (CSR 2024A) ***", "\n")
print(df_csr_2024a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2024a = df_csr_2024a[~df_csr_2024a["Date Opened"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null date opened (CSR 2024A) ***", "\n")
print(df_csr_2024a.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_csr_2024a = df_csr_2024a[~df_csr_2024a["Latitude"].isnull()]
df_csr_2024a = df_csr_2024a[~df_csr_2024a["Longitude"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null lat/long (CSR 2024A) ***", "\n")
print(df_csr_2024a.shape, "\n")



# Added by NS
# export final csv file
write_scd_data_csv(df_csr_2021b, f"{gcs_output_folder}/05.01.02_data_clean_csr_2021b.csv")
write_scd_data_csv(df_csr_2022a, f"{gcs_output_folder}/05.01.02_data_clean_csr_2022a.csv")
write_scd_data_csv(df_csr_2022b, f"{gcs_output_folder}/05.01.02_data_clean_csr_2022b.csv")
write_scd_data_csv(df_csr_2023a, f"{gcs_output_folder}/05.01.02_data_clean_csr_2023a.csv")
write_scd_data_csv(df_csr_2023b, f"{gcs_output_folder}/05.01.02_data_clean_csr_2023b.csv")
write_scd_data_csv(df_csr_2024a, f"{gcs_output_folder}/05.01.02_data_clean_csr_2024a.csv")

Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/csr_litter_aah_2021b.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/csr_litter_aah_2021b.csv
Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/csr_litter_aah_2022a.csv
Loaded CSV with UTF-8 encoding: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/csr_litter_aah_2022a.csv
Attempting to load: gs://calitp-analytics-data/data-analyses/big_data/clean_california_litter_study/01_source_data/csr_litter_aah_2022b.csv


In [None]:
# 05.01.03 - data import/cleaning (los)

# Added by NS 4/17/2025
# Import LOS scores data as CSV datasets
df_los_2023a_d1 = load_csv_from_gcs_folder("los_scores_2023a_d1.csv")
df_los_2023a_d2 = load_csv_from_gcs_folder("los_scores_2023a_d2.csv")
df_los_2023a_d3 = load_csv_from_gcs_folder("los_scores_2023a_d3.csv")
df_los_2023a_d4 = load_csv_from_gcs_folder("los_scores_2023a_d4.csv")
df_los_2023a_d5 = load_csv_from_gcs_folder("los_scores_2023a_d5.csv")
df_los_2023a_d6 = load_csv_from_gcs_folder("los_scores_2023a_d6.csv")
df_los_2023a_d7 = load_csv_from_gcs_folder("los_scores_2023a_d7.csv")
df_los_2023a_d8 = load_csv_from_gcs_folder("los_scores_2023a_d8.csv")
df_los_2023a_d9 = load_csv_from_gcs_folder("los_scores_2023a_d9.csv")
df_los_2023a_d10 = load_csv_from_gcs_folder("los_scores_2023a_d10.csv")
df_los_2023a_d11 = load_csv_from_gcs_folder("los_scores_2023a_d11.csv")
df_los_2023a_d12 = load_csv_from_gcs_folder("los_scores_2023a_d12.csv")





# Import LOS scores (raw) as CSV datasets
df_los_all_d1 = load_csv_from_gcs_folder("los_scores_raw_d1.csv")
df_los_all_d2 = load_csv_from_gcs_folder("los_scores_raw_d2.csv")
df_los_all_d3 = load_csv_from_gcs_folder("los_scores_raw_d3.csv")
df_los_all_d4 = load_csv_from_gcs_folder("los_scores_raw_d4.csv")
df_los_all_d5 = load_csv_from_gcs_folder("los_scores_raw_d5.csv")
df_los_all_d6 = load_csv_from_gcs_folder("los_scores_raw_d6.csv")
df_los_all_d7 = load_csv_from_gcs_folder("los_scores_raw_d7.csv")
df_los_all_d8 = load_csv_from_gcs_folder("los_scores_raw_d8.csv")
df_los_all_d9 = load_csv_from_gcs_folder("los_scores_raw_d9.csv")
df_los_all_d10 = load_csv_from_gcs_folder("los_scores_raw_d10.csv")
df_los_all_d11 = load_csv_from_gcs_folder("los_scores_raw_d11.csv")
df_los_all_d12 = load_csv_from_gcs_folder("los_scores_raw_d12.csv")


# import los scores as csv datasets
data_profile(df_los_all_d1, "Data Profile: LOS - d1")
data_profile(df_los_all_d2, "Data Profile: LOS - d2")
data_profile(df_los_all_d3, "Data Profile: LOS - d3")
data_profile(df_los_all_d4, "Data Profile: LOS - d4")
data_profile(df_los_all_d5, "Data Profile: LOS - d5")
data_profile(df_los_all_d6, "Data Profile: LOS - d6")
data_profile(df_los_all_d7, "Data Profile: LOS - d7")
data_profile(df_los_all_d8, "Data Profile: LOS - d8")
data_profile(df_los_all_d9, "Data Profile: LOS - d9")
data_profile(df_los_all_d10, "Data Profile: LOS - d10")
data_profile(df_los_all_d11, "Data Profile: LOS - d11")
data_profile(df_los_all_d12, "Data Profile: LOS - d12")

In [None]:


# clean data - los_2023a_d1
# check table dim
print("*** Table Dimensions: Original (los_2023a_d1) ***", "\n")
print(df_los_2023a_d1.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d1 = df_los_2023a_d1[~df_los_2023a_d1["CO"].isnull()]
df_los_2023a_d1 = df_los_2023a_d1[~df_los_2023a_d1["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d1) ***', '\n')
# print(df_los_2023a_d1.shape , '\n')

# clean data - los_2023a_d2
# check table dim
print("*** Table Dimensions: Original (los_2023a_d2) ***", "\n")
print(df_los_2023a_d2.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d2 = df_los_2023a_d2[~df_los_2023a_d2["CO"].isnull()]
df_los_2023a_d2 = df_los_2023a_d2[~df_los_2023a_d2["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d2) ***', '\n')
# print(df_los_2023a_d2.shape , '\n')

# clean data - los_2023a_d3
# check table dim
print("*** Table Dimensions: Original (los_2023a_d3) ***", "\n")
print(df_los_2023a_d3.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d3 = df_los_2023a_d3[~df_los_2023a_d3["CO"].isnull()]
df_los_2023a_d3 = df_los_2023a_d3[~df_los_2023a_d3["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d3) ***', '\n')
# print(df_los_2023a_d3.shape , '\n')

# clean data - los_2023a_d4
# check table dim
print("*** Table Dimensions: Original (los_2023a_d4) ***", "\n")
print(df_los_2023a_d4.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d4 = df_los_2023a_d4[~df_los_2023a_d4["CO"].isnull()]
df_los_2023a_d4 = df_los_2023a_d4[~df_los_2023a_d4["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d4) ***', '\n')
# print(df_los_2023a_d4.shape , '\n')

# clean data - los_2023a_d5
# check table dim
print("*** Table Dimensions: Original (los_2023a_d5) ***", "\n")
print(df_los_2023a_d5.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d5 = df_los_2023a_d5[~df_los_2023a_d5["CO"].isnull()]
df_los_2023a_d5 = df_los_2023a_d5[~df_los_2023a_d5["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d5) ***', '\n')
# print(df_los_2023a_d5.shape , '\n')

# clean data - los_2023a_d6
# check table dim
print("*** Table Dimensions: Original (los_2023a_d6) ***", "\n")
print(df_los_2023a_d6.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d6 = df_los_2023a_d6[~df_los_2023a_d6["CO"].isnull()]
df_los_2023a_d6 = df_los_2023a_d6[~df_los_2023a_d6["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d6) ***', '\n')
# print(df_los_2023a_d6.shape , '\n')

# clean data - los_2023a_d7
# check table dim
print("*** Table Dimensions: Original (los_2023a_d7) ***", "\n")
print(df_los_2023a_d7.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d7 = df_los_2023a_d7[~df_los_2023a_d7["CO"].isnull()]
df_los_2023a_d7 = df_los_2023a_d7[~df_los_2023a_d7["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d7) ***', '\n')
# print(df_los_2023a_d7.shape , '\n')

# clean data - los_2023a_d8
# check table dim
print("*** Table Dimensions: Original (los_2023a_d8) ***", "\n")
print(df_los_2023a_d8.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d8 = df_los_2023a_d8[~df_los_2023a_d8["CO"].isnull()]
df_los_2023a_d8 = df_los_2023a_d8[~df_los_2023a_d8["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d8) ***', '\n')
# print(df_los_2023a_d8.shape , '\n')

# clean data - los_2023a_d9
# check table dim
print("*** Table Dimensions: Original (los_2023a_d9) ***", "\n")
print(df_los_2023a_d9.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d9 = df_los_2023a_d9[~df_los_2023a_d9["CO"].isnull()]
df_los_2023a_d9 = df_los_2023a_d9[~df_los_2023a_d9["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d9) ***', '\n')
# print(df_los_2023a_d9.shape , '\n')

# clean data - los_2023a_d10
# check table dim
print("*** Table Dimensions: Original (los_2023a_d10) ***", "\n")
print(df_los_2023a_d10.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d10 = df_los_2023a_d10[~df_los_2023a_d10["CO"].isnull()]
df_los_2023a_d10 = df_los_2023a_d10[~df_los_2023a_d10["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d10) ***', '\n')
# print(df_los_2023a_d10.shape , '\n')

# clean data - los_2023a_d11
# check table dim
print("*** Table Dimensions: Original (los_2023a_d11) ***", "\n")
print(df_los_2023a_d11.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d11 = df_los_2023a_d11[~df_los_2023a_d11["CO"].isnull()]
df_los_2023a_d11 = df_los_2023a_d11[~df_los_2023a_d11["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d11) ***', '\n')
# print(df_los_2023a_d11.shape , '\n')

# clean data - los_2023a_d12
# check table dim
print("*** Table Dimensions: Original (los_2023a_d12) ***", "\n")
print(df_los_2023a_d12.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_2023a_d12 = df_los_2023a_d12[~df_los_2023a_d12["CO"].isnull()]
df_los_2023a_d12 = df_los_2023a_d12[~df_los_2023a_d12["RTE"].isnull()]
# check table dim
# print('*** Table Dimensions: Remove null dist/co/rte (los_2023a_d12) ***', '\n')
# print(df_los_2023a_d12.shape , '\n')

In [None]:


# export final csv file
write_scd_data_csv(df_los_2023a_d1, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d1.csv")
write_scd_data_csv(df_los_2023a_d2, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d2.csv")
write_scd_data_csv(df_los_2023a_d3, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d3.csv")
write_scd_data_csv(df_los_2023a_d4, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d4.csv")
write_scd_data_csv(df_los_2023a_d5, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d5.csv")
write_scd_data_csv(df_los_2023a_d6, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d6.csv")
write_scd_data_csv(df_los_2023a_d7, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d7.csv")
write_scd_data_csv(df_los_2023a_d8, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d8.csv")
write_scd_data_csv(df_los_2023a_d9, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d9.csv")
write_scd_data_csv(df_los_2023a_d10, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d10.csv")
write_scd_data_csv(df_los_2023a_d11, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d11.csv")
write_scd_data_csv(df_los_2023a_d12, f"{gcs_output_folder}/05.01.03_data_clean_los_2023a_d12.csv")

In [None]:
# clean data - los_all_d1
# check table dim
print("*** Table Dimensions: Original (los_all_d1) ***", "\n")
print(df_los_all_d1.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d1 = df_los_all_d1[~df_los_all_d1["CO"].isnull()]
df_los_all_d1 = df_los_all_d1[~df_los_all_d1["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d1) ***", "\n")
print(df_los_all_d1.shape, "\n")

# clean data - los_all_d2
# check table dim
print("*** Table Dimensions: Original (los_all_d2) ***", "\n")
print(df_los_all_d2.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d2 = df_los_all_d2[~df_los_all_d2["CO"].isnull()]
df_los_all_d2 = df_los_all_d2[~df_los_all_d2["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d2) ***", "\n")
print(df_los_all_d2.shape, "\n")

# clean data - los_all_d3
# check table dim
print("*** Table Dimensions: Original (los_all_d3) ***", "\n")
print(df_los_all_d3.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d3 = df_los_all_d3[~df_los_all_d3["CO"].isnull()]
df_los_all_d3 = df_los_all_d3[~df_los_all_d3["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d3) ***", "\n")
print(df_los_all_d3.shape, "\n")

# clean data - los_all_d4
# check table dim
print("*** Table Dimensions: Original (los_all_d4) ***", "\n")
print(df_los_all_d4.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d4 = df_los_all_d4[~df_los_all_d4["CO"].isnull()]
df_los_all_d4 = df_los_all_d4[~df_los_all_d4["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d4) ***", "\n")
print(df_los_all_d4.shape, "\n")

# clean data - los_all_d5
# check table dim
print("*** Table Dimensions: Original (los_all_d5) ***", "\n")
print(df_los_all_d5.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d5 = df_los_all_d5[~df_los_all_d5["CO"].isnull()]
df_los_all_d5 = df_los_all_d5[~df_los_all_d5["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d5) ***", "\n")
print(df_los_all_d5.shape, "\n")

# clean data - los_all_d6
# check table dim
print("*** Table Dimensions: Original (los_all_d6) ***", "\n")
print(df_los_all_d6.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d6 = df_los_all_d6[~df_los_all_d6["CO"].isnull()]
df_los_all_d6 = df_los_all_d6[~df_los_all_d6["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d6) ***", "\n")
print(df_los_all_d6.shape, "\n")







# This caused an error:

# # clean data - los_all_d7
# # check table dim
# print("*** Table Dimensions: Original (los_all_d7) ***", "\n")
# print(df_los_all_d7.shape, "\n")
# # remove null values from given column
# # https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_los_all_d7 = df_los_all_d7[~df_los_all_d7["CO"].isnull()]
# df_los_all_d7 = df_los_all_d7[~df_los_all_d7["RTE"].isnull()]
# # check table dim
# print("*** Table Dimensions: Remove null dist/co/rte (los_all_d7) ***", "\n")
# print(df_los_all_d7.shape, "\n")


# this is my workaround eda
# the error is saying that there isn't a dataframe, so the issue may be at a different spot in the script
# running the script with the script that remove nulls commented out

# clean data - los_all_d7
# check table dim
print("*** Table Dimensions: Original (los_all_d7) ***", "\n")
print(df_los_all_d7.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
# df_los_all_d7 = df_los_all_d7[~df_los_all_d7["CO"].isnull()]
# df_los_all_d7 = df_los_all_d7[~df_los_all_d7["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d7) ***", "\n")
print(df_los_all_d7.shape, "\n")







# clean data - los_all_d8
# check table dim
print("*** Table Dimensions: Original (los_all_d8) ***", "\n")
print(df_los_all_d8.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d8 = df_los_all_d8[~df_los_all_d8["CO"].isnull()]
df_los_all_d8 = df_los_all_d8[~df_los_all_d8["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d8) ***", "\n")
print(df_los_all_d8.shape, "\n")

# clean data - los_all_d9
# check table dim
print("*** Table Dimensions: Original (los_all_d9) ***", "\n")
print(df_los_all_d9.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d9 = df_los_all_d9[~df_los_all_d9["CO"].isnull()]
df_los_all_d9 = df_los_all_d9[~df_los_all_d9["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d9) ***", "\n")
print(df_los_all_d9.shape, "\n")

# clean data - los_all_d10
# check table dim
print("*** Table Dimensions: Original (los_all_d10) ***", "\n")
print(df_los_all_d10.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d10 = df_los_all_d10[~df_los_all_d10["CO"].isnull()]
df_los_all_d10 = df_los_all_d10[~df_los_all_d10["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d10) ***", "\n")
print(df_los_all_d10.shape, "\n")

# clean data - los_all_d11
# check table dim
print("*** Table Dimensions: Original (los_all_d11) ***", "\n")
print(df_los_all_d11.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d11 = df_los_all_d11[~df_los_all_d11["CO"].isnull()]
df_los_all_d11 = df_los_all_d11[~df_los_all_d11["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d11) ***", "\n")
print(df_los_all_d11.shape, "\n")

# clean data - los_all_d12
# check table dim
print("*** Table Dimensions: Original (los_all_d12) ***", "\n")
print(df_los_all_d12.shape, "\n")
# remove null values from given column
# https://stackoverflow.com/questions/44548721/remove-row-with-null-value-from-pandas-data-frame
df_los_all_d12 = df_los_all_d12[~df_los_all_d12["CO"].isnull()]
df_los_all_d12 = df_los_all_d12[~df_los_all_d12["RTE"].isnull()]
# check table dim
print("*** Table Dimensions: Remove null dist/co/rte (los_all_d12) ***", "\n")
print(df_los_all_d12.shape, "\n")


# export final csv file
write_scd_data_csv(df_los_all_d1, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d1.csv")
write_scd_data_csv(df_los_all_d2, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d2.csv")
write_scd_data_csv(df_los_all_d3, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d3.csv")
write_scd_data_csv(df_los_all_d4, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d4.csv")
write_scd_data_csv(df_los_all_d5, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d5.csv")
write_scd_data_csv(df_los_all_d6, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d6.csv")
write_scd_data_csv(df_los_all_d7, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d7.csv")
write_scd_data_csv(df_los_all_d8, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d8.csv")
write_scd_data_csv(df_los_all_d9, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d9.csv")
write_scd_data_csv(df_los_all_d10, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d10.csv")
write_scd_data_csv(df_los_all_d11, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d11.csv")
write_scd_data_csv(df_los_all_d12, f"{gcs_output_folder}/05.01.03_data_clean_los_all_d12.csv")

In [None]:
# 05.02.01 - data analysis (imms)




# The replacement
def filter_by_uom_and_district(df, district_number):
    """
    Filters the DataFrame for rows where 'UOM' contains 'CUYD' and 'Resp. District' equals the specified district_number.

    Parameters:
    df (pd.DataFrame): The input DataFrame.
    district_number (int): The district number to filter by.

    Returns:
    pd.DataFrame: The filtered DataFrame.
    """
    # Ensure 'Resp. District' is of integer type for accurate comparison
    df["Resp. District"] = pd.to_numeric(df["Resp. District"], errors="coerce")

    # Filter rows where 'UOM' contains 'CUYD' and 'Resp. District' matches the specified number
    filtered_df = df[
        df["UOM"].str.contains("CUYD", na=False)
        & (df["Resp. District"] == district_number)
    ]

    return filtered_df


# Apply the function to create various dataframes
df_imms_2023a_bar_d1 = filter_by_uom_and_district(df_imms_2023a, 1)
df_imms_2023a_bar_d2 = filter_by_uom_and_district(df_imms_2023a, 2)
df_imms_2023a_bar_d3 = filter_by_uom_and_district(df_imms_2023a, 3)
df_imms_2023a_bar_d4 = filter_by_uom_and_district(df_imms_2023a, 4)
df_imms_2023a_bar_d5 = filter_by_uom_and_district(df_imms_2023a, 5)
df_imms_2023a_bar_d6 = filter_by_uom_and_district(df_imms_2023a, 6)
df_imms_2023a_bar_d7 = filter_by_uom_and_district(df_imms_2023a, 7)
df_imms_2023a_bar_d8 = filter_by_uom_and_district(df_imms_2023a, 8)
df_imms_2023a_bar_d9 = filter_by_uom_and_district(df_imms_2023a, 9)
df_imms_2023a_bar_d10 = filter_by_uom_and_district(df_imms_2023a, 10)
df_imms_2023a_bar_d11 = filter_by_uom_and_district(df_imms_2023a, 11)
df_imms_2023a_bar_d12 = filter_by_uom_and_district(df_imms_2023a, 12)


# check table dim
# print('*** Table Dimensions: Remove sweeping CY values (D1) ***', '\n')
# print(df_imms_2023a_bar_d1.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D2) ***', '\n')
# print(df_imms_2023a_bar_d2.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D3) ***', '\n')
# print(df_imms_2023a_bar_d3.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D4) ***', '\n')
# print(df_imms_2023a_bar_d4.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D5) ***', '\n')
# print(df_imms_2023a_bar_d5.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D6) ***', '\n')
# print(df_imms_2023a_bar_d6.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D7) ***', '\n')
# print(df_imms_2023a_bar_d7.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D8) ***', '\n')
# print(df_imms_2023a_bar_d8.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D9) ***', '\n')
# print(df_imms_2023a_bar_d9.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D10) ***', '\n')
# print(df_imms_2023a_bar_d10.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D11) ***', '\n')
# print(df_imms_2023a_bar_d11.shape , '\n')
# print('*** Table Dimensions: Remove sweeping CY values (D12) ***', '\n')
# print(df_imms_2023a_bar_d12.shape , '\n')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_03sac005 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2022a_hotspot_03sac005 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2022b_hotspot_03sac005 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2023a_hotspot_03sac005 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2023b_hotspot_03sac005 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2024a_hotspot_03sac005 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "03-SAC-005")
]
df_imms_2024b_hotspot_03sac005 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "03-SAC-005") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_03sac005 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "03-SAC-005") # Added by NS 6/17/2025
]




# check filter results
# data_profile(df_imms_2021b_hotspot_03sac005, 'Data Profile: IMMS 2021b - 03sac005')
# data_profile(df_imms_2022a_hotspot_03sac005, 'Data Profile: IMMS 2022a - 03sac005')
# data_profile(df_imms_2022b_hotspot_03sac005, 'Data Profile: IMMS 2022b - 03sac005')
# data_profile(df_imms_2023a_hotspot_03sac005, 'Data Profile: IMMS 2023a - 03sac005')
# data_profile(df_imms_2023b_hotspot_03sac005, 'Data Profile: IMMS 2023b - 03sac005')
# data_profile(df_imms_2024a_hotspot_03sac005, 'Data Profile: IMMS 2024a - 03sac005')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_03sac005 = pd.concat(
    [
        df_imms_2021b_hotspot_03sac005,
        df_imms_2022a_hotspot_03sac005,
        df_imms_2022b_hotspot_03sac005,
        df_imms_2023a_hotspot_03sac005,
        df_imms_2023b_hotspot_03sac005,
        df_imms_2024a_hotspot_03sac005,
        df_imms_2024b_hotspot_03sac005, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_03sac005, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_03sac005, 'Data Profile: IMMS Litter Hotspot - 03sac005')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_03sac050 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2022a_hotspot_03sac050 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2022b_hotspot_03sac050 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2023a_hotspot_03sac050 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2023b_hotspot_03sac050 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2024a_hotspot_03sac050 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "03-SAC-050")
]
df_imms_2024b_hotspot_03sac050 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "03-SAC-050") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_03sac050 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "03-SAC-050") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_03sac050, 'Data Profile: IMMS 2021b - 03sac050')
# data_profile(df_imms_2022a_hotspot_03sac050, 'Data Profile: IMMS 2022a - 03sac050')
# data_profile(df_imms_2022b_hotspot_03sac050, 'Data Profile: IMMS 2022b - 03sac050')
# data_profile(df_imms_2023a_hotspot_03sac050, 'Data Profile: IMMS 2023a - 03sac050')
# data_profile(df_imms_2023b_hotspot_03sac050, 'Data Profile: IMMS 2023b - 03sac050')
# data_profile(df_imms_2024a_hotspot_03sac050, 'Data Profile: IMMS 2024a - 03sac050')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_03sac050 = pd.concat(
    [
        df_imms_2021b_hotspot_03sac050,
        df_imms_2022a_hotspot_03sac050,
        df_imms_2022b_hotspot_03sac050,
        df_imms_2023a_hotspot_03sac050,
        df_imms_2023b_hotspot_03sac050,
        df_imms_2024a_hotspot_03sac050,
        df_imms_2024b_hotspot_03sac050, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_03sac050, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_03sac050, 'Data Profile: IMMS Litter Hotspot - 03sac050')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_03sac080 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2022a_hotspot_03sac080 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2022b_hotspot_03sac080 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2023a_hotspot_03sac080 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2023b_hotspot_03sac080 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2024a_hotspot_03sac080 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "03-SAC-080")
]
df_imms_2024b_hotspot_03sac080 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "03-SAC-080") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_03sac080 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "03-SAC-080") # Added by NS 6/17/2025
]
# check filter results
# data_profile(df_imms_2021b_hotspot_03sac080, 'Data Profile: IMMS 2021b - 03sac080')
# data_profile(df_imms_2022a_hotspot_03sac080, 'Data Profile: IMMS 2022a - 03sac080')
# data_profile(df_imms_2022b_hotspot_03sac080, 'Data Profile: IMMS 2022b - 03sac080')
# data_profile(df_imms_2023a_hotspot_03sac080, 'Data Profile: IMMS 2023a - 03sac080')
# data_profile(df_imms_2023b_hotspot_03sac080, 'Data Profile: IMMS 2023b - 03sac080')
# data_profile(df_imms_2024a_hotspot_03sac080, 'Data Profile: IMMS 2024a - 03sac080')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_03sac080 = pd.concat(
    [
        df_imms_2021b_hotspot_03sac080,
        df_imms_2022a_hotspot_03sac080,
        df_imms_2022b_hotspot_03sac080,
        df_imms_2023a_hotspot_03sac080,
        df_imms_2023b_hotspot_03sac080,
        df_imms_2024a_hotspot_03sac080,
        df_imms_2024b_hotspot_03sac080, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_03sac080, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_03sac080, 'Data Profile: IMMS Litter Hotspot - 03sac080')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_04ala580b = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2022a_hotspot_04ala580b = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2022b_hotspot_04ala580b = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2023a_hotspot_04ala580b = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2023b_hotspot_04ala580b = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2024a_hotspot_04ala580b = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "04-ALA-580B")
]
df_imms_2024b_hotspot_04ala580b = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "04-ALA-580B") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_04ala580b = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "04-ALA-580B") # Added by NS 6/17/2025
]
# check filter results
# data_profile(df_imms_2021b_hotspot_04ala580b, 'Data Profile: IMMS 2021b - 04ala580b')
# data_profile(df_imms_2022a_hotspot_04ala580b, 'Data Profile: IMMS 2022a - 04ala580b')
# data_profile(df_imms_2022b_hotspot_04ala580b, 'Data Profile: IMMS 2022b - 04ala580b')
# data_profile(df_imms_2023a_hotspot_04ala580b, 'Data Profile: IMMS 2023a - 04ala580b')
# data_profile(df_imms_2023b_hotspot_04ala580b, 'Data Profile: IMMS 2023b - 04ala580b')
# data_profile(df_imms_2024a_hotspot_04ala580b, 'Data Profile: IMMS 2024a - 04ala580b')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_04ala580b = pd.concat(
    [
        df_imms_2021b_hotspot_04ala580b,
        df_imms_2022a_hotspot_04ala580b,
        df_imms_2022b_hotspot_04ala580b,
        df_imms_2023a_hotspot_04ala580b,
        df_imms_2023b_hotspot_04ala580b,
        df_imms_2024a_hotspot_04ala580b,
        df_imms_2024b_hotspot_04ala580b, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_04ala580b, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_04ala580b, 'Data Profile: IMMS Litter Hotspot - 04ala580b')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_04ala680 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2022a_hotspot_04ala680 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2022b_hotspot_04ala680 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2023a_hotspot_04ala680 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2023b_hotspot_04ala680 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2024a_hotspot_04ala680 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "04-ALA-680")
]
df_imms_2024b_hotspot_04ala680 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "04-ALA-680") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_04ala680 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "04-ALA-680") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_04ala680, 'Data Profile: IMMS 2021b - 04ala680')
# data_profile(df_imms_2022a_hotspot_04ala680, 'Data Profile: IMMS 2022a - 04ala680')
# data_profile(df_imms_2022b_hotspot_04ala680, 'Data Profile: IMMS 2022b - 04ala680')
# data_profile(df_imms_2023a_hotspot_04ala680, 'Data Profile: IMMS 2023a - 04ala680')
# data_profile(df_imms_2023b_hotspot_04ala680, 'Data Profile: IMMS 2023b - 04ala680')
# data_profile(df_imms_2024a_hotspot_04ala680, 'Data Profile: IMMS 2024a - 04ala680')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_04ala680 = pd.concat(
    [
        df_imms_2021b_hotspot_04ala680,
        df_imms_2022a_hotspot_04ala680,
        df_imms_2022b_hotspot_04ala680,
        df_imms_2023a_hotspot_04ala680,
        df_imms_2023b_hotspot_04ala680,
        df_imms_2024a_hotspot_04ala680,
        df_imms_2024b_hotspot_04ala680, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_04ala680, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_04ala680, 'Data Profile: IMMS Litter Hotspot - 04ala680')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_04ala880 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2022a_hotspot_04ala880 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2022b_hotspot_04ala880 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2023a_hotspot_04ala880 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2023b_hotspot_04ala880 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2024a_hotspot_04ala880 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "04-ALA-880")
]
df_imms_2024b_hotspot_04ala880 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "04-ALA-880") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_04ala880 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "04-ALA-880") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_04ala880, 'Data Profile: IMMS 2021b - 04ala880')
# data_profile(df_imms_2022a_hotspot_04ala880, 'Data Profile: IMMS 2022a - 04ala880')
# data_profile(df_imms_2022b_hotspot_04ala880, 'Data Profile: IMMS 2022b - 04ala880')
# data_profile(df_imms_2023a_hotspot_04ala880, 'Data Profile: IMMS 2023a - 04ala880')
# data_profile(df_imms_2023b_hotspot_04ala880, 'Data Profile: IMMS 2023b - 04ala880')
# data_profile(df_imms_2024a_hotspot_04ala880, 'Data Profile: IMMS 2024a - 04ala880')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_04ala880 = pd.concat(
    [
        df_imms_2021b_hotspot_04ala880,
        df_imms_2022a_hotspot_04ala880,
        df_imms_2022b_hotspot_04ala880,
        df_imms_2023a_hotspot_04ala880,
        df_imms_2023b_hotspot_04ala880,
        df_imms_2024a_hotspot_04ala880,
        df_imms_2024b_hotspot_04ala880, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_04ala880, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_04ala880, 'Data Profile: IMMS Litter Hotspot - 04ala880')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_04cc004a = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2022a_hotspot_04cc004a = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2022b_hotspot_04cc004a = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2023a_hotspot_04cc004a = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2023b_hotspot_04cc004a = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2024a_hotspot_04cc004a = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "04-CC-004A")
]
df_imms_2024b_hotspot_04cc004a = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "04-CC-004A") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_04cc004a = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "04-CC-004A") # Added by NS 6/17/2025
]
# check filter results
# data_profile(df_imms_2021b_hotspot_04cc004a, 'Data Profile: IMMS 2021b - 04cc004a')
# data_profile(df_imms_2022a_hotspot_04cc004a, 'Data Profile: IMMS 2022a - 04cc004a')
# data_profile(df_imms_2022b_hotspot_04cc004a, 'Data Profile: IMMS 2022b - 04cc004a')
# data_profile(df_imms_2023a_hotspot_04cc004a, 'Data Profile: IMMS 2023a - 04cc004a')
# data_profile(df_imms_2023b_hotspot_04cc004a, 'Data Profile: IMMS 2023b - 04cc004a')
# data_profile(df_imms_2024a_hotspot_04cc004a, 'Data Profile: IMMS 2024a - 04cc004a')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_04cc004a = pd.concat(
    [
        df_imms_2021b_hotspot_04cc004a,
        df_imms_2022a_hotspot_04cc004a,
        df_imms_2022b_hotspot_04cc004a,
        df_imms_2023a_hotspot_04cc004a,
        df_imms_2023b_hotspot_04cc004a,
        df_imms_2024a_hotspot_04cc004a,
        df_imms_2024b_hotspot_04cc004a, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_04cc004a, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_04cc004a, 'Data Profile: IMMS Litter Hotspot - 04cc004a')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_04cc680 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2022a_hotspot_04cc680 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2022b_hotspot_04cc680 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2023a_hotspot_04cc680 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2023b_hotspot_04cc680 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2024a_hotspot_04cc680 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "04-CC-680")
]
df_imms_2024b_hotspot_04cc680 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "04-CC-680") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_04cc680 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "04-CC-680") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_04cc680, 'Data Profile: IMMS 2021b - 04cc680')
# data_profile(df_imms_2022a_hotspot_04cc680, 'Data Profile: IMMS 2022a - 04cc680')
# data_profile(df_imms_2022b_hotspot_04cc680, 'Data Profile: IMMS 2022b - 04cc680')
# data_profile(df_imms_2023a_hotspot_04cc680, 'Data Profile: IMMS 2023a - 04cc680')
# data_profile(df_imms_2023b_hotspot_04cc680, 'Data Profile: IMMS 2023b - 04cc680')
# data_profile(df_imms_2024a_hotspot_04cc680, 'Data Profile: IMMS 2024a - 04cc680')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_04cc680 = pd.concat(
    [
        df_imms_2021b_hotspot_04cc680,
        df_imms_2022a_hotspot_04cc680,
        df_imms_2022b_hotspot_04cc680,
        df_imms_2023a_hotspot_04cc680,
        df_imms_2023b_hotspot_04cc680,
        df_imms_2024a_hotspot_04cc680,
        df_imms_2024b_hotspot_04cc680, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_04cc680, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_04cc680, 'Data Profile: IMMS Litter Hotspot - 04cc680')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_06fre099 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2022a_hotspot_06fre099 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2022b_hotspot_06fre099 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2023a_hotspot_06fre099 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2023b_hotspot_06fre099 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2024a_hotspot_06fre099 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "06-FRE-099")
]
df_imms_2024b_hotspot_06fre099 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "06-FRE-099") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_06fre099 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "06-FRE-099") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_06fre099, 'Data Profile: IMMS 2021b - 06fre099')
# data_profile(df_imms_2022a_hotspot_06fre099, 'Data Profile: IMMS 2022a - 06fre099')
# data_profile(df_imms_2022b_hotspot_06fre099, 'Data Profile: IMMS 2022b - 06fre099')
# data_profile(df_imms_2023a_hotspot_06fre099, 'Data Profile: IMMS 2023a - 06fre099')
# data_profile(df_imms_2023b_hotspot_06fre099, 'Data Profile: IMMS 2023b - 06fre099')
# data_profile(df_imms_2024a_hotspot_06fre099, 'Data Profile: IMMS 2024a - 06fre099')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_06fre099 = pd.concat(
    [
        df_imms_2021b_hotspot_06fre099,
        df_imms_2022a_hotspot_06fre099,
        df_imms_2022b_hotspot_06fre099,
        df_imms_2023a_hotspot_06fre099,
        df_imms_2023b_hotspot_06fre099,
        df_imms_2024a_hotspot_06fre099,
        df_imms_2024b_hotspot_06fre099, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_06fre099, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_06fre099, 'Data Profile: IMMS Litter Hotspot - 06fre099')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_06ker099 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2022a_hotspot_06ker099 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2022b_hotspot_06ker099 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2023a_hotspot_06ker099 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2023b_hotspot_06ker099 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2024a_hotspot_06ker099 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "06-KER-099")
]
df_imms_2024b_hotspot_06ker099 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "06-KER-099") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_06ker099 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "06-KER-099") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_06ker099, 'Data Profile: IMMS 2021b - 06ker099')
# data_profile(df_imms_2022a_hotspot_06ker099, 'Data Profile: IMMS 2022a - 06ker099')
# data_profile(df_imms_2022b_hotspot_06ker099, 'Data Profile: IMMS 2022b - 06ker099')
# data_profile(df_imms_2023a_hotspot_06ker099, 'Data Profile: IMMS 2023a - 06ker099')
# data_profile(df_imms_2023b_hotspot_06ker099, 'Data Profile: IMMS 2023b - 06ker099')
# data_profile(df_imms_2024a_hotspot_06ker099, 'Data Profile: IMMS 2024a - 06ker099')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_06ker099 = pd.concat(
    [
        df_imms_2021b_hotspot_06ker099,
        df_imms_2022a_hotspot_06ker099,
        df_imms_2022b_hotspot_06ker099,
        df_imms_2023a_hotspot_06ker099,
        df_imms_2023b_hotspot_06ker099,
        df_imms_2024a_hotspot_06ker099,
        df_imms_2024b_hotspot_06ker099, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_06ker099, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_06ker099, 'Data Profile: IMMS Litter Hotspot - 06ker099')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_07la005a = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2022a_hotspot_07la005a = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2022b_hotspot_07la005a = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2023a_hotspot_07la005a = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2023b_hotspot_07la005a = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2024a_hotspot_07la005a = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "07-LA-005A")
]
df_imms_2024b_hotspot_07la005a = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "07-LA-005A") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_07la005a = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "07-LA-005A") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_07la005a, 'Data Profile: IMMS 2021b - 07la005a')
# data_profile(df_imms_2022a_hotspot_07la005a, 'Data Profile: IMMS 2022a - 07la005a')
# data_profile(df_imms_2022b_hotspot_07la005a, 'Data Profile: IMMS 2022b - 07la005a')
# data_profile(df_imms_2023a_hotspot_07la005a, 'Data Profile: IMMS 2023a - 07la005a')
# data_profile(df_imms_2023b_hotspot_07la005a, 'Data Profile: IMMS 2023b - 07la005a')
# data_profile(df_imms_2024a_hotspot_07la005a, 'Data Profile: IMMS 2024a - 07la005a')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_07la005a = pd.concat(
    [
        df_imms_2021b_hotspot_07la005a,
        df_imms_2022a_hotspot_07la005a,
        df_imms_2022b_hotspot_07la005a,
        df_imms_2023a_hotspot_07la005a,
        df_imms_2023b_hotspot_07la005a,
        df_imms_2024a_hotspot_07la005a,
        df_imms_2024b_hotspot_07la005a, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_07la005a, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_07la005a, 'Data Profile: IMMS Litter Hotspot - 07la005a')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_07la010 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2022a_hotspot_07la010 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2022b_hotspot_07la010 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2023a_hotspot_07la010 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2023b_hotspot_07la010 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2024a_hotspot_07la010 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "07-LA-010")
]
df_imms_2024b_hotspot_07la010 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "07-LA-010") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_07la010 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "07-LA-010") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_07la010, 'Data Profile: IMMS 2021b - 07la010')
# data_profile(df_imms_2022a_hotspot_07la010, 'Data Profile: IMMS 2022a - 07la010')
# data_profile(df_imms_2022b_hotspot_07la010, 'Data Profile: IMMS 2022b - 07la010')
# data_profile(df_imms_2023a_hotspot_07la010, 'Data Profile: IMMS 2023a - 07la010')
# data_profile(df_imms_2023b_hotspot_07la010, 'Data Profile: IMMS 2023b - 07la010')
# data_profile(df_imms_2024a_hotspot_07la010, 'Data Profile: IMMS 2024a - 07la010')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_07la010 = pd.concat(
    [
        df_imms_2021b_hotspot_07la010,
        df_imms_2022a_hotspot_07la010,
        df_imms_2022b_hotspot_07la010,
        df_imms_2023a_hotspot_07la010,
        df_imms_2023b_hotspot_07la010,
        df_imms_2024a_hotspot_07la010,
        df_imms_2024b_hotspot_07la010, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_07la010, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_07la010, 'Data Profile: IMMS Litter Hotspot - 07la010')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_07la101 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2022a_hotspot_07la101 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2022b_hotspot_07la101 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2023a_hotspot_07la101 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2023b_hotspot_07la101 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2024a_hotspot_07la101 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "07-LA-101")
]
df_imms_2024b_hotspot_07la101 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "07-LA-101") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_07la101 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "07-LA-101") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_07la101, 'Data Profile: IMMS 2021b - 07la101')
# data_profile(df_imms_2022a_hotspot_07la101, 'Data Profile: IMMS 2022a - 07la101')
# data_profile(df_imms_2022b_hotspot_07la101, 'Data Profile: IMMS 2022b - 07la101')
# data_profile(df_imms_2023a_hotspot_07la101, 'Data Profile: IMMS 2023a - 07la101')
# data_profile(df_imms_2023b_hotspot_07la101, 'Data Profile: IMMS 2023b - 07la101')
# data_profile(df_imms_2024a_hotspot_07la101, 'Data Profile: IMMS 2024a - 07la101')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_07la101 = pd.concat(
    [
        df_imms_2021b_hotspot_07la101,
        df_imms_2022a_hotspot_07la101,
        df_imms_2022b_hotspot_07la101,
        df_imms_2023a_hotspot_07la101,
        df_imms_2023b_hotspot_07la101,
        df_imms_2024a_hotspot_07la101,
        df_imms_2024b_hotspot_07la101, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_07la101, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_07la101, 'Data Profile: IMMS Litter Hotspot - 07la101')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_07la110 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2022a_hotspot_07la110 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2022b_hotspot_07la110 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2023a_hotspot_07la110 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2023b_hotspot_07la110 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2024a_hotspot_07la110 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "07-LA-110")
]
df_imms_2024b_hotspot_07la110 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "07-LA-110") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_07la110 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "07-LA-110") # Added by NS 6/17/2025
]
# check filter results
# data_profile(df_imms_2021b_hotspot_07la110, 'Data Profile: IMMS 2021b - 07la110')
# data_profile(df_imms_2022a_hotspot_07la110, 'Data Profile: IMMS 2022a - 07la110')
# data_profile(df_imms_2022b_hotspot_07la110, 'Data Profile: IMMS 2022b - 07la110')
# data_profile(df_imms_2023a_hotspot_07la110, 'Data Profile: IMMS 2023a - 07la110')
# data_profile(df_imms_2023b_hotspot_07la110, 'Data Profile: IMMS 2023b - 07la110')
# data_profile(df_imms_2024a_hotspot_07la110, 'Data Profile: IMMS 2024a - 07la110')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_07la110 = pd.concat(
    [
        df_imms_2021b_hotspot_07la110,
        df_imms_2022a_hotspot_07la110,
        df_imms_2022b_hotspot_07la110,
        df_imms_2023a_hotspot_07la110,
        df_imms_2023b_hotspot_07la110,
        df_imms_2024a_hotspot_07la110,
        df_imms_2024b_hotspot_07la110, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_07la110, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_07la110, 'Data Profile: IMMS Litter Hotspot - 07la110')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_07la405 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2022a_hotspot_07la405 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2022b_hotspot_07la405 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2023a_hotspot_07la405 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2023b_hotspot_07la405 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2024a_hotspot_07la405 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "07-LA-405")
]
df_imms_2024b_hotspot_07la405 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "07-LA-405") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_07la405 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "07-LA-405") # Added by NS 6/17/2025
]
# check filter results
# data_profile(df_imms_2021b_hotspot_07la405, 'Data Profile: IMMS 2021b - 07la405')
# data_profile(df_imms_2022a_hotspot_07la405, 'Data Profile: IMMS 2022a - 07la405')
# data_profile(df_imms_2022b_hotspot_07la405, 'Data Profile: IMMS 2022b - 07la405')
# data_profile(df_imms_2023a_hotspot_07la405, 'Data Profile: IMMS 2023a - 07la405')
# data_profile(df_imms_2023b_hotspot_07la405, 'Data Profile: IMMS 2023b - 07la405')
# data_profile(df_imms_2024a_hotspot_07la405, 'Data Profile: IMMS 2024a - 07la405')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_07la405 = pd.concat(
    [
        df_imms_2021b_hotspot_07la405,
        df_imms_2022a_hotspot_07la405,
        df_imms_2022b_hotspot_07la405,
        df_imms_2023a_hotspot_07la405,
        df_imms_2023b_hotspot_07la405,
        df_imms_2024a_hotspot_07la405,
        df_imms_2024b_hotspot_07la405, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_07la405, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_07la405, 'Data Profile: IMMS Litter Hotspot - 07la405')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_08riv010 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2022a_hotspot_08riv010 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2022b_hotspot_08riv010 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2023a_hotspot_08riv010 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2023b_hotspot_08riv010 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2024a_hotspot_08riv010 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "08-RIV-010")
]
df_imms_2024b_hotspot_08riv010 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "08-RIV-010") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_08riv010 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "08-RIV-010") # Added by NS 6/17/2025
]


# check filter results
# data_profile(df_imms_2021b_hotspot_08riv010, 'Data Profile: IMMS 2021b - 08riv010')
# data_profile(df_imms_2022a_hotspot_08riv010, 'Data Profile: IMMS 2022a - 08riv010')
# data_profile(df_imms_2022b_hotspot_08riv010, 'Data Profile: IMMS 2022b - 08riv010')
# data_profile(df_imms_2023a_hotspot_08riv010, 'Data Profile: IMMS 2023a - 08riv010')
# data_profile(df_imms_2023b_hotspot_08riv010, 'Data Profile: IMMS 2023b - 08riv010')
# data_profile(df_imms_2024a_hotspot_08riv010, 'Data Profile: IMMS 2024a - 08riv010')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_08riv010 = pd.concat(
    [
        df_imms_2021b_hotspot_08riv010,
        df_imms_2022a_hotspot_08riv010,
        df_imms_2022b_hotspot_08riv010,
        df_imms_2023a_hotspot_08riv010,
        df_imms_2023b_hotspot_08riv010,
        df_imms_2024a_hotspot_08riv010,
        df_imms_2024b_hotspot_08riv010, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_08riv010, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_08riv010, 'Data Profile: IMMS Litter Hotspot - 08riv010')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_08riv060 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2022a_hotspot_08riv060 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2022b_hotspot_08riv060 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2023a_hotspot_08riv060 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2023b_hotspot_08riv060 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2024a_hotspot_08riv060 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "08-RIV-060")
]
df_imms_2024b_hotspot_08riv060 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "08-RIV-060") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_08riv060 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "08-RIV-060") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_08riv060, 'Data Profile: IMMS 2021b - 08riv060')
# data_profile(df_imms_2022a_hotspot_08riv060, 'Data Profile: IMMS 2022a - 08riv060')
# data_profile(df_imms_2022b_hotspot_08riv060, 'Data Profile: IMMS 2022b - 08riv060')
# data_profile(df_imms_2023a_hotspot_08riv060, 'Data Profile: IMMS 2023a - 08riv060')
# data_profile(df_imms_2023b_hotspot_08riv060, 'Data Profile: IMMS 2023b - 08riv060')
# data_profile(df_imms_2024a_hotspot_08riv060, 'Data Profile: IMMS 2024a - 08riv060')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_08riv060 = pd.concat(
    [
        df_imms_2021b_hotspot_08riv060,
        df_imms_2022a_hotspot_08riv060,
        df_imms_2022b_hotspot_08riv060,
        df_imms_2023a_hotspot_08riv060,
        df_imms_2023b_hotspot_08riv060,
        df_imms_2024a_hotspot_08riv060,
        df_imms_2024b_hotspot_08riv060, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_08riv060, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_08riv060, 'Data Profile: IMMS Litter Hotspot - 08riv060')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_10sj005 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2022a_hotspot_10sj005 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2022b_hotspot_10sj005 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2023a_hotspot_10sj005 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2023b_hotspot_10sj005 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2024a_hotspot_10sj005 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "10-SJ-005")
]
df_imms_2024b_hotspot_10sj005 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "10-SJ-005") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_10sj005 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "10-SJ-005") # Added by NS 6/17/2025
]


# check filter results
# data_profile(df_imms_2021b_hotspot_10sj005, 'Data Profile: IMMS 2021b - 10sj005')
# data_profile(df_imms_2022a_hotspot_10sj005, 'Data Profile: IMMS 2022a - 10sj005')
# data_profile(df_imms_2022b_hotspot_10sj005, 'Data Profile: IMMS 2022b - 10sj005')
# data_profile(df_imms_2023a_hotspot_10sj005, 'Data Profile: IMMS 2023a - 10sj005')
# data_profile(df_imms_2023b_hotspot_10sj005, 'Data Profile: IMMS 2023b - 10sj005')
# data_profile(df_imms_2024a_hotspot_10sj005, 'Data Profile: IMMS 2024a - 10sj005')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_10sj005 = pd.concat(
    [
        df_imms_2021b_hotspot_10sj005,
        df_imms_2022a_hotspot_10sj005,
        df_imms_2022b_hotspot_10sj005,
        df_imms_2023a_hotspot_10sj005,
        df_imms_2023b_hotspot_10sj005,
        df_imms_2024a_hotspot_10sj005,
        df_imms_2024b_hotspot_10sj005, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_10sj005, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_10sj005, 'Data Profile: IMMS Litter Hotspot - 10sj005')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_10sj099 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2022a_hotspot_10sj099 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2022b_hotspot_10sj099 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2023a_hotspot_10sj099 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2023b_hotspot_10sj099 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2024a_hotspot_10sj099 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "10-SJ-099")
]
df_imms_2024b_hotspot_10sj099 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "10-SJ-099") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_10sj099 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "10-SJ-099") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_10sj099, 'Data Profile: IMMS 2021b - 10sj099')
# data_profile(df_imms_2022a_hotspot_10sj099, 'Data Profile: IMMS 2022a - 10sj099')
# data_profile(df_imms_2022b_hotspot_10sj099, 'Data Profile: IMMS 2022b - 10sj099')
# data_profile(df_imms_2023a_hotspot_10sj099, 'Data Profile: IMMS 2023a - 10sj099')
# data_profile(df_imms_2023b_hotspot_10sj099, 'Data Profile: IMMS 2023b - 10sj099')
# data_profile(df_imms_2024a_hotspot_10sj099, 'Data Profile: IMMS 2024a - 10sj099')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_10sj099 = pd.concat(
    [
        df_imms_2021b_hotspot_10sj099,
        df_imms_2022a_hotspot_10sj099,
        df_imms_2022b_hotspot_10sj099,
        df_imms_2023a_hotspot_10sj099,
        df_imms_2023b_hotspot_10sj099,
        df_imms_2024a_hotspot_10sj099,
        df_imms_2024b_hotspot_10sj099, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_10sj099, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_10sj099, 'Data Profile: IMMS Litter Hotspot - 10sj099')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_11sd005 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2022a_hotspot_11sd005 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2022b_hotspot_11sd005 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2023a_hotspot_11sd005 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2023b_hotspot_11sd005 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2024a_hotspot_11sd005 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "11-SD-005")
]
df_imms_2024b_hotspot_11sd005 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "11-SD-005") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_11sd005 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "11-SD-005") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_11sd005, 'Data Profile: IMMS 2021b - 11sd005')
# data_profile(df_imms_2022a_hotspot_11sd005, 'Data Profile: IMMS 2022a - 11sd005')
# data_profile(df_imms_2022b_hotspot_11sd005, 'Data Profile: IMMS 2022b - 11sd005')
# data_profile(df_imms_2023a_hotspot_11sd005, 'Data Profile: IMMS 2023a - 11sd005')
# data_profile(df_imms_2023b_hotspot_11sd005, 'Data Profile: IMMS 2023b - 11sd005')
# data_profile(df_imms_2024a_hotspot_11sd005, 'Data Profile: IMMS 2024a - 11sd005')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_11sd005 = pd.concat(
    [
        df_imms_2021b_hotspot_11sd005,
        df_imms_2022a_hotspot_11sd005,
        df_imms_2022b_hotspot_11sd005,
        df_imms_2023a_hotspot_11sd005,
        df_imms_2023b_hotspot_11sd005,
        df_imms_2024a_hotspot_11sd005,
        df_imms_2024b_hotspot_11sd005, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_11sd005, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_11sd005, 'Data Profile: IMMS Litter Hotspot - 11sd005')

# filter litter hotspot corridors based on litter/los/csr data
# https://www.geeksforgeeks.org/filter-pandas-dataframe-with-multiple-conditions/
df_imms_2021b_hotspot_11sd805 = df_imms_2021b[
    (df_imms_2021b["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2022a_hotspot_11sd805 = df_imms_2022a[
    (df_imms_2022a["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2022b_hotspot_11sd805 = df_imms_2022b[
    (df_imms_2022b["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2023a_hotspot_11sd805 = df_imms_2023a[
    (df_imms_2023a["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2023b_hotspot_11sd805 = df_imms_2023b[
    (df_imms_2023b["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2024a_hotspot_11sd805 = df_imms_2024a[
    (df_imms_2024a["IMMS Unit ID"] == "11-SD-805")
]
df_imms_2024b_hotspot_11sd805 = df_imms_2024b[
    (df_imms_2024b["IMMS Unit ID"] == "11-SD-805") # Added by NS 6/17/2025
]
df_imms_2025a_hotspot_11sd805 = df_imms_2025a[
    (df_imms_2025a["IMMS Unit ID"] == "11-SD-805") # Added by NS 6/17/2025
]

# check filter results
# data_profile(df_imms_2021b_hotspot_11sd805, 'Data Profile: IMMS 2021b - 11sd805')
# data_profile(df_imms_2022a_hotspot_11sd805, 'Data Profile: IMMS 2022a - 11sd805')
# data_profile(df_imms_2022b_hotspot_11sd805, 'Data Profile: IMMS 2022b - 11sd805')
# data_profile(df_imms_2023a_hotspot_11sd805, 'Data Profile: IMMS 2023a - 11sd805')
# data_profile(df_imms_2023b_hotspot_11sd805, 'Data Profile: IMMS 2023b - 11sd805')
# data_profile(df_imms_2024a_hotspot_11sd805, 'Data Profile: IMMS 2024a - 11sd805')
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_11sd805 = pd.concat(
    [
        df_imms_2021b_hotspot_11sd805,
        df_imms_2022a_hotspot_11sd805,
        df_imms_2022b_hotspot_11sd805,
        df_imms_2023a_hotspot_11sd805,
        df_imms_2023b_hotspot_11sd805,
        df_imms_2024a_hotspot_11sd805,
        df_imms_2024b_hotspot_11sd805, # Added by NS 6/17/2025
        df_imms_2025a_hotspot_11sd805, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_11sd805, 'Data Profile: IMMS Litter Hotspot - 11sd805')

# create dataframe with all hotspots for analysis
# join dataframes together with vertical concat
# https://www.geeksforgeeks.org/how-to-concatenate-two-or-more-pandas-dataframes/
df_imms_hotspot_all = pd.concat(
    [
        df_imms_hotspot_03sac005,
        df_imms_hotspot_03sac050,
        df_imms_hotspot_03sac080,
        df_imms_hotspot_04ala580b,
        df_imms_hotspot_04ala680,
        df_imms_hotspot_04ala880,
        df_imms_hotspot_04cc004a,
        df_imms_hotspot_04cc680,
        df_imms_hotspot_06fre099,
        df_imms_hotspot_06ker099,
        df_imms_hotspot_07la005a,
        df_imms_hotspot_07la010,
        df_imms_hotspot_07la101,
        df_imms_hotspot_07la110,
        df_imms_hotspot_07la405,
        df_imms_hotspot_08riv010,
        df_imms_hotspot_08riv060,
        df_imms_hotspot_11sd005,
        df_imms_hotspot_11sd805,
    ],
    axis=0,
)
# data_profile(df_imms_hotspot_all, 'Data Profile: IMMS Litter Hotspot - Total')

# aggregate hotspots based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_hotspot_all_activity_count = (
    df_imms_hotspot_all.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)
# preview results
# data_profile(
#     df_imms_hotspot_all_activity_count,
#     'Data Profile: IMMS Litter Hotspot - Activity Count'
# )

# aggregate hotspots based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_hotspot_all_activity_sum = df_imms_hotspot_all[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_hotspot_all_activity_sum = (
    df_imms_hotspot_all_activity_sum.groupby(["IMMS Unit ID", "Activity Description"])
    .sum()
    .unstack("Activity Description")
)
# preview results
# data_profile(
#     df_imms_hotspot_all_activity_sum,
#     'Data Profile: IMMS Litter Hotspot - Activity Sum'
# )

# aggregate hotspots based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_hotspot_all_activity_cost = df_imms_hotspot_all[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_hotspot_all_activity_cost = (
    df_imms_hotspot_all_activity_cost.groupby(["IMMS Unit ID", "Activity Description"])
    .sum()
    .unstack("Activity Description")
)
# preview results
# data_profile(
#     df_imms_hotspot_all_activity_cost,
#     'Data Profile: IMMS Litter Hotspot - Total Cost'
# )

# aggregate hotspots based on dist/county/route and work activity (total labor)
# subset dataframe before aggregation/sum
df_imms_hotspot_all_activity_labor = df_imms_hotspot_all[
    ["IMMS Unit ID", "Activity Description", "P.Y.s"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_hotspot_all_activity_labor = (
    df_imms_hotspot_all_activity_labor.groupby(["IMMS Unit ID", "Activity Description"])
    .sum()
    .unstack("Activity Description")
)
# preview results
# data_profile(
#     df_imms_hotspot_all_activity_labor,
#     'Data Profile: IMMS Litter Hotspot - Total Labor'
# )

In [None]:
# 05.02.02 - data analysis for stacked charts (imms)

# concatenate all time periods to analyze frequency/CY by district
df_imms_all_periods = pd.concat(
    [
        df_imms_2021b,
        df_imms_2022a,
        df_imms_2022b,
        df_imms_2023a,
        df_imms_2023b,
        df_imms_2024a,
        df_imms_2024b, # Added by NS 6/17/2025
        df_imms_2025a, # Added by NS 6/17/2025
    ],
    axis=0,
)
# data_profile(
#     df_imms_all_periods,
#     'Data Profile: IMMS Litter Collection - All Periods'
# )
# subset dataframe by district
df_imms_all_periods_d1 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 1)
]
# data_profile(
#     df_imms_all_periods_d1,
#     'Data Profile: IMMS Litter Collection - All Periods (D1)'
# )
df_imms_all_periods_d2 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 2)
]
df_imms_all_periods_d3 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 3)
]
df_imms_all_periods_d4 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 4)
]
df_imms_all_periods_d5 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 5)
]
df_imms_all_periods_d6 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 6)
]
df_imms_all_periods_d7 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 7)
]
df_imms_all_periods_d8 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 8)
]
df_imms_all_periods_d9 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 9)
]
df_imms_all_periods_d10 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 10)
]
df_imms_all_periods_d11 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 11)
]
df_imms_all_periods_d12 = df_imms_all_periods[
    (df_imms_all_periods["Resp. District"] == 12)
]

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d1 = (
    df_imms_all_periods_d1.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d2 = (
    df_imms_all_periods_d2.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d3 = (
    df_imms_all_periods_d3.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d4 = (
    df_imms_all_periods_d4.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d5 = (
    df_imms_all_periods_d5.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d6 = (
    df_imms_all_periods_d6.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d7 = (
    df_imms_all_periods_d7.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d8 = (
    df_imms_all_periods_d8.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d9 = (
    df_imms_all_periods_d9.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d10 = (
    df_imms_all_periods_d10.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d11 = (
    df_imms_all_periods_d11.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)

# aggregate districts based on dist/county/route and work activity (frequency)
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_count_d12 = (
    df_imms_all_periods_d12.groupby(["IMMS Unit ID", "Activity Description"])
    .size()
    .unstack("Activity Description")
)





# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d1 = df_imms_all_periods_d1[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d1 = (
    df_imms_all_periods_activity_sum_d1.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d2 = df_imms_all_periods_d2[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d2 = (
    df_imms_all_periods_activity_sum_d2.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d3 = df_imms_all_periods_d3[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d3 = (
    df_imms_all_periods_activity_sum_d3.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d4 = df_imms_all_periods_d4[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d4 = (
    df_imms_all_periods_activity_sum_d4.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d5 = df_imms_all_periods_d5[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d5 = (
    df_imms_all_periods_activity_sum_d5.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d6 = df_imms_all_periods_d6[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d6 = (
    df_imms_all_periods_activity_sum_d6.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d7 = df_imms_all_periods_d7[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d7 = (
    df_imms_all_periods_activity_sum_d7.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d8 = df_imms_all_periods_d8[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d8 = (
    df_imms_all_periods_activity_sum_d8.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d9 = df_imms_all_periods_d9[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d9 = (
    df_imms_all_periods_activity_sum_d9.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d10 = df_imms_all_periods_d10[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d10 = (
    df_imms_all_periods_activity_sum_d10.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d11 = df_imms_all_periods_d11[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d11 = (
    df_imms_all_periods_activity_sum_d11.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (litter totals)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_sum_d12 = df_imms_all_periods_d12[
    ["IMMS Unit ID", "Activity Description", "Production Quantity"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_sum_d12 = (
    df_imms_all_periods_activity_sum_d12.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)






# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d1 = df_imms_all_periods_d1[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d1 = (
    df_imms_all_periods_activity_cost_d1.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d2 = df_imms_all_periods_d2[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d2 = (
    df_imms_all_periods_activity_cost_d2.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d3 = df_imms_all_periods_d3[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d3 = (
    df_imms_all_periods_activity_cost_d3.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d4 = df_imms_all_periods_d4[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d4 = (
    df_imms_all_periods_activity_cost_d4.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d5 = df_imms_all_periods_d5[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d5 = (
    df_imms_all_periods_activity_cost_d5.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d6 = df_imms_all_periods_d6[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d6 = (
    df_imms_all_periods_activity_cost_d6.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d7 = df_imms_all_periods_d7[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d7 = (
    df_imms_all_periods_activity_cost_d7.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d8 = df_imms_all_periods_d8[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d8 = (
    df_imms_all_periods_activity_cost_d8.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d9 = df_imms_all_periods_d9[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d9 = (
    df_imms_all_periods_activity_cost_d9.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d10 = df_imms_all_periods_d10[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d10 = (
    df_imms_all_periods_activity_cost_d10.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d11 = df_imms_all_periods_d11[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d11 = (
    df_imms_all_periods_activity_cost_d11.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# aggregate district based on dist/county/route and work activity (total cost)
# subset dataframe before aggregation/sum
df_imms_all_periods_activity_cost_d12 = df_imms_all_periods_d12[
    ["IMMS Unit ID", "Activity Description", "Total Cost"]
]
# https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
df_imms_all_periods_activity_cost_d12 = (
    df_imms_all_periods_activity_cost_d12.groupby(
        ["IMMS Unit ID", "Activity Description"]
    )
    .sum()
    .unstack("Activity Description")
)

# preview results
data_profile(
    df_imms_all_periods_activity_cost_d1,
    "Data Profile: IMMS Litter Collection - Activity Sum (D1)",
)
data_profile(
    df_imms_all_periods_activity_cost_d2,
    "Data Profile: IMMS Litter Collection - Activity Sum (D2)",
)
data_profile(
    df_imms_all_periods_activity_cost_d3,
    "Data Profile: IMMS Litter Collection - Activity Sum (D3)",
)
data_profile(
    df_imms_all_periods_activity_cost_d4,
    "Data Profile: IMMS Litter Collection - Activity Sum (D4)",
)
data_profile(
    df_imms_all_periods_activity_cost_d5,
    "Data Profile: IMMS Litter Collection - Activity Sum (D5)",
)
data_profile(
    df_imms_all_periods_activity_cost_d6,
    "Data Profile: IMMS Litter Collection - Activity Sum (D6)",
)
data_profile(
    df_imms_all_periods_activity_cost_d7,
    "Data Profile: IMMS Litter Collection - Activity Sum (D7)",
)
data_profile(
    df_imms_all_periods_activity_cost_d8,
    "Data Profile: IMMS Litter Collection - Activity Sum (D8)",
)
data_profile(
    df_imms_all_periods_activity_cost_d9,
    "Data Profile: IMMS Litter Collection - Activity Sum (D9)",
)
data_profile(
    df_imms_all_periods_activity_cost_d10,
    "Data Profile: IMMS Litter Collection - Activity Sum (D10)",
)
data_profile(
    df_imms_all_periods_activity_cost_d11,
    "Data Profile: IMMS Litter Collection - Activity Sum (D11)",
)
data_profile(
    df_imms_all_periods_activity_cost_d12,
    "Data Profile: IMMS Litter Collection - Activity Sum (D12)",
)

# Do I need to update this section? (CSRs)

In [None]:
# # 05.02.03 - data analysis (csr)

# # format data for plotting



# # NS, this next section was used to replace the previous section to get past an error
# df_csr_2023a_bar_d1 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 1
# ]
# df_csr_2023a_bar_d2 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 2
# ]
# df_csr_2023a_bar_d3 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 3
# ]
# df_csr_2023a_bar_d4 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 4
# ]
# df_csr_2023a_bar_d5 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 5
# ]
# df_csr_2023a_bar_d6 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 6
# ]
# df_csr_2023a_bar_d7 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 7
# ]
# df_csr_2023a_bar_d8 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 8
# ]
# df_csr_2023a_bar_d9 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 9
# ]
# df_csr_2023a_bar_d10 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 10
# ]
# df_csr_2023a_bar_d11 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 11
# ]
# df_csr_2023a_bar_d12 = df_csr_2023a[
#     df_csr_2023a["Responsible District"].astype(int) == 12
# ]







# # concatenate all time periods for analysis
# df_csr_all_periods = pd.concat(
#     [
#         df_csr_2021b,
#         df_csr_2022a,
#         df_csr_2022b,
#         df_csr_2023a,
#         df_csr_2023b,
#         df_csr_2024a,
#     ],
#     axis=0,
# )
# data_profile(df_csr_all_periods, "Data Profile: CSR - All Periods")

# # subset all periods by district
# df_csr_all_periods_d1 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 1)
# ]
# df_csr_all_periods_d2 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 2)
# ]
# df_csr_all_periods_d3 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 3)
# ]
# df_csr_all_periods_d4 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 4)
# ]
# df_csr_all_periods_d5 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 5)
# ]
# df_csr_all_periods_d6 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 6)
# ]
# df_csr_all_periods_d7 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 7)
# ]
# df_csr_all_periods_d8 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 8)
# ]
# df_csr_all_periods_d9 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 9)
# ]
# df_csr_all_periods_d10 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 10)
# ]
# df_csr_all_periods_d11 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 11)
# ]
# df_csr_all_periods_d12 = df_csr_all_periods[
#     (df_csr_all_periods["Responsible District"] == 12)
# ]

# # aggregate total count based on dist/county/route
# # https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby
# df_csr_all_periods_route_count_d1 = (
#     df_csr_all_periods_d1.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d2 = (
#     df_csr_all_periods_d2.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d3 = (
#     df_csr_all_periods_d3.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d4 = (
#     df_csr_all_periods_d4.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d5 = (
#     df_csr_all_periods_d5.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d6 = (
#     df_csr_all_periods_d6.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d7 = (
#     df_csr_all_periods_d7.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d8 = (
#     df_csr_all_periods_d8.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d9 = (
#     df_csr_all_periods_d9.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d10 = (
#     df_csr_all_periods_d10.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d11 = (
#     df_csr_all_periods_d11.groupby(["County", "Route"]).size().unstack("Route")
# )
# df_csr_all_periods_route_count_d12 = (
#     df_csr_all_periods_d12.groupby(["County", "Route"]).size().unstack("Route")
# )

# # preview results
# data_profile(
#     df_csr_all_periods_route_count_d1, "Data Profile: CSR - Count by Route (D1)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d2, "Data Profile: CSR - Count by Route (D2)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d3, "Data Profile: CSR - Count by Route (D3)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d4, "Data Profile: CSR - Count by Route (D4)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d5, "Data Profile: CSR - Count by Route (D5)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d6, "Data Profile: CSR - Count by Route (D6)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d7, "Data Profile: CSR - Count by Route (D7)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d8, "Data Profile: CSR - Count by Route (D8)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d9, "Data Profile: CSR - Count by Route (D9)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d10, "Data Profile: CSR - Count by Route (D10)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d11, "Data Profile: CSR - Count by Route (D11)"
# )
# data_profile(
#     df_csr_all_periods_route_count_d12, "Data Profile: CSR - Count by Route (D12)"
# )

In [None]:
# # 05.07.01 - plot work activty counts by hotspot corridors

# # note: dim convention = x, y
# sns.set_theme(rc={"figure.figsize": (20, 10)})
# # create stacked bar chart based on work activity count
# # https://stackoverflow.com/questions/67805611/stacked-bar-plot-in-seaborn-with-groups
# df_imms_hotspot_all_activity_count.plot.bar(stacked=True)

In [None]:
# # 05.07.13 - plot work activity counts by litter hotspot

# # note: dim convention = x, y
# sns.set_theme(rc={"figure.figsize": (20, 10)})
# # create stacked bar chart based on work activity count
# # https://stackoverflow.com/questions/67805611/stacked-bar-plot-in-seaborn-with-groups
# df_imms_hotspot_all_activity_count.plot.bar(stacked=True)

In [None]:
# 05.08.13 - plot work activity totals by hotspot corridors

# note: dim convention = x, y
sns.set_theme(rc={"figure.figsize": (20, 10)})
# create stacked bar chart based on work activity sum
# https://stackoverflow.com/questions/67805611/stacked-bar-plot-in-seaborn-with-groups
df_imms_hotspot_all_activity_sum.plot.bar(stacked=True)

In [None]:
# 05.09.13 - plot work activity cost by hotspot corridors

# note: dim convention = x, y
sns.set_theme(rc={"figure.figsize": (20, 10)})
# create stacked bar chart based on work activity cost
# https://stackoverflow.com/questions/67805611/stacked-bar-plot-in-seaborn-with-groups
df_imms_hotspot_all_activity_cost.plot.bar(stacked=True)

In [None]:
# 05.09.14 - plot work total labor by hotspot corridors

# note: dim convention = x, y
sns.set_theme(rc={"figure.figsize": (20, 10)})
# create stacked bar chart based on work activity cost
# https://stackoverflow.com/questions/67805611/stacked-bar-plot-in-seaborn-with-groups
df_imms_hotspot_all_activity_labor.plot.bar(stacked=True)

In [None]:
def export_visualizations_to_pdf(df1, df2, df3, filename="work_activity_visuals.pdf"):
    """
    Create three stacked bar plots from input DataFrames and save them into a single PDF.

    Parameters:
        df1 (pd.DataFrame): Data for total work activity
        df2 (pd.DataFrame): Data for labor totals
        df3 (pd.DataFrame): Data for cost breakdown
        filename (str): Output PDF filename
    """
    sns.set_theme(rc={"figure.figsize": (20, 10)})

    with PdfPages(filename) as pdf:
        # Visualization 1
        fig1, ax1 = plt.subplots()
        df1.plot.bar(stacked=True, ax=ax1)
        ax1.set_title("Total Work Activity by Hotspot Corridors")
        ax1.set_ylabel("Work Activity Total")
        ax1.set_xlabel("Hotspot Corridors")
        pdf.savefig(fig1)
        plt.close(fig1)

        # Visualization 2
        fig2, ax2 = plt.subplots()
        df2.plot.bar(stacked=True, ax=ax2)
        ax2.set_title("Labor Totals by Hotspot Corridors")
        ax2.set_ylabel("Total Labor Cost")
        ax2.set_xlabel("Hotspot Corridors")
        pdf.savefig(fig2)
        plt.close(fig2)

        # Visualization 3
        fig3, ax3 = plt.subplots()
        df3.plot.bar(stacked=True, ax=ax3)
        ax3.set_title("Work Activity Cost by Hotspot Corridors")
        ax3.set_ylabel("Activity Cost")
        ax3.set_xlabel("Hotspot Corridors")
        pdf.savefig(fig3)
        plt.close(fig3)

    print(f"PDF saved successfully as: {filename}")

In [None]:
export_visualizations_to_pdf(
    df_imms_hotspot_all_activity_sum,
    df_imms_hotspot_all_activity_labor,
    df_imms_hotspot_all_activity_cost,
    filename="hotspot_work_summary.pdf",
)