# Environmental Pivot
* A notebook for the Division of Maintenance/Environmental Analysis to monitor litter volumes & costs

* The purpose of this notebook is to concatenate IMMS LEMO Data (multiple datasets, each dataset a different Fiscal Year) & Manually entered data (one dataset containing multiple years)

## Data

### IMMS LEMO
IMMS Detailed LEMO Download Instructions
* Download updated IMMS raw data by navigating to MOMS through Enterprise Web Applications:  
    * Welcome to MOMS | Maintenance & Operations Management Solution (MOMS)  
* Click on IMMS -> Oracle Business Intelligence Reports-> IMMS Reports  
* Navigate to the work management tab -> LEMO tab  
* Select your district  
* Copy the trash related activities listed below and insert into the Activity field box.  
    * D40051;D40151;D40050;D40150;D30051;D30050;D41050;D41051;D42051;D42050;D44050;D44051;D43051;D45050;C50010;C50150;C60010;C60050;C60220;F20020;F20050;F70020;F70050;F70110  
* Enter the latest work order dates for relevant FY (that is not already entered into the workbook)  
* Under select view -> Click on District HR-PU-LEMO Detailed  
* Export/Download to .CSV file  
* Open the newly exported .CSV file  

#### Activity Descriptions
* Family C  
    * C50010 - Repair/Replace Ditch/Channel  
    * C50150 - Clean Ditch/Channel  
    * C60010 - Repair/Replace Drainage  
    * C60220 - Drainage Inspection  
* Family D  
    * D30050 - Sweep Highway/Shoulder  
    * D30051 - Clean California Sweeping  
    * D40050 - Litter Control  
    * D40051 - Clean California Litter Control  
    * D40150 - Road Patrol/Debris Pickup  
    * D40151 - Clean California Road Patrol/Debris Pickup  
    * D41050 - Adopt-A-Highway Litter Control  
    * D41051 - Clean California Adopt-A-Highway Litter Control  
    * D42050 - Unsheltered Encampment - Cleaning and Removal  
    * D42051 - Clean California Encampment Litter/Debris Removal  
    * D43051 - Clean California Dump Days  
    * D44050 - Special Programs People (SPP) Litter Control  
    * D44051 - Clean California Special Programs People (SPP) Litter Control  
    * D45050 - Illegal Dumping Debris Removal  
* Family F  
    * F20020 - Drain Inlet Inspection  
    * F20050 - Drain Cleaning  
    * F70020 - Treatment DMP Inspection  
    * F70050 - Clean/MOW Treatment BMP  
    * F70110 - Repair of Treatment DMP  
  
  
### Manually Entered Data  
SPP (Olivia Liggins), Encampments (I forgot the gentlemen's name)  
  
#### Activity Description  
* Special Programs  
    * Haz Mat Encampment Contract  
    * SPP (SHA & CC) Contract  



In [1]:
# 01.00 import modules
import pandas as pd
import os
import gcsfs
import re
import warnings

In [2]:
# 02.00 Identify the path to the Data
gcs_path_lemo = (
    "gs://calitp-analytics-data/data-analyses/big_data/environmental_pivot/1_imms_lemo/"
)
gcs_path_supplemental_data = (
    "gs://calitp-analytics-data/data-analyses/big_data/environmental_pivot/2_supplemental_data/"
)


# 02.01 Identify the file names
imms_lemo_file_names = ["lemo_fy18_final.csv",
              "lemo_fy19_final.csv",
              "lemo_fy20_final.csv",
              "lemo_fy21_final.csv",
              "lemo_fy22_final.csv",
              "lemo_fy23_final.csv",
              "lemo_fy24_final.csv",
              "lemo_fy25_final.csv",
              "lemo_fy26_20250826.csv"] # This dataset name is typically updated when you download new data and want it included in the analysis

supplemental_data_file_names = ["encampments.csv",
                                "spp_crew_expenditures.csv"]



In [3]:
def clean_column_names(columns):
    """
    Clean column names by lowercasing, removing spaces and punctuation (e.g., '.').
    """
    cleaned = []
    for col in columns:
        col = col.lower()                # make lowercase
        col = re.sub(r'[^\w]', '', col)  # remove punctuation and spaces
        cleaned.append(col)
    return cleaned

def load_and_concat_csvs_from_gcs(gcs_path, file_names):
    """
    Load multiple CSV files from a GCS path, clean column names, and concatenate into a single DataFrame.

    Args:
        gcs_path (str): The base GCS path (e.g., "gs://bucket/folder")
        file_names (list of str): List of CSV file names to load

    Returns:
        pd.DataFrame: A single concatenated DataFrame with cleaned columns
    """
    fs = gcsfs.GCSFileSystem()
    dataframes = []

    for file_name in file_names:
        full_path = f"{gcs_path.rstrip('/')}/{file_name}"
        try:
            df = pd.read_csv(fs.open(full_path, 'rb'))

            # Clean column names
            df.columns = clean_column_names(df.columns)

            # # Optional: track the source file
            # df['sourcefile'] = file_name

            dataframes.append(df)
            print(f"Loaded and cleaned: {file_name}")
        except Exception as e:
            print(f"Error loading {file_name}: {e}")

    if dataframes:
        combined_df = pd.concat(dataframes, ignore_index=True)
        return combined_df
    else:
        print("No files were loaded.")
        return pd.DataFrame()

In [4]:
# This line will load all of the CSV files that are identified in the "imms_lemo_file_names" list
df_lemo = load_and_concat_csvs_from_gcs(gcs_path_lemo, imms_lemo_file_names)

  df = pd.read_csv(fs.open(full_path, 'rb'))


Loaded and cleaned: lemo_fy18_final.csv
Loaded and cleaned: lemo_fy19_final.csv
Loaded and cleaned: lemo_fy20_final.csv
Loaded and cleaned: lemo_fy21_final.csv
Loaded and cleaned: lemo_fy22_final.csv
Loaded and cleaned: lemo_fy23_final.csv
Loaded and cleaned: lemo_fy24_final.csv
Loaded and cleaned: lemo_fy25_final.csv
Error loading lemo_fy26_20250826.csv: b/calitp-analytics-data/o/data-analyses%2Fbig_data%2Fenvironmental_pivot%2F1_imms_lemo%2Flemo_fy26_20250826.csv


In [5]:
# This line will load all of the CSV files that are identified in the "supplemental_data_file_names" list
df_sup = load_and_concat_csvs_from_gcs(gcs_path_supplemental_data, supplemental_data_file_names)

Loaded and cleaned: encampments.csv
Loaded and cleaned: spp_crew_expenditures.csv


In [6]:
# update the name of the 'resp. district' column to 'district' to match the Supplemental dataset
df_lemo = df_lemo.rename(columns={'respdistrict': 'district'})

In [7]:
# Create a subset
df_sup = df_sup[['district', 'fiscal_year', 'activity', 'activitydescription', 'totalcost']]

In [8]:
# update the name of the 'fiscal_year' column to 'fiscalyear' to match the LEMO dataset
df_sup = df_sup.rename(columns={'fiscal_year': 'fiscalyear'})

In [9]:
# LEMO has all columns, and the Supplemental dataset only has a few
df_combined = pd.concat([df_lemo, df_sup], ignore_index=True)

In [10]:
# the output_path is the name of the .csv file (include the .csv)
file_path = "environmental_litter_pivot.csv"

# # Save to CSV (create folder 'output' if needed)
# df_combined.to_csv(file_path, index=False)

In [11]:
# 02.02 Identify the path to the output data
gcs_output_folder = "gs://calitp-analytics-data/data-analyses/big_data/environmental_pivot/3_python_output"

In [12]:
df_combined.to_csv(f"{gcs_output_folder}/{file_path}", index=False)

In [13]:
# current shape (520878, 35)
df_combined.shape

(1111785, 35)