# **Predicting Seattle Resident's Customer Requests**

The City of Seattle has collected extensive data on resident service requests through its customer service portals over many years. These requests cover a wide range of issues, from pothole repairs to unauthorized encampments, providing a valuable opportunity to understand public service demand patterns over time.

In this project, we focus on forecasting the number of service requests expected over the next three months (May, June, and July 2025). Our goal is to build predictive models that can accurately estimate future service volumes for each Service Request Type, as well as extend to predictions at the Department level and for specific ZIP Code and Service Request Type combinations.

To achieve this, we preprocess the data by aggregating historical service requests at a monthly level. We engineer time series features such as Lag (previous month's request counts) and Rolling Statistics (rolling mean and standard deviation) to give our models "memory" of past trends and fluctuations. These features help capture seasonality, stability, and short-term changes in service request volumes.

We evaluate four models: Linear Regression, Random Forest, LightGBM, and XGBoost. Each model is globally tuned before applying 5-fold time-based cross-validation to fairly compare their performance across service types. Final model selection for each service is based on minimizing the Mean Absolute Percentage Error (MAPE) to ensure robust forecasting even across diverse and unpredictable service categories.

Our final predictions aim to equip city agencies with actionable insights, allowing them to allocate resources more effectively, anticipate resident needs, and plan operational efforts during the upcoming months.

### Install All Relevant Dependencies

In [332]:
!pip install scikit-learn
!pip install lightgbm xgboost




[notice] A new release of pip is available: 25.0.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Import All Relevant Libraries

In [333]:
# General
import numpy as np
import pandas as pd

# Models
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor
from xgboost import XGBRegressor

# Cross-Validation
from sklearn.model_selection import KFold
import time

# Model Tuning
from sklearn.model_selection import RandomizedSearchCV, KFold
from scipy.stats import randint, uniform


In [334]:
df = pd.read_csv("Customer_Service_Requests_20250426.csv", low_memory=False)
df.sample(5)

Unnamed: 0,Service Request Number,Service Request Type,City Department,Created Date,Method Received,Status,Location,X_Value,Y_Value,Latitude,Longitude,Latitude/Longitude,ZIP Code,Council District,Police Precinct,Neighborhood
792864,24-00153441,Illegal Dumping / Needles,SPU-Seattle Public Utilities,06/02/2024 08:23:57 PM,Find It Fix It Apps,Closed,"1000 E DENNY WAY, SEATTLE, WA 98122",1273956.0,229293.304329,47.618898,-122.31932,POINT (-122.31932017 47.61889806),98122,3.0,EAST,BROADWAY
508623,23-00114836,Abandoned Vehicle/72hr Parking Ordinance,SPD-Seattle Police Department,05/15/2023 05:23:19 PM,Find It Fix It Apps,Closed,"4513 EVANSTON AVE N, SEATTLE, WA 98103",1266280.0,245099.249362,47.661811,-122.351706,POINT (-122.35170565 47.66181082),98103,6.0,NORTH,FREMONT
146440,21-00254411,Parking Enforcement,SPD-Seattle Police Department,11/23/2021 09:12:25 AM,Find It Fix It Apps,Closed,"326 N 82ND ST, SEATTLE, WA 98103",1265905.0,254873.266213,47.688581,-122.354009,POINT (-122.35400868 47.68858064),98103,6.0,NORTH,GREENWOOD
553345,23-00169131,Illegal Dumping / Needles,SPU-Seattle Public Utilities,07/13/2023 07:11:30 AM,Find It Fix It Apps,Closed,"2636 1ST AVE N, SEATTLE, WA 98109",1265257.0,238749.686654,47.644352,-122.355344,POINT (-122.35534413 47.64435161),98109,7.0,WEST,NORTH QUEEN ANNE
203597,22-00056196,Pothole,SDOT-Seattle Department of Transportation,03/06/2022 04:25:21 PM,Citizen Web Intake App,Reported,"1220 10TH AVE E, SEATTLE, WA 98102",1273916.0,233758.370787,47.631135,-122.319832,POINT (-122.31983209 47.63113459),98102,3.0,EAST,BROADWAY


### Pinpoint Null Values and Remove Them

In [335]:
max_na = df.isna().sum().max()
total_count = df.shape[0]
percent_missing = (max_na / total_count) * 100

print(f"There are a total of {total_count} observations with {max_na} observations that has at least one feature with missing data.", end="\n")
print(f"{percent_missing:.2f}% of the data would be removed if we were to account for all features with missing data.")

There are a total of 1077316 observations with 43129 observations that has at least one feature with missing data.
4.00% of the data would be removed if we were to account for all features with missing data.


In [336]:
df.isna().sum()

Service Request Number        0
Service Request Type          0
City Department               0
Created Date                  0
Method Received               0
Status                        0
Location                  17216
X_Value                       0
Y_Value                       0
Latitude                  24161
Longitude                 24161
Latitude/Longitude        24161
ZIP Code                  43129
Council District          34316
Police Precinct           32785
Neighborhood              34373
dtype: int64

In [337]:
df = df.dropna()

### Source and Create Relevant Variables

In [338]:
df['Created Date'] = pd.to_datetime(df['Created Date'])

df['Year'] = df['Created Date'].dt.year
df['Month'] = df['Created Date'].dt.month

df.sample(5)

Unnamed: 0,Service Request Number,Service Request Type,City Department,Created Date,Method Received,Status,Location,X_Value,Y_Value,Latitude,Longitude,Latitude/Longitude,ZIP Code,Council District,Police Precinct,Neighborhood,Year,Month
514361,23-00122150,Unauthorized Encampment,SEA-City of Seattle,2023-05-23 09:44:47,Find It Fix It Apps,Closed,"2011 RAINIER AVE S, SEATTLE, WA 98144",1277575.0,216745.001169,47.584694,-122.303679,POINT (-122.30367936 47.58469388),98144,2.0,SOUTH,NORTH BEACON HILL,2023,5
725315,24-00056920,Unauthorized Encampment,SEA-City of Seattle,2024-03-07 17:46:28,Find It Fix It Apps,Closed,"920 STURGUS AVE S, SEATTLE, WA 98144",1274902.0,220397.029935,47.594564,-122.314792,POINT (-122.31479215 47.59456353),98144,2.0,SOUTH,NORTH BEACON HILL,2024,3
509762,23-00116252,Overgrown Vegetation,FAS-Finance and Administrative Services,2023-05-17 04:54:35,Find It Fix It Apps,Closed,"1601 SW AUSTIN ST, SEATTLE, WA 98106",1264430.0,198590.790226,47.534233,-122.355479,POINT (-122.35547878 47.53423305),98106,1.0,SOUTHWEST,SOUTH DELRIDGE,2023,5
1030962,25-00064223,Unauthorized Encampment,SEA-City of Seattle,2025-03-06 16:18:07,Find It Fix It Apps,Closed,"1820 SW HENDERSON ST, SEATTLE, WA 98106",1263578.0,194518.44038,47.523025,-122.358598,POINT (-122.35859781 47.52302472),98106,1.0,SOUTHWEST,SOUTH DELRIDGE,2025,3
774939,24-00117163,Street Sign Maintenance,SDOT-Seattle Department of Transportation,2024-05-11 10:34:05,Find It Fix It Apps,Reported,"1213 2ND AVE, SEATTLE, WA 98101",1269511.0,224924.238743,47.606686,-122.336996,POINT (-122.33699634 47.60668585),98101,7.0,WEST,CENTRAL BUSINESS DISTRICT,2024,5


## Predicting Total Service Requests in the next 3 Months: Service Request Type

### Preprocessing

In [339]:
Service_Type = df.groupby(['Service Request Type', 'Year', 'Month'])['Service Request Number'].count().reset_index()
Service_Type.rename(columns={'Service Request Number': 'Request Count'}, inplace=True)

Service_Type.sample(10)

Unnamed: 0,Service Request Type,Year,Month,Request Count
338,Dead Animal,2022,5,124
1082,Parks and Recreation Maintenance,2021,8,349
1495,Streetlight Maintenance,2022,3,326
1635,Traffic Calming,2021,3,5
879,Internet/Cable Issue,2022,2,35
76,Abandoned Vehicle/72hr Parking Ordinance,2023,3,4715
1488,Streetlight Maintenance,2021,8,257
1195,Pothole,2022,7,460
680,General Inquiry - Public Utilities,2022,11,131
230,Clogged Storm Drain,2022,1,226


### Lag and Rolling

In time series forecasting, what happens in the past will often impact what happens in the future.

**LAG**: This serves as a value from a previous time step that helps the model remember recent trends.

**ROLLING**: This serves to help smooth out the statistics by suppressing sudden ruptures of noises in order reveal the general trend.

In [340]:
Service_Type = Service_Type.sort_values(['Year', 'Month', 'Service Request Type']).reset_index(drop=True)

Service_Type['lag'] = Service_Type['Request Count'].shift(1)

Service_Type['Rolling_Mean'] = Service_Type['Request Count'].rolling(window=3, min_periods=1).mean().reset_index(0, drop=True)
Service_Type['Rolling_Std'] = Service_Type['Request Count'].rolling(window=3, min_periods=1).std().reset_index(0, drop=True)

Service_Type = Service_Type.fillna(0)
Service_Type['lag'] = Service_Type['lag'].astype(int)

Service_Type.sample(10)

Unnamed: 0,Service Request Type,Year,Month,Request Count,lag,Rolling_Mean,Rolling_Std
1556,Public Garage or Parking Lot Complaint,2024,11,5,762,258.666667,435.904041
461,General Inquiry - City Light,2022,4,309,101,165.666667,124.327524
182,Animal Noise,2021,7,63,1,22.666667,34.961884
1228,Towing Complaint - Public Impound,2024,2,6,329,247.0,212.233362
1317,General Inquiry - Animal Shelter,2024,5,62,29,32.0,28.618176
1334,Public Garage or Parking Lot Complaint,2024,5,5,848,285.333333,487.284653
926,Graffiti,2023,6,1855,660,895.0,866.732369
867,Streetlight Maintenance,2023,4,195,785,467.666667,297.525349
946,ADA Request (Transportation),2023,7,5,3599,1311.0,1988.077463
966,Overgrown Vegetation,2023,7,870,49,317.666667,478.393492


### Global Tuning

We will tune hyperparameters on the following candidate models:
- Random Forest
- LightGBM
- XGBoost

Tuning will prepare us to run these models at the best setting during cross-validation.



In [341]:
def tune_model(model, param_dist, X_sample, y_sample, n_iter=10):
    """Tunes a model using RandomizedSearchCV and returns the best estimator."""
    random_search = RandomizedSearchCV(
        model,
        param_distributions=param_dist,
        n_iter=n_iter,
        cv=3,
        scoring='neg_mean_absolute_error',
        random_state=42,
        n_jobs=-1
    )
    random_search.fit(X_sample, y_sample)
    return random_search.best_params_

In [342]:
rf_params = {
    'n_estimators': [100, 200, 300],
    'max_depth': [5, 10, 20, None],
    'min_samples_split': [2, 5, 10]
}

lgbm_params = {
    'num_leaves': [20, 31, 40],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 300]
}

xgb_params = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200, 300]
}

In [343]:
service_data = Service_Type.drop(columns=['Service Request Type'])
X = service_data.drop(columns=['Request Count'])
y = service_data['Request Count']
X_sample = X.sample(frac=0.1, random_state=42)
y_sample = y.loc[X_sample.index]

In [344]:
best_rf_params = tune_model(RandomForestRegressor(), rf_params, X_sample, y_sample)

best_lgbm_params = tune_model(LGBMRegressor(verbosity=-1), lgbm_params, X_sample, y_sample)

best_xgb_params = tune_model(XGBRegressor(verbosity=0), xgb_params, X_sample, y_sample)

In [345]:
best_params = {
    'Linear Regression': {},
    'Random Forest': best_rf_params,
    'LightGBM': best_lgbm_params,
    'XGBoost': best_xgb_params
}

print("Best Random Forest Parameters:", best_rf_params)
print("Best LightGBM Parameters:", best_lgbm_params)
print("Best XGBoost Parameters:", best_xgb_params)

Best Random Forest Parameters: {'n_estimators': 100, 'min_samples_split': 5, 'max_depth': None}
Best LightGBM Parameters: {'num_leaves': 20, 'n_estimators': 300, 'learning_rate': 0.1}
Best XGBoost Parameters: {'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.2}


### Model Candidate Competition through Cross-Validation
The four candidate models:
- Linear Regression
- Random Forest
- LightGBM
- XGBoost

We will be evaluating their performances for each service request types based on the overall
average of their magnitude of error (Mean Absolute Error), the average of their magnitude of error
in percentage relative to their proportion of errors from the total (Mean Absolute Percentage Error),
and the total requests reported from that service type (Total Requests). Latency will help us determine
the speed to which the model is training and predicting (not as important but still a useful insight).
We will be assigning the best model with the least MAPE (Mean Absolute Percentage Error) to each service request
type.

In [346]:
service_types = Service_Type['Service Request Type'].unique()

model_df = pd.DataFrame(service_types, columns=['Service Request Type'])
model_df['Model'] = None
model_df['Mean Absolute Error'] = None
model_df['Mean Absolute Percentage Error'] = None
model_df['Total Requests'] = None
model_df['Latency'] = None

model_df.sample(10)

Unnamed: 0,Service Request Type,Model,Mean Absolute Error,Mean Absolute Percentage Error,Total Requests,Latency
31,Towing Complaint - Public Impound,,,,,
21,Pothole,,,,,
16,Internet/Cable Issue,,,,,
12,General Inquiry - Public Utilities,,,,,
19,Parks and Recreation Maintenance,,,,,
15,Illegal Dumping / Needles,,,,,
24,Safe Routes to School,,,,,
41,Traffic Signal Maintenance,,,,,
23,Public Litter and Recycling Cans,,,,,
36,Feedback about the Customer Service Requests P...,,,,,


In [347]:
cv_split = KFold(n_splits=5, shuffle=False)

models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(**best_params['Random Forest']),
    'LightGBM': LGBMRegressor(**best_params['LightGBM']),
    'XGBoost': XGBRegressor(**best_params['XGBoost'])
}

for service in service_types:
    eval = {model: {'mae': None, 'latency': None} for model in models.keys()}

    service_data = Service_Type[Service_Type['Service Request Type'] == service].drop(columns=['Service Request Type'])
    X = service_data.drop(columns=['Request Count'])
    y = service_data['Request Count']

    if len(X) < 5:
        print(f"Skipping {service}: only {len(X)} samples.")
        continue

    for model_name, model in models.items():

        mean_absolute_errors, mean_absolute_percentage_errors, latencies = [], [], []

        for idx_train, idx_val in cv_split.split(X, y):
            X_train = X.iloc[idx_train]
            X_val = X.iloc[idx_val]
            y_train = y.iloc[idx_train]
            y_val = y.iloc[idx_val]

            start_time = time.time()
            model.fit(X_train, y_train)

            y_pred = model.predict(X_val)
            end_time = time.time()

            latency = end_time - start_time
            mae = np.mean(np.abs(y_val - y_pred))
            mape = np.mean(np.abs(100.0 * (y_val - y_pred) / y_val))
            mean_absolute_errors.append(mae)
            mean_absolute_percentage_errors.append(mape)
            latencies.append(latency)

        eval[model_name]['mae'] = np.mean(mean_absolute_errors)
        eval[model_name]['mape'] = np.mean(mean_absolute_percentage_errors)
        eval[model_name]['latency'] = np.mean(latencies)

    best_model = min(eval, key=lambda x: eval[x]['mape'])

    model_df.loc[model_df['Service Request Type'] == service, 'Model'] = best_model
    model_df.loc[model_df['Service Request Type'] == service, 'Mean Absolute Error'] = eval[best_model]['mae']
    model_df.loc[model_df['Service Request Type'] == service, 'Mean Absolute Percentage Error'] = eval[best_model]['mape']
    model_df.loc[model_df['Service Request Type'] == service, 'Total Requests'] = service_data['Request Count'].sum()
    model_df.loc[model_df['Service Request Type'] == service, 'Latency'] = eval[best_model]['latency']

Skipping Taxi, TNC, or Limousine Complaint or Compliment: only 4 samples.


Notice that some service types like "Snow and Ice" has a huge MAPE! That is usually due to the tiny amount of total requests used for training and validating this model.
You will notice the trend that popular service types tend to have smaller MAPE. But there are also some anomalies or outliers that does not follow this trend. That is because
of the amount of noise in the data which often times, in time series, are unexplainable and has no causality. But generally, it is safe to say that even if certain causes do not contribute to the presence of a phenomenon, larger data always makes our models perform better! 

In [348]:
model_df = model_df.loc[model_df['Model'].notna(), :]
model_df.sample(10)

Unnamed: 0,Service Request Type,Model,Mean Absolute Error,Mean Absolute Percentage Error,Total Requests,Latency
18,Parking Enforcement,Linear Regression,13.484949,0.547262,130806,0.001508
20,Pollution Report Form,LightGBM,1.985,57.307031,281,0.017846
43,Nuisance dogs in a park,Linear Regression,7.32728,5.128536,2171,0.001479
25,Scooter or Bike Share Issue,Random Forest,43.35068,22.126604,19470,0.080279
3,Business Related Complaint,Linear Regression,5.252364,34.664436,861,0.001617
29,Traffic Calming,Linear Regression,2.443948,46.288031,191,0.001556
19,Parks and Recreation Maintenance,Linear Regression,117.301133,120.788051,35326,0.001512
41,Traffic Signal Maintenance,Linear Regression,10.256267,3.460451,6532,0.001645
15,Illegal Dumping / Needles,Linear Regression,53.949032,2.350935,115780,0.001476
38,Found a Pet,Linear Regression,10.140827,67.125973,720,0.00151


### Predicting the Future: 3 Month from Today (4/27/2025)

In [349]:
test_rows = []

for service in model_df['Service Request Type'].unique():
    for month in [5, 6, 7]: 
        row = {
            'Service Request Type': service,
            'Year': 2025,
            'Month': month,
            'Predicted Request Count': None
        }
        test_rows.append(row)

test = pd.DataFrame(test_rows)
test.sample(10)

Unnamed: 0,Service Request Type,Year,Month,Predicted Request Count
17,Clogged Storm Drain,2025,7,
70,Public Litter and Recycling Cans,2025,6,
3,Abandoned Vehicle,2025,5,
82,Streetlight Maintenance,2025,6,
116,Lost a Pet,2025,7,
10,Business Related Complaint,2025,6,
77,Scooter or Bike Share Issue,2025,7,
54,Parking Enforcement,2025,5,
103,Unauthorized Encampment,2025,6,
122,Traffic Signal Maintenance,2025,7,


In [350]:
test_set = test.merge(model_df[['Service Request Type', 'Model']], on='Service Request Type', how='left')

trained_models = {}
trained_features = {}

for service in test_set['Service Request Type'].unique():
    model_name = test_set.loc[test_set['Service Request Type'] == service, 'Model'].values[0]

    service_data = Service_Type[Service_Type['Service Request Type'] == service].drop(columns=['Service Request Type', 'lag', 'Rolling_Mean', 'Rolling_Std'])
    X = service_data.drop(columns=['Request Count'])
    y = service_data['Request Count']

    model = models[model_name]
    model.fit(X, y)

    trained_models[service] = model
    trained_features[service] = X.columns.tolist()

predicted_counts = []

for service in test_set['Service Request Type'].unique():
    model = trained_models[service]
    features = trained_features[service]

    service_rows = test_set[test_set['Service Request Type'] == service]
    X_test = service_rows[features].astype(float)
    
    preds = model.predict(X_test)
    test_set.loc[test_set['Service Request Type'] == service, 'Predicted Request Count'] = preds

test_set.sample(10)


Unnamed: 0,Service Request Type,Year,Month,Predicted Request Count,Model
17,Clogged Storm Drain,2025,7,140.659091,Linear Regression
118,Street Sign Maintenance,2025,6,683.009278,Random Forest
126,Nuisance dogs in a park,2025,5,114.477273,Linear Regression
122,Traffic Signal Maintenance,2025,7,140.659091,Linear Regression
12,Business Violation of Public Health Requirements,2025,5,4.951691,LightGBM
111,Found a Pet,2025,5,114.477273,Linear Regression
51,Overgrown Vegetation,2025,5,43.079842,XGBoost
96,Abandoned Vehicle/72hr Parking Ordinance,2025,5,669.863175,Random Forest
3,Abandoned Vehicle,2025,5,669.863175,Random Forest
15,Clogged Storm Drain,2025,5,114.477273,Linear Regression


In [351]:
final = test_set.groupby(['Service Request Type'])['Predicted Request Count'].sum().reset_index()
final.sample(10)

Unnamed: 0,Service Request Type,Predicted Request Count
35,Street Sign Maintenance,1992.163258
24,Nuisance dogs in a park,382.704545
31,Public Litter and Recycling Cans,382.704545
14,General Inquiry - City Light,382.704545
30,Public Garage or Parking Lot Complaint,382.704545
34,Snow and Ice,129.239525
21,Internet/Cable Issue,1992.163258
6,Business Violation of Public Health Requirements,15.787439
17,General Inquiry - Public Utilities,382.704545
5,Business Related Complaint,382.704545


### Exports for Data Analysis & Visualization

In [None]:
# Service_Type.to_csv('Service_Type.csv', index=False)
# model_df.to_csv('model_summary.csv', index=False)
# test_set.to_csv('predicted_monthly.csv', index=False)
# final.to_csv('predicted_quarterly.csv', index=False)

### Conclusion

Through time series modeling and predictive analysis of Seattle’s historical service request data, we successfully forecasted the expected number of service requests for each Service Request Type over the next three months. By engineering critical features such as lag values and rolling statistics, and by tuning and evaluating multiple machine learning models — including Linear Regression, Random Forest, LightGBM, and XGBoost — we identified the best-performing model for each service category.

Our results showed strong predictive performance on high-volume and stable service types, while naturally encountering greater variability in smaller, more irregular service types. This outcome reflects common patterns in real-world forecasting, where rare or seasonal services present higher prediction challenges.

Overall, the models and forecasts developed in this project provide a data-driven foundation for Seattle’s departments to anticipate resident needs, allocate resources more efficiently, and plan operational strategies for the coming months. Future work could extend this approach by incorporating external factors, such as weather or event calendars, to further refine forecasts for highly seasonal or irregular service types.