In [3]:
## Missed Trash Pickups

##In this data question you will be working data of service request related to missed trash pickups from hubNashville, Metro Nashville government's comprehensive customer service system (https://hub.nashville.gov).

##As part of Metro's contract with Red River Waste Solutions, failure to remedy an action or inaction will result in liquidated damages. 
    ##One category of liquidated damages is related to chronic problems in any category of service at the same premises. 
    ##A chronic problem is defined as more than one missed pickup for any address. 
    ##The first missed pickup will not result in a fine; however, every subsequent missed pickup will result in a $200 fine.

##Your job is to determine the total amount of damages due to missed pickups. 
##Note that not all rows that you have been provided correspond to missed pickups and that you will need to ensure that you are only counting missed pickups.

##After determining the total amount of damages, you can look at other questions:


##* What other types of complaints are there?

##* Make a heat map that shows the most total missed pick ups and another that shows the total fines, each by zip code.

##* How do metro crews compare to the contractor's performance?

##* How much does each trash hauler owe?

##* What were to total missed pickup by route?



##Before you begin, explore the data.  Cleaning and preparing the data for analysis is an important and necessary step.  
##Planning and communication are vital to success. This data and analysis are based on a real-world project.
import pandas as pd
import geopandas as gpd
import re
from shapely.geometry import Point
import matplotlib.pyplot as plt
import seaborn as sns
import folium

In [4]:
from folium.plugins import HeatMap

In [5]:
pickup_report = pd.read_csv('data/trash_hauler_report.csv')

In [6]:
     # Fill missing fields 
pickup_report['Trash Hauler'] = pickup_report['Trash Hauler'].fillna('N/A')
pickup_report['Trash Route'] = pickup_report['Trash Route'].fillna('N/A')

In [7]:
 # code from Logan to handle zip code disparities
pickup_report['Zip Code'] = pickup_report['Zip Code'].fillna(0).astype('category')
assert pickup_report['Zip Code'].dtype == 'category'

print(pickup_report['Zip Code'].describe())

display(pickup_report['Zip Code'])

count     20226.0
unique       29.0
top       37013.0
freq       2278.0
Name: Zip Code, dtype: float64


0        37207.0
1        37218.0
2        37209.0
3        37207.0
4        37209.0
          ...   
20221    37013.0
20222    37206.0
20223    37214.0
20224    37013.0
20225    37217.0
Name: Zip Code, Length: 20226, dtype: category
Categories (29, float64): [0.0, 37013.0, 37027.0, 37076.0, ..., 37219.0, 37220.0, 37221.0, 37228.0]

In [8]:
# Clean up column names and Normalize data
pickup_report = pickup_report.rename(columns={'Request ': 'Request'})
pickup_report['Trash Hauler'] = pickup_report['Trash Hauler'].str.upper()

In [9]:
  # Create a dataframe made up of ONLY 'missed' appearing in the 'Request" column
missedpu = pickup_report[pickup_report['Request'].str.contains(r'\bmissed\b', case=False, na=False, regex=True)]
print (missedpu.head(25))

    Request Number Date Opened                               Request  \
1            25274    11/01/17  Trash - Curbside/Alley Missed Pickup   
2            25276    11/01/17  Trash - Curbside/Alley Missed Pickup   
3            25307    11/01/17  Trash - Curbside/Alley Missed Pickup   
4            25312    11/01/17  Trash - Curbside/Alley Missed Pickup   
8            25330    11/01/17  Trash - Curbside/Alley Missed Pickup   
9            25331    11/01/17  Trash - Curbside/Alley Missed Pickup   
10           25341    11/01/17  Trash - Curbside/Alley Missed Pickup   
12           25359    11/01/17  Trash - Curbside/Alley Missed Pickup   
14           25370    11/01/17  Trash - Curbside/Alley Missed Pickup   
15           25371    11/01/17  Trash - Curbside/Alley Missed Pickup   
18           25449    11/01/17  Trash - Curbside/Alley Missed Pickup   
19           25454    11/01/17  Trash - Curbside/Alley Missed Pickup   
21           25469    11/01/17  Trash - Curbside/Alley Missed Pi

In [10]:
# Create a dataframe made up of ONLY 'missed' appearing in the 'Request" column
missedpu = pickup_report[pickup_report['Request'].str.contains(r'\bmissed\b', case=False, na=False, regex=True)]
print (missedpu.describe())

       Request Number  Council District  State Plan X  State Plan Y
count    15028.000000      14991.000000  1.500700e+04  1.500700e+04
mean    151361.267234         18.318524  1.766425e+06  6.592279e+05
std      71901.537542         10.107257  8.137032e+05  4.550526e+04
min      25274.000000          1.000000  1.663490e+06 -2.719106e+05
25%      87142.250000          8.000000  1.728654e+06  6.395148e+05
50%     148228.500000         20.000000  1.747701e+06  6.570097e+05
75%     220744.500000         27.000000  1.760562e+06  6.757087e+05
max     267137.000000         35.000000  3.496892e+07  2.204382e+06


In [11]:
add_counts = missedpu.groupby(missedpu['Incident Address'].str.lower()).size().reset_index(name='Missed Count')
miss_count = missedpu.merge(add_counts[add_counts['Missed Count'] > 1], left_on=missedpu['Incident Address'].str.lower(), right_on='Incident Address', how='inner')[['Missed Count'] + [col for col in missedpu.columns]].rename(columns={'Incident Address_x': 'Incident Address'})
print(miss_count.head(20))

    Missed Count  Request Number Date Opened  \
0              2           25274    11/01/17   
1              2           25359    11/01/17   
2              4           25371    11/01/17   
3              3           25454    11/01/17   
4              5           25496    11/01/17   
5              2           25511    11/01/17   
6              2           25512    11/01/17   
7              4           25515    11/01/17   
8              3           25517    11/01/17   
9              2           25536    11/02/17   
10             4           25539    11/02/17   
11             2           25540    11/02/17   
12             2           25586    11/02/17   
13             5           25587    11/02/17   
14             5           25592    11/02/17   
15             2           25653    11/02/17   
16             3           25733    11/02/17   
17             5           25789    11/02/17   
18             2           25791    11/02/17   
19             2           25825    11/0

In [12]:
    # add fees

fees = ((miss_count['Missed Count'] -1) * 200)
tot_fees = ((miss_count['Missed Count'] -1).sum())*200

# add column to dataframe
miss_count['fees'] = (miss_count['Missed Count'] - 1) * 200

# total fees
print(tot_fees)

# fees by address in table: 
print(miss_count[['Missed Count', 'fees', 'Incident Address']].head(25))

3016400
    Missed Count  fees        Incident Address
0              2   200   4028 clarksville pike
1              2   200         830 meridian st
2              4   600   2218 buena vista pike
3              3   400         449 westboro dr
4              5   800      1815 woodmont blvd
5              2   200         259 sunrise ave
6              2   200         4311 dakota ave
7              4   600         4029 graybar ct
8              3   400          524 harding pl
9              2   200  6011 jocelyn hollow rd
10             4   600         2524 batavia st
11             2   200          934 battery ln
12             2   200         4311 dakota ave
13             5   800    101 westover park ct
14             5   800      205 channelkirk ln
15             2   200    118 westover park ct
16             3   400       905 woodmont blvd
17             5   800             116 bart dr
18             2   200       5115 greentree dr
19             2   200        531 cathy jo cir
20   

In [13]:
# missed counts by route with fees inclusive
route_missed = miss_count.groupby('Trash Route').agg({'Missed Count': 'sum', 'fees': 'sum'})
print(route_missed)

             Missed Count    fees
Trash Route                      
1201                  143   18000
1202                   58    6600
1203                   18    1800
1204                   33    3600
1205                   51    8400
...                   ...     ...
9505                    4     400
9506                   50    5600
9507                   24    2400
9508                  108   13200
N/A                  1303  196600

[162 rows x 2 columns]


In [14]:
  # Trash Route N/A concerning.  Exploring possible reasons
na_explore = miss_count[miss_count['Trash Route'] == 'N/A'][['Request', 'Description', 'Trash Hauler', 'Incident Address']]
print(na_explore.head(50))

# Missed pickups seem valid even though there is no Trash Route recorded.  Might warrent more investigation later

                                   Request  \
34    Trash - Curbside/Alley Missed Pickup   
35    Trash - Curbside/Alley Missed Pickup   
65    Trash - Curbside/Alley Missed Pickup   
66    Trash - Curbside/Alley Missed Pickup   
67    Trash - Curbside/Alley Missed Pickup   
68    Trash - Curbside/Alley Missed Pickup   
173   Trash - Curbside/Alley Missed Pickup   
179   Trash - Curbside/Alley Missed Pickup   
180   Trash - Curbside/Alley Missed Pickup   
181   Trash - Curbside/Alley Missed Pickup   
189   Trash - Curbside/Alley Missed Pickup   
245   Trash - Curbside/Alley Missed Pickup   
261   Trash - Curbside/Alley Missed Pickup   
266   Trash - Curbside/Alley Missed Pickup   
288   Trash - Curbside/Alley Missed Pickup   
297   Trash - Curbside/Alley Missed Pickup   
303   Trash - Curbside/Alley Missed Pickup   
319   Trash - Curbside/Alley Missed Pickup   
329   Trash - Curbside/Alley Missed Pickup   
348   Trash - Curbside/Alley Missed Pickup   
390   Trash - Curbside/Alley Misse

In [23]:
zip_shapes = gpd.read_file('data/zipcodes.geojson')

polygon37207 = zip_shapes[zip_shapes['zip'] == '37207']

# Filter data for the same zip code
zip_code = 37207
zip_df = miss_count[miss_count['Zip Code'] == zip_code]

# Get centroid of the polygon to center the map
center = polygon37207.geometry.centroid.iloc[0]
area_center = [center.y, center.x]

# Create folium map
map_37207 = folium.Map(location=area_center, zoom_start=12)

# Add polygon boundary
folium.GeoJson(polygon37207).add_to(map_37207)

# Add markers for each missed pickup location
for idx, row in zip_df.iterrows():
    folium.CircleMarker(
        location=[row['State Plan Y'], row['State Plan X']],  # lat, lon
        radius=row['Missed Count'] * 0.5,  # bubble size scaled by missed pickups
        color='blue',
        fill=True,
        fill_opacity=0.6,
        popup=f"""
        <b>Zip Code:</b> {row['Zip Code']}<br>
        <b>Address:</b> {row['Incident Address']}<br>
        <b>Total Missed Pickups:</b> {row['Missed Count']}
        """
    ).add_to(map_37207)

# Display the map
map_37207


  center = polygon37207.geometry.centroid.iloc[0]
