
#    311 Service Requests Data Analysis and Visualization
##   DATA 601 Project Report - Team 4 
###  Team Members : Anitha Joseph, Jincy Thomas, Megha Radhakrishnan Sanitha 


#### INTRODUCTION

    311 is a non-emergency government service that helps citizens to report issues and access city information. It can be accessed by phone, online, or through mobile apps and often includes multilingual support and services for vulnerable populations like the senior residents. The 311 system collects all requests, which are then routed to the responsible city department for resolution. It operates 24/7 and allows users to track the progress of their requests, ensuring transparency and accountability. Thus the 311 services make the lives of the residents easier and saves their time by eliminating the need to contact multiple departments for resolving their concerns. The user-friendly interface of 311 also makes the communication with the local government effortless, convenient and accessible for everyone.

    Analysis of 311 data over a long period of time helps us in determining the most frequent service requests which can be addressed more efficiently by allocating adequate resources. This can also be used to identify the efficiencies of different departments and the gaps where improvement is required. Seasonal patterns and long term trends can also be identified which can be used to predict the future demands and optimize the public services. By tracking types of requests, response times, and areas with frequent complaints, resources can be allocated more effectively, and operational inefficiencies can be identified and resolved. The residents will get a clear picture regarding the handling of their concerns which promotes transparency. Proactive planning is made possible, allowing for better resource distribution and improved service delivery. Ultimately, systemic issues can be identified, and solutions can be developed,leading to improved city planning and enhanced community satisfaction.


#### GUIDING QUESTIONS

    The major guiding questions for the analysis of 311 service requests and the the ways in which the insights from answering these questions can be utilized are as follows:
    1. Geographic Analysis
    ● Which community or location has the largest number of service requests?
    ● Are there any specific needs for certain areas?
    High need areas can be prioritized and resources can be allocated where they are most required.
    
    2. Seasonal Trends
    ● During which seasons do service requests occur most often?
    ● How do service requests change over seasons? Are there any identifiable patterns?
    Seasonal patterns are identified thus helping to focus resources effectively.
    
    3. Request Sources
    ● What is the primary source of service requests: phone calls or online submissions (web)?
    Interfaces can be upgraded based on user preferences.
    
    4. Types of Service Requests
    ● What is the service requested most frequently?
    Helps to identify and allocate adequate resources for the most needed service.
    
    5. Response Efficiency
    ● Which agency handles the most and least number of service requests?
    ● What is the average response rate and time for resolving for service requests?
    ● Who are the most efficient agents in terms of response and resolution times?
    ● How does the response efficiency vary across different years?
    Track the department efficiencies and identify areas where improvement is required.
    
    6. Trends Over Time
    ● How has the volume and type of service requests changed over the past five years?
    ● Are there noticeable trends in requests that could be used for future planning?
    Future demands can be anticipated, allowing for better preparedness.


#### DATASET

##### Data description

    The data for this analysis is sourced from The City of Calgary’s open data portal, specifically the "311 Service Requests -Services and Amenities" dataset (The City of Calgary,2025) . We have included public service requests submitted via 311 for 2 years- 2023 and 2024, consisting of 1062842 rows and 15 columns, structured in a tabular format. Each row represents an individual service request.
    
##### Format and Structure
    The data consists of several datasets depending on the city or organization's open data portal. Each dataset typically contains the following columns: 

    Column                Datatype   Description
    service_request_id     object    The unique identifier for an individual request.
    requested_date         object    The date the request was submitted.
    updated_date           object    The most recent date the request was updated.
    closed_date            object    The date the request was closed.
    status_description     object    The current status of the request (e.g. open, closed).
    source                 object    The channel used to submit the request
    service_name           object    The type of service requested.
    agency_responsible     object    The department responsible for this request.
    address               float64    The location of the service request (if applicable).
    comm_code              object    The community code associated with the service request location.
    comm_name              object    The community name associated with the service request location.
    location_type          object    The type of location information provided for this service request.
    longitude             float64    The longitude of the service request.
    latitude              float64    The latitude of the service request.
    point                  object    The spatial coordinates based on latitude and longitude.

    The data is sourced from The City of Calgary’s open data portal, 311 Service Requests - Services and Amenities.The data is publicly available and used with permission as per the open data policy. This data is provided by the City of Calgary at https://data.calgary.ca/Services-and-Amenities/311-Service-Requests/iahh-g8bj/about_data and all usagecomplies with their data usage and attribution requirements listed in the license URL https://data.calgary.ca/d/Open-Data-Terms/u45n-7awa .

In [3]:
# Installing and Importing libraries 

#!pip install geopandas

import numpy as np
import re
import pandas as pd
from tabulate import tabulate
import geopandas as gpd

import seaborn as sns

import matplotlib as mpl
import matplotlib.pyplot as plt

import plotly.express as px

from datetime import datetime
import pytz

import geopandas as gpd
import matplotlib.patches as mpatches
from shapely.geometry import Point, Polygon, MultiPolygon

In [16]:
# Loading datasets

# 311_Service_Requests
df = pd.read_csv('/Users/jincythomas/Downloads/311_Service_Requests_2yrs.csv')

# Community Sectors
community_data=pd.read_csv("CSV_SECTORS.csv")

# Geographic data for geo-visualization
multi_polygon_df = pd.read_csv("Community_District_Boundaries_GeoJson.csv")
#display(df)
display(community_data)

Unnamed: 0,CLASS,CLASS_CODE,COMM_CODE,NAME,SECTOR,SRG,COMM_STRUCTURE,longitude,latitude,POINT
0,Residential,1,BED,BEDDINGTON HEIGHTS,NORTH,ESTABLISHED,1960s/1970s,-114.09,51.13,POINT (-114.08502139544244 51.13163280873361)
1,Residential,1,EVN,EVANSTON,NORTH,COMPLETE,2010s,-114.11,51.17,POINT (-114.1124526074949 51.17109493109596)
2,Residential,1,KIL,KILLARNEY/GLENGARRY,CENTRE,ESTABLISHED,1950s,-114.13,51.03,POINT (-114.13172726984385 51.031548429038665)
3,Residential,1,BRA,BRAESIDE,SOUTH,ESTABLISHED,1960s/1970s,-114.11,50.96,POINT (-114.10636591786145 50.955992888964275)
4,Residential,1,BLM,BELMONT,SOUTH,DEVELOPING,BUILDING OUT,-114.06,50.87,POINT (-114.055251748252 50.86868365691495)
...,...,...,...,...,...,...,...,...,...,...
313,Residual Sub Area,4,ABT,AMBLETON,NORTHWEST,,UNDEVELOPED,-114.11,51.19,POINT (-114.22275942835957 51.168724389792594)
314,Residual Sub Area,4,12L,12L,SOUTHEAST,FUTURE,UNDEVELOPED,-113.87,50.91,POINT (-113.8715190162749 50.91435050102264)
315,Residual Sub Area,4,12I,12I,SOUTH,,UNDEVELOPED,-114.16,50.99,POINT (-114.16441971345019 50.986651529493514)
316,Residual Sub Area,4,01I,01I,SOUTH,,UNDEVELOPED,-114.16,50.99,POINT (-114.16441971345019 50.986651529493514)


## Data cleaning nad preprocessing

Handling Missing and Unwanted Data

• Handling Missing Data: Drop columns with more than 10% missing values.

• Handling Unwanted Data: Drop requestes created before Jan-1-2023 and after Dec-31-2024.

• Handling Missing Community Code: Fill Community Code with Community name.

• Handling Missing Longitude: Fill Longitude with Median value.

• Handling Missing Latitude: Fill Latitude with median value.

• Handling Missing Point with it's mode value

• Replace values in 'source' column with corresponding service source

In [7]:
print("----------------------------------------------------------------------------")
print("\033[1m"+"Data Analysis and Visualization of Building Emergency Benchmarking"+"\033[0m")
print("----------------------------------------------------------------------------")

#display shape, columns, and data types
print("1.\tShape of the Dataset:", df.shape)
print("2.\tNumber of records or rows of the DataFrame:", df.shape[0])
print("3.\tColumns and Data types of each column:\n", df.dtypes)

----------------------------------------------------------------------------
[1mData Analysis and Visualization of Building Emergency Benchmarking[0m
----------------------------------------------------------------------------
1.	Shape of the Dataset: (1093918, 15)
2.	Number of records or rows of the DataFrame: 1093918
3.	Columns and Data types of each column:
 service_request_id     object
requested_date         object
updated_date           object
closed_date            object
status_description     object
source                 object
service_name           object
agency_responsible     object
address               float64
comm_code              object
comm_name              object
location_type          object
longitude             float64
latitude              float64
point                  object
dtype: object


In [8]:
# Inspecting  data

missingDataSum = df.isna().sum()
missingDataPercentage = (df.isnull().mean() * 100).round(2)
missingData = pd.DataFrame({
    "Missing Count": missingDataSum,
    "Missing Percentage": missingDataPercentage
})

pd.options.display.float_format = '{:.2f}'.format
print("\n\033[1m"+"Missing Count per column:"+"\033[0m")
print(tabulate(missingData, headers='keys', tablefmt='fancy_grid'))

#The dataframe(DF) is copied to another DF variable if in case there is a need for original DF
originalDF = df
display(df.head(5))


[1mMissing Count per column:[0m
╒════════════════════╤═════════════════╤══════════════════════╕
│                    │   Missing Count │   Missing Percentage │
╞════════════════════╪═════════════════╪══════════════════════╡
│ service_request_id │     0           │                 0    │
├────────────────────┼─────────────────┼──────────────────────┤
│ requested_date     │     0           │                 0    │
├────────────────────┼─────────────────┼──────────────────────┤
│ updated_date       │     0           │                 0    │
├────────────────────┼─────────────────┼──────────────────────┤
│ closed_date        │ 39714           │                 3.63 │
├────────────────────┼─────────────────┼──────────────────────┤
│ status_description │     0           │                 0    │
├────────────────────┼─────────────────┼──────────────────────┤
│ source             │     0           │                 0    │
├────────────────────┼─────────────────┼──────────────────────┤
│ ser

Unnamed: 0,service_request_id,requested_date,updated_date,closed_date,status_description,source,service_name,agency_responsible,address,comm_code,comm_name,location_type,longitude,latitude,point
0,23-00000797,2023/01/02 12:00:00 AM,2023/01/10 12:00:00 AM,2023/01/10 12:00:00 AM,Closed,Other,Finance - ONLINE TIPP Agreement Request,CFOD - Finance,,,,,,,
1,23-00001045,2023/01/02 12:00:00 AM,2024/01/11 12:00:00 AM,2024/01/11 12:00:00 AM,Closed,Other,Active Living Program Application,CS - Recreation and Social Programs,,,,,,,
2,23-00001163,2023/01/02 12:00:00 AM,2023/01/06 12:00:00 AM,2023/01/06 12:00:00 AM,Closed,Phone,CN - Registered Social Worker Letter,CS - Calgary Neighbourhoods,,,,,,,
3,23-00001191,2023/01/02 12:00:00 AM,2024/05/19 12:00:00 AM,2023/01/10 12:00:00 AM,Closed,Other,CT - Lost Property,OS - Calgary Transit,,,,,,,
4,23-00001584,2023/01/02 12:00:00 AM,2023/01/04 12:00:00 AM,2023/01/04 12:00:00 AM,Closed,Other,Recreation - Arena Booking Application,CS - Calgary Recreation,,,,,,,


In [19]:
# Handling Missing Data
columnNameDropped = missingDataPercentage[missingDataPercentage >= 40].index.tolist()
print("\nColumns with missing percentage more than 40% missing values are:", columnNameDropped)
df = df.drop(columns = missingDataPercentage[missingDataPercentage > 40].index)

# Handling Unwanted Data
beforeCount = df.shape[0]
df = df[(df['requested_date'] < '2025-01-01') & (df['requested_date'] > '2023-01-01')]
afterCount =df.shape[0]
deletedCount = beforeCount - afterCount
print(f"\nCount of deleted request which are recieved on or after 2025-01-01 and before 2023-01-01: {deletedCount}")

#Handling Missing Community Code
communityNames = df[df['comm_code'].isnull() & df['comm_name'].notnull()]['comm_name'].to_list()
print(f"\nCommunity name with community code null and community name exists: {communityNames}")

df['comm_code'].fillna(df['comm_name'])
print(f"\nCommunity Code is filled with Community name for {communityNames} community")

#Handling Missing Longitude and Latitude with their median 
df['longitude'] = df['longitude'].fillna(df['longitude'].median())
df['latitude'] = df['latitude'].fillna(df['latitude'].median())
print("\nLongitude and latitude missing values are replaced with its corresponding median")

#Handling Missing Point with the mode
df['point'] = df['point'].fillna(df['point'].mode()[0])
print("\nPoint missing values are replaced with its mode")

# Replace values in 'source' column with corresponding service source
df['source'] = df['source'].replace('Web', 'Web (Online Form)')
df['source'] = df['source'].replace('App', 'Mobile App')
df['source'] = df['source'].replace('Other','Email & Social Media')


Columns with missing percentage more than 40% missing values are: ['address']

Count of deleted request which are recieved on or after 2025-01-01 and before 2023-01-01: 31076

Community name with community code null and community name exists: []

Community Code is filled with Community name for [] community

Longitude and latitude missing values are replaced with its corresponding median

Point missing values are replaced with its mode


In [20]:
# Date and Time Handling:
#------------------------------------------------------------------------------------------

#Convert the Date column to a datetime object
df['requested_date'] = pd.to_datetime(df['requested_date'], format = '%Y/%m/%d %I:%M:%S %p')
df['updated_date'] = pd.to_datetime(df['updated_date'], format = '%Y/%m/%d %I:%M:%S %p')
df['closed_date'] = pd.to_datetime(df['closed_date'], format = '%Y/%m/%d %I:%M:%S %p')

print("\n\033[1m"+"Date and Time Handling: Modified data type:"+"\033[0m")
print(f"Data type of 'requested_date': {df['requested_date'].dtype}")
print(f"Data type of 'updated_date': {df['updated_date'].dtype}")
print(f"Data type of 'closed_date': {df['closed_date'].dtype}")

# Converting null values to NaT
df['closed_date'] = df['closed_date'].fillna(pd.NaT)

#Create new columns for the year, month, and day of the week for requested, updated and closed date columns
df['request_year'] = df['requested_date'].dt.year
df['request_month'] = df['requested_date'].dt.month
df['request_day'] = df['requested_date'].dt.day
df['update_year'] = df['updated_date'].dt.year
df['update_month'] = df['updated_date'].dt.month
df['update_day'] = df['updated_date'].dt.day
df['closed_year'] = df['closed_date'].dt.year
df['closed_month'] = df['closed_date'].dt.month
df['closed_day'] = df['closed_date'].dt.day

# Map the numerical days to their names for better readability
day_name_map = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
df['day_name'] = df['request_day'].map(day_name_map)

# Replacing null values in derived date related columns with 0 and converting the column values to int type
df.loc[df['closed_date'].isna(), ['closed_year', 'closed_month', 'closed_day']] = 0
df[['request_year', 'request_month', 'request_day']] = df[['request_year', 'request_month', 'request_day']].astype('Int32')
df[['update_year', 'update_month', 'update_day']] = df[['update_year', 'update_month', 'update_day']].astype('Int32')
df[['closed_year', 'closed_month', 'closed_day']] = df[['closed_year', 'closed_month', 'closed_day']].astype('Int32')


print("\n\033[1m"+"Date and Time Handling: newly created columns are:"+"\033[0m")
print("For requested_date: request_year, request_month, request_day")
print("For updated_date: update_year, update_month, update_day")
print("For closed_date: closed_year, closed_month, closed_day")


[1mDate and Time Handling: Modified data type:[0m
Data type of 'requested_date': datetime64[ns]
Data type of 'updated_date': datetime64[ns]
Data type of 'closed_date': datetime64[ns]

[1mDate and Time Handling: newly created columns are:[0m
For requested_date: request_year, request_month, request_day
For updated_date: update_year, update_month, update_day
For closed_date: closed_year, closed_month, closed_day


In [21]:
# Create additional columns:
#-------------------------------------------------------------------------

#Add a column indicating whether each request date falls on a weekend
df['is_weekend_request'] = df['request_day']>= 5

#Add a column for time duration to calculate the time took to close the request
df['response_time'] = df['closed_date'] - df['requested_date']
df['response_time'] = df['response_time'].dt.days

#Add a column to see if the request is duplicate or not(Yes means duplicate and No means not a duplicate request)
df['duplicate_request'] = df['status_description'].str.contains(r'Duplicate \(Closed\)', regex=True)
df['duplicate_request'] = df['duplicate_request'].replace({True: 'Yes', False: 'No'})

print("\n\033[1m"+"Additional Columns created are:"+"\033[0m")
print("\tis_weekend_request")
print("\tresponse_time")
print("\tduplicate_request")


[1mAdditional Columns created are:[0m
	is_weekend_request
	response_time
	duplicate_request


In [22]:
# Season Categorisation of "Requests"
# Defining Calgary's timezone
calgary_tz = pytz.timezone('America/Edmonton')  
# Exact UTC times for solstices and equinoxes (taken from Govt of Canada Website)
seasons_utc = {
    'Spring_2023': '2023-03-20 21:24:00',
    'Summer_2023': '2023-06-21 14:57:00',
    'Autumn_2023': '2023-09-23 06:50:00',
    'Winter_2023': '2023-12-22 03:27:00',
    'Spring_2024': '2024-03-20 03:06:00',
    'Summer_2024': '2024-06-20 20:50:00',
    'Autumn_2024': '2024-09-22 12:43:00',
    'Winter_2024': '2024-12-21 09:20:00'
}

# Converting the UTC times to Calgary local time
seasons = {}
for season, utc_time_str in seasons_utc.items():
    # Converting the UTC string into a datetime object   
    utc_time = datetime.strptime(utc_time_str, '%Y-%m-%d %H:%M:%S')
    utc_time = pytz.utc.localize(utc_time) 
    # Converting to Calgary local time
    local_time = utc_time.astimezone(calgary_tz)
    # Saving the result in the dictionary
    seasons[season] = local_time
#for key, value in seasons.items():
#print(f"{key}: {value.strftime('%Y-%m-%d %H:%M:%S')}")
#    print(f"{key}: {value}")

# Keeping the local time but making it aware for requested_date columns
if df['requested_date'].dt.tz is None:
    df['new_requested_date'] = df['requested_date'].dt.tz_localize('America/Edmonton')
#print(df['new_requested_date'].head())

# Categorizing into seasons and creating a new 'season' column
# Assigning seasons based on request date
def get_season(request_date):
    for season, season_date in seasons.items():
        if request_date < season_date:
            return season
    return 'Winter_2024'  # Default to the latest season

# Creating new season column 
df['Season'] = df['new_requested_date'].apply(get_season)

# Define the date range
start_date = '2023-12-22'
end_date = '2023-12-31'
# Update the Season column for the specified date range
df.loc[(df['requested_date'] >= start_date) & (df['requested_date'] <= end_date), 'Season'] = 'Winter_2023'
#display(df)

print("\n\033[1m"+"Additional Columns created are:"+"\033[0m")
print("\tnew_requested_date")


[1mAdditional Columns created are:[0m
	new_requested_date


In [24]:
#Add column for Community Sector using the community sector csv file
def merge_community_sector(main_data, community_data):
    # Rename the relevant columns in the community_data for clarity and consistency
    community_data.rename(columns={'COMM_CODE': 'comm_code', 'SECTOR': 'community_sector'}, inplace=True)

    # Merge the datasets based on the 'comm_code'
    merged_data = main_data.merge(community_data[['comm_code', 'community_sector']], on='comm_code', how='left')

    return merged_data

df = merge_community_sector(df, community_data)
print("\n\033[1m"+"Additional Columns created are:"+"\033[0m")
print("\tcommunity_sector")

#Handling Missing for Community related columns
df.loc[df['comm_code'].isnull(), 'comm_code'] = "Community Centrepoint"
df.loc[df['comm_name'].isnull(), 'comm_name'] = "Community Centrepoint"
df.loc[df['community_sector'].isnull(), 'community_sector'] = "Community Centrepoint"

# Extract year and month from the requested_date column
df['year_month'] = df['requested_date'].dt.to_period('M')


[1mAdditional Columns created are:[0m
	community_sector


In [25]:
# Add a column for Divisions of Agency assigned for the requests

#Unassigned agencies are assigned to corresponding divisions
df.loc[df['agency_responsible'].isnull() & df['service_name'].str.contains('WATR -'), 'agency_responsible'] = 'UEP - Utilities & Environmental Protection'
df.loc[df['agency_responsible'].isnull() & df['service_name'].str.contains('PSD -'), 'agency_responsible'] = 'PDS - Planning & Development Services'
df.loc[df['agency_responsible'].isnull() & df['service_name'].str.contains('CPI -'), 'agency_responsible'] = 'OSC - Operational Services and Compliance'

# agency abbreviations are extracted
def extract_division(value):
    if pd.isna(value):
        return np.nan
    parts = value.split('-')
    resultStr = parts[0].strip() if '-' in value else value.strip()
    return resultStr


df['agency_division'] = df['agency_responsible'].apply(extract_division)

#Actual agencies or divisions under Calgary Government
agency_division = {
    'agency_name': ['Affiliated Organizations', 'Chief Financial Officer Department', 'Corporate Wide Service Requests',
                    'Calgary Police & Fire Services', 'Community Services', "Deputy City Manager's Office",
                   'Elected Officials', 'Fleet and Inventory', 'Information Services','Legal or Legislative Services',
                   'Office of the City Auditor','Operational Services and Compliance', 'Partnerships',
                   'Planning & Development Services','Project Information and Control Systems', 'Recreation and Social Programs',
                    'Transportation', 'Utilities & Environmental Protection'],
    'abbreviations': [['AO', 'Affiliated Organizations'], ['CFOD'], ['Corporate Wide Service Requests'], 
                      ['CPFS'],['CS'], ['DCMO'], 
                      ['Elected Officials'], ['Fleet and Inventory'], ['IS'], ['LL','LLSS'],
                      ['Office of the City Auditor'],['OS','OSC'],['Partnerships'],
                      ['PD','PDS'],['PICS'],['Recreation and Social Programs'],
                      ['TRAN','Tranc'], ['UEP','Uepc']]
}


# Create a mapping dictionary
mapping = {abbreviation: agency_name 
           for agency_name, abbreviations in zip(agency_division['agency_name'], agency_division['abbreviations']) 
           for abbreviation in abbreviations}


# Replace the agency_division values with actual agency_name or divisions
df['agency_division'] = df['agency_division'].map(mapping)
#noDivisionDF = df[df['agency_division'].isnull()]
#display(noDivisionDF)
agencies= df['agency_division'].unique()
    
# Iterate through each agency division in the list
for division in agencies:
    subset_df = df[df['agency_division'] == division]
    
    # Split the 'agency_responsible' column at the first hyphen and create 'agency_subdivision'
    df.loc[df['agency_division'] == division, 'agency_subdivision'] = subset_df['agency_responsible'].apply(
        lambda x: x.split('-', 1)[1] if '-' in x else division
    )

    # Split the 'service_name' column at the first hyphen and create 'service_category'
    df.loc[df['agency_division'] == division, 'service_category'] = subset_df['service_name'].apply(
        lambda x: x.split('-', 1)[0] if '-' in x else x
    )

    # Split the 'service_name' column at the first hyphen and create 'service_request'
    df.loc[df['agency_division'] == division, 'service_request'] = subset_df['service_name'].apply(
        lambda x: x.split('-', 1)[1] if '-' in x else x
    )
    
#Update Service Category names
df.loc[df['service_category'] == 'CT', 'service_category'] = 'Calgary Transit'
df.loc[df['service_category'] == 'DBBS Inspection', 'service_category'] = 'Development, Business and Building Services'
df.loc[df['service_category'] == 'WRS', 'service_category'] = 'Waste and Recycling Services'
df.loc[df['service_category'] == 'WATS', 'service_category'] = 'Water Services'
df.loc[df['service_category'] == 'Corporate', 'service_category'] = 'Corporate Wide Service Requests'


print("\n\033[1m"+"Additional Columns created are:"+"\033[0m")
print("\tagency_division")
print("\tagency_subdivision")
print("\tservice_category")
print("\tservice_request")



[1mAdditional Columns created are:[0m
	agency_division
	agency_subdivision
	service_category
	service_request


In [26]:
#Filter the records from your dataframe df where closed_date is greater than or equal to requested_date, 
#closed_date is not null, and duplicate_request is 'No'
print("\n\033[1m"+"Agency division and the count of requests handles by each division:"+"\033[0m")
efficiencyDF = df[(df['closed_date'] >= df['requested_date']) & 
                 (df['closed_date'].notna()) & 
                 (df['duplicate_request'] == 'No')]
print(f"For answering the response efficiency, we have considered {efficiencyDF.shape[0]} requests")

average_response_time = round(efficiencyDF['response_time'].mean(), 2)
print(f"Average response time: {average_response_time} days")



[1mAgency division and the count of requests handles by each division:[0m
For answering the response efficiency, we have considered 1022239 requests
Average response time: 12.6 days


## Information of New Processed Data Set for 311 requests

• Shape of the new Processed Dataset.

• Count of 311 service requests.

• Columns and Data types of each column.

• 311 Request 'Status' available.

• Agencies resposible for handling service requests.

• Count of all distinct service requests.

• Count of all distinct service requests for all agencies.

In [27]:
print("1.\tShape of the new Processed Dataset:", df.shape)
print("2.\tCount of 311 service requests:", df.shape[0])
print("3.\tColumns and Data types of each column:")

# Get the columns that are in df but not in originalDF
newColumns = list(set(df.columns) - set(originalDF.columns))
dtypes = df[newColumns].dtypes
width = max(len(column) for column in df.columns) + 2
for column, dtype in dtypes.items():
    print(f"\t\t{column.ljust(width)}{dtype}")
print(f"4.\t311 Request Status available: {df['status_description'].unique()}")
#display(df['status_description'].unique())

status_counts = df['status_description'].value_counts()
dupClosedReqCnt = status_counts['Duplicate (Closed)']
dupOpenReqCnt = status_counts['Duplicate (Open)']
closedReqCnt = status_counts['Closed']
openReqCnt = status_counts['Open']
print(f"\ti.\tCount of Open requests: {openReqCnt}")

# Filter requests where status_description is "open" but has closed date
openButClosedDF = df[(df['status_description'] == 'Open') & (df['closed_date'].notnull())]
openReqWithClosedDateCnt = openButClosedDF['status_description'].value_counts()
print(f"\t\ta.\tCount of Open requests with closed date: {openReqWithClosedDateCnt['Open']}")
openDF = df[(df['status_description'] == 'Open') & (df['closed_date'].isnull())]
openDF = openDF['status_description'].value_counts()
print(f"\t\tb.\tCount of Open requests with no closed date: {openDF['Open']}")

print(f"\tii.\tCount of Closed requests: {closedReqCnt}")
print(f"\tiii.\tCount of Duplicate (Open) requests: {dupOpenReqCnt}")
print(f"\tiv.\tCount of Duplicate (Closed) requests: {dupClosedReqCnt}")

# Inspect data
missingDataSum = df.isna().sum()
missingDataPercentage = (df.isnull().mean() * 100).round(2)
missingData = pd.DataFrame({
    "Missing Count": missingDataSum,
    "Missing Percentage": missingDataPercentage
})

pd.options.display.float_format = '{:.2f}'.format
print("\n\033[1m"+"Missing Count per column:"+"\033[0m")
print(tabulate(missingData, headers='keys', tablefmt='fancy_grid'))


1.	Shape of the new Processed Dataset: (1063381, 37)
2.	Count of 311 service requests: 1063381
3.	Columns and Data types of each column:
		agency_subdivision  object
		closed_day          Int32
		duplicate_request   object
		service_request     object
		agency_division     object
		request_day         Int32
		closed_month        Int32
		request_year        Int32
		closed_year         Int32
		day_name            object
		update_year         Int32
		community_sector    object
		update_month        Int32
		response_time       float64
		request_month       Int32
		year_month          period[M]
		update_day          Int32
		Season              object
		is_weekend_request  boolean
		service_category    object
		community_sector_x  object
		community_sector_y  object
		new_requested_date  datetime64[ns, America/Edmonton]
4.	311 Request Status available: ['Closed' 'Open' 'Duplicate (Closed)' 'Duplicate (Open)']
	i.	Count of Open requests: 32788
		a.	Count of Open requests with closed date: 203

In [28]:
# Entire Unique service names
unique_service_name_df = df['service_name'].unique()
print("\n5.\tCount of all distinct service requests:",len(unique_service_name_df))

agency_vise_distinct_req = df[['agency_division','agency_subdivision', 'agency_responsible','service_name']].drop_duplicates()
print("\n6.\tCount of all distinct service requests for all agencies:",len(agency_vise_distinct_req))

#agencies= df['agency_division'].unique()
print("\n7.\tAgencies resposible for handling service requests are:")
for agency in agencies:
    print(f"\t\t- {agency}")


5.	Count of all distinct service requests: 638

6.	Count of all distinct service requests for all agencies: 1037

7.	Agencies resposible for handling service requests are:
		- Chief Financial Officer Department
		- Community Services
		- Operational Services and Compliance
		- Transportation
		- Utilities & Environmental Protection
		- Planning & Development Services
		- Calgary Police & Fire Services
		- Corporate Wide Service Requests
		- Project Information and Control Systems
		- Partnerships
		- Deputy City Manager's Office
		- Legal or Legislative Services
		- Recreation and Social Programs
		- Elected Officials
		- Information Services
		- Affiliated Organizations
		- Fleet and Inventory
		- Office of the City Auditor
