<a href="https://colab.research.google.com/github/MohammadWaleed339/internship_collab_file/blob/main/Labmentix_Project_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name** - **FBI Time Series Forecasting**
## **Project Type** - EDA
### **Contribution - Team**

*   **Team Member 1** - Mohd. Tabrej Khan
*   **Team Member 2** - Mohammad Waleed
*   **Team Member 3** - Owais Khan



# **Project Summary**


## ***Methodology***
1. **Data Cleaning**:
   - Handle missing values and inconsistent formatting.
   - Convert date/time columns to proper formats.
2. **Exploratory Data Analysis (EDA)**:
   - Visualizing distribution of queries and response times.
   - Identifying correlations between response time and CSAT scores.
3. **Agent Performance Analysis**:
   - Comparing performance across tenure and shifts.
   - Evaluating agent effectiveness based on CSAT scores.
4. **Insights & Recommendations**:
   - Factors impacting CSAT scores.
   - Potential improvements for support response time.






# **Github link**

*   Mohd. Tabrej Khan - https://github.com/Mohd-Tabrej-Khan
*   Mohammad Waleed -  https://github.com/MohammadWaleed339
*   Owais Khan - https://github.com/Owaiskhan3320






# **General Guidelines**

## ***1. Code Structure and Documentation***
- The code is well-structured, formatted, and documented with clear comments explaining logic and implementation.
- Followed PEP8 guidelines for Python code formatting.
- Used meaningful variable and function names for better readability.

## ***2. Exception Handling and Production-Grade Code***
- Implemented try-except blocks to prevent unexpected errors.
- The code is modular, making it efficient and maintainable.
- The Jupyter Notebook is deployment-ready, ensuring it runs smoothly from start to finish.

## ***3. Proper Commenting and Documentation***
- Each function and logic block has detailed comments explaining its purpose and implementation.
- Docstrings (`''' '''`) are used for function definitions to improve clarity.
- A README file is included to guide users on executing the project.

## ***4. Data Visualization and Chart Guidelines***
As part of the analysis, multiple charts have been generated to provide insights into content distribution, ratings, and popularity trends. Each visualization includes:
- **Chart Title**: Clearly describes the visualization.
- **Purpose of the Chart**: Explains why the chart is included and its relevance to the analysis.
- **Insights Derived from the Chart**: Key takeaways, observed patterns, and significant trends.


# **Know Your Data**

## ***Importing the necessary libraries***.
First, we will import all the necessary libraries we will be going to utilise in our project.

In [None]:
import pandas as pd #for data handling
import numpy as np #for numerical operations
import matplotlib.pyplot as plt # for basic viz
import seaborn as sns # for advanced viz
from sklearn.impute import SimpleImputer # for handling missing values
import warnings
warnings.filterwarnings('ignore')

## ***Loading the dataset.***
Here we have been provided with the ***FBI Time Series data***, in **.csv** file format. We will now upload our data using pandas.
We have uploaded the data into the __GitHub__ repository for open acces and loaded it.

In [None]:
!wget -O Train.xlsx "https://github.com/Mohd-Tabrej-Khan/internship_collab_file/releases/download/Datasets/Train.xlsx"

--2025-03-23 05:49:18--  https://github.com/Mohd-Tabrej-Khan/internship_collab_file/releases/download/Datasets/Train.xlsx
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/937491692/04ef4b5f-9047-47bf-b20d-a4adb788e967?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250323%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250323T054918Z&X-Amz-Expires=300&X-Amz-Signature=0f3e6f48db4ab52ce78f9a8ac0600b943fa8fbfe8a83981173a9ed2ce6cb400c&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3DTrain.xlsx&response-content-type=application%2Foctet-stream [following]
--2025-03-23 05:49:18--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/937491692/04ef4b5f-9047-47bf-b20d-a4adb788e967?X-Amz-Algorithm=AWS4-HMAC-SHA

In [None]:
fbi = pd.read_excel("Train.xlsx")

## ***Take a first look at the data.***

In [None]:
fbi.head()

Unnamed: 0,TYPE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,Latitude,Longitude,HOUR,MINUTE,YEAR,MONTH,DAY,Date
0,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,16.0,15.0,1999,5,12,1999-05-12
1,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,15.0,20.0,1999,5,7,1999-05-07
2,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,16.0,40.0,1999,4,23,1999-04-23
3,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,11.0,15.0,1999,4,20,1999-04-20
4,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,17.0,45.0,1999,4,12,1999-04-12


Check for the Rows and Columns in the data, know about the various features and their data types.

In [None]:
fbi.shape

(474565, 13)

In [None]:
fbi.columns

Index(['TYPE', 'HUNDRED_BLOCK', 'NEIGHBOURHOOD', 'X', 'Y', 'Latitude',
       'Longitude', 'HOUR', 'MINUTE', 'YEAR', 'MONTH', 'DAY', 'Date'],
      dtype='object')

In [None]:
fbi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 474565 entries, 0 to 474564
Data columns (total 13 columns):
 #   Column         Non-Null Count   Dtype         
---  ------         --------------   -----         
 0   TYPE           474565 non-null  object        
 1   HUNDRED_BLOCK  474552 non-null  object        
 2   NEIGHBOURHOOD  423074 non-null  object        
 3   X              474565 non-null  float64       
 4   Y              474565 non-null  float64       
 5   Latitude       474565 non-null  float64       
 6   Longitude      474565 non-null  float64       
 7   HOUR           425200 non-null  float64       
 8   MINUTE         425200 non-null  float64       
 9   YEAR           474565 non-null  int64         
 10  MONTH          474565 non-null  int64         
 11  DAY            474565 non-null  int64         
 12  Date           474565 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(6), int64(3), object(3)
memory usage: 47.1+ MB


## ***Check for Unique Values across each feature.***

In [None]:
# check for the unique values across each column
fbi.nunique()

Unnamed: 0,0
TYPE,9
HUNDRED_BLOCK,20566
NEIGHBOURHOOD,24
X,84225
Y,82768
Latitude,89488
Longitude,87190
HOUR,24
MINUTE,60
YEAR,13


## ***Check for the Duplicate Values.***

In [None]:
# check for duplicate values in entire dataset
print("Total duplicate values:",fbi.reset_index().duplicated().sum())

Total duplicate values: 0


## ***Check for the Missing values.***

In [None]:
# check for null or missing values in our data across each feature
fbi.isnull().sum()

Unnamed: 0,0
TYPE,0
HUNDRED_BLOCK,13
NEIGHBOURHOOD,51491
X,0
Y,0
Latitude,0
Longitude,0
HOUR,49365
MINUTE,49365
YEAR,0


# **Data Cleaning**

In [None]:
fbi.groupby(['Latitude', 'Longitude'])['NEIGHBOURHOOD'][84049]

IndexError: Column(s) NEIGHBOURHOOD already selected

In [None]:
# now dropping those records for which we have no information available
# here we have used multiple conditions based on different fields
# filter out only that data thats is useful to us
update_fbi = fbi.loc[~((fbi['HUNDRED_BLOCK'] == "OFFSET TO PROTECT PRIVACY") &
                     (fbi['NEIGHBOURHOOD'].isnull()) &
                     (fbi['Latitude'] == 0) &
                     (fbi['Longitude'] == 0) &
                     (fbi['HOUR'].isnull()) &
                     (fbi['MINUTE'].isnull()))]

In [None]:
update_fbi.shape

(425200, 13)

In [None]:
update_fbi.isnull().sum()

Unnamed: 0,0
TYPE,0
HUNDRED_BLOCK,13
NEIGHBOURHOOD,0
X,0
Y,0
Latitude,0
Longitude,0
HOUR,0
MINUTE,0
YEAR,0


Finding the blocks and neighbourhoods using the lats and longs

In [None]:
# Function to get the neighborhood from lat and long.

!pip install geopy
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

# Initialize Nominatim geocoder with a custom user agent
geolocator = Nominatim(user_agent="my-geocoding-app/1.0")  # Replace with your app name and version

def get_neighborhood(latitude, longitude):
    try:
        location = geolocator.reverse((latitude, longitude), exactly_one=True, addressdetails=True)
        # Extract neighborhood information from the address details
        address = location.raw['address']
        neighborhood = address.get('neighbourhood') or address.get('suburb') or address.get('city_district')
        return neighborhood
    except GeocoderTimedOut:
        return None  # Handle timeout errors




In [None]:
# Example usage
latitude = 	49.281843	  # Example latitude
longitude = -123.099582	 # Example longitude
neighborhood = get_neighborhood(latitude, longitude)

if neighborhood:
    print(f"Neighborhood: {neighborhood}")
else:
    print("Neighborhood not found.")

Neighborhood: Downtown Eastside


In [None]:
update_fbi['NEIGHBOURHOOD'] = update_fbi['NEIGHBOURHOOD'].fillna('Downtown Eastside')

In [None]:
fbi_fill_block = update_fbi[update_fbi['HUNDRED_BLOCK'].isnull()]

In [None]:
!pip install geopy

import pandas as pd
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

# Initialize Nominatim geocoder
geolocator = Nominatim(user_agent="my-geocoding-app/1.0")

def get_block_data_from_lat_long(latitude, longitude):
    try:
        # Reverse geocode using latitude and longitude to get address details
        address_details = geolocator.reverse((latitude, longitude), exactly_one=True, addressdetails=True)

        # Extract 'road' or 'house_number' for block information
        block_data = address_details.raw['address'].get('road') or address_details.raw['address'].get('house_number')
        return block_data
    except GeocoderTimedOut:
        return None  # Handle timeout errors
    except AttributeError:
        return None  # Handle cases where 'address' is missing in the response




In [None]:
update_fbi['HUNDRED_BLOCK'] = update_fbi.apply(lambda row: get_block_data_from_lat_long(row['Latitude'], row['Longitude'])
                                               if pd.isnull(row['HUNDRED_BLOCK']) else row['HUNDRED_BLOCK'], axis=1)

In [None]:
update_fbi[update_fbi['HUNDRED_BLOCK'].isnull()]

Unnamed: 0,TYPE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,Latitude,Longitude,HOUR,MINUTE,YEAR,MONTH,DAY,Date
115213,Theft from Vehicle,,Mount Pleasant,492366.0,5456595.0,49.262072,-123.104923,6.0,3.0,2001,8,15,2001-08-15


In [None]:
update_fbi['HUNDRED_BLOCK'] = update_fbi['HUNDRED_BLOCK'].fillna('7V6W+R2P')

In [None]:
update_fbi['HUNDRED_BLOCK'].isnull().sum()

np.int64(0)

In [None]:
update_fbi.isnull().sum()

Unnamed: 0,0
TYPE,0
HUNDRED_BLOCK,0
NEIGHBOURHOOD,0
X,0
Y,0
Latitude,0
Longitude,0
HOUR,0
MINUTE,0
YEAR,0
