# Fayette County E-Scooters & Bicycles Data Exploration

This project is devoted to exploring data related to collisions, injuries, and deaths between bicycles and scooters since the beginning of the Fayette County E-Scooter Project in the spring of 2018. Data analysis including charts will be used to explore the datasets provided by the Lexington Fayette County Data and Analysis Department. Note that two separate datasets were compiled and sent from the LFPD. One concerns data entries solely related to 'e-scooter' searches as well as one for 'bicycle' searches. The second database is the comprehensive list of bicycle and e-scooter-related police reports concerning collisions and injuries. 

The LFPD has some notes that should be made clear concerning this data. One, it was impossible to determine "motor-scooters" from "e-scooters". This means that injuries related to e-scooters are roped in with general motor scooters, so it's difficult to determine if they come from commercial brands like Lime/Bird or are private motor scooters. Two, the lat/lng points of collisions have been censored and are triangulated to the nearest intersection. This gives privacy to involved parties and protects their identities. 

In [1]:
import re
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# set dataframe to options for row and column widths 
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

In [21]:
# load data from csv files provided by Lexington-Fayette Police Department 

# first, the narrative searches for 'bicycle' or 'scooter' collisions data
bicycle_collisions_narrative = pd.read_csv('../data/csv/LFPD_data/scooter_narrative_search.csv')
scooter_collisions_narrative = pd.read_csv('../data/csv/LFPD_data/scooter_narrative_search.csv')



In [23]:
# next, the full bicycle and scooter collision data 
bicycles_all_collisions = pd.read_csv('all_collision_bicyles.csv')
scooters_all_collisions = pd.read_csv('../data/csv/LFPD_data/all_collision_scooters.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'all_collision_bicyles.csv'

In [None]:
# Create DataFrames 
# for scooters
scooter_search = pd.DataFrame(scooter_collisions_narrative)
all_scooter = pd.DataFrame(scooters_all_collisions)

# for bicycles
bicycle_search = pd.DataFrame(bicycle_collisions_narrative)
all_bicycle = pd.DataFrame(bicycles_all_collisions)

## Data Cleanup Pt. 1
### Initial Exploration and DataFrame Manipulation Phase

### All Scooters Data

Let's begin with examining the entire scooters database and look at the data

In [None]:
# print the dfa
all_scooter

In [None]:
## now let's look at the scooter info including Dtypes, number  
all_scooter.info()

In [None]:
## rename columns to be more friendly 
all_scooter.rename(columns = {'#Units': 'Units', 'RD\COND':'Road_cond', 'H&R':'HitAndRun', 'DOW':'DoW', '#KILL':'Killed', '#INJURED':'Injured', '#VEH#':'Number of Vehicles Involved', 'DIRECTIONAL ANALYSIS':'Direction'}, inplace=True)

In [None]:
# what are the counts of different fields?
all_scooter.HitAndRun.value_counts()

#### Average number of hit and runs involving motorized scooters for Fayette County

**True (25) / False (69) = 0.3623**

wow! ~36% of scooter injuries are hit and runs. 
That seems a little high, but maybe we should compare this number to other national averages later for this project. 

Next, let's take a look at the weather conditions involved with scooter collisions here in Lexington. Do wet conditions have any effect on the number of collisions?:




In [None]:
# what are the different values/counts for road conditions with scooters?
all_scooter.Road_cond.value_counts()


Interesting. Most scooter collisions take place during dry road conditions. It seems likely that people do not use scooters in Lexington during rainstorms. :D Or better yet, wet road conditions have little effect on the number of incidents. And why is there one 'other' condition? It's likely debris in the road or some other outlier reason that would affect road conditions. 

Next, I'm going to check against the weather field value counts and see how they correspond with the road conditions.

In [None]:
# Let's go over the weather data field next. See if there is a correlation between weather and wet road conditions
all_scooter.WEATHER.value_counts()

Looks like the weather patterns correspond roughly to the road conditions. There are 8 wet road condition collisions and that corresponds with the roughly 6 weather conditions described in the police report. We can safely say that weather and road conditions do not showcase any leading causal data to related to the frequency of Lexington's motorized scooter incidents.

But right now, I am more concerned with the injury/death data and would like to move forward with that. Before we finish the analysis, let's remember to convert the csv to a geoDataFrame with Lat/long values as the geometries. 

Next, let's take a look at injuries counts.

In [None]:
# Check the values for number of injuries  
all_scooter.Injured.values

In [None]:
# What are the counts of these numbers?
all_scooter.Injured.value_counts()

Looks like there are about ten more single injury-related collisions as non-injury. About 7% of all collisions with motor scooters result in 2 injuries. 

In [None]:
plt.style.use('_mpl-gallery')

fig, ax = plt.subplots(figsize=(8, 2), layout='constrained')
ax.bar('DoW', 'Injured', data=all_scooter, edgecolor='white', linewidth=0.7)
ax.set_xlabel('Day of the Week')
ax.set_ylabel('Injured' )
ax.set_title('# of Injuries by day of the week')

In [None]:
all_scooter.Killed.value_counts()

In [None]:
# Looks like only three scooter collisions have resulted in death since 2018.

# Only ~3% of collisions result in death here in Lexington

3 / 94

In [None]:
# Next, let's check the roadways and see which ones are the most common 
all_scooter.ROADWAY.values

In [None]:
all_scooter['road_counts'] = all_scooter.ROADWAY.value_counts()

In [None]:
fig, ax = plt.subplots(figsize=(15, 5))
all_scooter.ROADWAY.value_counts().plot.bar()
ax.set_xlabel('Street',)
ax.set_ylabel('Collisions' )
ax.set_title('# of Motorized Scooter Collisions by Roadway in Lexington, KY')

The highest amount of collisions take place on Tates Creek, Broadway, Versailles Road, New Circle, Winchester Road respectively. We can come back to this later with a map to interogate locations where collisions take place. Let's move on and look at the months of the year that scooter injuries take place. Our hypothesis is that scooter injuries take place more often during the school season, and UK likely has an affect on the frequency of injuries. Of course, this is just a hypothesis for now, but we should consider this element going forward in studying the age range for collisions involving students and other youthful victims. 

In [None]:
# let's take a look at the DATE field
patterns = [
    r"\d{1,2}/\d{1,2}/\d{2,4}",
]

for pattern in patterns:
    matches = re.findall(pattern, all_scooter.DATE)
    print(matches)

    

## Scooter Narrative Search Database

In [None]:
# set the index row 1 as header for pandas dataframe column
scooter_collisions_narrative.columns = scooter_collisions_narrative.iloc[0]

# remove the first row that has the old column names 
scooter_collisions_narrative = scooter_collisions_narrative[1:]

# print the df
scooter_collisions_narrative

In [None]:
## now let's look at the scooter info including Dtypes, number  
scooter_collisions_narrative.info()

# check the dtypes afterwards
scooter_collisions_narrative.dtypes

### All Bicycles Database


In [None]:
# set the index row 1 as header for pandas dataframe columns
all_bicycle.columns = all_bicycle.iloc[0]

# remove the first row that has the old column names 
all_bicycle = all_bicycle[1:]
# print the dfa
all_bicycle

## Bicycle Narrative Search Database 

In [None]:
bicycle_collisions_narrative