## 4. What is the highest count of scooters being used at the same time? When did it occur? Does this vary by zip code or other geographic region?

#### Note:
#### Finding the exact moment that the most scooters were being used was very computationally expensive (150,000,000,000 loops if checking every 5 minutes of the three months), but can be done with enough time. Because of this, I have decided to look into what days had the most scooters started, and what days had the highest sum of tripduration.

## How the original question was answered:
### What is the highest count of scooters started on the same month and day within the same hour? When was this? Does this vary by zip code or other geographic region?

In [None]:
from sqlalchemy import create_engine, text
import pandas as pd
import matplotlib.pyplot as plt

#### Setting up engine to use SQL

In [None]:
database_name = 'scooters'

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

In [None]:
engine = create_engine(connection_string)

#### Brining in the essential columns and filtering out non-compliant entries

In [None]:
#SQL querry filters on compliance rules 2 and 3. (see question 2)

trips_query = '''
SELECT sumdid, startdate, starttime, enddate, endtime, tripduration, tripdistance, startlatitude,
        startlongitude, endlatitude, endlongitude
FROM trips
WHERE tripduration >= 1.0
	AND tripduration <= (24 * 60);
'''

with engine.connect() as connection:    
    trips = pd.read_sql(text(trips_query), con = connection)
    
drop_entries = trips[(trips['tripdistance'] <= 0) & (trips['tripduration'] >= 5)].index
trips.drop(drop_entries, inplace = True)
trips.head(10)

In [None]:
trips.info()

In [None]:
trips['startdate'] = pd.to_datetime(trips['startdate'])
trips['enddate'] = pd.to_datetime(trips['enddate'])
trips.info()

### Creating start_datetime and end_datetime columns

In [None]:
def fulltimejunc(date, time):
    # given a startdate timestamp with year, month and day and a starttime with hour, 
    # minute, and second: combine the two into one datetime value.
    return pd.Timestamp(year = date.year,
                        month = date.month,
                        day = date.day,
                        hour = time.hour,
                        minute = time.minute,
                        second = time.second,
                        microsecond = time.microsecond)

In [None]:
for index, row in trips.iterrows():
    trips.loc[index, 'start_datetime'] = fulltimejunc(row.startdate, row.starttime)
    trips.loc[index, 'end_datetime'] = fulltimejunc(row.enddate, row.endtime)

trips

In [None]:
trips.sort_values(by=['start_datetime'])

In [None]:
trips['month'] = trips['start_datetime'].dt.month
trips['day'] = trips['start_datetime'].dt.day
trips['hour'] = trips['start_datetime'].dt.hour
trips

### Groupping by month, day, and hour then counting the scooters used in those groups. (All years are 2019)

In [None]:
# can change filtering to month (M), day (D), hour (h) in "freg=<dateformat>"
scooter_usage = pd.DataFrame(trips.groupby(pd.Grouper(key='start_datetime', freq='h'))['sumdid'].count())
scooter_usage.columns = ['scooters_started']
scooter_usage.sort_values('scooters_started', ascending = False)

### The datetime with the most scooters started was month: 5 , day: 16 , hour: 23 (2755 scooters started)
#### -This is almost twice as many scooters started as the datetime with the second most amount of scooters started
##### -Nashville hosted 'The Who' at the Bridgestone areana on this day
### The resulted dataframe will be good to export for dashboarding

In [None]:
# Top 10 datetimes with the most scooters started
scooter_usage.sort_values('scooters_started', ascending = False).head(10)

# Starting analysis by zipcode

In [None]:
trips

In [None]:
from shapely.geometry import Point
import geopandas as gpd
import folium
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

#### creating point geometry for start and end positions

In [None]:
trips['start_geometry'] = trips.apply(lambda x: Point((float(x.startlongitude), 
                                                         float(x.startlatitude))), 
                                        axis=1)

In [None]:
trips['end_geometry'] = trips.apply(lambda x: Point((float(x.endlongitude), 
                                                         float(x.endlatitude))), 
                                        axis=1)

#### pulling in zipcodes data

In [None]:
zipcodes = gpd.read_file('../../data/zipcodes.geojson')
print(zipcodes.crs)
zipcodes.head( )

#### making sure 'crs' is the same for zipcodes and trips

In [None]:
trips_gdf = gpd.GeoDataFrame(trips, 
                           crs = zipcodes.crs, 
                           geometry = trips['start_geometry'])

#### joining zipcodes and trips

In [None]:
trips_by_zip = gpd.sjoin(trips_gdf, zipcodes, op = 'within')
trips_by_zip

In [None]:
df_usage = pd.DataFrame(trips.groupby(pd.Grouper(key='start_datetime', freq='h'))['sumdid'].count())
scooter_usage_days = df_usage.reset_index()
scooter_usage_days.columns = ['start_datetime', 'scooters_started']
scooter_usage_days

In [None]:
scooter_usage_zipcodes = (pd.DataFrame(trips_by_zip.groupby(by = ['zip', pd.Grouper(key='start_datetime', freq='h')])['sumdid'].count())).reset_index()
scooter_usage_zipcodes.columns = ['zip', 'start_datetime', 'scooters_started']
scooter_usage_zipcodes

In [None]:
maxstarted_by_zip = pd.DataFrame(scooter_usage_zipcodes.groupby(by = ['zip'])['scooters_started'].max())

In [None]:
maxstarted_by_zip

In [None]:
max_scootersstarted_zipcodes = pd.merge(maxstarted_by_zip, scooter_usage_zipcodes, 
                               left_on = ['zip', 'scooters_started'], right_on = ['zip','scooters_started'], 
                               how = 'inner')

max_scootersstarted_zipcodes.sort_values('scooters_started', ascending = False) 

#### some zipcodes show up twice due to having the same maximum on different dates  

In [None]:
max_scootersstarted_zipcodes = max_scootersstarted_zipcodes.drop_duplicates(subset = 'zip')
max_scootersstarted_zipcodes

In [None]:
max_scootersstarted_zipcodes['hour'] = max_scootersstarted_zipcodes['start_datetime'].dt.hour

#### filtering to atleast 20 scooters used within the hour

In [None]:
filtered_max_use = max_scootersstarted_zipcodes[max_scootersstarted_zipcodes['scooters_started'] >= 20]

In [None]:
# I found this data a bit difficult to plot considering midnight = 24 and 1am = 1
# midnight would be the very top of the graph, and 1am would be the very bottom
filtered_max_use[['zip','hour']].plot.scatter(x = 'zip', y = 'hour')

## On average, the most scooters are being used from 10 pm to 3 am.
## A few zipcodes have a maximum usage occuring around 5pm.