In [1]:
from sqlalchemy import create_engine
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [2]:
database_name = 'scooters'    # Fill this in with your scooter database name

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"


In [3]:
engine = create_engine(connection_string)


As you know, it's important to gain an understanding of new datasets before diving headlong into analysis. Here are some suggestions for guiding the process of getting to know the data contained in these tables:
- Are there any null values in any columns in either table?
- What date range is represented in each of the date columns? Investigate any values that seem odd.
- Is time represented with am/pm or using 24 hour values in each of the columns that include time?
- What values are there in the sumdgroup column? Are there any that are not of interest for this project?
- What are the minimum and maximum values for all the latitude and longitude columns? Do these ranges make sense, or is there anything surprising?
-What is the range of values for trip duration and trip distance? Do these values make sense? Explore values that might seem questionable.
- Check out how the values for the company name column in the scooters table compare to those of the trips table. What do you notice?


In [5]:
query = '''
SELECT COUNT(*)
FROM scooters
WHERE chargelevel IS NULL
'''

result = engine.execute(query)

In [5]:
result.fetchall()

[(770,)]

scooters: chargelevel has 770 nulls.

In [10]:
query = '''
SELECT MAX(tripduration),MIN(tripduration ) 
FROM trips
'''

result = engine.execute(query)

In [11]:
result.fetchall()

[(Decimal('512619.0'), Decimal('-19.3582666667'))]

In [12]:
query = '''
SELECT MAX(tripdistance),MIN(tripdistance) 
FROM trips
'''

result = engine.execute(query)

In [13]:
result.fetchall()

[(Decimal('31884482.6476'), Decimal('-20324803.8'))]

In [32]:
query = '''
SELECT pubdatetime
FROM scooters
LIMIT 2
'''

result = engine.execute(query)

In [33]:
result.fetchall()
pd.read_sql(query, con = engine)

Unnamed: 0,pubdatetime
0,2019-05-01 00:01:41.247
1,2019-05-01 00:01:41.247


1. During this period, seven companies offered scooters. How many scooters did each company have in this time frame? Did the number for each company change over time? Did scooter usage vary by company?


In [4]:
scooter_count = '''
SELECT DISTINCT companyname, COUNT(DISTINCT(sumdid))
FROM scooters
GROUP BY companyname;
'''

In [5]:
pd.read_sql(scooter_count, con = engine)

Unnamed: 0,companyname,count
0,Bird,3860
1,Bolt,360
2,Gotcha,224
3,Jump,1210
4,Lime,1824
5,Lyft,1735
6,Spin,805


In [10]:
scooter_count = '''
SELECT DISTINCT companyname, COUNT(DISTINCT(sumdid)), EXTRACT(MONTH FROM pubdatetime) as month
FROM scooters
GROUP BY companyname, month;
'''


In [11]:
pd.read_sql(scooter_count, con = engine)


Unnamed: 0,companyname,count,month
0,Bird,2583,7.0
1,Bird,2910,6.0
2,Bird,3064,5.0
3,Bolt,276,7.0
4,Bolt,333,6.0
5,Bolt,346,5.0
6,Gotcha,223,5.0
7,Gotcha,223,6.0
8,Gotcha,224,7.0
9,Jump,976,5.0


In [12]:
scooter_count_distance = '''
SELECT DISTINCT companyname, SUM(tripduration)
FROM trips
GROUP BY companyname;
'''

In [13]:
pd.read_sql(scooter_count_distance, con = engine)

Unnamed: 0,companyname,sum
0,Bird,2046202.0
1,Bolt Mobility,30821500.0
2,Gotcha,33802.78
3,JUMP,211001.3
4,Lime,3507335.0
5,Lyft,1936370.0
6,SPIN,900575.0


2. According to Second Substitute Bill BL2018-1202 (as amended) (https://web.archive.org/web/20181019234657/https://www.nashville.gov/Metro-Clerk/Legislative/Ordinances/Details/7d2cf076-b12c-4645-a118-b530577c5ee8/2015-2019/BL2018-1202.aspx), all permitted operators will first clean data before providing or reporting data to Metro. Data processing and cleaning shall include:  
* Removal of staff servicing and test trips  
* Removal of trips below one minute  
* Trip lengths are capped at 24 hours  
Are the scooter companies in compliance with the second and third part of this rule? 


3. The goal of Metro Nashville is to have each scooter used a minimum of 3 times per day. Based on the data, what is the average number of trips per scooter per day? Make sure to consider the days that a scooter was available. How does this vary by company?

4. What is the highest count of scooters being used at the same time? When did it occur? Does this vary by zip code or other geographic region?

5. SUMDs can provide alternative transportation and provide "last mile" access to public transit. How often are trips starting near public transit hubs? You can download a dataset of bus stop locations from https://data.nashville.gov/Transportation/Regional-Transportation-Authority-Bus-Stops/p886-fnbd.