In this notebook, you'll see how to connect to a Postgres database using the sqlalchemy library.

For this notebook, you'll need both the `sqlalchemy` and `psycopg2` libraries installed.

In [56]:
from sqlalchemy import create_engine
import pandas as pd
import numpy as np
import matplotlib
import datetime
import seaborn as sns

First, we need to create a connection string. The format is

 ```<dialect(+driver)>://<username>:<password>@<hostname>:<port>/<database>```


In [2]:
database_name = 'scooters'    # Fill this in with your scooter database name

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

Now, we need to create an engine and use it to connect.

In [3]:
engine = create_engine(connection_string)

Now, we can create our query and pass it into the `.query()` method.

In [4]:
# Look at difference in run time for this:
query = '''
SELECT latitude, longitude
FROM scooters
LIMIT 5;
'''

result = engine.execute(query)

In [5]:
# Vs this:
query = '''
SELECT COUNT(latitude)
FROM scooters;
'''

result = engine.execute(query)

You can then fetch the results as tuples using either `fetchone` or `fetchall`:

In [6]:
result.fetchone()

(73414043,)

In [7]:
result.fetchall()

[]

On the other hand, sqlalchemy plays nicely with pandas.

In [8]:
import pandas as pd

In [9]:
lat = pd.read_sql(query, con = engine)
lat.head()

Unnamed: 0,count
0,73414043


For much more information about SQLAlchemy and to see a more “Pythonic” way to execute queries, see Introduction to Databases in Python: https://www.datacamp.com/courses/introduction-to-relational-databases-in-python

1. During this period, seven companies offered scooters. How many scooters did each company have in this time frame? Did the number for each company change over time? Did scooter usage vary by company?


In [10]:
query = '''
SELECT COUNT(pubdatetime)
FROM scooters
WHERE pubdatetime IS NULL
'''

2. According to Second Substitute Bill BL2018-1202 (as amended) (https://web.archive.org/web/20181019234657/https://www.nashville.gov/Metro-Clerk/Legislative/Ordinances/Details/7d2cf076-b12c-4645-a118-b530577c5ee8/2015-2019/BL2018-1202.aspx), all permitted operators will first clean data before providing or reporting data to Metro. Data processing and cleaning shall include:  
* Removal of staff servicing and test trips  
* Removal of trips below one minute  
* Trip lengths are capped at 24 hours  
Are the scooter companies in compliance with the second and third part of this rule? 

In [11]:
nulls = pd.read_sql(query, con = engine)
nulls.head()

Unnamed: 0,count
0,0


In [12]:
query= '''
SELECT SUM(CASE WHEN chargelevel is null THEN 1 ELSE 0 END) 
AS charge_nulls 
FROM scooters
'''

In [13]:
query= '''
SELECT SUM(CASE WHEN latitude is null THEN 1 ELSE 0 END) 
AS latitude_nulls 
FROM scooters
'''



In [14]:
latitude_nulls = pd.read_sql(query, con = engine)
latitude_nulls.head()

Unnamed: 0,latitude_nulls
0,0


In [15]:
query= '''
SELECT SUM(CASE WHEN latitude is null THEN 1 ELSE 0 END) 
AS longitude_nulls 
FROM scooters
'''

In [16]:
longitude_nulls = pd.read_sql(query, con = engine)
longitude_nulls.head()

Unnamed: 0,longitude_nulls
0,0


In [17]:
query= '''
SELECT SUM(CASE WHEN latitude is null THEN 1 ELSE 0 END) 
AS tripduration_nulls 
FROM scooters
'''

In [18]:
tripduration_nulls = pd.read_sql(query, con = engine)
tripduration_nulls.head()

Unnamed: 0,tripduration_nulls
0,0


- What date range is represented in each of the date columns? Investigate any values that seem odd.


- Is time represented with am/pm or using 24 hour values in each of the columns that include time?

- What values are there in the sumdgroup column? Are there any that are not of interest for this project?

In [19]:
result = engine.execute('SELECT DISTINCT sumdgroup FROM scooters').fetchall()
print('Distinct values in sumdgroup column:')
for row in result:
    print(row[0])

Distinct values in sumdgroup column:
bicycle
scooter
Scooter


- What are the minimum and maximum values for all the latitude and longitude columns? Do these ranges make sense, or is there anything surprising?

In [25]:
query= '''
SELECT MAX tripduration
FROM scooters;
'''


-What is the range of values for trip duration and trip distance? Do these values make sense? Explore values that might seem questionable.

- Check out how the values for the company name column in the scooters table compare to those of the trips table. What do you notice?

In [21]:
query= '''
(SELECT companyname, COUNT(*), 'scooters' as type
FROM scooters
Group by companyname)
UNION
(SELECT companyname, COUNT(*), 'trips' as type
FROM trips
Group by companyname)
ORDER BY companyname, type
'''

companies = pd.read_sql(query, con = engine)
print(companies)

      companyname     count      type
0            Bird  12251590  scooters
1            Bird    152745     trips
2            Bolt   3477198  scooters
3   Bolt Mobility     21890     trips
4          Gotcha   4679280  scooters
5          Gotcha      3315     trips
6            Jump  21835098  scooters
7            JUMP      6437     trips
8            Lime  16524261  scooters
9            Lime    225694     trips
10           Lyft   9087043  scooters
11           Lyft    120991     trips
12           Spin   5559573  scooters
13           SPIN     34450     trips


1. During this period, seven companies offered scooters. How many scooters did each company have in this time frame? Did the number for each company change over time? Did scooter usage vary by company?

In [None]:
query = '''
(SELECT DISTINCT companyname, COUNT(DISTINCT sumdid) as total_scooters
FROM scooters
Group by companyname)'''

unique_scooters = pd.read_sql(query, con = engine)
print(unique_scooters)


In [26]:
query = '''
SELECT DISTINCT(sumdid),
companyname
FROM scooters;
'''
companies = pd.read_sql(query, con = engine)
print(companies)

                                             sumdid companyname
0      Powered-017d3133-f14a-2b83-ee4f-d777e7c5b619        Bolt
1      Powered-01a24436-0315-e1bb-7ce0-d081d05dff7d        Bolt
2      Powered-03be23ca-d43b-222f-be54-e44b5b4690df        Bolt
3      Powered-046201fb-6532-1f37-6334-3612fb1e61f7        Bolt
4      Powered-0479bb84-afbd-0426-f1c4-df628542a88c        Bolt
...                                             ...         ...
10013                         Standard5JXOV277MCWID        Lime
10014                         StandardNPOOZNUSGAXZN        Lime
10015                         StandardNUTLLXP4G37OI        Lime
10016                         StandardNW5HJFO4R32LY        Lime
10017                         StandardZPUQESHVPP74J        Lime

[10018 rows x 2 columns]


In [27]:
 query = '''
SELECT sumdid,
companyname, MIN(pubdatetime)
FROM scooters
GROUP BY sumdid, companyname;
'''

company_scooters_time = pd.read_sql(query, con = engine)
company_scooters_time.head()

Unnamed: 0,sumdid,companyname,min
0,Powered-017d3133-f14a-2b83-ee4f-d777e7c5b619,Bolt,2019-05-24 00:04:42
1,Powered-01a24436-0315-e1bb-7ce0-d081d05dff7d,Bolt,2019-05-24 00:04:42
2,Powered-03be23ca-d43b-222f-be54-e44b5b4690df,Bolt,2019-05-24 00:04:42
3,Powered-046201fb-6532-1f37-6334-3612fb1e61f7,Bolt,2019-05-28 20:53:56
4,Powered-0479bb84-afbd-0426-f1c4-df628542a88c,Bolt,2019-05-24 00:04:42


Unnamed: 0,companyname,num_scooters


In [39]:
company_scooters_time.groupby([company_scooters_time['companyname'], company_scooters_time['min'].dt.date]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,sumdid,min
companyname,min,Unnamed: 2_level_1,Unnamed: 3_level_1
Bird,2019-05-01,1545,1545
Bird,2019-05-02,164,164
Bird,2019-05-03,55,55
Bird,2019-05-04,29,29
Bird,2019-05-05,6,6
...,...,...,...
Spin,2019-07-14,1,1
Spin,2019-07-17,1,1
Spin,2019-07-19,1,1
Spin,2019-07-22,1,1


In [54]:
company_scooters_time['min'] = pd.to_datetime(company_scooters_time['min'])

KeyError: 'min'

In [55]:
sns.lineplot(data = scoot_df, x='date', y='cumulative_count', hue='companyname')

NameError: name 'sns' is not defined

TypeError: no numeric data to plot

2. According to Second Substitute Bill BL2018-1202 (as amended) (https://web.archive.org/web/20181019234657/https://www.nashville.gov/Metro-Clerk/Legislative/Ordinances/Details/7d2cf076-b12c-4645-a118-b530577c5ee8/2015-2019/BL2018-1202.aspx), all permitted operators will first clean data before providing or reporting data to Metro. Data processing and cleaning shall include:  
* Removal of staff servicing and test trips  
* Removal of trips below one minute  
* Trip lengths are capped at 24 hours  
Are the scooter companies in compliance with the second and third part of this rule? 

3. The goal of Metro Nashville is to have each scooter used a minimum of 3 times per day. Based on the data, what is the average number of trips per scooter per day? Make sure to consider the days that a scooter was available. How does this vary by company?

4. What is the highest count of scooters being used at the same time? When did it occur? Does this vary by zip code or other geographic region?

5. SUMDs can provide alternative transportation and provide "last mile" access to public transit. How often are trips starting near public transit hubs? You can download a dataset of bus stop locations from https://data.nashville.gov/Transportation/Regional-Transportation-Authority-Bus-Stops/p886-fnbd.