In [1]:
from sqlalchemy import create_engine, text

First, we need to create a connection string. The format is

 ```<dialect(+driver)>://<username>:<password>@<hostname>:<port>/<database>```

To connect to the Lahman baseball database, you can use the following connection string.

In [2]:
database_name = 'scooters'# Fill this in with your database name 

connection_string = f"postgresql://postgres:postgres@localhost:5433/{database_name}"

In [3]:
connection_string 

'postgresql://postgres:postgres@localhost:5433/scooters'

Now, we need to create an engine and use it to connect.

In [4]:
engine = create_engine(connection_string)

In [5]:
engine

Engine(postgresql://postgres:***@localhost:5433/scooters)

Now, we can create our query and pass it into the `.query()` method.

In [6]:
query = '''
SELECT *
FROM scooters 
LIMIT 10;
'''

with engine.connect() as connection:
    result = connection.execute(text(query))

In [7]:
result.fetchone()

(datetime.datetime(2019, 6, 4, 4, 15, 55, 627000), Decimal('36.163510'), Decimal('-86.778920'), 'Powered4557302', 'Powered', Decimal('35.00'), 'Scooter', Decimal('0.23'), 'Spin')

In [8]:
import pandas as pd

In [9]:
with engine.connect() as connection:
    scooters_query = pd.read_sql_query(text(query), con = connection)

scooters_query.head()

Unnamed: 0,pubdatetime,latitude,longitude,sumdid,sumdtype,chargelevel,sumdgroup,costpermin,companyname
0,2019-06-04 04:15:55.627,36.16351,-86.77892,Powered4557302,Powered,35.0,Scooter,0.23,Spin
1,2019-06-04 04:15:55.627,36.17193,-86.78275,Powered7950702,Powered,84.0,Scooter,0.23,Spin
2,2019-06-04 04:15:55.627,36.17869,-86.7772,Powered8985953,Powered,49.0,Scooter,0.23,Spin
3,2019-06-04 04:15:55.627,36.155821,-86.780287,Powered1080426,Powered,53.0,Scooter,0.23,Spin
4,2019-06-04 04:15:55.627,36.17119,-86.78897,Powered9958429,Powered,78.0,Scooter,0.23,Spin


In [10]:
query = '''
SELECT * 
FROM scooters
LIMIT 100;
'''

with engine.connect() as connection:
    scooters_query = pd.read_sql(text(query), con = connection)

scooters_query.head()

Unnamed: 0,pubdatetime,latitude,longitude,sumdid,sumdtype,chargelevel,sumdgroup,costpermin,companyname
0,2019-06-04 04:15:55.627,36.16351,-86.77892,Powered4557302,Powered,35.0,Scooter,0.23,Spin
1,2019-06-04 04:15:55.627,36.17193,-86.78275,Powered7950702,Powered,84.0,Scooter,0.23,Spin
2,2019-06-04 04:15:55.627,36.17869,-86.7772,Powered8985953,Powered,49.0,Scooter,0.23,Spin
3,2019-06-04 04:15:55.627,36.155821,-86.780287,Powered1080426,Powered,53.0,Scooter,0.23,Spin
4,2019-06-04 04:15:55.627,36.17119,-86.78897,Powered9958429,Powered,78.0,Scooter,0.23,Spin


## Metro Scooters Analysis
In May of 2018, Bird dropped hundreds of scooters on the streets of Nashville with no permission. In response, Metro sued, which caused Bird to remove and wait for permits. Metro began developing regulations for scooters and other shared urban mobility devices (SUMDs). In 2019, the Metro Council passed legislation enacting a one-year pilot program for scooters. For this project, you have been provided with the data for 3 months of this pilot program with the goal of reporting on usage trends and generating recommendations for quantity and distribution of scooters in Nashville.

Metro would like to know what the ideal density of available scooters is, which balances the objectives of
enabling scooters to serve transportation goals,
discouraging scooters from piling up on sidewalks,
keeping it economically viable for companies to operate equitably in the city.

This data for this project can be downloaded as a Postgres backup from https://drive.google.com/file/d/1BXAfByFvHCwX0G1BvTCQ373qKm7wE4Y-/view?usp=share_link.

Some notes about the data:
* When not in use, each scooter will report its location every five minutes. This data is contained in the scooters table.
* WARNING: Both tables contain a large number of records, so think carefully about what data you need to pull in a given query. If you try and pull in all rows from the scooters table, there is a very good chance that you will crash your notebook!

As you know, it's important to gain an understanding of new datasets before diving headlong into analysis. Here are some suggestions for guiding the process of getting to know the data contained in these tables:
- Are there any null values in any columns in either table?
- What date range is represented in each of the date columns? Investigate any values that seem odd.
- Is time represented with am/pm or using 24 hour values in each of the columns that include time?
- What values are there in the sumdgroup column? Are there any that are not of interest for this project?
- What are the minimum and maximum values for all the latitude and longitude columns? Do these ranges make sense, or is there anything surprising?
-What is the range of values for trip duration and trip distance? Do these values make sense? Explore values that might seem questionable.
- Check out how the values for the company name column in the scooters table compare to those of the trips table. What do you notice?

Once you've gotten an understanding of what is contained in the available tables, start with addressing these questions:
1. During this period, seven companies offered scooters. How many scooters did each company have in this time frame? Did the number for each company change over time? Did scooter usage vary by company?
2. According to Second Substitute Bill BL2018-1202 (as amended) (https://web.archive.org/web/20181019234657/https://www.nashville.gov/Metro-Clerk/Legislative/Ordinances/Details/7d2cf076-b12c-4645-a118-b530577c5ee8/2015-2019/BL2018-1202.aspx), all permitted operators will first clean data before providing or reporting data to Metro. Data processing and cleaning shall include:  
* Removal of staff servicing and test trips  
* Removal of trips below one minute  
* Trip lengths are capped at 24 hours  
Are the scooter companies in compliance with the second and third part of this rule? 
3. The goal of Metro Nashville is to have each scooter used a minimum of 3 times per day. Based on the data, what is the average number of trips per scooter per day? Make sure to consider the days that a scooter was available. How does this vary by company?
4. Metro would like to know how many scooters are needed, and something that could help with this is knowing peak demand. Estimate the highest count of scooters being used at the same time. When were the highest volume times? Does this vary by zip code or other geographic region?
5. **Stretch Goal:** SUMDs can provide alternative transportation and provide "last mile" access to public transit. How often are trips starting near public transit hubs? You can download a dataset of bus stop locations from https://data.nashville.gov/Transportation/WeGo-Transit-Bus-Stops/vfe9-k7vc/about_data.

Deliverables:
At the conclusion of this project, your group should deliver a presentation which addresses the following points:
* Are scooter companies in compliance with the required data cleaning?
* What are typical usage patterns for scooters in terms of time, location, and trip duration?
* What are your recommendations for total number of scooters for the city overall and density of scooters by zip code?
* **Stretch Goal:** Does it appear that scooters are used as "last mile" transportation from public transit hubs to work or school?



scooters data: 
scooters table - 73,414,043 rows
trips table - 565,522
10012 distinct IDs scooters in scooters table, 9005 distinct IDs in trips table

distinct company names in both tables: 

"Bird"

"Bolt"
"Bolt Mobility"

"Gotcha"

"Jump"
"JUMP"

"Lime"

"Lyft"

"Spin"
"SPIN"

In [11]:
query = '''
SELECT *
FROM trips;
'''
with engine.connect() as connection:
    trips = pd.read_sql(text(query), con = connection)
trips.head()

Unnamed: 0,pubtimestamp,companyname,triprecordnum,sumdid,tripduration,tripdistance,startdate,starttime,enddate,endtime,startlatitude,startlongitude,endlatitude,endlongitude,triproute,create_dt
0,2019-05-01 00:00:55.423,Bird,BRD2134,Powered9EAJL,3.0,958.00528,2019-05-01,00:00:20.460000,2019-05-01,00:02:52.346666,36.1571,-86.8036,36.1566,-86.8067,"[(36.157235, -86.803612), (36.157235, -86.8036...",2019-05-02 05:30:23.780
1,2019-05-01 00:03:33.147,Lyft,LFT5,Powered296631,1.7156,1371.39112,2019-05-01,00:01:50.090000,2019-05-01,00:03:33.026666,36.15797,-86.77896,36.16054,-86.77689,"[(36.15797, -86.77896), (36.15795, -86.77873),...",2019-05-02 07:20:32.757
2,2019-05-01 00:05:55.570,Bird,BRD2168,Powered7S2UU,3.0,2296.588,2019-05-01,00:03:47.363333,2019-05-01,00:07:13.596666,36.1547,-86.7818,36.1565,-86.7868,"[(36.155068, -86.782124), (36.156597, -86.78675)]",2019-05-02 05:30:24.530
3,2019-05-01 00:05:55.570,Bird,BRD2166,PoweredZIIVX,3.0,1200.78744,2019-05-01,00:04:21.386666,2019-05-01,00:06:59.176666,36.1494,-86.7795,36.1531,-86.7796,"[(36.149741, -86.779344), (36.149741, -86.7793...",2019-05-02 05:30:24.237
4,2019-05-01 00:05:55.570,Bird,BRD2165,PoweredJ7MB3,2.0,351.04988,2019-05-01,00:04:27.796666,2019-05-01,00:06:23.150000,36.1778,-86.7866,36.1774,-86.7876,"[(36.177699, -86.786477), (36.177711, -86.7864...",2019-05-02 05:30:24.207


In [None]:
query = '''
SELECT *
FROM scooters
WHERE companyname = 'Bird';
'''
with engine.connect() as connection:
    bird = pd.read_sql(text(query), con = connection)
bird.head()