# Scooter Exploratory Data

**Questions to Answer:**

* Are there any null values in any columns in either table?
    * yes, nulls in the scooters/chargelevel - 770 total

*  What date range is represented in each of the date columns? Investigate any values that seem odd.
    * May 1st 2019 to July 31, 2019

* Is time represented with am/pm or using 24 hour values in each of the columns that include time?
    * 24 Hour Values

* What values are there in the sumdgroup column? Are there any that are not of interest for this project?
    * Scooter, scooter, and bicycle. Bikes are not needed for this project, I assume.

* What are the minimum and maximum values for all the latitude and longitude columns? Do these ranges make sense, or is there anything surprising? 
    * The Minimum Latitude and Maximum Longitude are both 0.0, which would put them on the equator or prime meridian respectably. This makes me wonder if the centerpoint is based on some decided point as opposed to geographic points.

* What is the range of values for trip duration and trip distance? Do these values make sense? Explore values that might seem questionable.
    * There are negitive values for trip duration & distance that don't make sense. Unsure why at this time. 

* Check out how the values for the company name column in the scooters table compare to those of the trips table. What do you notice?
    *

In [1]:
from sqlalchemy import create_engine

In [2]:
import pandas as pd

In [3]:
database_name = 'scooters'    # Fill this in with your scooter database name

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

In [4]:
engine = create_engine(connection_string)

In [5]:
query = '''
SELECT DISTINCT(companyname)
FROM scooters;
'''
result = engine.execute(query)

In [6]:
result.fetchall()

[('Bird',), ('Bolt',), ('Gotcha',), ('Jump',), ('Lime',), ('Lyft',), ('Spin',)]

## Companies Involved Are:
* Bird
* Bolt
* Gotcha
* Jump
* Lime
* Lyft
* Spin

In [7]:
query = '''
SELECT MAX(startdate), MIN(startdate)
FROM trips
'''
result = engine.execute(query)

In [8]:
result.fetchall()

[(datetime.date(2019, 7, 31), datetime.date(2019, 5, 1))]

## Dates pulled are from May 1st 2019 to July 31, 2019

In [9]:
query = '''
SELECT starttime
FROM trips
ORDER BY starttime DESC
LIMIT 5;
'''
result = engine.execute(query)

In [10]:
result.fetchall()

[(datetime.time(23, 59, 59, 506666),),
 (datetime.time(23, 59, 59, 286666),),
 (datetime.time(23, 59, 59, 56666),),
 (datetime.time(23, 59, 59),),
 (datetime.time(23, 59, 59),)]

## Nulls Found In:
* Scooters
    * chargelevel - 770 nulls
* trips
    * pubdatetime - didn't run 

In [11]:
query = '''
SELECT COUNT(*)
FROM scooters
WHERE chargelevel IS NULL;
'''

result = engine.execute(query)

result.fetchall()

[(770,)]

## Types of Vehicles 

In [12]:
query = '''
SELECT DISTINCT(sumdgroup)
FROM scooters;
'''

result = engine.execute(query)

result.fetchall()

[('bicycle',), ('scooter',), ('Scooter',)]

## Min & Max Lng + Lat

In [13]:
query = '''
SELECT MAX(latitude), MIN(latitude)
FROM scooters;
'''

result = engine.execute(query)

result.fetchall()

[(Decimal('3609874.116666'), Decimal('0.000000'))]

In [14]:
query = '''
SELECT MAX(longitude), MIN(longitude)
FROM scooters;
'''

result = engine.execute(query)

result.fetchall()

[(Decimal('0.000000'), Decimal('-97.443879'))]

In [15]:
query5a_eda=  '''
SELECT min(startlatitude) as min_st_lat, max(startlatitude) as max_st_lat,
    min(startlongitude) as min_st_lon, max(startlongitude) as max_st_lon,
    min(endlatitude) as min_ed_lat, max(endlatitude) as max_ed_lat,
    min(endlongitude) as min_ed_lon, max(endlongitude) as max_ed_lon,
    min(tripduration) as min_trp_dur, max(tripduration) as max_trp_dur,
    min(tripdistance) as min_trp_dis, max(tripdistance) as max_trp_dis
FROM trips;
'''

result5a = engine.execute(query5a_eda)

In [16]:
loc_rvw2 = pd.read_sql(query5a_eda, con = engine)
print(loc_rvw2)

   min_st_lat  max_st_lat  min_st_lon  max_st_lon  min_ed_lat  max_ed_lat   
0     35.8532   36.300029  -86.918008    -86.3662  -36.850405   51.045409  \

   min_ed_lon  max_ed_lon  min_trp_dur  max_trp_dur  min_trp_dis   max_trp_dis  
0 -122.673729  174.764886   -19.358267     512619.0  -20324803.8  3.188448e+07  


In [17]:
query5b_eda=  '''
SELECT tripduration as top_10_dur
FROM trips
ORDER BY tripduration DESC
LIMIT 10;
'''

result5b = engine.execute(query5a_eda)

query5c_eda=  '''
SELECT tripduration as btm_10_dur
FROM trips
ORDER BY tripduration 
LIMIT 10;
'''

result5c = engine.execute(query5c_eda)

In [18]:
dur_rvw1 = pd.read_sql(query5b_eda, con = engine)
print(round(dur_rvw1, 2))

dur_rvw2 = pd.read_sql(query5c_eda, con = engine)
print(round(dur_rvw2, 2))

   top_10_dur
0    512619.0
1    257790.0
2     93837.0
3     92977.0
4     78802.0
5     64311.0
6     62717.0
7     62066.0
8     59482.0
9     56793.0
   btm_10_dur
0      -19.36
1      -10.98
2      -10.24
3       -8.00
4       -4.62
5       -1.36
6       -0.72
7       -0.50
8        0.00
9        0.00


In [19]:
query5d_eda=  '''
SELECT tripdistance as top_10_dis
FROM trips
ORDER BY tripdistance DESC
LIMIT 10;
'''

result5d = engine.execute(query5d_eda)

query5e_eda=  '''
SELECT tripdistance as btm_10_dis
FROM trips
ORDER BY tripdistance 
LIMIT 10;
'''

result5e = engine.execute(query5e_eda)

In [20]:
dis_rvw1 = pd.read_sql(query5d_eda, con = engine)
print(round(dis_rvw1,2))

dis_rvw2 = pd.read_sql(query5e_eda, con = engine)
print(round(dis_rvw2,2))

    top_10_dis
0  31884482.65
1  18489501.90
2  18489501.90
3  18489501.90
4   7580025.94
5   6485564.51
6   4607692.28
7   4340344.20
8   4308714.01
9   4275554.96
    btm_10_dis
0 -20324803.80
1 -19900919.27
2  -9337270.64
3  -2758530.27
4  -2253937.08
5  -1685315.01
6  -1684970.53
7  -1684806.48
8  -1684701.50
9  -1119963.95


In [21]:
query6_eda=  '''
SELECT DISTINCT s.companyname as s_nm 
FROM scooters as s;
'''

result6 = engine.execute(query6_eda)

query6a_eda=  '''
SELECT DISTINCT t.companyname as t_nm 
FROM trips as t;
'''

result6a = engine.execute(query6a_eda)

In [22]:
nm_rvw_sctr = pd.read_sql(query6_eda, con = engine)
print(nm_rvw_sctr)

nm_rvw_trp = pd.read_sql(query6a_eda, con = engine)
print(nm_rvw_trp)

     s_nm
0    Bird
1    Bolt
2  Gotcha
3    Jump
4    Lime
5    Lyft
6    Spin
            t_nm
0           Bird
1  Bolt Mobility
2         Gotcha
3           JUMP
4           Lime
5           Lyft
6           SPIN
