## Designing the Database

Each citibike file records information about every single trip that was taken during a single month of the year. There are files for each month starting from June 2013. Each citibike file has the same format. The order and the description of the colomns are as follows:
- Trip Duration (seconds): The length of the trip in seconds
- Start Date & Time: The start time of the trip MM-DD-YYYY HH:MM:SS
- End Date & Time: The end time of the trip MM-DD-YYYY HH:MM:SS
- Start Station ID: The ID for the station where the trip started
- Start Station Name: The name of the station where the trip started
- Start Station Latitude: The latitude of the station where the trip started
- Start Station Longitude: The longitude of the station where the trip started
- End Station ID: The ID for the station where the trip ended
- End Station Name: The name of the station where the trip ended
- End Station Latitude: The latitude of the station where the trip ended
- End Station Longitude: The longitude of the station where the trip ended
- Bike ID: The ID for the bike that was used in the trip
- User Type: What type of user took the trip (Subscriber or Customer)
- Gender: The gender of the user (Male - 1, Female - 2, None - 0)
- Year of Birth: The year that the user was born

<img src="./Data/Images/DatabaseDiagramW.png" width="600" height="800" align="center"/>

*Note: If you cannot see the label names try editing the markdown code (double click diagram) and change the src from DatabaseDiagramW.png to DatabaseDiagramB.png

## Connecting to the Database

In [1]:
pip install psycopg2-binary;

Collecting psycopg2-binary
  Using cached psycopg2_binary-2.8.6-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
Installing collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.8.6
Note: you may need to restart the kernel to use updated packages.


In [2]:
import psycopg2

In [3]:
# Put the password in 
PGHOST = 'tripdatabase2.cmaaautpgbsf.us-east-2.rds.amazonaws.com'
PGDATABASE = ''
PGUSER = 'postgres'
PGPASSWORD = 'Josh1234'

In [4]:
# Database Context Manager
try:   
    # Set up a connection to the postgres server.    
    conn = psycopg2.connect(user = PGUSER,
                            port = "5432",
                            password = PGPASSWORD,
                            host = PGHOST,
                            database = PGDATABASE)
    # Create a cursor object
    cursor = conn.cursor()   
    cursor.execute("SELECT version();")
    record = cursor.fetchone()
    print("Connection Success:", record,"\n")

except (Exception, psycopg2.Error) as error:
    print("Error while connecting to PostgreSQL", error)

Connection Success: ('PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit',) 



## Database Construction I - Creating the Staging Table

In [5]:
pip install s3fs;

Collecting botocore<1.17.45,>=1.17.44
  Using cached botocore-1.17.44-py2.py3-none-any.whl (6.5 MB)
[31mERROR: boto3 1.16.36 has requirement botocore<1.20.0,>=1.19.36, but you'll have botocore 1.17.44 which is incompatible.[0m
[31mERROR: awscli 1.18.196 has requirement botocore==1.19.36, but you'll have botocore 1.17.44 which is incompatible.[0m
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.19.36
    Uninstalling botocore-1.19.36:
      Successfully uninstalled botocore-1.19.36
Successfully installed botocore-1.17.44
Note: you may need to restart the kernel to use updated packages.


In [5]:
import pandas as pd
import s3fs
import os
from io import StringIO
import Queries

In [14]:
# The S3 Bucket that will be used to store the data should be created beforehand
ACCESS_KEY_ID = 'AKIARJEUISD2VILSZ6HM'
ACCESS_SECRET_KEY = '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'

fs = s3fs.S3FileSystem(anon=False, key = ACCESS_KEY_ID, secret= ACCESS_SECRET_KEY)
trip_filenames = fs.ls("s3://williams-citibike/TripData/")

In [15]:
# TAbles module. One function for all the tables. 
staging_table_query = """
               CREATE TABLE IF NOT EXISTS staging (
                   tripduration NUMERIC, 
                   starttime TIMESTAMP,
                   endtime TIMESTAMP,
                   startID NUMERIC,
                   startname VARCHAR(64),
                   start_lat REAL,
                   start_long REAL,
                   endID NUMERIC,
                   endname VARCHAR(64),
                   end_lat REAL,
                   end_long REAL,
                   bikeID INTEGER,
                   usertype VARCHAR(16),
                   birthyear REAL,
                   gender SMALLINT                
              );
              """
cursor.execute("rollback;")
cursor.execute(staging_table_query)
conn.commit()

In [4]:
def upload_data(conn, data: pd.DataFrame(), table: str):
    datastream = StringIO()
    cursor = conn.cursor()
    
    data.to_csv(datastream, index=False, header=False)
    datastream.seek(0)
    
    cursor.copy_from(datastream,table,sep=',')
    conn.commit()
    
    return None    

In [19]:
def populate_staging(datafile: str) -> None:
    """Grabs the data from the s3 bucket and edits it so that it can be uploaded to the staging table
    
    Parameters
    ----------
    datafile : str
        The name of a file in the s3 bucket without the s3:// prefix

    Returns
    -------
    None:
        If executed properly the database should now have rows corresponding to the rows in the data
    """
       
    with fs.open("s3://"+datafile, 'r') as file:
        data = pd.read_csv(file, na_values ="")   # Can't use the C engine to speed this up
        data.fillna(-1, inplace=True)   # Empty spaces need to be integers for birthyear REAL type in database
        
        #Some stations have commas in their name causing the copy_from to register extra data fields
        data.iloc[:, 4] = data.iloc[:, 4].str.replace(',','_')
        data.iloc[:, 8] = data.iloc[:, 8].str.replace(',','_')
        
        # data.iloc[:, 3] = data.iloc[:, 3].astype('int32')
        # data.iloc[:, 7] = data.iloc[:, 7].astype('int32')
        
        upload_data(conn,data,'staging')
        
    datastream.close()
    print(f"Finished Uploading to Staging Table: {datafile}")
    return None

In [21]:
"""
cursor.execute("rollback;")
for file in trip_filenames:
    populate_staging(file)
"""

Finished Uploading to Staging Table: williams-citibike/TripData/202011-citibike-tripdata.csv
Finished Uploading to Staging Table: williams-citibike/TripData/202012-citibike-tripdata.csv


## Database Construction II - Creating the Trip Table

In [6]:
# Tables module
trip_table_query = """
            CREATE TABLE IF NOT EXISTS trip (
                starttime TIMESTAMP,
                endtime TIMESTAMP,
                tripduration NUMERIC,
                startID NUMERIC,
                endID NUMERIC,
                usertype VARCHAR(16),
                age REAL,
                gender SMALLINT
            ) PARTITION BY RANGE (starttime);
            """
cursor.execute("rollback;")
cursor.execute(trip_table_query)
conn.commit()

In [1]:
def create_partition(year: int, month: int) -> None: #Tables
    """Docstring 
    
    """
    nxt_month = month+1
    nxt_year = year   # Always the same as current year unless the month is December
    
    if month == 12:   # If Decemember sets the year-mon to January of the next year
        nxt_month = 1
        nxt_year = year+1
    
    month = str(month).zfill(2)
    nxt_month = str(nxt_month).zfill(2)
    
    # Move this to the Tables module
    # ----- This can use Queries.execute_query(conn, partition_query)
    partition_query = f"""
            CREATE TABLE trip_y{year}m{month} PARTITION OF trip
            FOR VALUES FROM ('{year}-{month}-01') TO ('{nxt_year}-{nxt_month}-01');
            """
    
    cursor.execute("rollback;")
    cursor.execute(partition_query)
    conn.commit()
    # --------------------------
    return None
    

In [8]:
yearlist13 = [2013]
monthlist13 = [6, 7, 8, 9, 10, 11, 12]

for year in yearlist13:
    for month in monthlist13:
        create_partition(year, month)

In [9]:
yearlist14_20 = [2014, 2015, 2016, 2017, 2018, 2019,2020]
monthlist14_20 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

for year in yearlist14_20:
    for month in monthlist14_20:
        create_partition(year, month)

**In the following code we will be converting the tripduration from seconds to minutes and converting the birthyear to age. On a db.t3.micro rds instance it will take 3.3hrs to execute** 

*Style using CSS*

In [None]:
"""
# Tables module
insert_query2 = """
        INSERT INTO trip
        SELECT DISTINCT starttime, endtime, ROUND(tripduration/60,2), startid, endid, usertype, 
               CASE WHEN birthyear > 0 THEN 2020 - birthyear
                    ELSE birthyear
                    END AS age,
               gender
          FROM staging
         ORDER BY starttime, endtime;
        """

cursor.execute("rollback;")
cursor.execute(insert_query2)
conn.commit()
"""

By using the DISTINCT clause we are filtering out trips that are exact duplicates. The reason is that in our data, only exact duplicates are trips that were accidentally duplicated. If all the values are the same except a single value then that represents a different trip. For example, two friends may take a ride from the same stations at the same exact time but one may be male and the other may be female. 

*Note: It is possible in reality that two separate trips have exactly the same data. However,that would require two people of the same age and gender, starting and stoping at the same stations at the exact same time (down to the second). Additionally, getting rid of duplicates removed only 0.004% of trips. Therefore on the off chance that all 4,797 counted duplicates weren't actually duplicates in real life we removed a miniscule amount of data from our dataset*

*Note 2: Our trip table doesn't include the bikeid, so there is a chance that those 4,797 duplicates aren't errors. Those people with the same age and gender, starting and stoping at the same stations at the exact same time (down to the second) might be on different bikes.*

## Preparing the Neighborhood Table I - Without the Spatial Data

In [13]:
from bs4 import BeautifulSoup
import requests

In [14]:
# Attempt connection to the URL
HoodURL = "https://furmancenter.org/neighborhoods"
try:
    r2 = requests.get(HoodURL)
    r2.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print(errh)

In [15]:
soup = BeautifulSoup(r2.content, "html.parser")

# The website has a dropdown with all the neighborhood codes and names
hood_code_names = []

#Instead of creating a dictionary like before, we create a list of tuples so that we can make a df
for code in soup.find_all('option')[1:]:
    hood_code_names.append((code.text[:4], code.text[6:].replace("/","-").replace(" ","_")))

In [16]:
hood_df = pd.DataFrame(hood_code_names, columns=["code", "hoodname"])

In [17]:
borough = {
        "BK": "Brooklyn", 
        "BX": "Bronx",
        "MN": "Manhattan",
        "QN": "Queens",
        "SI": "Staten"
        }

hood_df["borough"] = hood_df["code"].str[0:2].map(borough)

In [18]:
hood_df.head()

Unnamed: 0,code,hoodname,borough
0,BK01,Greenpoint-Williamsburg,Brooklyn
1,BK02,Fort_Greene-Brooklyn_Heights,Brooklyn
2,BK03,Bedford_Stuyvesant,Brooklyn
3,BK04,Bushwick,Brooklyn
4,BK05,East_New_York-Starrett_City,Brooklyn


## Preparing the Neighborhood Table II - Adding the Spatial Data

In [19]:
pip install geopandas

Collecting geopandas
  Using cached geopandas-0.8.1-py2.py3-none-any.whl (962 kB)
Collecting fiona
  Using cached Fiona-1.8.18-cp37-cp37m-manylinux1_x86_64.whl (14.8 MB)
Collecting pyproj>=2.2.0
  Using cached pyproj-3.0.0.post1-cp37-cp37m-manylinux2010_x86_64.whl (6.4 MB)
Collecting shapely
  Using cached Shapely-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (1.0 MB)
Collecting munch
  Using cached munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
  Using cached click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Using cached cligj-0.7.1-py3-none-any.whl (7.1 kB)
Installing collected packages: munch, click-plugins, cligj, fiona, pyproj, shapely, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.1 fiona-1.8.18 geopandas-0.8.1 munch-2.5.0 pyproj-3.0.0.post1 shapely-1.7.1
Note: you may need to restart the kernel to use updated packages.


In [20]:
import geopandas as gpd
import shapely

In [21]:
geofile = "s3://williams-citibike/Community_Districts.geojson"

with fs.open(geofile, 'rb') as file:
    districts = gpd.read_file(file)

In [22]:
districts.head()

Unnamed: 0,boro_cd,shape_area,shape_leng,geometry
0,311,103177785.365,51549.5578567,"MULTIPOLYGON (((-73.97299 40.60881, -73.97259 ..."
1,313,88195686.2748,65821.875577,"MULTIPOLYGON (((-73.98372 40.59582, -73.98305 ..."
2,312,99525500.0655,52245.8304843,"MULTIPOLYGON (((-73.97140 40.64826, -73.97121 ..."
3,206,42664311.3238,35875.7111725,"MULTIPOLYGON (((-73.87185 40.84376, -73.87192 ..."
4,226,50566410.6415,32820.3983295,"MULTIPOLYGON (((-73.86790 40.90294, -73.86796 ..."


The codes from the Furman Center are exactly the same as the codes seen in the boro_cd column. However, the first number in the boro_cd acts as a category that represents the borough. The original Furman codes, seen in the hood_df, have to be reversed engineered using a maping. Once the mapping is complete, the two dataframes can be merged together.

In [23]:
borough_num_to_abr = {
        "3": "BK", 
        "2": "BX",
        "1": "MN",
        "4": "QN",
        "5": "SI"
        }

districts["boro_cd"] = districts["boro_cd"].str[0].map(borough_num_to_abr) + districts['boro_cd'].str[1:]

In [24]:
districts = districts[['boro_cd','geometry']]

In [25]:
hood_spatial = hood_df.merge(districts, left_on='code', right_on='boro_cd', how='left').loc[:,['code', 'hoodname', 'borough', 'geometry']]

In [26]:
hood_spatial.sort_values(by='code', inplace=True)

In [27]:
hood_spatial = gpd.GeoDataFrame(hood_spatial)

In [28]:
hood_spatial.head()

Unnamed: 0,code,hoodname,borough,geometry
0,BK01,Greenpoint-Williamsburg,Brooklyn,"MULTIPOLYGON (((-73.92406 40.71411, -73.92404 ..."
1,BK02,Fort_Greene-Brooklyn_Heights,Brooklyn,"MULTIPOLYGON (((-73.96929 40.70709, -73.96839 ..."
2,BK03,Bedford_Stuyvesant,Brooklyn,"MULTIPOLYGON (((-73.91805 40.68721, -73.91800 ..."
3,BK04,Bushwick,Brooklyn,"MULTIPOLYGON (((-73.89647 40.68234, -73.89653 ..."
4,BK05,East_New_York-Starrett_City,Brooklyn,"MULTIPOLYGON (((-73.86841 40.69473, -73.86868 ..."


## Database Construction III - Creating the Neighborhood Table

In [114]:
# Tables module
neighborhood_table_query = """
        CREATE TABLE IF NOT EXISTS neighborhood (
            code CHAR(4) PRIMARY KEY,
            hoodname VARCHAR NOT NULL,
            borough VARCHAR(16) NOT NULL,
            geometry GEOGRAPHY(MULTIPOLYGON,4326) NOT NULL
        );
        """
cursor.execute("rollback;")
cursor.execute(neighborhood_table_query)
conn.commit()

In [115]:
# Replace with the new function
hoodstream = StringIO()

hood_spatial.to_csv(hoodstream,sep='\t', index=False, header=False)
hoodstream.seek(0)

cursor.copy_from(hoodstream,'neighborhood',sep='\t')
conn.commit()

## Preparing the Station Table I - Querying from the Database

In [37]:
# Endid has more distinct values than startid
# Tables module
stations_query = """
        SELECT DISTINCT ON(endid) endid, endname, end_lat, end_long 
          FROM staging 
         ORDER BY endid;
        """

In [38]:
# stations = pd.read_sql(stations_query, conn) # Expect long execution times

In [39]:
stations_spatial = gpd.GeoDataFrame(stations, geometry=gpd.points_from_xy(stations.end_long, stations.end_lat), crs="EPSG:4326")

## Preparing the Station Table II - SJoining the Neighborhood Spatial Data

In [40]:
# The inner join will remove stations that aren't in NYC (some stations are in NJ).
# Additionally it will remove the handful of stations that didn't have information other than the ID

stations_spatial = gpd.sjoin(stations_spatial, hood_spatial, how='inner', op='within')

In [46]:
stations_spatial = stations_spatial[['endid','endname','code','geometry']].rename(columns={'endid':'stationID','endname':'name'})

In [47]:
stations_spatial.name = stations_spatial.name.str.replace("'","")

In [48]:
stations_spatial.head()

Unnamed: 0,stationID,name,code,geometry
1,72.0,W 52 St & 11 Ave,MN04,POINT (-73.99393 40.76727)
5,116.0,W 17 St & 8 Ave,MN04,POINT (-74.00150 40.74178)
28,212.0,W 16 St & The High Line,MN04,POINT (-74.00682 40.74335)
123,334.0,W 20 St & 7 Ave,MN04,POINT (-73.99726 40.74239)
171,388.0,W 26 St & 10 Ave,MN04,POINT (-74.00295 40.74972)


## Database Consruction IV - Creating the Station Table

In [35]:
# Tables module
station_table_query = """
               CREATE TABLE IF NOT EXISTS station (
                   stationID NUMERIC PRIMARY KEY,
                   name VARCHAR(64) NOT NULL,
                   code CHAR(4) NOT NULL,
                   geometry GEOGRAPHY(POINT,4326) NOT NULL
                );
                
                """
cursor.execute("rollback;")
cursor.execute(station_table_query)
conn.commit()

In [49]:
# Replace with function
stationstream = StringIO()
stations_spatial.to_csv(stationstream,sep='\t', index=False, header=False)
stationstream.seek(0)

cursor.copy_from(stationstream,'station',sep='\t')
conn.commit()

## Database Construction V - Creating the Lookup Table

In [53]:
hood_filenames = fs.ls("s3://williams-citibike/HoodData/")[1:]

In [None]:
# Tables module
lookup_table_query = """
                CREATE TABLE IF NOT EXISTS lookup(
                    alias VARCHAR(5) PRIMARY KEY,
                    indicator VARCHAR,
                    description VARCHAR
                );
                """

cursor.execute("rollback;")
cursor.execute(lookup_table_query)
conn.commit()

In [70]:
cols_lst = [2,3,4]
names_lst = ["indicator_category", "indicator", "description"]
lookup = pd.read_excel("s3://" + hood_filenames[0], sheet_name=1, usecols = cols_lst, names = names_lst)

In [71]:
lookup = lookup.sort_values(by=["indicator_category",'indicator'])

In [72]:
alias = {
    'Demographics': 'DEM',
    'Housing Market and Conditions': 'HSC',
    'Land Use and Development': 'LUD',
    'Neighborhood Services and Conditions': 'NSC',
    'Renters': 'RNT'
}

In [73]:
lookup['indicator_category'] = lookup["indicator_category"].map(alias)

In [75]:
lookup = lookup.rename(columns={'indicator_category':'alias'})

In [76]:
indicator_group_order = lookup.groupby("alias").cumcount()+1

In [77]:
lookup['alias'] = lookup['alias'] + indicator_group_order.astype(str)

In [100]:
# replace with function
lookupstream = StringIO()

lookup.to_csv(lookupstream,sep='\t', index=False, header=False)
lookupstream.seek(0)

cursor.copy_from(lookupstream,'lookup',sep='\t')
conn.commit()

## Creating the Neighborhood Profile Table

In [85]:
def flatten_hooddata(datafile: str) -> pd.DataFrame:
    """Grabs the data from the s3 bucket and flattens it to a single row consisting of the neighborhood attributes
    
    Parameters
    ----------
    datafile : str
        The name of a file in the s3 bucket without the s3:// prefix

    Returns
    -------
    pd.DataFrame:
        A single row DataFrame that contains the attributes of the neighborhood
    """
    cols_lst = [0,2,3,8]
    names_lst = ["code", "indicator category", "indicator", "2018"]

    # This function is a mess
    
    with fs.open("s3://"+datafile, 'rb') as file:
        data = pd.read_excel(file, sheet_name=1, usecols = cols_lst, names = names_lst)
       
        #In the previous section we did all the alias work, now we can simply input it into the df from lookup['alias']
        data = data.sort_values(by=['indicator category','indicator'])
        data.insert(1, 'alias', lookup['alias'])
        data = data.drop(columns = ['indicator category', 'indicator'])

        # Prep the '2018' column so that it can used as the value argument in the pivot_table 
        data['2018'] = data['2018'].str.replace('$',"")
        data['2018'] = data['2018'].str.replace(',',"")

        # Values that are percents get turned into decimals
        for index, value in data['2018'].items():
            if isinstance(value,str):
                if value[-1] == '%':
                    data['2018'][index] = float(value.strip('%')) / 100

        data['2018'] = pd.to_numeric(data['2018'])

        # The pivot_table alphabatizes the columns, but we want to maintain the original order
        column_order = ['code'] + list(data['alias'])

        data = data.pivot_table(index=['code'],values='2018', columns='alias', dropna=False)
        data = data.rename_axis(None, axis=1).reset_index()   # The pivot creates a unnecessary column axis
        data['code'] = data['code'][0].replace(" ","")
        data = data.reindex(column_order, axis=1)

    return data

In [86]:
hood_profile = pd.DataFrame()

# This loop only works successfully if there are those specific neighborhood excel files in the HoodData folder
for hood in hood_filenames:
    hood_profile = hood_profile.append(flatten_hooddata(hood))

In [87]:
hood_profile = hood_profile.dropna(axis=1, how='all')

In [100]:
hood_profile = hood_profile.fillna(-1)   # We need to fill NaN with -1 so they can be put into the database

## Database Construction VI - Importing the Neighborhood Profiles into Database

In [101]:
# Tables Module
profile_table_query = """
                CREATE TABLE IF NOT EXISTS profile(
                );
                """
cursor.execute("rollback;")
cursor.execute(profile_table_query)
conn.commit()

In [None]:
for name in hood_profile.columns:
    if name == 'code':
        import_column_query = f"""
                    ALTER TABLE profile
                    ADD COLUMN {name} CHAR(4) PRIMARY KEY;
                    """
    else:
        import_column_query = f"""
                    ALTER TABLE profile
                    ADD COLUMN {name} REAL;
                    """
        
    cursor.execute("rollback;")
    cursor.execute(import_column_query)
    conn.commit

In [103]:
# Can use the function
profilestream = StringIO()

hood_profile.to_csv(profilestream,sep='\t', index=False, header=False)
profilestream.seek(0)

cursor.copy_from(profilestream,'profile',sep='\t')
conn.commit()

## Database Construction VII - Purging the Database: Removing Trips that aren't Contained in NYC

When the neighbborhood data was inner joined to the station data, the stations that were not in NYC were dropped. Although removed from the stations table, there are still trips in the trip table that have the dropped stations. In this section the goal is to remove those trips that are not fully contained within NYC. 

*Note: Not in NYC is defined as trip either starting or ending at a station that is not in NYC.*

**Before we drop the trips that involve New Jersey (NJ), let's see how much of the market share NJ is gathering over time.**

*Note: There are other important questions that could be asked about the NJ data, however, this project is focused on NYC data. For now, more complex NJ based questions are out of scope.*

In [8]:
import Queries # This is actually going to be the Analyze module in the Queries package

In [9]:
# Counts the number of trips per year
all_trips_df = Queries.countYearlyTrips(conn)    # Query-0001 in file # How to use the context manager in the function

In [10]:
NJ_trips_df = Queries.countYearlyNJTrips(conn)   # Query-0002 in file

In [11]:
market_share = NJ_trips_df.merge(all_trips_df, on='year',suffixes=['_nj','_all'])

In [12]:
market_share['nj_percent'] = round(market_share['trips_nj'] / market_share['trips_all'], 4)* 100

In [13]:
market_share # Diagram

Unnamed: 0,year,trips_nj,trips_all,nj_percent
0,2013.0,67094,5614874,1.19
1,2014.0,55277,8081195,0.68
2,2015.0,182610,9921596,1.84
3,2016.0,618062,13842693,4.46
4,2017.0,948753,16362322,5.8
5,2018.0,1010194,17548339,5.76
6,2019.0,1128002,20551697,5.49
7,2020.0,1264092,19506857,6.48


In [14]:
# Deleting the NJ data
Queries.deleteNJTrips(conn)