# Preliminary Analysis

### Data Cleaning Code
Code for cleaning and processing your data. Include a data dictionary for your transformed dataset.

- Data Dictionary for Air Quality
    - **indicator id:** id for each name
    - **name:** classify the sample in the air
    - **measure:** how the indicator is measured
    - **measure info:** information about the measure
    - **geo type name:** geography type, UHF stands for United Hospital Fund neighborhoods
    - **geo place name:** neighborhood name
    - **time period:** time frame
    - **start_date:** date started
    <br><br>
- Data Dictionary for Traffic Volume
    - **requestId:** unique id generated for each counts request
    - **boro:** lists which of the five diviions of New York City the location is within
    - **vol:** total sum of count collected within 15 minute increments
    - **segmentId:** The ID that idenifies each segment of a street
    - **wktgeom:** Geometry point of the location
    - **street:** street name of where traffic happened
    - **fromst:** start street of traffic
    - **tost:** end street where traffic volume was located
    - **direction:** text-based direction of traffic where the count took place
    - **date_time:** date at which it took place
    <br><br>
- Data Dictionary for 2020 mobility Dataset
    - **sub_region_2** which county it is
    - **date** date during recording
    - **retail_and_recreation_percent_change_from_baseline** mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.
    - **grocery_and_pharmacy_percent_change_from_baseline** mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies
    - **parks_percent_change_from_baseline** mobility trends for places like national parks, public beaches, marinas, dog parks, plazas, and public gardens
    - **transit_stations_percent_change_from_baseline** mobility trends for places like public transport hubs such as subway, bus, and train stations
    - **workplaces_percent_change_from_baseline** mobility trend for places of work
    - **residential_percent_change_from_baseline** mobility trends for places of residence
    
### Exploratory Analysis
Describe what work you have done so far and include the code. This may include descriptive statistics, graphs and charts, and preliminary models.

- We removed some columns that were irrelevant to what we want to predict as well as combine some columns that would fit together, such as the date and time.


### Challenges
Describe any challenges you've encountered so far. Let me know if there's anything you need help with!

- There were some challenges in figuring out what sort of data was necessary to include for our problem as it was targeted in New York City. 
- Figuring out the transformations to use on each dataset was also a challenge since there were many columns for each dataset and we had to find the ones that weren't relevant to our problem.
- There are some issues for the columns right now where there are some, such as segmentId in the Traffic Volume dataset where we are currently unsure if it's useful to keep or remove.
- Dealing with large datasets efficiently

### Future Work
Describe what work you are planning to complete for the final analysis.

- Future work includes using the cleaned data to use as inputs for models suited for classification such as Logisitc Regression and Linear Regression. 
- Make predictions using the models trained to obtain the accuracy scores to answer our questions
- Find the best model for accuracy as well as graph/chart the data to further understand it for future predictions.

### Contributions
Describe the contributions that each group member made.
- **Daniel Aguilar-Rodriguez**
    - Researched and acquired datasets
    - Helped present ideas during brainstorming session
    - Created jupyter notebook and helped clean datasets
    - Helped transform datasets and removed columns irrelevant to our work
    <br><br>
- **Jia Cong Lin**
    - Helped present ideas during brainstorming session
    - Helped define necessary columns for the mobility dataset
    - Assisted in determining columns to clean and define 
    <br><br>
- **Anvinh Truong**
    - Helped clean and define some columns for the datasets and dictionary
    - Helped present ideas during brainstorming session
    - Assisted in thinking of procedure to clean data columns

In [1]:
import pandas as pd
import numpy as np
import os
import requests
import datetime

In [2]:
def download_data(csv_name):
    url_dict = {'air_quality': 'https://data.cityofnewyork.us/api/views/c3uy-2p5r/rows.csv', 
                'mobility_global': 'https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv', 
                'traffic_volume': 'https://data.cityofnewyork.us/api/views/7ym2-wayt/rows.csv'}
    
    response = requests.get(url_dict[csv_name])
    path = f'datasets/{csv_name}.csv'
    with open(path, 'wb') as f:
        f.write(response.content)

In [3]:
def csv_exists(csv_name):
    path = f'datasets/{csv_name}.csv'
    file_exists = os.path.exists(path)
    return file_exists

In [4]:
def create_df(csv_name):
    if not csv_exists(csv_name):
        download_data(csv_name)
    path = f'datasets/{csv_name}.csv'
    df = pd.read_csv(path)
    return df

In [5]:
def mkdir_if_not_exist():
    directory = 'datasets'
    if not os.path.exists(f'{directory}/'):
        os.mkdir(directory)

In [6]:
def create_all_df(csv_names):
    mkdir_if_not_exist()
    df_list = []
    
    for csv_name in csv_names:
        print(f'Creating {csv_name} df')
        df = create_df(csv_name)
        df_list.append(df)
        
    return df_list

In [7]:
csv_names = ['air_quality', 'traffic_volume']

air_quality, traffic_volume = create_all_df(csv_names)

Creating air_quality df
Creating traffic_volume df


## Air Quality Dataset Cleaning

In [8]:
print(air_quality.isnull().sum() / len(air_quality))

Unique ID         0.0
Indicator ID      0.0
Name              0.0
Measure           0.0
Measure Info      0.0
Geo Type Name     0.0
Geo Join ID       0.0
Geo Place Name    0.0
Time Period       0.0
Start_Date        0.0
Data Value        0.0
Message           1.0
dtype: float64


In [9]:
air_quality = air_quality.drop(['Message'], axis=1)
print(air_quality.isnull().sum() / len(air_quality))

Unique ID         0.0
Indicator ID      0.0
Name              0.0
Measure           0.0
Measure Info      0.0
Geo Type Name     0.0
Geo Join ID       0.0
Geo Place Name    0.0
Time Period       0.0
Start_Date        0.0
Data Value        0.0
dtype: float64


In [10]:
print(air_quality.nunique() / len(air_quality))

Unique ID         1.000000
Indicator ID      0.001365
Name              0.001179
Measure           0.000496
Measure Info      0.000496
Geo Type Name     0.000310
Geo Join ID       0.004466
Geo Place Name    0.007071
Time Period       0.002791
Start_Date        0.002233
Data Value        0.253443
dtype: float64


In [11]:
air_quality = air_quality.drop(['Unique ID'], axis=1)
print(air_quality.shape)
print(air_quality.nunique() / len(air_quality))

(16122, 10)
Indicator ID      0.001365
Name              0.001179
Measure           0.000496
Measure Info      0.000496
Geo Type Name     0.000310
Geo Join ID       0.004466
Geo Place Name    0.007071
Time Period       0.002791
Start_Date        0.002233
Data Value        0.253443
dtype: float64


In [12]:
air_quality = air_quality.drop(['Geo Join ID'], axis=1)
print(air_quality.shape)
print(air_quality.nunique() / len(air_quality))

(16122, 9)
Indicator ID      0.001365
Name              0.001179
Measure           0.000496
Measure Info      0.000496
Geo Type Name     0.000310
Geo Place Name    0.007071
Time Period       0.002791
Start_Date        0.002233
Data Value        0.253443
dtype: float64


In [13]:
air_quality.dtypes

Indicator ID        int64
Name               object
Measure            object
Measure Info       object
Geo Type Name      object
Geo Place Name     object
Time Period        object
Start_Date         object
Data Value        float64
dtype: object

In [14]:
air_quality.nunique()

Indicator ID        22
Name                19
Measure              8
Measure Info         8
Geo Type Name        5
Geo Place Name     114
Time Period         45
Start_Date          36
Data Value        4086
dtype: int64

In [15]:
air_quality['Time Period'].unique()

array(['Summer 2013', 'Summer 2014', 'Winter 2008-09', 'Summer 2009',
       'Summer 2010', 'Summer 2011', 'Summer 2012', 'Winter 2009-10',
       '2005-2007', '2013', '2005', '2009-2011', 'Winter 2010-11',
       'Winter 2011-12', 'Winter 2012-13', 'Annual Average 2009',
       'Annual Average 2010', 'Annual Average 2011',
       'Annual Average 2012', 'Annual Average 2013', '2015',
       'Winter 2013-14', 'Annual Average 2014', '2011', 'Winter 2014-15',
       '2016', 'Annual Average 2015', 'Summer 2015', 'Winter 2015-16',
       'Summer 2016', 'Annual Average 2016', 'Summer 2017', '2012-2014',
       'Summer 2018', 'Annual Average 2017', 'Summer 2019',
       'Winter 2016-17', 'Annual Average 2018', 'Winter 2017-18',
       '2015-2017', 'Summer 2020', 'Annual Average 2019',
       'Winter 2018-19', 'Annual Average 2020', 'Winter 2019-20'],
      dtype=object)

In [16]:
np.sort(air_quality['Time Period'].unique())

array(['2005', '2005-2007', '2009-2011', '2011', '2012-2014', '2013',
       '2015', '2015-2017', '2016', 'Annual Average 2009',
       'Annual Average 2010', 'Annual Average 2011',
       'Annual Average 2012', 'Annual Average 2013',
       'Annual Average 2014', 'Annual Average 2015',
       'Annual Average 2016', 'Annual Average 2017',
       'Annual Average 2018', 'Annual Average 2019',
       'Annual Average 2020', 'Summer 2009', 'Summer 2010', 'Summer 2011',
       'Summer 2012', 'Summer 2013', 'Summer 2014', 'Summer 2015',
       'Summer 2016', 'Summer 2017', 'Summer 2018', 'Summer 2019',
       'Summer 2020', 'Winter 2008-09', 'Winter 2009-10',
       'Winter 2010-11', 'Winter 2011-12', 'Winter 2012-13',
       'Winter 2013-14', 'Winter 2014-15', 'Winter 2015-16',
       'Winter 2016-17', 'Winter 2017-18', 'Winter 2018-19',
       'Winter 2019-20'], dtype=object)

In [17]:
air_quality['Time Period'].value_counts() / len(air_quality)

2012-2014              0.029773
2005-2007              0.029773
2009-2011              0.029773
2015-2017              0.029773
Summer 2018            0.026237
Summer 2017            0.026237
Summer 2016            0.026237
Winter 2015-16         0.026237
Summer 2015            0.026237
Summer 2019            0.026237
Winter 2014-15         0.026237
Summer 2014            0.026237
Winter 2013-14         0.026237
Summer 2020            0.026237
Summer 2013            0.026237
Winter 2009-10         0.026237
Summer 2011            0.026237
Winter 2011-12         0.026237
Winter 2010-11         0.026237
Winter 2008-09         0.026237
Summer 2009            0.026237
Summer 2010            0.026237
Winter 2012-13         0.026237
Summer 2012            0.026237
2005                   0.025245
2016                   0.019911
Annual Average 2017    0.017492
Winter 2018-19         0.017492
Winter 2016-17         0.017492
Annual Average 2020    0.017492
Annual Average 2018    0.017492
Winter 2

In [18]:
air_quality['Start_Date'].unique()

array(['06/01/2013', '06/01/2014', '12/01/2008', '06/01/2009',
       '06/01/2010', '06/01/2011', '06/01/2012', '12/01/2009',
       '01/01/2005', '01/01/2013', '01/01/2009', '12/01/2010',
       '12/01/2011', '12/01/2012', '01/01/2015', '12/01/2013',
       '01/01/2011', '12/01/2014', '01/01/2016', '06/01/2015',
       '12/01/2015', '05/31/2016', '12/31/2015', '06/01/2017',
       '01/02/2012', '06/01/2018', '01/01/2017', '06/01/2019',
       '12/01/2016', '01/01/2018', '12/01/2017', '06/01/2020',
       '01/01/2019', '12/01/2018', '01/01/2020', '12/01/2019'],
      dtype=object)

In [19]:
air_quality['Start_Date'] = pd.to_datetime(air_quality['Start_Date'], infer_datetime_format=True)

In [20]:
air_quality['Start_Date'].min()

Timestamp('2005-01-01 00:00:00')

In [21]:
air_quality['Start_Date'].value_counts().sort_index() / len(air_quality)

2005-01-01    0.055018
2008-12-01    0.043729
2009-01-01    0.029773
2009-06-01    0.026237
2009-12-01    0.043729
2010-06-01    0.026237
2010-12-01    0.043729
2011-01-01    0.013274
2011-06-01    0.026237
2011-12-01    0.043729
2012-01-02    0.029773
2012-06-01    0.026237
2012-12-01    0.043729
2013-01-01    0.008932
2013-06-01    0.026237
2013-12-01    0.043729
2014-06-01    0.026237
2014-12-01    0.026237
2015-01-01    0.056197
2015-06-01    0.026237
2015-12-01    0.026237
2015-12-31    0.017492
2016-01-01    0.019911
2016-05-31    0.026237
2016-12-01    0.017492
2017-01-01    0.017492
2017-06-01    0.026237
2017-12-01    0.017492
2018-01-01    0.017492
2018-06-01    0.026237
2018-12-01    0.017492
2019-01-01    0.017492
2019-06-01    0.026237
2019-12-01    0.017492
2020-01-01    0.017492
2020-06-01    0.026237
Name: Start_Date, dtype: float64

In [22]:
air_quality.groupby('Start_Date')['Time Period'].value_counts()

Start_Date  Time Period        
2005-01-01  2005-2007              480
            2005                   407
2008-12-01  Winter 2008-09         423
            Annual Average 2009    282
2009-01-01  2009-2011              480
2009-06-01  Summer 2009            423
2009-12-01  Winter 2009-10         423
            Annual Average 2010    282
2010-06-01  Summer 2010            423
2010-12-01  Winter 2010-11         423
            Annual Average 2011    282
2011-01-01  2011                   214
2011-06-01  Summer 2011            423
2011-12-01  Winter 2011-12         423
            Annual Average 2012    282
2012-01-02  2012-2014              480
2012-06-01  Summer 2012            423
2012-12-01  Winter 2012-13         423
            Annual Average 2013    282
2013-01-01  2013                   144
2013-06-01  Summer 2013            423
2013-12-01  Winter 2013-14         423
            Annual Average 2014    282
2014-06-01  Summer 2014            423
2014-12-01  Winter 2014-15      

In [23]:
list(air_quality[air_quality['Time Period'].str.contains('Winter')]['Time Period'].unique())

['Winter 2008-09',
 'Winter 2009-10',
 'Winter 2010-11',
 'Winter 2011-12',
 'Winter 2012-13',
 'Winter 2013-14',
 'Winter 2014-15',
 'Winter 2015-16',
 'Winter 2016-17',
 'Winter 2017-18',
 'Winter 2018-19',
 'Winter 2019-20']

In [24]:
air_quality[air_quality['Time Period'].str.contains('Winter|Summer')].groupby('Time Period')['Start_Date'].value_counts()

Time Period     Start_Date
Summer 2009     2009-06-01    423
Summer 2010     2010-06-01    423
Summer 2011     2011-06-01    423
Summer 2012     2012-06-01    423
Summer 2013     2013-06-01    423
Summer 2014     2014-06-01    423
Summer 2015     2015-06-01    423
Summer 2016     2016-05-31    423
Summer 2017     2017-06-01    423
Summer 2018     2018-06-01    423
Summer 2019     2019-06-01    423
Summer 2020     2020-06-01    423
Winter 2008-09  2008-12-01    423
Winter 2009-10  2009-12-01    423
Winter 2010-11  2010-12-01    423
Winter 2011-12  2011-12-01    423
Winter 2012-13  2012-12-01    423
Winter 2013-14  2013-12-01    423
Winter 2014-15  2014-12-01    423
Winter 2015-16  2015-12-01    423
Winter 2016-17  2016-12-01    282
Winter 2017-18  2017-12-01    282
Winter 2018-19  2018-12-01    282
Winter 2019-20  2019-12-01    282
Name: Start_Date, dtype: int64

In [25]:
air_quality[air_quality['Time Period'].str.contains('Winter')]

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
4,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Morris Park and Bronxdale (CD11),Winter 2008-09,2008-12-01,5.89
5,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Williamsbridge and Baychester (CD12),Winter 2008-09,2008-12-01,5.75
8,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Greenpoint and Williamsburg (CD1),Winter 2008-09,2008-12-01,4.33
9,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Fort Greene and Brooklyn Heights (CD2),Winter 2008-09,2008-12-01,4.41
10,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Bedford Stuyvesant (CD3),Winter 2008-09,2008-12-01,4.73
...,...,...,...,...,...,...,...,...,...
15969,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,Queens Village (CD13),Winter 2019-20,2019-12-01,7.15
15972,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,Rockaway and Broad Channel (CD14),Winter 2019-20,2019-12-01,6.12
15975,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,St. George and Stapleton (CD1),Winter 2019-20,2019-12-01,7.43
15978,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,South Beach and Willowbrook (CD2),Winter 2019-20,2019-12-01,6.89


In [26]:
air_quality.sample(10)

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
11221,653,O3-Attributable Asthma Emergency Department Vi...,Estimated Annual Rate- Children 0 to 17 Yrs Old,"per 100,000 children",UHF42,Kingsbridge - Riverdale,2012-2014,2012-01-02,102.7
9103,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,UHF34,Kingsbridge - Riverdale,Winter 2014-15,2014-12-01,10.16
15685,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,UHF34,Sunset Park,Winter 2019-20,2019-12-01,8.17
9117,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,UHF34,Downtown - Heights - Slope,Summer 2015,2015-06-01,10.29
12890,375,Nitrogen Dioxide (NO2),Mean,ppb,UHF34,West Queens,Winter 2017-18,2017-12-01,26.84
13168,386,Ozone (O3),Mean,ppb,UHF34,Canarsie - Flatlands,Summer 2018,2018-06-01,32.96
10643,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,St. George and Stapleton (CD1),Summer 2016,2016-05-31,8.04
3622,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Ridgewood - Forest Hills,Winter 2008-09,2008-12-01,4.5
15836,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,UHF42,Kingsbridge - Riverdale,Winter 2019-20,2019-12-01,6.37
1072,653,O3-Attributable Asthma Emergency Department Vi...,Estimated Annual Rate- Children 0 to 17 Yrs Old,"per 100,000 children",UHF42,Flushing - Clearview,2009-2011,2009-01-01,40.0


In [27]:
air_quality[air_quality['Geo Type Name'].str.contains('Borough')]['Time Period'].unique()

array(['Summer 2013', 'Summer 2014', 'Summer 2009', 'Summer 2010',
       'Summer 2011', 'Summer 2012', '2013', '2005', '2005-2007',
       '2009-2011', 'Winter 2008-09', 'Winter 2009-10', 'Winter 2010-11',
       'Winter 2011-12', 'Winter 2012-13', 'Annual Average 2009',
       'Annual Average 2010', 'Annual Average 2011',
       'Annual Average 2012', 'Annual Average 2013', '2015',
       'Winter 2013-14', 'Annual Average 2014', '2011', '2016',
       'Annual Average 2015', 'Summer 2015', 'Winter 2014-15',
       'Winter 2015-16', 'Summer 2016', 'Annual Average 2016',
       '2012-2014', 'Summer 2017', 'Annual Average 2017',
       'Winter 2016-17', 'Annual Average 2018', 'Summer 2018',
       'Winter 2017-18', '2015-2017', 'Annual Average 2019',
       'Summer 2019', 'Winter 2018-19', 'Annual Average 2020',
       'Winter 2019-20', 'Summer 2020'], dtype=object)

In [28]:
air_quality[air_quality['Geo Type Name'].str.contains('Borough')].groupby('Name')['Geo Place Name'].value_counts()

Name                                                       Geo Place Name
Air Toxics Concentrations- Average Benzene Concentrations  Bronx             2
                                                           Brooklyn          2
                                                           Manhattan         2
                                                           Queens            2
                                                           Staten Island     2
                                                                            ..
Traffic Density- Annual Vehicle Miles Traveled for Trucks  Bronx             1
                                                           Brooklyn          1
                                                           Manhattan         1
                                                           Queens            1
                                                           Staten Island     1
Name: Geo Place Name, Length: 95, dtype: int64

In [29]:
list(air_quality['Name'].unique())

['Ozone (O3)',
 'Sulfur Dioxide (SO2)',
 'PM2.5-Attributable Deaths',
 'Boiler Emissions- Total SO2 Emissions',
 'Boiler Emissions- Total PM2.5 Emissions',
 'Boiler Emissions- Total NOx Emissions',
 'Air Toxics Concentrations- Average Benzene Concentrations',
 'Air Toxics Concentrations- Average Formaldehyde Concentrations',
 'PM2.5-Attributable Asthma Emergency Department Visits',
 'PM2.5-Attributable Respiratory Hospitalizations (Adults 20 Yrs and Older)',
 'PM2.5-Attributable Cardiovascular Hospitalizations (Adults 40 Yrs and Older)',
 'Traffic Density- Annual Vehicle Miles Traveled',
 'O3-Attributable Cardiac and Respiratory Deaths',
 'O3-Attributable Asthma Emergency Department Visits',
 'O3-Attributable Asthma Hospitalizations',
 'Traffic Density- Annual Vehicle Miles Traveled for Cars',
 'Traffic Density- Annual Vehicle Miles Traveled for Trucks',
 'Nitrogen Dioxide (NO2)',
 'Fine Particulate Matter (PM2.5)']

In [30]:
air_quality.groupby(['Name', 'Start_Date'])['Time Period'].value_counts()

Name                                                            Start_Date  Time Period
Air Toxics Concentrations- Average Benzene Concentrations       2005-01-01  2005            48
                                                                2011-01-01  2011           107
Air Toxics Concentrations- Average Formaldehyde Concentrations  2005-01-01  2005            48
                                                                2011-01-01  2011           107
Boiler Emissions- Total NOx Emissions                           2013-01-01  2013            48
                                                                                          ... 
Traffic Density- Annual Vehicle Miles Traveled                  2016-01-01  2016           107
Traffic Density- Annual Vehicle Miles Traveled for Cars         2005-01-01  2005           107
                                                                2016-01-01  2016           107
Traffic Density- Annual Vehicle Miles Traveled for Trucks

In [31]:
air_quality['Geo Type Name'].unique()

array(['CD', 'Borough', 'UHF42', 'Citywide', 'UHF34'], dtype=object)

In [32]:
air_quality[air_quality['Geo Type Name'].str.contains('CD')]

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
0,386,Ozone (O3),Mean,ppb,CD,Coney Island (CD13),Summer 2013,2013-06-01,34.64
1,386,Ozone (O3),Mean,ppb,CD,Coney Island (CD13),Summer 2014,2014-06-01,33.22
4,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Morris Park and Bronxdale (CD11),Winter 2008-09,2008-12-01,5.89
5,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Williamsbridge and Baychester (CD12),Winter 2008-09,2008-12-01,5.75
8,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Greenpoint and Williamsburg (CD1),Winter 2008-09,2008-12-01,4.33
...,...,...,...,...,...,...,...,...,...
16117,386,Ozone (O3),Mean,ppb,CD,Park Slope and Carroll Gardens (CD6),Summer 2020,2020-06-01,28.70
16118,386,Ozone (O3),Mean,ppb,CD,East New York and Starrett City (CD5),Summer 2020,2020-06-01,29.56
16119,386,Ozone (O3),Mean,ppb,CD,Bushwick (CD4),Summer 2020,2020-06-01,29.65
16120,386,Ozone (O3),Mean,ppb,CD,Bedford Stuyvesant (CD3),Summer 2020,2020-06-01,29.28


In [33]:
air_quality[air_quality['Geo Type Name'].str.contains('UHF42')]

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
22,639,PM2.5-Attributable Deaths,Estimated Annual Rate - Adults 30 Yrs and Older,"per 100,000 adults",UHF42,Kingsbridge - Riverdale,2005-2007,2005-01-01,117.70
23,639,PM2.5-Attributable Deaths,Estimated Annual Rate - Adults 30 Yrs and Older,"per 100,000 adults",UHF42,Northeast Bronx,2005-2007,2005-01-01,77.30
24,639,PM2.5-Attributable Deaths,Estimated Annual Rate - Adults 30 Yrs and Older,"per 100,000 adults",UHF42,Fordham - Bronx Pk,2005-2007,2005-01-01,67.30
25,639,PM2.5-Attributable Deaths,Estimated Annual Rate - Adults 30 Yrs and Older,"per 100,000 adults",UHF42,Pelham - Throgs Neck,2005-2007,2005-01-01,73.60
26,639,PM2.5-Attributable Deaths,Estimated Annual Rate - Adults 30 Yrs and Older,"per 100,000 adults",UHF42,Crotona -Tremont,2005-2007,2005-01-01,65.80
...,...,...,...,...,...,...,...,...,...
16078,386,Ozone (O3),Mean,ppb,UHF42,Crotona -Tremont,Summer 2020,2020-06-01,30.15
16079,386,Ozone (O3),Mean,ppb,UHF42,Pelham - Throgs Neck,Summer 2020,2020-06-01,32.05
16080,386,Ozone (O3),Mean,ppb,UHF42,Fordham - Bronx Pk,Summer 2020,2020-06-01,30.17
16081,386,Ozone (O3),Mean,ppb,UHF42,Northeast Bronx,Summer 2020,2020-06-01,30.85


In [34]:
air_quality[air_quality['Geo Type Name'].str.contains('UHF34')]

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
3600,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Kingsbridge - Riverdale,Winter 2008-09,2008-12-01,6.62
3601,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Northeast Bronx,Winter 2008-09,2008-12-01,5.38
3602,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Fordham - Bronx Pk,Winter 2008-09,2008-12-01,9.48
3603,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Pelham - Throgs Neck,Winter 2008-09,2008-12-01,5.15
3604,383,Sulfur Dioxide (SO2),Mean,ppb,UHF34,Greenpoint,Winter 2008-09,2008-12-01,4.25
...,...,...,...,...,...,...,...,...,...
16036,386,Ozone (O3),Mean,ppb,UHF34,Bedford Stuyvesant - Crown Heights,Summer 2020,2020-06-01,29.16
16037,386,Ozone (O3),Mean,ppb,UHF34,Downtown - Heights - Slope,Summer 2020,2020-06-01,28.79
16038,386,Ozone (O3),Mean,ppb,UHF34,Greenpoint,Summer 2020,2020-06-01,29.71
16039,386,Ozone (O3),Mean,ppb,UHF34,Pelham - Throgs Neck,Summer 2020,2020-06-01,32.05


In [35]:
air_quality_boros = air_quality[air_quality['Geo Type Name'].str.contains('Borough')]
air_quality_boros = air_quality_boros.drop(['Geo Type Name'], axis=1)

In [36]:
air_quality_boros.sample(10)

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Place Name,Time Period,Start_Date,Data Value
6,386,Ozone (O3),Mean,ppb,Brooklyn,Summer 2009,2009-06-01,26.27
13260,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Queens,Summer 2018,2018-06-01,8.3
1234,661,O3-Attributable Asthma Hospitalizations,Estimated Annual Rate- 18 Yrs and Older,"per 100,000 adults",Queens,2009-2011,2009-01-01,4.2
15842,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Queens,Winter 2019-20,2019-12-01,7.55
10229,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Bronx,Annual Average 2016,2015-12-31,7.75
13254,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Brooklyn,Summer 2018,2018-06-01,8.41
554,653,O3-Attributable Asthma Emergency Department Vi...,Estimated Annual Rate- Children 0 to 17 Yrs Old,"per 100,000 children",Queens,2005-2007,2005-01-01,65.7
747,661,O3-Attributable Asthma Hospitalizations,Estimated Annual Rate- 18 Yrs and Older,"per 100,000 adults",Brooklyn,2005-2007,2005-01-01,8.0
10654,386,Ozone (O3),Mean,ppb,Queens,Summer 2016,2016-05-31,33.98
15146,375,Nitrogen Dioxide (NO2),Mean,ppb,Queens,Annual Average 2020,2020-01-01,14.82


In [37]:
for time_period in air_quality_boros['Time Period'].sort_values().unique():
    print(time_period, 'length:', len(time_period))

2005 length: 4
2005-2007 length: 9
2009-2011 length: 9
2011 length: 4
2012-2014 length: 9
2013 length: 4
2015 length: 4
2015-2017 length: 9
2016 length: 4
Annual Average 2009 length: 19
Annual Average 2010 length: 19
Annual Average 2011 length: 19
Annual Average 2012 length: 19
Annual Average 2013 length: 19
Annual Average 2014 length: 19
Annual Average 2015 length: 19
Annual Average 2016 length: 19
Annual Average 2017 length: 19
Annual Average 2018 length: 19
Annual Average 2019 length: 19
Annual Average 2020 length: 19
Summer 2009 length: 11
Summer 2010 length: 11
Summer 2011 length: 11
Summer 2012 length: 11
Summer 2013 length: 11
Summer 2014 length: 11
Summer 2015 length: 11
Summer 2016 length: 11
Summer 2017 length: 11
Summer 2018 length: 11
Summer 2019 length: 11
Summer 2020 length: 11
Winter 2008-09 length: 14
Winter 2009-10 length: 14
Winter 2010-11 length: 14
Winter 2011-12 length: 14
Winter 2012-13 length: 14
Winter 2013-14 length: 14
Winter 2014-15 length: 14
Winter 2015-16 

In [38]:
air_quality.groupby('Time Period')['Start_Date'].value_counts()

Time Period          Start_Date
2005                 2005-01-01    407
2005-2007            2005-01-01    480
2009-2011            2009-01-01    480
2011                 2011-01-01    214
2012-2014            2012-01-02    480
2013                 2013-01-01    144
2015                 2015-01-01    144
2015-2017            2015-01-01    480
2016                 2016-01-01    321
Annual Average 2009  2008-12-01    282
Annual Average 2010  2009-12-01    282
Annual Average 2011  2010-12-01    282
Annual Average 2012  2011-12-01    282
Annual Average 2013  2012-12-01    282
Annual Average 2014  2013-12-01    282
Annual Average 2015  2015-01-01    282
Annual Average 2016  2015-12-31    282
Annual Average 2017  2017-01-01    282
Annual Average 2018  2018-01-01    282
Annual Average 2019  2019-01-01    282
Annual Average 2020  2020-01-01    282
Summer 2009          2009-06-01    423
Summer 2010          2010-06-01    423
Summer 2011          2011-06-01    423
Summer 2012          2012-06-01 

In [39]:
def create_end_dates(air_quality_boros_dict):
    end_dates = []
    for row in air_quality_boros_dict:
        time_period_str_len = len(row['Time Period'])
        year = row['Start_Date'].year
        if time_period_str_len == 4:
            date = datetime.date(year, 12, 31)
        elif time_period_str_len == 9:
            date = datetime.date(year + 2, 12, 31)
        elif time_period_str_len == 11:
            date = datetime.date(year, 8, 31)
        elif time_period_str_len == 14:
            date = datetime.date(year + 1, 2, 28)
        elif time_period_str_len == 19:
            year = int(row['Time Period'][-4:])
            date = datetime.date(year, 12, 31)
        end_dates.append(date)
    return end_dates

In [40]:
air_quality_boros['end_date'] = create_end_dates(air_quality_boros.to_dict('records'))

In [41]:
air_quality_boros.sample(10)

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Place Name,Time Period,Start_Date,Data Value,end_date
6319,383,Sulfur Dioxide (SO2),Mean,ppb,Queens,Winter 2012-13,2012-12-01,1.01,2013-02-28
159,641,Boiler Emissions- Total PM2.5 Emissions,Number per km2,number,Queens,2013,2013-01-01,0.3,2013-12-31
544,386,Ozone (O3),Mean,ppb,Manhattan,Summer 2011,2011-06-01,27.27,2011-08-31
1041,652,O3-Attributable Cardiac and Respiratory Deaths,Estimated Annual Rate,"per 100,000 residents",Staten Island,2009-2011,2009-01-01,7.8,2011-12-31
17,386,Ozone (O3),Mean,ppb,Brooklyn,Summer 2012,2012-06-01,33.89,2012-08-31
405,650,PM2.5-Attributable Respiratory Hospitalization...,Estimated Annual Rate,"per 100,000 adults",Bronx,2005-2007,2005-01-01,30.8,2007-12-31
6313,383,Sulfur Dioxide (SO2),Mean,ppb,Manhattan,Winter 2011-12,2011-12-01,4.81,2012-02-28
14582,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Bronx,Winter 2018-19,2018-12-01,7.46,2019-02-28
9383,375,Nitrogen Dioxide (NO2),Mean,ppb,Bronx,Annual Average 2015,2015-01-01,19.97,2015-12-31
13252,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Bronx,Winter 2017-18,2017-12-01,8.28,2018-02-28


In [42]:
air_quality_boros['Indicator ID'].nunique()

22

In [43]:
print(air_quality.groupby(['Name', 'Measure'])['Indicator ID'].value_counts())
print(len(air_quality.groupby(['Name', 'Measure'])['Indicator ID'].value_counts()))

Name                                                                          Measure                                          Indicator ID
Air Toxics Concentrations- Average Benzene Concentrations                     Annual Average Concentration                     646              155
Air Toxics Concentrations- Average Formaldehyde Concentrations                Annual Average Concentration                     647              155
Boiler Emissions- Total NOx Emissions                                         Number per km2                                   642               96
Boiler Emissions- Total PM2.5 Emissions                                       Number per km2                                   641               96
Boiler Emissions- Total SO2 Emissions                                         Number per km2                                   640               96
Fine Particulate Matter (PM2.5)                                               Mean                                      

In [44]:
air_quality_boros = air_quality_boros.drop(['Indicator ID'], axis=1)

In [45]:
air_quality_boros.sample(10)

Unnamed: 0,Name,Measure,Measure Info,Geo Place Name,Time Period,Start_Date,Data Value,end_date
555,O3-Attributable Asthma Emergency Department Vi...,Estimated Annual Rate- Children 0 to 17 Yrs Old,"per 100,000 children",Staten Island,2005-2007,2005-01-01,49.9,2007-12-31
8973,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Staten Island,Summer 2015,2015-06-01,8.68,2015-08-31
15845,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Manhattan,Winter 2019-20,2019-12-01,9.13,2020-02-28
13256,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Manhattan,Annual Average 2018,2018-01-01,8.46,2018-12-31
13110,Ozone (O3),Mean,ppb,Bronx,Summer 2018,2018-06-01,30.78,2018-08-31
14116,O3-Attributable Cardiac and Respiratory Deaths,Estimated Annual Rate,"per 100,000 residents",Brooklyn,2015-2017,2015-01-01,5.0,2017-12-31
301,Air Toxics Concentrations- Average Formaldehyd...,Annual Average Concentration,µg/m3,Bronx,2005,2005-01-01,3.3,2005-12-31
9395,Nitrogen Dioxide (NO2),Mean,ppb,Staten Island,Annual Average 2015,2015-01-01,13.72,2015-12-31
11562,PM2.5-Attributable Cardiovascular Hospitalizat...,Estimated Annual Rate,"per 100,000 adults",Queens,2012-2014,2012-01-02,11.060049,2014-12-31
8591,Air Toxics Concentrations- Average Formaldehyd...,Annual Average Concentration,µg/m3,Brooklyn,2011,2011-01-01,2.2,2011-12-31


In [46]:
air_quality_boros.groupby(['Name'])['Measure'].value_counts()

Name                                                                          Measure                                        
Air Toxics Concentrations- Average Benzene Concentrations                     Annual Average Concentration                        10
Air Toxics Concentrations- Average Formaldehyde Concentrations                Annual Average Concentration                        10
Boiler Emissions- Total NOx Emissions                                         Number per km2                                      10
Boiler Emissions- Total PM2.5 Emissions                                       Number per km2                                      10
Boiler Emissions- Total SO2 Emissions                                         Number per km2                                      10
Fine Particulate Matter (PM2.5)                                               Mean                                               180
Nitrogen Dioxide (NO2)                                                      

In [47]:
air_quality_boros[air_quality_boros['Time Period'].str.contains("Annual")]['Measure'].unique()

array(['Mean'], dtype=object)

In [48]:
air_quality_boros[air_quality_boros['Time Period'].str.contains("Annual")]['Name'].unique()

array(['Nitrogen Dioxide (NO2)', 'Fine Particulate Matter (PM2.5)'],
      dtype=object)

In [49]:
air_quality_boros[air_quality_boros['Measure'].str.contains("Mean")]['Time Period'].unique()

array(['Summer 2013', 'Summer 2014', 'Summer 2009', 'Summer 2010',
       'Summer 2011', 'Summer 2012', 'Winter 2008-09', 'Winter 2009-10',
       'Winter 2010-11', 'Winter 2011-12', 'Winter 2012-13',
       'Annual Average 2009', 'Annual Average 2010',
       'Annual Average 2011', 'Annual Average 2012',
       'Annual Average 2013', 'Winter 2013-14', 'Annual Average 2014',
       'Annual Average 2015', 'Summer 2015', 'Winter 2014-15',
       'Winter 2015-16', 'Summer 2016', 'Annual Average 2016',
       'Summer 2017', 'Annual Average 2017', 'Winter 2016-17',
       'Annual Average 2018', 'Summer 2018', 'Winter 2017-18',
       'Annual Average 2019', 'Summer 2019', 'Winter 2018-19',
       'Annual Average 2020', 'Winter 2019-20', 'Summer 2020'],
      dtype=object)

## Traffic Volume Dataset Cleaning

In [50]:
traffic_volume.sample(10)

Unnamed: 0,RequestID,Boro,Yr,M,D,HH,MM,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction
5419833,21918,Bronx,2016,2,22,3,0,3,77339,POINT (1011353.3287200108 238854.19494052036),UNION AVENUE,East 161 Street,East 163 Street,NB
25919884,2894,Manhattan,2010,10,8,2,30,28,31903,POINT (982595.9 203309.6),S/B VARICK ST(2 LANE W/S) S OF BROOME ST,BROOME ST/HOLLAND TUN APPR,DOMINICK ST,SB
8934799,12181,Brooklyn,2011,11,18,2,0,34,22388,POINT (985058.7 186985.5),COURT ST APPROACH TO 2ND PL,2 PL,1 PL,SB
21303386,10533,Brooklyn,2012,10,1,10,45,22,22728,POINT (988336.8 181356.4),14 ST,6 AV,7 AV,WB
303913,24003,Queens,2016,7,19,4,45,119,111705,POINT (1007159.8642303782 220108.66816084864),HOYT AVENUE NORTH,31 Street,29 Street,NB
16004957,28809,Bronx,2019,1,5,21,30,110,69639,POINT (1003784.5999818614 235373.2570641422),EAST 138 STREET,Metro North Harlem Line,Park Avenue,EB
10134934,12351,Manhattan,2010,3,16,5,30,19,37869,POINT (998583.5 222145.5),E 86TH ST BET 1ST AVE & YORK AVE,YORK AV,1 AV,WB
13465466,23544,Queens,2016,6,8,18,45,63,190825,POINT (1021690.5981275195 205616.81889445538),QUEENS BOULEVARD,Dead End,63 Avenue,WB
3860926,21148,Queens,2016,1,16,21,15,28,155832,POINT (1035223.8 196287.7),QUEENS BLVD,87 AV,HILLSIDE AV,WB
22500940,16632,Queens,2014,4,19,22,0,350,147123,POINT (1029496.1 217223.1),FLUSHING BR,NORTHERN BLVD,NORTHERN BLVD,WB


In [51]:
traffic_volume.shape

(27190511, 14)

In [52]:
print(traffic_volume.isnull().sum() / len(traffic_volume))

RequestID    0.000000
Boro         0.000000
Yr           0.000000
M            0.000000
D            0.000000
HH           0.000000
MM           0.000000
Vol          0.000000
SegmentID    0.000000
WktGeom      0.000000
street       0.000000
fromSt       0.000000
toSt         0.000074
Direction    0.000000
dtype: float64


In [53]:
print(traffic_volume.nunique() / len(traffic_volume))

RequestID    2.607527e-04
Boro         1.838877e-07
Yr           5.884406e-07
M            4.413304e-07
D            1.140104e-06
HH           8.826609e-07
MM           1.471101e-07
Vol          1.476986e-04
SegmentID    5.499345e-04
WktGeom      7.525787e-04
street       2.482484e-04
fromSt       2.361486e-04
toSt         2.175391e-04
Direction    2.206652e-07
dtype: float64


In [54]:
traffic_volume.nunique()

RequestID     7090
Boro             5
Yr              16
M               12
D               31
HH              24
MM               4
Vol           4016
SegmentID    14953
WktGeom      20463
street        6750
fromSt        6421
toSt          5915
Direction        6
dtype: int64

In [55]:
traffic_volume.dtypes

RequestID     int64
Boro         object
Yr            int64
M             int64
D             int64
HH            int64
MM            int64
Vol           int64
SegmentID     int64
WktGeom      object
street       object
fromSt       object
toSt         object
Direction    object
dtype: object

In [56]:
traffic_volume.Yr.min()

2000

In [57]:
traffic_volume = traffic_volume[traffic_volume['Yr'] >= 2005]

In [58]:
traffic_volume.shape

(27188607, 14)

In [59]:
traffic_volume['Yr'].value_counts().sort_index()

2006        664
2007      11780
2008      68591
2009    1012766
2010    1421397
2011    1238391
2012    2434583
2013    2829656
2014    3708367
2015    3232005
2016    3362243
2017    3013530
2018    2046443
2019    2365633
2020     442558
Name: Yr, dtype: int64

In [60]:
traffic_volume = traffic_volume[traffic_volume['Yr'] > 2008]

In [61]:
traffic_volume.shape

(27107572, 14)

In [62]:
traffic_volume['date_time'] = pd.to_datetime(dict(year=traffic_volume.Yr, \
                                                  month=traffic_volume.M, \
                                                  day=traffic_volume.D, \
                                                  hour=traffic_volume.HH, \
                                                  minute=traffic_volume.MM))

In [63]:
traffic_volume = traffic_volume.drop(['Yr', 'M', 'D', 'HH', 'MM'], axis=1)

In [64]:
traffic_volume.sample(10)

Unnamed: 0,RequestID,Boro,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction,date_time
18510154,27437,Queens,77,67573,POINT (1003923.5058462786 217529.72337725165),BROADWAY,23 Street,Crescent Street,EB,2018-02-04 18:00:00
464779,26799,Queens,98,62211,POINT (1055108.9938887337 161625.57818784998),CENTRAL AVENUE,Dead End,Virginia Street,EB,2017-05-02 17:15:00
14018724,12290,Brooklyn,203,41160,POINT (1004301.9 167786.5),UTICA AVE SB APPROACH TO FLATLANDS AVE,FLATLANDS AV,AV J,SB,2010-07-17 15:15:00
8602635,18050,Queens,267,153288,POINT (1030902.6906675637 210016.6205223442),LONG ISLAND EXPRESSWAY,Dead End,Dead end,EB,2014-11-22 03:45:00
1789701,32384,Manhattan,351,36375,POINT (994320.9493131964 216734.2445431342),2 AVENUE,Astoria Line,East 61 Street,SB,2020-10-21 06:30:00
8979246,23557,Brooklyn,45,35487,POINT (998258.9271810555 203130.1726799544),NASSAU AVENUE,Leonard Street,Eckford Street,SB,2016-06-14 14:45:00
18039145,1360,Queens,61,54914,POINT (1033207 187752.3),W/B 109 AVE @ 117 ST,117 ST,118 ST,WB,2010-01-19 18:30:00
11382805,30153,Queens,147,102994,POINT (1067176.7350721152 208645.26446477076),HILLSIDE AVENUE,268 Street,Langdale Street,EB,2019-05-09 09:45:00
13968867,18590,Brooklyn,57,152674,POINT (1004377.8179838859 199434.80382497213),METROPOLITAN AVENUE,Stewart Avenue,Gardner Avenue,EB,2014-10-15 00:45:00
15303165,27019,Brooklyn,5,191563,POINT (991745.3874004212 176266.48862114173),OCEAN PARKWAY,Bike Path,Dead end,EB,2017-11-01 03:45:00


In [65]:
traffic_volume['date_time'].dt.year.value_counts().sort_index()

2009    1012766
2010    1421397
2011    1238391
2012    2434583
2013    2829656
2014    3708367
2015    3232005
2016    3362243
2017    3013530
2018    2046443
2019    2365633
2020     442558
Name: date_time, dtype: int64

In [66]:
traffic_volume['date_time'].dt.year.value_counts().sort_index() / len(traffic_volume)

2009    0.037361
2010    0.052435
2011    0.045684
2012    0.089812
2013    0.104386
2014    0.136802
2015    0.119229
2016    0.124033
2017    0.111169
2018    0.075493
2019    0.087268
2020    0.016326
Name: date_time, dtype: float64

In [67]:
traffic_volume.head()

Unnamed: 0,RequestID,Boro,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction,date_time
0,20856,Queens,9,171896,POINT (1052296.600156678 199785.26932711253),94 AVENUE,207 Street,Francis Lewis Boulevard,WB,2015-06-23 23:30:00
1,21231,Staten Island,6,9896,POINT (942668.0589509147 171441.21296926),RICHMOND TERRACE,Wright Avenue,Emeric Court,WB,2015-09-14 04:15:00
2,29279,Bronx,85,77817,POINT (1016508.0034050211 235221.59092266942),HUNTS POINT AVENUE,Whittier Street,Randall Avenue,NB,2017-10-19 04:30:00
3,27019,Brooklyn,168,188023,POINT (992925.4316054962 184116.82855457635),FLATBUSH AVENUE,Brighton Line,Brighton Line,NB,2017-11-07 18:30:00
4,26734,Manhattan,355,137516,POINT (1004175.9505178436 247779.63624949602),WASHINGTON BRIDGE,Harlem River Shoreline,Harlem River Shoreline,EB,2017-11-03 22:00:00


In [68]:
traffic_volume.sort_values(["date_time"], 
                    axis=0,
                    ascending=[False], 
                    inplace=True);
traffic_volume.head(15)

Unnamed: 0,RequestID,Boro,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction,date_time
25543663,32417,Queens,20,67665,POINT (1004228.4823799994 215767.68782613552),31 STREET,34 Avenue,35 Avenue,SB,2020-11-22 23:45:00
20968423,32417,Queens,3,101621,POINT (1055667.922729934 216597.78334720692),DOUGLASTON PARKWAY,Maryland Road,Van Zandt Avenue,SB,2020-11-22 23:45:00
6368254,32417,Queens,27,148877,POINT (1015917.5752772316 218664.23589469263),23 AVENUE,Dead End,85 Street,WB,2020-11-22 23:45:00
7417214,32417,Queens,22,155846,POINT (1050166.9282559236 199291.28045636677),JAMAICA AVENUE,197 Street,198 Street,WB,2020-11-22 23:45:00
18926986,32417,Queens,5,76510,POINT (1017222.1345317845 216101.64900761063),31 AVENUE,87 Street,88 Street,WB,2020-11-22 23:45:00
13407547,32417,Queens,60,155773,POINT (1027092.2981505651 190399.9018293268),ATLANTIC AVENUE,97 Street,98 Street,WB,2020-11-22 23:45:00
7423333,32417,Queens,8,75313,POINT (1019421.0343045937 208800.348152701),51 AVENUE,90 Street,92 Street,EB,2020-11-22 23:45:00
18674125,32417,Queens,33,45497,POINT (1009925.0151697205 199046.51050050836),METROPOLITAN AVENUE,55 Street,56 Street,WB,2020-11-22 23:45:00
18956053,32417,Queens,6,145416,POINT (1046510.2750956557 205583.03286028266),UNION TURNPIKE,Dead End,Dead end,EB,2020-11-22 23:45:00
9171262,32417,Queens,5,101855,POINT (1057510.6819890984 218109.6099654521),MARATHON PARKWAY,Rushmore Avenue,Morenci Lane,SB,2020-11-22 23:45:00


In [69]:
traffic_vol_daily = traffic_volume.groupby(['Boro', traffic_volume['date_time'].dt.date])['Vol'].mean().reset_index()
traffic_vol_daily.columns = traffic_vol_daily.columns.str.lower()
traffic_vol_daily.rename(columns={'date_time':'date'}, inplace=True)

In [70]:
traffic_vol_daily.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12737 entries, 0 to 12736
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   boro    12737 non-null  object 
 1   date    12737 non-null  object 
 2   vol     12737 non-null  float64
dtypes: float64(1), object(2)
memory usage: 298.6+ KB


In [71]:
print(len(traffic_vol_daily[traffic_vol_daily['boro'] == 'Bronx']))
traffic_vol_daily[traffic_vol_daily['boro'] == 'Bronx'].head(10)

2306


Unnamed: 0,boro,date,vol
0,Bronx,2009-01-20,55.559748
1,Bronx,2009-01-21,42.722222
2,Bronx,2009-01-22,45.707224
3,Bronx,2009-01-23,49.243056
4,Bronx,2009-01-24,33.958333
5,Bronx,2009-01-25,25.475694
6,Bronx,2009-01-26,2.755556
7,Bronx,2009-02-21,57.924731
8,Bronx,2009-02-22,32.234375
9,Bronx,2009-02-23,42.192708


In [72]:
print(len(traffic_vol_daily[traffic_vol_daily['boro'] == 'Queens']))
traffic_vol_daily[traffic_vol_daily['boro'] == 'Queens'].head(10)

3098


Unnamed: 0,boro,date,vol
7970,Queens,2009-01-12,13.409091
7971,Queens,2009-01-13,89.291498
7972,Queens,2009-01-14,66.995444
7973,Queens,2009-01-15,64.7125
7974,Queens,2009-01-16,87.118483
7975,Queens,2009-01-17,8.885057
7976,Queens,2009-02-09,34.076517
7977,Queens,2009-02-10,24.81994
7978,Queens,2009-02-11,25.796131
7979,Queens,2009-02-12,26.970238


In [73]:
print(traffic_vol_daily[traffic_vol_daily['boro'] == 'Brooklyn'].shape[0])
traffic_vol_daily[traffic_vol_daily['boro'] == 'Brooklyn'].head(10)

3143


Unnamed: 0,boro,date,vol
2306,Brooklyn,2009-01-12,40.478992
2307,Brooklyn,2009-01-13,33.807292
2308,Brooklyn,2009-01-14,18.541667
2309,Brooklyn,2009-01-15,12.307292
2310,Brooklyn,2009-01-16,34.046875
2311,Brooklyn,2009-01-17,29.776042
2312,Brooklyn,2009-01-18,22.140625
2313,Brooklyn,2009-01-19,17.057292
2314,Brooklyn,2009-01-20,34.370052
2315,Brooklyn,2009-01-21,55.906667


In [74]:
print(traffic_vol_daily[traffic_vol_daily['boro'] == 'Manhattan'].shape[0])
traffic_vol_daily[traffic_vol_daily['boro'] == 'Manhattan'].head(10)

2521


Unnamed: 0,boro,date,vol
5449,Manhattan,2009-01-08,54.047619
5450,Manhattan,2009-01-09,88.436326
5451,Manhattan,2009-01-10,70.91875
5452,Manhattan,2009-01-11,59.25625
5453,Manhattan,2009-01-12,71.082038
5454,Manhattan,2009-01-13,65.396577
5455,Manhattan,2009-01-14,69.708333
5456,Manhattan,2009-01-15,69.952381
5457,Manhattan,2009-01-16,79.55878
5458,Manhattan,2009-01-17,71.813244


In [75]:
print(traffic_vol_daily[traffic_vol_daily['boro'] == 'Staten Island'].shape[0])
traffic_vol_daily[traffic_vol_daily['boro'] == 'Staten Island'].head(10)

1669


Unnamed: 0,boro,date,vol
11068,Staten Island,2009-02-06,73.077844
11069,Staten Island,2009-02-07,46.930556
11070,Staten Island,2009-02-08,38.333333
11071,Staten Island,2009-02-09,42.625
11072,Staten Island,2009-02-10,30.1875
11073,Staten Island,2009-02-11,40.204861
11074,Staten Island,2009-02-12,24.638889
11075,Staten Island,2009-02-13,5.555556
11076,Staten Island,2009-02-14,3.614583
11077,Staten Island,2009-02-15,2.788194


In [89]:
print(str(traffic_vol_daily['date'].min()).replace('-', ''))

20090108


In [85]:
traffic_vol_daily['date'].max()

datetime.date(2020, 11, 22)

## EPA data

In [86]:
email = 'daguila000@citymail.cuny.edu'
key = 'cobaltcrane81'
param_code = ''
start_date = ''
end_date = ''
state_code = '36'
county_codes = ['']


In [77]:
import json

In [78]:
url = 'https://aqs.epa.gov/data/api/list/states?email=test@aqs.api&key=test'

response = requests.get(url)
j = json.loads(response.text)
print(json.dumps(j, indent=2))

{
  "Header": [
    {
      "status": "Success",
      "request_time": "2022-12-14T21:37:08-05:00",
      "url": "https://aqs.epa.gov/data/api/list/states?email=test@aqs.api&key=test",
      "rows": 56
    }
  ],
  "Data": [
    {
      "code": "01",
      "value_represented": "Alabama"
    },
    {
      "code": "02",
      "value_represented": "Alaska"
    },
    {
      "code": "04",
      "value_represented": "Arizona"
    },
    {
      "code": "05",
      "value_represented": "Arkansas"
    },
    {
      "code": "06",
      "value_represented": "California"
    },
    {
      "code": "08",
      "value_represented": "Colorado"
    },
    {
      "code": "09",
      "value_represented": "Connecticut"
    },
    {
      "code": "10",
      "value_represented": "Delaware"
    },
    {
      "code": "11",
      "value_represented": "District Of Columbia"
    },
    {
      "code": "12",
      "value_represented": "Florida"
    },
    {
      "code": "13",
      "value_represented": 

In [79]:
#print(j['Data']['value_represented'])

state_code = ''

for state in j['Data']:
    if state['value_represented'] == 'New York':
        state_code = state['code']
        
print(state_code)

36


In [106]:
url = f'https://aqs.epa.gov/data/api/list/countiesByState?email=test@aqs.api&key=test&state={state_code}'

response = requests.get(url)
j = json.loads(response.text)
print(json.dumps(j, indent=2))

{
  "Header": [
    {
      "status": "Success",
      "request_time": "2022-12-14T22:20:57-05:00",
      "url": "https://aqs.epa.gov/data/api/list/countiesByState?email=test@aqs.api&key=test&state=36",
      "rows": 62
    }
  ],
  "Data": [
    {
      "code": "001",
      "value_represented": "Albany"
    },
    {
      "code": "003",
      "value_represented": "Allegany"
    },
    {
      "code": "005",
      "value_represented": "Bronx"
    },
    {
      "code": "007",
      "value_represented": "Broome"
    },
    {
      "code": "009",
      "value_represented": "Cattaraugus"
    },
    {
      "code": "011",
      "value_represented": "Cayuga"
    },
    {
      "code": "013",
      "value_represented": "Chautauqua"
    },
    {
      "code": "015",
      "value_represented": "Chemung"
    },
    {
      "code": "017",
      "value_represented": "Chenango"
    },
    {
      "code": "019",
      "value_represented": "Clinton"
    },
    {
      "code": "021",
      "value_rep

In [107]:
import re

county_codes = []

for county in j['Data']:
    if re.search('Bronx|Kings|New York|Queens|Richmond', county['value_represented']):
        county_codes.append(county['code'])
        
print(county_codes)

{'Header': [{'status': 'Success', 'request_time': '2022-12-14T22:20:57-05:00', 'url': 'https://aqs.epa.gov/data/api/list/countiesByState?email=test@aqs.api&key=test&state=36', 'rows': 62}], 'Data': [{'code': '001', 'value_represented': 'Albany'}, {'code': '003', 'value_represented': 'Allegany'}, {'code': '005', 'value_represented': 'Bronx'}, {'code': '007', 'value_represented': 'Broome'}, {'code': '009', 'value_represented': 'Cattaraugus'}, {'code': '011', 'value_represented': 'Cayuga'}, {'code': '013', 'value_represented': 'Chautauqua'}, {'code': '015', 'value_represented': 'Chemung'}, {'code': '017', 'value_represented': 'Chenango'}, {'code': '019', 'value_represented': 'Clinton'}, {'code': '021', 'value_represented': 'Columbia'}, {'code': '023', 'value_represented': 'Cortland'}, {'code': '025', 'value_represented': 'Delaware'}, {'code': '027', 'value_represented': 'Dutchess'}, {'code': '029', 'value_represented': 'Erie'}, {'code': '031', 'value_represented': 'Essex'}, {'code': '033'

In [129]:
year_counter = traffic_vol_daily['date'].min().year
end_year = traffic_vol_daily['date'].max().year

while year_counter <= end_year:
    print(str(year_counter) + '0101')
    year_counter += 1

20090101
20100101
20110101
20120101
20130101
20140101
20150101
20160101
20170101
20180101
20190101
20200101


In [187]:
import time 
#start_date = str(traffic_vol_daily['date'].min()).replace('-', '')
#end_date = str(traffic_vol_daily['date'].max()).replace('-', '')
#start_date = '20160101'
#end_date = '20160229'
#county_code = '005'
param_code = '88101'
daily_air_quality_list = []

year_counter = traffic_vol_daily['date'].min().year
end_year = traffic_vol_daily['date'].max().year

while year_counter <= end_year:
    for county_code in county_codes: 
        start_date = str(year_counter) + '0101'
        end_date = str(year_counter) + '1231'
        url = f'https://aqs.epa.gov/data/api/dailyData/byCounty?email={email}&key={key}&param={param_code}&bdate={start_date}&edate={end_date}&state={state_code}&county={county_code}'
        response = requests.get(url)
        j = json.loads(response.text)
        print(j['Header'])
        daily_air_quality_list.extend(j['Data'])
        time.sleep(6)
    year_counter += 1


[{'status': 'Success', 'request_time': '2022-12-15T01:48:06-05:00', 'url': 'https://aqs.epa.gov/data/api/dailyData/byCounty?email=daguila000@citymail.cuny.edu&key=cobaltcrane81&param=88101&bdate=20090101&edate=20091231&state=36&county=005', 'rows': 4080}]
[{'status': 'Success', 'request_time': '2022-12-15T01:48:14-05:00', 'url': 'https://aqs.epa.gov/data/api/dailyData/byCounty?email=daguila000@citymail.cuny.edu&key=cobaltcrane81&param=88101&bdate=20090101&edate=20091231&state=36&county=047', 'rows': 672}]
[{'status': 'Success', 'request_time': '2022-12-15T01:48:21-05:00', 'url': 'https://aqs.epa.gov/data/api/dailyData/byCounty?email=daguila000@citymail.cuny.edu&key=cobaltcrane81&param=88101&bdate=20090101&edate=20091231&state=36&county=061', 'rows': 2694}]
[{'status': 'Success', 'request_time': '2022-12-15T01:48:28-05:00', 'url': 'https://aqs.epa.gov/data/api/dailyData/byCounty?email=daguila000@citymail.cuny.edu&key=cobaltcrane81&param=88101&bdate=20090101&edate=20091231&state=36&count

In [342]:
daily_air_quality_df = pd.DataFrame(daily_air_quality_list)

In [411]:
daily_air_quality_df.to_csv('datasets/daily_air_quality.csv')

In [343]:
print(daily_air_quality_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 127061 entries, 0 to 127060
Data columns (total 32 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   state_code            127061 non-null  object 
 1   county_code           127061 non-null  object 
 2   site_number           127061 non-null  object 
 3   parameter_code        127061 non-null  object 
 4   poc                   127061 non-null  int64  
 5   latitude              127061 non-null  float64
 6   longitude             127061 non-null  float64
 7   datum                 127061 non-null  object 
 8   parameter             127061 non-null  object 
 9   sample_duration_code  127061 non-null  object 
 10  sample_duration       127061 non-null  object 
 11  pollutant_standard    122376 non-null  object 
 12  date_local            127061 non-null  object 
 13  units_of_measure      127061 non-null  object 
 14  event_type            127061 non-null  object 
 15  

In [344]:
daily_air_quality_df.sample(5).transpose()

Unnamed: 0,100236,15889,90060,50460,44525
state_code,36,36,36,36,36
county_code,081,061,081,061,005
site_number,0124,0079,0124,0079,0110
parameter_code,88101,88101,88101,88101,88101
poc,4,1,4,2,4
latitude,40.73614,40.7997,40.73614,40.7997,40.816
longitude,-73.82153,-73.93432,-73.82153,-73.93432,-73.902
datum,WGS84,WGS84,WGS84,WGS84,WGS84
parameter,PM2.5 - Local Conditions,PM2.5 - Local Conditions,PM2.5 - Local Conditions,PM2.5 - Local Conditions,PM2.5 - Local Conditions
sample_duration_code,1,7,X,7,X


In [345]:
daily_air_quality_df['date_local'] = pd.to_datetime(daily_air_quality_df['date_local'], infer_datetime_format=True)
daily_air_quality_df['date_local'] = daily_air_quality_df['date_local'].dt.date

In [346]:
for county in daily_air_quality_df['county'].unique():
    mean = round(daily_air_quality_df[daily_air_quality_df['county'] == county]['aqi'].mean())
    filled_na_county = daily_air_quality_df.loc[(daily_air_quality_df['county'] == county) & (daily_air_quality_df['aqi'].isnull()), ['aqi']].fillna(mean)
    daily_air_quality_df.loc[(daily_air_quality_df['county'] == county) & (daily_air_quality_df['aqi'].isnull()), ['aqi']] = filled_na_county

In [402]:
for county in daily_air_quality_df['county'].unique():
    mean = daily_air_quality_df[(daily_air_quality_df['county'] == county) & 
                                (daily_air_quality_df['sample_duration'] != '1 HOUR')]['arithmetic_mean'].mean()
    
    daily_air_quality_df.loc[(daily_air_quality_df['county'] == county) & 
                             (daily_air_quality_df['sample_duration'] == '1 HOUR'), ['arithmetic_mean']] = mean

In [347]:
import math 
for aqi in daily_air_quality_df['aqi']:
    if not aqi.is_integer() and not math.isnan(aqi):
        print('is not integer', aqi)
        break

In [348]:
daily_air_quality_df['aqi'] = daily_air_quality_df['aqi'].astype(int)

In [352]:
daily_air_quality_df = daily_air_quality_df[(daily_air_quality_df['date_local'] >= traffic_vol_daily['date'].min()) & 
                                            (daily_air_quality_df['date_local'] <= traffic_vol_daily['date'].max())]

In [407]:
daily_air_quality_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 125643 entries, 0 to 127060
Data columns (total 32 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   state_code            125643 non-null  object 
 1   county_code           125643 non-null  object 
 2   site_number           125643 non-null  object 
 3   parameter_code        125643 non-null  object 
 4   poc                   125643 non-null  int64  
 5   latitude              125643 non-null  float64
 6   longitude             125643 non-null  float64
 7   datum                 125643 non-null  object 
 8   parameter             125643 non-null  object 
 9   sample_duration_code  125643 non-null  object 
 10  sample_duration       125643 non-null  object 
 11  pollutant_standard    121110 non-null  object 
 12  date_local            125643 non-null  object 
 13  units_of_measure      125643 non-null  object 
 14  event_type            125643 non-null  object 
 15  

In [233]:
print(len(daily_air_quality_df))
print(len(traffic_vol_daily))

125717
12737


In [354]:
daily_air_quality_df['validity_indicator'].value_counts()

Y    125643
N        74
Name: validity_indicator, dtype: int64

In [370]:
daily_air_quality_df = daily_air_quality_df[daily_air_quality_df['validity_indicator'] == 'Y']

In [371]:
daily_air_quality_df['sample_duration'].value_counts()

24 HOUR          93912
24-HR BLK AVG    27198
1 HOUR            4533
Name: sample_duration, dtype: int64

In [372]:
daily_air_quality_df['sample_duration'].value_counts() / len(daily_air_quality_df['sample_duration'])

24 HOUR          0.747451
24-HR BLK AVG    0.216470
1 HOUR           0.036078
Name: sample_duration, dtype: float64

In [373]:
daily_air_quality_df['event_type'].unique()

array(['No Events', 'Concurred Events Excluded', 'Events Included'],
      dtype=object)

In [374]:
daily_air_quality_df['event_type'].value_counts()

No Events                    125631
Concurred Events Excluded         6
Events Included                   6
Name: event_type, dtype: int64

In [375]:
daily_air_quality_df['parameter'].unique()

array(['PM2.5 - Local Conditions'], dtype=object)

In [198]:
daily_air_quality_df['method'].unique()

array(['R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC',
       'Thermo Scientific TEOM 1405-DF Dichotomous FDMS - FDMS Gravimetric',
       'R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC - Gravimetric',
       'Teledyne T640 at 5.0 LPM - Broadband spectroscopy'], dtype=object)

In [202]:
daily_air_quality_df['units_of_measure'].unique()

array(['Micrograms/cubic meter (LC)'], dtype=object)

In [408]:
daily_air_quality_df_cleaned = daily_air_quality_df[['date_local', 
                                                     'parameter',
                                                     'units_of_measure',  
                                                     'arithmetic_mean', 
                                                     'first_max_value', 
                                                     'aqi', 
                                                     'county']]

In [409]:
daily_air_quality_df_cleaned = daily_air_quality_df_cleaned.reset_index().drop(['index'], axis=1)

In [410]:
daily_air_quality_df_cleaned.sample(15)

Unnamed: 0,date_local,parameter,units_of_measure,arithmetic_mean,first_max_value,aqi,county
116828,2019-03-05,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),9.7,9.7,40,Queens
58283,2015-11-05,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),24.9,24.9,78,Bronx
58167,2015-01-01,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),9.2,9.2,38,Bronx
91087,2017-11-15,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),4.8,4.8,20,Queens
75458,2016-01-04,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),5.1,5.1,21,New York
112596,2019-12-21,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),8.3,8.3,35,Queens
98700,2018-06-25,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),5.3,5.3,22,New York
106760,2019-04-12,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),8.0,8.0,33,Bronx
4739,2009-11-15,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),8.6,8.6,36,New York
11224,2010-03-13,PM2.5 - Local Conditions,Micrograms/cubic meter (LC),3.3,3.3,14,Bronx


In [423]:
daily_air_quality_df_cleaned = daily_air_quality_df_cleaned.groupby(['county', 'date_local'])['arithmetic_mean', 'aqi'].mean().reset_index()
daily_air_quality_df_cleaned

  daily_air_quality_df_cleaned = daily_air_quality_df_cleaned.groupby(['county', 'date_local'])['arithmetic_mean', 'aqi'].mean().reset_index()


Unnamed: 0,county,date_local,arithmetic_mean,aqi
0,Bronx,2009-01-08,10.7,45.00
1,Bronx,2009-01-09,5.1,21.00
2,Bronx,2009-01-10,11.2,46.75
3,Bronx,2009-01-11,12.2,51.00
4,Bronx,2009-01-12,13.0,53.00
...,...,...,...,...
11486,Richmond,2020-11-05,8.5,35.00
11487,Richmond,2020-11-08,24.2,76.00
11488,Richmond,2020-11-11,0.0,0.00
11489,Richmond,2020-11-17,5.0,21.00


## Transformed Datasets

In [82]:
air_quality.sample(10)

Unnamed: 0,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Place Name,Time Period,Start_Date,Data Value
10661,386,Ozone (O3),Mean,ppb,UHF42,High Bridge - Morrisania,Summer 2016,2016-05-31,32.34
9544,375,Nitrogen Dioxide (NO2),Mean,ppb,UHF34,East New York,Summer 2015,2015-06-01,13.88
1815,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Bayside and Little Neck (CD11),Winter 2012-13,2012-12-01,0.78
4929,383,Sulfur Dioxide (SO2),Mean,ppb,UHF42,Borough Park,Winter 2011-12,2011-12-01,1.61
10110,383,Sulfur Dioxide (SO2),Mean,ppb,UHF42,Central Harlem - Morningside Heights,Winter 2015-16,2015-12-01,0.35
11997,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,CD,Highbridge and Concourse (CD4),Winter 2016-17,2016-12-01,8.66
15178,375,Nitrogen Dioxide (NO2),Mean,ppb,UHF42,Downtown - Heights - Slope,Summer 2020,2020-06-01,12.96
6410,365,Fine Particulate Matter (PM2.5),Mean,mcg per cubic meter,Borough,Staten Island,Winter 2010-11,2010-12-01,11.68
7204,383,Sulfur Dioxide (SO2),Mean,ppb,CD,Hunts Point and Longwood (CD2),Winter 2013-14,2013-12-01,1.84
8917,644,Traffic Density- Annual Vehicle Miles Traveled...,million miles,per km2,CD,Fordham and University Heights (CD5),2016,2016-01-01,34.6


In [83]:
traffic_volume.head(15)

Unnamed: 0,RequestID,Boro,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction,date_time
25543663,32417,Queens,20,67665,POINT (1004228.4823799994 215767.68782613552),31 STREET,34 Avenue,35 Avenue,SB,2020-11-22 23:45:00
20968423,32417,Queens,3,101621,POINT (1055667.922729934 216597.78334720692),DOUGLASTON PARKWAY,Maryland Road,Van Zandt Avenue,SB,2020-11-22 23:45:00
6368254,32417,Queens,27,148877,POINT (1015917.5752772316 218664.23589469263),23 AVENUE,Dead End,85 Street,WB,2020-11-22 23:45:00
7417214,32417,Queens,22,155846,POINT (1050166.9282559236 199291.28045636677),JAMAICA AVENUE,197 Street,198 Street,WB,2020-11-22 23:45:00
18926986,32417,Queens,5,76510,POINT (1017222.1345317845 216101.64900761063),31 AVENUE,87 Street,88 Street,WB,2020-11-22 23:45:00
13407547,32417,Queens,60,155773,POINT (1027092.2981505651 190399.9018293268),ATLANTIC AVENUE,97 Street,98 Street,WB,2020-11-22 23:45:00
7423333,32417,Queens,8,75313,POINT (1019421.0343045937 208800.348152701),51 AVENUE,90 Street,92 Street,EB,2020-11-22 23:45:00
18674125,32417,Queens,33,45497,POINT (1009925.0151697205 199046.51050050836),METROPOLITAN AVENUE,55 Street,56 Street,WB,2020-11-22 23:45:00
18956053,32417,Queens,6,145416,POINT (1046510.2750956557 205583.03286028266),UNION TURNPIKE,Dead End,Dead end,EB,2020-11-22 23:45:00
9171262,32417,Queens,5,101855,POINT (1057510.6819890984 218109.6099654521),MARATHON PARKWAY,Rushmore Avenue,Morenci Lane,SB,2020-11-22 23:45:00
