# JS_EDA__Combine_2018Flow_with_SeattleStreets

## Contents
- [Notebook Focus](#Notebook-Focus)
- [Plan for Join](#Plan-for-Join)

- [How to Use Exported File](#How-to-Use-Exported-File)

<hr>

### Notebook Focus

In addition to the collision data, there is related information in the traffic flow and Seattle street data sets. This notebook will combine

- Seattle_streets.csv
- 2018_Traffic_Flow_Counts_singlekeys.csv (formerly 2018_Traffic_Flow_Counts.csv)

export the results to
- combined_streets_2018_flow.csv

and then demonstrate how to pair this with 
- collisions_clean.csv

### streets
- **COMPKEY** - primary key of the street asset table
- UNITDESC - structured description of the street location
- STNAME_ORD - street segment name

### flow 
- **COMPKEY** - street segment number; may include more than one segment
- STNAME_ORD - street segment name





# Import and notebook setup

In [1]:
#import numpy as np
import pandas as pd

pd.options.display.max_rows = 500
pd.options.display.max_columns = 100

In [2]:
df_flow = pd.read_csv('../data/2018_Traffic_Flow_Counts_singlekeys.csv')
df_streets = pd.read_csv('../z_misc_NO_CHECKIN/Seattle_Streets.csv')
df_collisions = pd.read_csv('../data/collisions_clean.csv')

In [3]:
# FIX LATER - also try intersections and maybe SND
df_snd = pd.read_csv('../z_misc_NO_CHECKIN/Street_Network_Database__SND_.csv')
df_intersections = pd.read_csv('../data/intersections.csv')

## Review the columns

In [4]:
# Convert all columns names to lower case

df_flow.columns = df_flow.columns.str.lower()
df_streets.columns = df_streets.columns.str.lower()
df_intersections.columns = df_intersections.columns.str.lower()

df_snd.columns = df_snd.columns.str.lower()

In [5]:
df_flow.columns

Index(['objectid', 'stname_ord', 'flowsegid', 'downtown', 'start_date', 'ampk',
       'pmpk', 'awdt', 'adt', 'awdt_rounded', 'dataquality', 'flags',
       'shape_length', 'compkey'],
      dtype='object')

In [6]:
df_streets.columns

Index(['objectid', 'artclass', 'compkey', 'unitid', 'unitid2', 'unitidsort',
       'unitdesc', 'stname_ord', 'xstrlo', 'xstrhi', 'artdescript', 'owner',
       'status', 'blocknbr', 'speedlimit', 'segdir', 'oneway', 'onewaydir',
       'flow', 'seglength', 'surfacewidth', 'surfacetype_1', 'surfacetype_2',
       'intrlo', 'dirlo', 'intkeylo', 'intrhi', 'dirhi', 'nationhwysys',
       'streettype', 'pvmtcondindx1', 'pvmtcondindx2', 'tranclass',
       'trandescript', 'slope_pct', 'pvmtcategory', 'parkboulevard',
       'shape_length'],
      dtype='object')

In [7]:
df_intersections.columns

Index(['x', 'y', 'objectid', 'intr_id', 'gis_xcoord', 'gis_ycoord', 'compkey',
       'comptype', 'unitid', 'subarea', 'unitdesc', 'arterialclasscd',
       'signal_maint_dist', 'signal_type', 'shape_lng', 'shape_lat'],
      dtype='object')

In [8]:
df_snd.columns

Index(['objectid', 'f_intr_id', 't_intr_id', 'snd_id', 'snd_feacode',
       'citycode', 'stname_id', 'st_code', 'arterial_code', 'segment_type',
       'agency_code', 'access_code', 'divided_code', 'structure_type',
       'legalloc_code', 'vehicle_use_code', 'gis_seg_length', 'l_adrs_from',
       'l_adrs_to', 'r_adrs_from', 'r_adrs_to', 'ord_pre_dir',
       'ord_street_name', 'ord_street_type', 'ord_suf_dir',
       'ord_stname_concat', 'l_city', 'l_state', 'l_zip', 'r_city', 'r_state',
       'r_zip', 'sndseg_update', 'compkey', 'comptype', 'unitid', 'unitid2',
       'shape_length'],
      dtype='object')

In [9]:
df_collisions.columns

Index(['Unnamed: 0', 'x', 'y', 'objectid', 'addrtype', 'intkey', 'location',
       'severitycode', 'severitydesc', 'collisiontype', 'personcount',
       'pedcount', 'pedcylcount', 'vehcount', 'injuries', 'seriousinjuries',
       'fatalities', 'incdate', 'incdttm', 'junctiontype', 'sdot_colcode',
       'sdot_coldesc', 'inattentionind', 'underinfl', 'weather', 'roadcond',
       'lightcond', 'pedrownotgrnt', 'sdotcolnum', 'speeding', 'st_colcode',
       'st_coldesc', 'hitparkedcar', 'fe_exists', 'time', 'total_injuries',
       'total_person_count', 'fe_emd', 'cluster', 'census_area',
       'neighborhood', 'city'],
      dtype='object')

## Explore NULLs

In [10]:
df_streets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23806 entries, 0 to 23805
Data columns (total 38 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   objectid       23806 non-null  int64  
 1   artclass       23800 non-null  float64
 2   compkey        23806 non-null  int64  
 3   unitid         23800 non-null  float64
 4   unitid2        23800 non-null  float64
 5   unitidsort     23800 non-null  float64
 6   unitdesc       23800 non-null  object 
 7   stname_ord     23806 non-null  object 
 8   xstrlo         23800 non-null  object 
 9   xstrhi         23800 non-null  object 
 10  artdescript    23800 non-null  object 
 11  owner          23800 non-null  object 
 12  status         23800 non-null  object 
 13  blocknbr       23800 non-null  float64
 14  speedlimit     23799 non-null  float64
 15  segdir         23800 non-null  object 
 16  oneway         23795 non-null  object 
 17  onewaydir      23800 non-null  object 
 18  flow  

## Explore how to join STREETS and FLOW data

- Determine if df_snd should be used
- Determine if df_intersections should also be joined
- Determine how to join *STREETS* and *FLOW*
    Check that the 'compkey' field makes sense for the join.

In [11]:
# Overview of relative sizes
print(f"Shape of df_snd: {df_snd.shape}")
print(f"Shape of df_streets: {df_streets.shape}")
print(f"Shape of df_intersections: {df_intersections.shape}")
print(f"Shape of df_flow: {df_flow.shape}")
print(f"Shape of df_collisions: {df_collisions.shape}")

Shape of df_snd: (34179, 38)
Shape of df_streets: (23806, 38)
Shape of df_intersections: (15441, 16)
Shape of df_flow: (6226, 14)
Shape of df_collisions: (220436, 42)


In [12]:
# Prepare to explore compkey overlaps
compkey_df_streets = list(df_streets['compkey'].unique())
compkey_df_flow    = list(df_flow['compkey'].unique())
compkey_df_intersections     = list(df_intersections['compkey'].unique())
compkey_df_snd     = list(df_snd['compkey'].unique())

dict_compkeys = {
    'df_streets':compkey_df_streets,
    'df_flow':compkey_df_flow,
    'df_intersections':compkey_df_intersections,
    'df_snd'    :compkey_df_snd
}


for k,v in dict_compkeys.items():
    print(f"{k} \t count {len(v)}\trange is {min(v)}, {max(v)}")

df_streets 	 count 23806	range is 1000, 765446
df_flow 	 count 6226	range is 1000, 729817
df_intersections 	 count 15441	range is 23806, 765443
df_snd 	 count 23807	range is 0, 765446


In [13]:
# Any overlap between streets and intersections? (No)
streets_and_intersections = [x for x in compkey_df_streets if x in compkey_df_intersections]
len(streets_and_intersections)

0

In [14]:
# Overlap between streets and snd?
len([x for x in compkey_df_streets if x in compkey_df_snd])
# this is exactly the count of unique streets, so it looks like all of the streets are in the df_snd data

23806

In [15]:
# Overlap between intersections and snd?
len([x for x in compkey_df_intersections if x in compkey_df_snd])
# it looks like NONE of the intersections are included in the df_snd data

0

In [16]:
# One compkey is in df_snd and not in df_streets
one_key = [x for x in compkey_df_snd if x not in compkey_df_streets][0]   # it's actually a 0 ...basically a NULL
df_snd[df_snd['compkey']==one_key].shape                                  # there are 9698 of them!

(9698, 38)

In [17]:
# Which locations are in the SND dataset where there are no compkeys?
print(df_snd.loc[df_snd['compkey']==one_key,'ord_stname_concat'].nunique())
print(df_snd.loc[df_snd['compkey']==one_key,'ord_stname_concat'].value_counts().head(4))
print(df_snd.loc[df_snd['compkey']==one_key,'ord_stname_concat'].value_counts().tail(4))

# This exploration is inconclusive; it's difficult to tell if using the SND dataset instead of streets would
# provide value. 

2026
FIRST HILL STREETCAR    86
NP RR                   75
8TH AVE SW              58
4TH AVE SW              55
Name: ord_stname_concat, dtype: int64
S BRADFORD ST WKWY    1
SW 107TH PL           1
5TH CT NW             1
NW 179TH PL           1
Name: ord_stname_concat, dtype: int64


**Decision**: For now, ignore the df_snd data.

In [18]:
# Would including INTERSECTIONS provide data for the collisions?
list_intersection_loc = list(df_intersections['unitdesc'].unique().astype(str))
list_collision_loc    = list(df_collisions['location'].unique().astype(str))

In [19]:
intersect_collisions = [x for x in list_intersection_loc if x in list_collision_loc]   
print(len(intersect_collisions)) # answer is yes, for 7837 intersections
intersect_collisions[:4]

7837


['WILSON AVE S AND S UPLAND RD',
 '3RD AVE NW AND NW 77TH ST',
 'RAVENNA AVE NE AND NE 92ND ST',
 '42ND AVE S AND S OTHELLO ST']

In [20]:
# INTERSECTIONS has used the X,Y data already. What else could be valuable?
df_intersections.head(3)
# It doesn't look like there are not other valuable pieces of information in this dataset

Unnamed: 0,x,y,objectid,intr_id,gis_xcoord,gis_ycoord,compkey,comptype,unitid,subarea,unitdesc,arterialclasscd,signal_maint_dist,signal_type,shape_lng,shape_lat
0,1270709.0,194387.955195,1,18213,1270709.0,194387.95532,340313,13,78852,GRDWM,4TH AVE S AND S HENDERSON N ST,0.0,,NONE,-122.329732,47.523051
1,1282582.0,234414.695012,2,10302,1282582.0,234414.69508,157936,13,32854,E,WOODROW PL E AND E GARFIELD ST,0.0,,NONE,-122.284745,47.633387
2,1261648.0,256226.49721,3,4716,1261648.0,256226.49721,37264,13,231740,BLRD,12TH AVE NW AND NW 87TH ST,0.0,,NONE,-122.371401,47.692058


**Decision**: Ignore the df_intersections data (no join).

## Plan for Join of STREETS and FLOW only

- Try using streets key to access data from flow; check that STNAME_ORD matches
- Try using STNAME_ORD if first plan fails


### Quick view of key info

In [21]:
df_flow.head(3)

Unnamed: 0,objectid,stname_ord,flowsegid,downtown,start_date,ampk,pmpk,awdt,adt,awdt_rounded,dataquality,flags,shape_length,compkey
0,1,PINE ST,894,Y,1970/01/01 00:00:00+00,,,8000.0,,8000,Estimate,,322.037238,12221
1,2,15TH AVE W ON RP,1345,N,2015/03/13 00:00:00+00,,,11129.0,10139.0,11100,Study - 13-15,,173.612269,2203
2,3,NE 65TH ST,1622,N,2015/02/27 00:00:00+00,,,20487.0,19740.0,20500,Study - 13-15,,1445.416389,17275


In [22]:
df_streets[['compkey','unitdesc','stname_ord','slope_pct','trandescript']].head()

Unnamed: 0,compkey,unitdesc,stname_ord,slope_pct,trandescript
0,1006,1ST AVE BETWEEN SENECA ST AND UNIVERSITY ST,1ST AVE,4.0,PRINCIPAL TRANSIT ROUTE
1,1009,1ST AVE BETWEEN PIKE ST AND PINE ST,1ST AVE,5.0,PRINCIPAL TRANSIT ROUTE
2,1032,1ST AVE N BETWEEN VALLEY UPPER ST AND ALOHA ST,1ST AVE N,17.0,NOT DESIGNATED
3,1051,1ST AVE N BETWEEN LYNN ST AND MCGRAW S ST,1ST AVE N,3.0,NOT DESIGNATED
4,1060,1ST AVE N BETWEEN FULTON S ST AND FULTON N ST,1ST AVE N,5.0,NOT DESIGNATED


In [23]:
df_streets['compkey']

0         1006
1         1009
2         1032
3         1051
4         1060
         ...  
23801    20361
23802    18211
23803    10946
23804    17470
23805    18073
Name: compkey, Length: 23806, dtype: int64

In [24]:
# Manually find streets compkeys to figure out the pattern
df_flow.loc[df_flow['compkey']==1009,:]

# Issues to handle:
#   null compkey
#   string matches on longer numbers

Unnamed: 0,objectid,stname_ord,flowsegid,downtown,start_date,ampk,pmpk,awdt,adt,awdt_rounded,dataquality,flags,shape_length,compkey
5203,1583,1ST AVE,796,Y,2017/08/30 09:15:00+00,1238.0,1431.0,17839.0,17413.0,17800,Study,,426.031498,1009


## Make the JOIN

In [25]:
df_street_flow = pd.merge(df_streets, df_flow, how='left', on='compkey',# left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_str', '_flo'), copy=True, indicator=False,
         validate=None)

In [26]:
df_street_flow.shape
# how = 'inner' --> (6225, 51)
# how = 'left'  --> (23806, 51)

(23806, 51)

In [27]:
df_street_flow[['compkey','unitdesc','stname_ord_str','stname_ord_flo','slope_pct']].head(4)

Unnamed: 0,compkey,unitdesc,stname_ord_str,stname_ord_flo,slope_pct
0,1000,1ST AVE BETWEEN YESLER WAY AND CHERRY ST,1ST AVE,1ST AVE,2.0
1,1001,1ST AVE BETWEEN CHERRY ST AND COLUMBIA ST,1ST AVE,1ST AVE,0.0
2,1002,1ST AVE BETWEEN COLUMBIA ST AND MARION ST,1ST AVE,1ST AVE,1.0
3,1003,1ST AVE BETWEEN MARION ST AND MADISON ST,1ST AVE,1ST AVE,1.0


In [28]:
# check for mismatched stname_ord
df_street_flow.loc[(df_street_flow['stname_ord_str'] != df_street_flow['stname_ord_flo']),
                   ['compkey','unitdesc','stname_ord_str','stname_ord_flo','slope_pct']]

# slope_pct is from STREETS; there are some null values

Unnamed: 0,compkey,unitdesc,stname_ord_str,stname_ord_flo,slope_pct
31,1031,1ST AVE N BETWEEN ROY ST AND VALLEY ST,1ST AVE N,,10.0
32,1032,1ST AVE N BETWEEN VALLEY UPPER ST AND ALOHA ST,1ST AVE N,,17.0
33,1033,1ST AVE N BETWEEN ALOHA ST AND WARD ST,1ST AVE N,,10.0
34,1034,1ST AVE N BETWEEN WARD ST AND PROSPECT S ST,1ST AVE N,,18.0
35,1035,1ST AVE N BETWEEN PROSPECT S ST AND PROSPECT N ST,1ST AVE N,,12.0
...,...,...,...,...,...
23801,761790,23RD AVE SW BETWEEN 22ND N AVE SW AND 22ND S A...,23RD AVE SW,,
23802,764415,WESTLAKE AVE N BETWEEN 9TH AVE N AND ALOHA ST,WESTLAKE AVE N,,
23803,765444,25TH AVE S BETWEEN DEAD END 2 AND S LANDER ST,25TH AVE S,,
23804,765445,S LANDER ST BETWEEN DEAD END 2 AND 25TH AVE S,S LANDER ST,,


## Export the STREET + FLOW file

In [29]:
df_street_flow.to_csv('../data/combined_streets_2018_flow.csv', index=False)

## How to Use Exported File

In [30]:
# Read in both files
df_streetflow = pd.read_csv('../data/combined_streets_2018_flow.csv')
df_collisions = pd.read_csv('../data/collisions_clean.csv')

In [31]:
df_streetflow.head(3)

Unnamed: 0,objectid_str,artclass,compkey,unitid,unitid2,unitidsort,unitdesc,stname_ord_str,xstrlo,xstrhi,artdescript,owner,status,blocknbr,speedlimit,segdir,oneway,onewaydir,flow,seglength,surfacewidth,surfacetype_1,surfacetype_2,intrlo,dirlo,intkeylo,intrhi,dirhi,nationhwysys,streettype,pvmtcondindx1,pvmtcondindx2,tranclass,trandescript,slope_pct,pvmtcategory,parkboulevard,shape_length_str,objectid_flo,stname_ord_flo,flowsegid,downtown,start_date,ampk,pmpk,awdt,adt,awdt_rounded,dataquality,flags,shape_length_flo
0,5010,2.0,1000,10.0,60.0,100060.0,1ST AVE BETWEEN YESLER WAY AND CHERRY ST,1ST AVE,YESLER WAY,CHERRY ST,Minor Arterial,,INSVC,600.0,25.0,N,N,,,311.0,44.0,AC/PCC,PCC,1ST AVE AND YESLER WAY,N,30357.0,1ST AVE AND CHERRY ST,S,N,Downtown Neighborhood,26.0,49.0,1,PRINCIPAL TRANSIT ROUTE,2.0,ART,N,311.239073,1627.0,1ST AVE,623.0,Y,2017/10/02 11:45:00+00,1165.0,927.0,12475.0,11333.0,12500.0,Study,,311.238965
1,15317,2.0,1001,10.0,70.0,100070.0,1ST AVE BETWEEN CHERRY ST AND COLUMBIA ST,1ST AVE,CHERRY ST,COLUMBIA ST,Minor Arterial,,INSVC,700.0,25.0,NW,N,,,306.0,52.0,AC/PCC,,1ST AVE AND CHERRY ST,NW,30354.0,1ST AVE AND COLUMBIA ST,SE,N,Downtown Neighborhood,16.0,0.0,1,PRINCIPAL TRANSIT ROUTE,0.0,ART,N,306.062269,1335.0,1ST AVE,637.0,Y,2017/10/02 11:30:00+00,1313.0,886.0,13054.0,13050.0,13100.0,Study,,306.062164
2,5011,2.0,1002,10.0,80.0,100080.0,1ST AVE BETWEEN COLUMBIA ST AND MARION ST,1ST AVE,COLUMBIA ST,MARION ST,Minor Arterial,,INSVC,800.0,25.0,NW,N,,,306.0,52.0,AC/PCC,,1ST AVE AND COLUMBIA ST,NW,30348.0,1ST AVE AND MARION ST,SE,N,Downtown Neighborhood,19.0,0.0,1,PRINCIPAL TRANSIT ROUTE,1.0,ART,N,305.996038,1514.0,1ST AVE,651.0,Y,2017/08/30 09:30:00+00,1065.0,849.0,12409.0,11444.0,12400.0,Study,,305.996222


**JOIN** the collisions data with the streetflow data based on the location string

In [32]:
# Do a left merge so that no collisions are dropped
df_collisions_extended = pd.merge(df_collisions, df_streetflow, how='left', #on='compkey',
         left_on='location',     # collisions data uses 'location'
         right_on='unitdesc',    # streets+flow data uses 'unitdesc'
         left_index=False, right_index=False, sort=True,
         suffixes=('_col', '_flo'), copy=True, indicator=False,
         validate=None)

In [33]:
df_collisions_extended.head(2)

Unnamed: 0.1,Unnamed: 0,x,y,objectid,addrtype,intkey,location,severitycode,severitydesc,collisiontype,personcount,pedcount,pedcylcount,vehcount,injuries,seriousinjuries,fatalities,incdate,incdttm,junctiontype,sdot_colcode,sdot_coldesc,inattentionind,underinfl,weather,roadcond,lightcond,pedrownotgrnt,sdotcolnum,speeding,st_colcode,st_coldesc,hitparkedcar,fe_exists,time,total_injuries,total_person_count,fe_emd,cluster,census_area,neighborhood,city,objectid_str,artclass,compkey,unitid,unitid2,unitidsort,unitdesc,stname_ord_str,xstrlo,xstrhi,artdescript,owner,status,blocknbr,speedlimit,segdir,oneway,onewaydir,flow,seglength,surfacewidth,surfacetype_1,surfacetype_2,intrlo,dirlo,intkeylo,intrhi,dirhi,nationhwysys,streettype,pvmtcondindx1,pvmtcondindx2,tranclass,trandescript,slope_pct,pvmtcategory,parkboulevard,shape_length_str,objectid_flo,stname_ord_flo,flowsegid,downtown,start_date,ampk,pmpk,awdt,adt,awdt_rounded,dataquality,flags,shape_length_flo
0,153046,-122.319411,47.60436,153047,Intersection,29828.0,10TH AVE AND E ALDER ST,1,Property Damage Only Collision,Parked Car,2,0,0,2,0,0,0,2014-11-02 00:00:00,2014-11-02 17:00:00,At Intersection (intersection related),0.0,NOT ENOUGH INFORMATION / NOT APPLICABLE,,N,Clear,Clear,Unknown,N,,Unknown,32,One parked--one moving,N,1,17:00,0,2,False,2,Census Tract 86,Central District,seattle,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,36839,-122.319416,47.606209,36840,Intersection,29811.0,10TH AVE AND E JEFFERSON ST,1,Property Damage Only Collision,Unknown,1,0,0,1,0,0,0,2006-02-14 00:00:00,2006-02-14 02:13:00,At Intersection (but not related to intersection),28.0,MOTOR VEHICLE RAN OFF ROAD - HIT FIXED OBJECT,,Y,Clear,Dry,Dark - Street Lights On,N,6045010.0,Unknown,50,Fixed object,N,1,02:13,0,1,False,3,Census Tract 86,Central District,seattle,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [34]:
# END OF EXAMPLE
# these next 2 lines are for me to debug the output
df_collisions_extended.drop(columns='Unnamed: 0', inplace=True)
df_collisions_extended.head(2)

Unnamed: 0,x,y,objectid,addrtype,intkey,location,severitycode,severitydesc,collisiontype,personcount,pedcount,pedcylcount,vehcount,injuries,seriousinjuries,fatalities,incdate,incdttm,junctiontype,sdot_colcode,sdot_coldesc,inattentionind,underinfl,weather,roadcond,lightcond,pedrownotgrnt,sdotcolnum,speeding,st_colcode,st_coldesc,hitparkedcar,fe_exists,time,total_injuries,total_person_count,fe_emd,cluster,census_area,neighborhood,city,objectid_str,artclass,compkey,unitid,unitid2,unitidsort,unitdesc,stname_ord_str,xstrlo,xstrhi,artdescript,owner,status,blocknbr,speedlimit,segdir,oneway,onewaydir,flow,seglength,surfacewidth,surfacetype_1,surfacetype_2,intrlo,dirlo,intkeylo,intrhi,dirhi,nationhwysys,streettype,pvmtcondindx1,pvmtcondindx2,tranclass,trandescript,slope_pct,pvmtcategory,parkboulevard,shape_length_str,objectid_flo,stname_ord_flo,flowsegid,downtown,start_date,ampk,pmpk,awdt,adt,awdt_rounded,dataquality,flags,shape_length_flo
0,-122.319411,47.60436,153047,Intersection,29828.0,10TH AVE AND E ALDER ST,1,Property Damage Only Collision,Parked Car,2,0,0,2,0,0,0,2014-11-02 00:00:00,2014-11-02 17:00:00,At Intersection (intersection related),0.0,NOT ENOUGH INFORMATION / NOT APPLICABLE,,N,Clear,Clear,Unknown,N,,Unknown,32,One parked--one moving,N,1,17:00,0,2,False,2,Census Tract 86,Central District,seattle,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,-122.319416,47.606209,36840,Intersection,29811.0,10TH AVE AND E JEFFERSON ST,1,Property Damage Only Collision,Unknown,1,0,0,1,0,0,0,2006-02-14 00:00:00,2006-02-14 02:13:00,At Intersection (but not related to intersection),28.0,MOTOR VEHICLE RAN OFF ROAD - HIT FIXED OBJECT,,Y,Clear,Dry,Dark - Street Lights On,N,6045010.0,Unknown,50,Fixed object,N,1,02:13,0,1,False,3,Census Tract 86,Central District,seattle,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [35]:
df_collisions_extended.columns

Index(['x', 'y', 'objectid', 'addrtype', 'intkey', 'location', 'severitycode',
       'severitydesc', 'collisiontype', 'personcount', 'pedcount',
       'pedcylcount', 'vehcount', 'injuries', 'seriousinjuries', 'fatalities',
       'incdate', 'incdttm', 'junctiontype', 'sdot_colcode', 'sdot_coldesc',
       'inattentionind', 'underinfl', 'weather', 'roadcond', 'lightcond',
       'pedrownotgrnt', 'sdotcolnum', 'speeding', 'st_colcode', 'st_coldesc',
       'hitparkedcar', 'fe_exists', 'time', 'total_injuries',
       'total_person_count', 'fe_emd', 'cluster', 'census_area',
       'neighborhood', 'city', 'objectid_str', 'artclass', 'compkey', 'unitid',
       'unitid2', 'unitidsort', 'unitdesc', 'stname_ord_str', 'xstrlo',
       'xstrhi', 'artdescript', 'owner', 'status', 'blocknbr', 'speedlimit',
       'segdir', 'oneway', 'onewaydir', 'flow', 'seglength', 'surfacewidth',
       'surfacetype_1', 'surfacetype_2', 'intrlo', 'dirlo', 'intkeylo',
       'intrhi', 'dirhi', 'nationhwysys',

In [36]:
df_collisions_extended.to_csv('../data/collisions_extended.csv',index=False)