## Data stations ordered

### Purpose & Motivation

The purpose of notebook is to fetch elevation data and add it into the metadata file. Once elevation is obtained calculate the slope for each station.

### Direction from Advisor

Do not continue to investigate the potential of misclassified stations or reason for stations being abnormally close together on the ML.

### Tasks/Questions to Answer
#### Questions to Answer

What is the elevation for each station?
What is the slop for each station?

#### Tasks
* Use Google API to get the elevation for each station.
* Use Google API to get a path of the elevation for each station to calculate the slope.
* Generate a geojson file for use in visualization work (241) which is not included in this repo but is still an example for future work.

### Results
Elevation data was obtained, but slope was abandoned after the first pass didn't work.

### Conclusions

Using Google's API was pretty easy 



In [1]:
from os.path import expanduser
import pandas as pd
import simplejson
import numpy as np
import urllib
import json

In [2]:
# retrieve the google api key, for a new user you'll have to follow google's instructions to get your key
home = expanduser("~")
with open(home + '/.googleapi') as infile:
    apikey = infile.next().strip()

In [3]:
!ls ../data/meta_2015.hdf

../data/meta_2015.hdf


# Load Metadata

In [4]:
# read in the metadata file
meta_hdf_path = '../data/meta_2015.hdf'
meta_frame = pd.read_hdf(meta_hdf_path, 'meta_2015')
meta_frame.head()

Unnamed: 0,ID,Fwy,Dir,District,County,City,State_PM,Abs_PM,Latitude,Longitude,Length,Type,Lanes,Name,User_ID_1,User_ID_2,User_ID_3,User_ID_4,file_date
584,1113072,56,W,11,73,66000.0,7.383,7.885,32.955202,-117.124689,0.452,ML,2,Black Mountain Rd,314,,,,2015_01_01
585,1113073,56,W,11,73,66000.0,7.383,7.885,32.955202,-117.124689,,OR,3,BLACK MOUNTAIN RD,314,,,,2015_01_01
665,1113680,56,E,11,73,66000.0,6.862,7.364,32.953394,-117.133404,0.999,ML,3,BLACK MOUNTAIN RD,434,,,,2015_01_01
666,1113683,56,W,11,73,66000.0,7.383,7.885,32.955202,-117.124689,,FR,2,BLK MOUNTAIN - WB 56,314,,,,2015_01_01
1034,1119041,125,N,11,73,,1.433,3.282,32.608637,-116.967247,0.792,ML,3,1 MI S/O BIRCH RD,19106,,,,2015_01_01


## Missing Data
Not all of the metadata has a valid lat / long. First we'll get the elevation for the known lat/long and then we'll manually fix the missing records.

In [5]:
meta_frame.count()

ID           1541
Fwy          1541
Dir          1541
District     1541
County       1541
City         1318
State_PM     1541
Abs_PM       1541
Latitude     1539
Longitude    1539
Length        932
Type         1541
Lanes        1541
Name         1541
User_ID_1    1541
User_ID_2       0
User_ID_3       0
User_ID_4       0
file_date    1541
dtype: int64

In [6]:
meta_frame.Type.unique()

array(['ML', 'OR', 'FR', 'FF', 'HV', 'CH', 'CD'], dtype=object)

In [7]:
meta_frame[meta_frame.Latitude.isnull()]

Unnamed: 0,ID,Fwy,Dir,District,County,City,State_PM,Abs_PM,Latitude,Longitude,Length,Type,Lanes,Name,User_ID_1,User_ID_2,User_ID_3,User_ID_4,file_date
767,1114649,805,S,11,73,66000.0,28.811,28.662,,,0.651,ML,4,S/B AT JCT I-5,4045,,,,2015_12_17
1324,1125383,52,W,11,73,70224.0,14.756,14.756,,,0.695,ML,2,52 WB from 125 Conn,43812,,,,2015_12_17


In [8]:
meta_frame_no_nan = meta_frame.dropna(axis='index', subset=['Latitude', 'Longitude'])
meta_frame_no_nan.index = np.arange(0, len(meta_frame_no_nan))
meta_frame_no_nan['lat_lon'] = meta_frame_no_nan['Latitude'].map(str) + ',' + meta_frame_no_nan['Longitude'].map(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [9]:
# spot check
meta_frame_no_nan[meta_frame_no_nan.ID == 1114649]

Unnamed: 0,ID,Fwy,Dir,District,County,City,State_PM,Abs_PM,Latitude,Longitude,Length,Type,Lanes,Name,User_ID_1,User_ID_2,User_ID_3,User_ID_4,file_date,lat_lon


In [10]:
meta_frame_no_nan.count()

ID           1539
Fwy          1539
Dir          1539
District     1539
County       1539
City         1316
State_PM     1539
Abs_PM       1539
Latitude     1539
Longitude    1539
Length        930
Type         1539
Lanes        1539
Name         1539
User_ID_1    1539
User_ID_2       0
User_ID_3       0
User_ID_4       0
file_date    1539
lat_lon      1539
dtype: int64

## Get Elevation from Google API

In [11]:
ELEVATION_BASE_URL = 'https://maps.googleapis.com/maps/api/elevation/json'
def getElevation(lat_lon_list=None, **elvtn_args):
    """
    This function will get the elevation from the google api
    """
    result = None
    elvtn_args.update({
        'locations': "|".join(lat_lon_list),
        'key': apikey
    })

    url = ELEVATION_BASE_URL + '?' + urllib.urlencode(elvtn_args)
    response = simplejson.load(urllib.urlopen(url))

    if 'status' not in response or response['status'] != 'OK':
        print "bad response"
    else:
        result = [item['elevation'] for item in response['results']]
    return result

In [12]:
# chunk the data into parts of 50 and fetch data from Google API
id_master = []
elevation_master = []

chunk = 50
for item in range(0, 31):
    start = item*chunk
    end = (item+1)*chunk
    
    id_master += list(meta_frame_no_nan['ID'][start:end].values)
    elevation_master += getElevation(list(meta_frame_no_nan['lat_lon'][start:end]))
    
print len(id_master)
print len(elevation_master)

1539
1539


In [13]:
elev_df = pd.DataFrame({'ID': id_master, 'elevation': elevation_master})

In [14]:
meta_frame.columns

Index([u'ID', u'Fwy', u'Dir', u'District', u'County', u'City', u'State_PM',
       u'Abs_PM', u'Latitude', u'Longitude', u'Length', u'Type', u'Lanes',
       u'Name', u'User_ID_1', u'User_ID_2', u'User_ID_3', u'User_ID_4',
       u'file_date'],
      dtype='object')

In [15]:
elev_df.columns

Index([u'ID', u'elevation'], dtype='object')

In [16]:
new_meta_frame = pd.merge(meta_frame, elev_df, how='left')

In [17]:
# find bad data...
new_meta_frame[new_meta_frame.elevation.isnull()]

Unnamed: 0,ID,Fwy,Dir,District,County,City,State_PM,Abs_PM,Latitude,Longitude,Length,Type,Lanes,Name,User_ID_1,User_ID_2,User_ID_3,User_ID_4,file_date,elevation
884,1114649,805,S,11,73,66000.0,28.811,28.662,,,0.651,ML,4,S/B AT JCT I-5,4045,,,,2015_12_17,
1441,1125383,52,W,11,73,70224.0,14.756,14.756,,,0.695,ML,2,52 WB from 125 Conn,43812,,,,2015_12_17,


## Fix the records with Missing Lat/Long

In [18]:
# Fix bad data via manual lookup
new_meta_frame.loc[884, 'Latitude'] = 32.966531
new_meta_frame.loc[884, 'Longitude'] = -117.2255
new_meta_frame.iloc[884]

ID                  1114649
Fwy                     805
Dir                       S
District                 11
County                   73
City                  66000
State_PM             28.811
Abs_PM               28.662
Latitude            32.9665
Longitude          -117.225
Length                0.651
Type                     ML
Lanes                     4
Name         S/B AT JCT I-5
User_ID_1              4045
User_ID_2               NaN
User_ID_3               NaN
User_ID_4               NaN
file_date        2015_12_17
elevation               NaN
Name: 884, dtype: object

In [19]:
new_meta_frame.loc[1441, 'Latitude'] = 32.836534
new_meta_frame.loc[1441, 'Longitude'] = -117.00755
new_meta_frame.iloc[1441]

ID                       1125383
Fwy                           52
Dir                            W
District                      11
County                        73
City                       70224
State_PM                  14.756
Abs_PM                    14.756
Latitude                 32.8365
Longitude               -117.008
Length                     0.695
Type                          ML
Lanes                          2
Name         52 WB from 125 Conn
User_ID_1                  43812
User_ID_2                    NaN
User_ID_3                    NaN
User_ID_4                    NaN
file_date             2015_12_17
elevation                    NaN
Name: 1441, dtype: object

In [20]:
result_884 = getElevation(['32.9665,-117.2255'])
result_1441 = getElevation(['32.8365,-117.00755'])

In [21]:
new_meta_frame.loc[884, 'elevation'] = result_884[0]
new_meta_frame.loc[1441, 'elevation'] = result_1441[0]

In [16]:
# Note: all elevations are in meters.
new_meta_frame.to_csv('../data/meta_2015_with_elev.csv')

# Calculate slope between stations

In [22]:
new_meta_frame = pd.read_csv('../data/meta_2015_with_elev.csv')

In [23]:
new_meta_frame.columns

Index([u'Unnamed: 0', u'Unnamed: 0.1', u'Unnamed: 0.1.1', u'ID', u'Fwy',
       u'Dir', u'District', u'County', u'City', u'State_PM', u'Abs_PM',
       u'Latitude', u'Longitude', u'Length', u'Type', u'Lanes', u'Name',
       u'User_ID_1', u'User_ID_2', u'User_ID_3', u'User_ID_4', u'file_date',
       u'elevation'],
      dtype='object')

In [24]:
new_meta_frame.Type.unique()

array(['ML', 'OR', 'FR', 'FF', 'HV', 'CH', 'CD'], dtype=object)

In [25]:
def create_freeway_vectors(frame_to_use, columns_to_select=['ID', 'Latitude', 'Longitude', 'Abs_PM', 'Lanes']):
    """
    Create a vector for the ML stations of a particular freeway
    """
    to_loop = new_meta_frame.groupby(['Fwy', 'Dir'])['ID'].count().reset_index()[['Fwy', 'Dir']].values
    
    ret = {}
    for Fwy, Dir in to_loop:
        sort_order = ('Abs_PM', True)                    
        tmp = frame_to_use[(frame_to_use.Fwy == Fwy) & (frame_to_use.Dir == Dir)
                                                  & (frame_to_use.Type == 'ML')]\
            .sort_values(by=sort_order[0], ascending=sort_order[1])[columns_to_select].drop_duplicates()
        tmp['order'] = pd.Series(index=tmp.index, data=sorted(range(0, len(tmp.ID)), reverse=(not sort_order[1])))
        ret["%s_%s" % (Fwy, Dir)] = tmp
    return ret 

In [26]:
freeway_vectors = create_freeway_vectors(new_meta_frame)

In [27]:
len(freeway_vectors)

26

In [28]:
freeway_vectors

{'125_N':           ID   Latitude   Longitude  Abs_PM  Lanes  order
 18   1119021  32.595673 -116.964628   2.344      2      0
 4    1119041  32.608637 -116.967247   3.282      3      1
 19   1119050  32.617282 -116.971114   3.928      3      2
 6    1119059  32.619301 -116.971171   4.067      2      3
 23   1119075  32.623484 -116.971141   4.356      2      4
 25   1119085  32.625784 -116.971135   4.515      3      5
 9    1119094  32.629142 -116.971145   4.747      2      6
 28   1119102  32.634145 -116.971152   5.092      2      7
 11   1119110  32.641088 -116.970426   5.575      2      8
 32   1119126  32.649208 -116.970687   6.136      2      9
 34   1119135  32.656460 -116.972951   6.657      2     10
 36   1119144  32.658804 -116.975168   6.865      2     11
 38   1119162  32.666777 -116.984009   7.634      2     12
 40   1119171  32.671414 -116.984862   7.958      2     13
 30   1119118  32.675550 -116.985470   8.246      2     14
 45   1119205  32.684722 -117.009803  10.142   

In [29]:
print freeway_vectors['15_N'].head()

           ID   Latitude   Longitude  Abs_PM  Lanes  order
1138  1119689  32.701162 -117.120639   0.734      4      0
1139  1119694  32.705882 -117.120433   1.066      3      1
84    1122948  32.715149 -117.117717   1.713      2      2
82    1122942  32.716837 -117.117751   1.829      3      3
1105  1118886  32.723255 -117.115308   2.302      3      4


In [30]:
ELEVATION_BASE_URL = 'https://maps.googleapis.com/maps/api/elevation/json'
def getElevationPath(lat_lon_list=None, samples=10):
    """
    This function will get the elevation from the google api
    """
    result = None
    elvtn_args = {
        'path': "|".join(lat_lon_list),
        'samples': samples,
        'key': apikey
    }

    url = ELEVATION_BASE_URL + '?' + urllib.urlencode(elvtn_args)
    response = simplejson.load(urllib.urlopen(url))

    if 'status' not in response or response['status'] != 'OK':
        print "bad response"
        print "response: %s" % response
    else:
        result = [item['elevation'] for item in response['results']]
    return result

In [31]:
# Test the function
result = getElevationPath(['32.834748,-117.003404', '32.809436,-117.005295'], samples=18)
result

[110.7438354492188,
 113.268913269043,
 125.765510559082,
 132.7821807861328,
 141.4222564697266,
 151.6808624267578,
 158.3583221435547,
 170.2845611572266,
 167.3048400878906,
 170.2570190429688,
 226.43896484375,
 216.9059600830078,
 224.97607421875,
 223.4471282958984,
 214.2090148925781,
 217.1637115478516,
 216.4353179931641,
 217.5686340332031]

In [32]:
# loop through all freeways and get the results
# calculations will be done afterwards to debug while not pulling from google's api
raw_results = {}
counter = 0
for key in freeway_vectors:
    print "on freeway: %s" % key
    if counter % 3:
        print "counter: %s" % counter
    frame = freeway_vectors[key]
    count = frame.ID.count()
    for index in range(1, count):
        distance = abs(frame.iloc[index-1].Abs_PM - frame.iloc[index].Abs_PM)

        # get enough samples for .1 resolution
        samples = int(round(distance, 1)*10)
        lat1, lon1 = frame.iloc[index-1][['Latitude', 'Longitude']]
        lat_lon1 = '%s,%s' % (lat1, lon1)
        lat2, lon2 = frame.iloc[index][['Latitude', 'Longitude']]
        lat_lon2 = '%s,%s' % (lat2, lon2)
        if samples < 2:
            samples = 2
        result = getElevationPath([lat_lon1, lat_lon2], samples=samples)
        
        start_id = frame.iloc[index-1].ID
        end_id = frame.iloc[index].ID
        if key not in raw_results:
            raw_results[key] = {}
        raw_results[key]['%s_%s' % (start_id, end_id)] = result
    counter += 1

on freeway: 125_S
on freeway: 905_E
counter: 1
on freeway: 54_W
counter: 2
on freeway: 125_N
on freeway: 52_W
counter: 4
on freeway: 67_N
counter: 5
on freeway: 56_E
on freeway: 67_S
counter: 7
on freeway: 56_W
counter: 8
on freeway: 905_W
on freeway: 94_E
counter: 10
on freeway: 52_E
counter: 11
on freeway: 78_W
on freeway: 5_N
counter: 13
on freeway: 15_N
counter: 14
on freeway: 15_S
on freeway: 94_W
counter: 16
on freeway: 78_E
counter: 17
on freeway: 54_E
on freeway: 5_S
counter: 19
on freeway: 8_E
counter: 20
on freeway: 163_N
on freeway: 805_S
counter: 22
on freeway: 8_W
counter: 23
on freeway: 805_N
on freeway: 163_S
counter: 25


In [33]:
# calculate the grade for each station
id_list = []
grade_list = []
for key in freeway_vectors:
    frame = freeway_vectors[key]
    count = frame.ID.count()
    for index in range(1, count+1):
        # absolute value of run in case we walk the freeway backwards
        if index < count:
            run = abs(frame.iloc[index-1].Abs_PM - frame.iloc[index].Abs_PM)
        else:
            run = abs(frame.iloc[index-2].Abs_PM - frame.iloc[index-1].Abs_PM)

        # sometimes there are multiple stations on the same mile marker...
        if run == 0:
            if index < count:
                pad = 0
            else:
                pad = 1
            for i in range(1, 5):
                run = abs(frame.iloc[index-i].Abs_PM - frame.iloc[index-pad].Abs_PM)
                if run != 0:
                    break
#         print "run: %s" % run

        if index < count:
            start_id = frame.iloc[index-1].ID
            end_id = frame.iloc[index].ID
        else:
            start_id = frame.iloc[index-2].ID
            end_id = frame.iloc[index-1].ID
            
        raw_elevations = raw_results[key]['%s_%s' % (start_id, end_id)]
#         print "raw_elevations: %s" % raw_elevations
        
        run_between_points = run/len(raw_elevations)
#         print "run_between_points: %s" % run_between_points

        # use the closest point to calculate the grade
        if index == count:
            # if at the end of the vector, then use the last raw_elevations
            grade = np.arctan((raw_elevations[-1] - raw_elevations[-2])/float(run_between_points))
        else:
            grade = np.arctan((raw_elevations[1] - raw_elevations[0])/float(run_between_points))
        
        id_list.append(start_id)
        grade_list.append(grade)

In [34]:
grade_df = pd.DataFrame({'ID': id_list, 'grade': grade_list})

In [35]:
grade_df[grade_df.grade.isnull()]

Unnamed: 0,ID,grade


In [36]:
new_meta_frame_with_grade = pd.merge(new_meta_frame, grade_df, how='inner', on='ID')

In [38]:
new_meta_frame_with_grade[[u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes', u'Name',
     u'elevation', u'grade']].to_csv('../data/meta_2015_with_elev_and_grade.csv')

# Generate json files for use to convert to Geojson

In [40]:
new_meta_frame_with_grade_ML = pd.read_csv('../data/meta_2015_with_elev_and_grade.csv')
new_meta_frame_with_grade_ML['Type'] = 'ML'
new_meta_frame_with_grade_ML.head()

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
0,0,1113072,56,W,7.885,32.955202,-117.124689,2,Black Mountain Rd,178.243729,1.536084,ML
1,1,1113680,56,E,7.364,32.953394,-117.133404,3,BLACK MOUNTAIN RD,150.936234,1.545684,ML
2,2,1119041,125,N,3.282,32.608637,-116.967247,3,1 MI S/O BIRCH RD,163.3069,1.563082,ML
3,3,1119042,125,S,5.428,32.637029,-116.971446,2,1 MI S/O BIRCH RD,169.974411,1.558263,ML
4,4,1119059,125,N,4.067,32.619301,-116.971171,2,SOUTH SIDE BIRCH RD,180.948593,1.520973,ML


In [41]:
new_meta_frame_with_grade_ML.count()

Unnamed: 0    839
ID            839
Fwy           839
Dir           839
Abs_PM        839
Latitude      839
Longitude     839
Lanes         839
Name          839
elevation     839
grade         839
Type          839
dtype: int64

In [42]:
freeway_vectors_update = create_freeway_vectors(
    new_meta_frame_with_grade_ML, [u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes', u'Name',
     u'elevation'])

In [43]:
def sorted_func(x):
    values = x.split('_')
    Fwy = int(values[0])
    Dir = values[1]
    if Dir == 'N' or Dir == 'E':
        dir_weight = 0
    else:
        dir_weight = 1
    return Fwy + dir_weight

In [44]:
sorted(freeway_vectors_update.keys(), key=sorted_func)

['5_N',
 '5_S',
 '8_E',
 '8_W',
 '15_N',
 '15_S',
 '52_E',
 '52_W',
 '54_E',
 '54_W',
 '56_E',
 '56_W',
 '67_N',
 '67_S',
 '78_E',
 '78_W',
 '94_E',
 '94_W',
 '125_N',
 '125_S',
 '163_N',
 '163_S',
 '805_N',
 '805_S',
 '905_E',
 '905_W']

In [45]:
for freeway_key in freeway_vectors_update:
    data = freeway_vectors_update[freeway_key][[u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', 
                                                u'Lanes', u'Name', u'elevation', 'order']]
#     data['order'] = pd.Series(index=data.index, data=range(1, len(data)+1))
    data[[u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes', u'Name',
          u'elevation', 'order']].to_json('../data/%s_ML_2015.json' % freeway_key, orient='records')

In [46]:
files=!ls ../data/*_ML_2015.json

In [47]:
files

['../data/125_N_ML_2015.json',
 '../data/125_S_ML_2015.json',
 '../data/15_N_ML_2015.json',
 '../data/15_S_ML_2015.json',
 '../data/163_N_ML_2015.json',
 '../data/163_S_ML_2015.json',
 '../data/52_E_ML_2015.json',
 '../data/52_W_ML_2015.json',
 '../data/54_E_ML_2015.json',
 '../data/54_W_ML_2015.json',
 '../data/56_E_ML_2015.json',
 '../data/56_W_ML_2015.json',
 '../data/5_N_ML_2015.json',
 '../data/5_S_ML_2015.json',
 '../data/67_N_ML_2015.json',
 '../data/67_S_ML_2015.json',
 '../data/78_E_ML_2015.json',
 '../data/78_W_ML_2015.json',
 '../data/805_N_ML_2015.json',
 '../data/805_S_ML_2015.json',
 '../data/8_E_ML_2015.json',
 '../data/8_W_ML_2015.json',
 '../data/905_E_ML_2015.json',
 '../data/905_W_ML_2015.json',
 '../data/94_E_ML_2015.json',
 '../data/94_W_ML_2015.json']

In [48]:
test_final = json.load(open('../data/2015_ML_d11_geojson_points.json'))

In [49]:
len(test_final['125_S']['data']['features'])

37

In [51]:
final = {}

for filename in files:
    new_geojson = {'type': 'FeatureCollection', 'features': []}
    data = json.load(open(filename))
    key = '_'.join(filename.split('/')[2].split('_')[0:2])
    print key
    if key == '15_N':
        visible = False
    else:
        visible = False

    print len(data)
    for row in data:
        properties = {'key': key,
                      'ID': row['ID'],
                      'Lanes': row['Lanes'],
                      'Name': row['Name'],
                      'Abs_PM': row['Abs_PM'],
                      'Elevation': row['elevation'],
                      'Order': row['order']}
        geometry = {'type': "Point", "coordinates": [row['Longitude'], row['Latitude']]}
        temp = {'type': 'Feature', 'properties': properties, "geometry": geometry}
        new_geojson['features'].append(temp)

    print "geojson len: %s" % len(new_geojson['features'])
    final[key] = {'visible': visible, 'data': new_geojson}
json.dump(final, open('../data/2015_ML_d11_geojson_points.json', 'w'))

125_N
34
geojson len: 34
125_S
36
geojson len: 36
15_N
71
geojson len: 71
15_S
68
geojson len: 68
163_N
13
geojson len: 13
163_S
15
geojson len: 15
52_E
19
geojson len: 19
52_W
22
geojson len: 22
54_E
2
geojson len: 2
54_W
2
geojson len: 2
56_E
12
geojson len: 12
56_W
11
geojson len: 11
5_N
108
geojson len: 108
5_S
102
geojson len: 102
67_N
0
geojson len: 0
67_S
0
geojson len: 0
78_E
17
geojson len: 17
78_W
24
geojson len: 24
805_N
55
geojson len: 55
805_S
60
geojson len: 60
8_E
43
geojson len: 43
8_W
46
geojson len: 46
905_E
10
geojson len: 10
905_W
10
geojson len: 10
94_E
15
geojson len: 15
94_W
20
geojson len: 20


In [52]:
final = {}
for filename in files:
    data = json.load(open(filename))
    key = '_'.join(filename.split('/')[2].split('_')[0:2])
    print key
    if key == '15_N':
        visible = True
    else:
        visible = False
    final[key] = {'visible': visible, 'data': data}
json.dump(final, open('../data/2015_ML_d11.json', 'w'))

125_N
125_S
15_N
15_S
163_N
163_S
52_E
52_W
54_E
54_W
56_E
56_W
5_N
5_S
67_N
67_S
78_E
78_W
805_N
805_S
8_E
8_W
905_E
905_W
94_E
94_W


In [53]:
new_meta_frame_with_grade_ML[[u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes', u'Name',
     u'elevation', u'grade']].elevation.describe()

count    839.000000
mean      89.920762
std       68.061828
min        3.167338
25%       25.592031
50%       77.214447
75%      141.236954
max      309.354401
Name: elevation, dtype: float64

In [56]:
# loop through all freeways and get the results
# calculations will be done afterwards to debug while not pulling from google's api
raw_results = {}
counter = 0
to_review = {}
for key in freeway_vectors_update:
    print "on freeway: %s" % key
    frame = freeway_vectors_update[key]
    count = frame.ID.count()
    for index in range(1, count):
        distance = abs(frame.iloc[index-1].Abs_PM - frame.iloc[index].Abs_PM)

        if distance < .2:
            if key not in to_review:
                to_review[key] = []
            to_review[key].append(frame.iloc[index-1])
    counter += 1

on freeway: 125_S
on freeway: 905_E
on freeway: 54_W
on freeway: 125_N
on freeway: 52_W
on freeway: 67_N
on freeway: 56_E
on freeway: 67_S
on freeway: 56_W
on freeway: 905_W
on freeway: 94_E
on freeway: 52_E
on freeway: 78_W
on freeway: 5_N
on freeway: 15_N
on freeway: 15_S
on freeway: 94_W
on freeway: 78_E
on freeway: 54_E
on freeway: 5_S
on freeway: 8_E
on freeway: 163_N
on freeway: 805_S
on freeway: 8_W
on freeway: 805_N
on freeway: 163_S


to_review contains the list of stations that need further review to determine if there are data errors.  This is the end point of this analysis due to Advisor direction to not continue any further.

## Highlight stations that seem too close to each other
It's likely that either stations have changed over time and therefore some of these stations might not be active, or perhaps some of the stations are misclassified.  Either way additional research would be needed to determine how accurate each of the stations were over time and to determine which stations should be filtered out. Advisor direction was to not go down this path of exploration / research.

<img src='screenshots/1123134.png'>
1123134 is really close to it's neighbor 1123135 0.001 miles

In [61]:
new_meta_frame_with_grade_ML[(new_meta_frame_with_grade_ML.Dir == 'S') & (new_meta_frame_with_grade_ML.Fwy == 5) &
                             (new_meta_frame_with_grade_ML.Lanes < 4)]

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
410,410,1114660,5,S,30.812,32.907588,-117.226265,3,S/B AT JCT I-805,21.196461,-1.556265,ML
673,673,1120356,5,S,6.627,32.613849,-117.091173,3,SB 5 S/O L St,9.15379,1.436106,ML
769,769,1123072,5,S,30.532,32.903777,-117.224665,2,5 S Bypass S/O 5/805,17.328718,-1.562271,ML
779,779,1123134,5,S,5.324,32.59507,-117.088901,1,Main St to 5 SB,7.251188,1.268062,ML


<img src='screenshots/1116433.png'>
station 1116433 is really close to a neighboring station and is only 1 lane. seems like an incorrect station

In [75]:
new_meta_frame_with_grade_ML[(new_meta_frame_with_grade_ML.Dir == 'W') & (new_meta_frame_with_grade_ML.Fwy == 78) &
                             (new_meta_frame_with_grade_ML.Lanes < 2)]

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
522,522,1116433,78,W,11.094,33.145066,-117.19219,1,Las Posas Rd,168.368027,1.507798,ML


In [None]:
# 8, 52, 54, 56, 78E, 94, 905 has no issues
# 125, 163, 805 has no issues

In [91]:
new_meta_frame_with_grade_ML[(new_meta_frame_with_grade_ML.Dir == 'S') & (new_meta_frame_with_grade_ML.Fwy == 5) &
                             (new_meta_frame_with_grade_ML.Lanes < 3)]

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
769,769,1123072,5,S,30.532,32.903777,-117.224665,2,5 S Bypass S/O 5/805,17.328718,-1.562271,ML
779,779,1123134,5,S,5.324,32.59507,-117.088901,1,Main St to 5 SB,7.251188,1.268062,ML


In [39]:
new_meta_frame_with_grade_ML[(new_meta_frame_with_grade_ML.Dir == 'S') & (new_meta_frame_with_grade_ML.Fwy == 15) &
                             (new_meta_frame_with_grade_ML.Lanes < 3)]

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
63,63,1117920,15,S,1.703,32.715149,-117.117869,2,SB 15 @ 94,19.198471,1.547851,ML
67,67,1122945,15,S,1.703,32.715149,-117.117869,2,15 SB From 94,19.198471,0.0,ML


In [40]:
new_meta_frame_with_grade_ML[(new_meta_frame_with_grade_ML.Dir == 'N') & (new_meta_frame_with_grade_ML.Fwy == 15) &
                             (new_meta_frame_with_grade_ML.Lanes < 3)]

Unnamed: 0.1,Unnamed: 0,ID,Fwy,Dir,Abs_PM,Latitude,Longitude,Lanes,Name,elevation,grade,Type
68,68,1122948,15,N,1.713,32.715149,-117.117717,2,94 EB From 15 NB,19.331848,1.522833,ML
735,735,1122502,15,N,30.47,33.109785,-117.095526,2,15 NB HOV N/O,218.190216,-1.550586,ML
783,783,1123179,15,N,3.059,32.733487,-117.111847,2,15NB From 805NB Conn,61.661972,0.0,ML


<img src='screenshots/1108771.png'>
<img src='screenshots/1122502.png'>
station 1108771 is super close to 1122502 and 1122502 has a name of HOV plus is only two lanes as opposed to 5