## Data stations ordered

### Purpose & Motivation

The purpose of this notebook is to create geojson files for the Wiggle Visualization by consuming data that has been processed from databricks for years 2008 to 2015 for the 5 minute frame. Metadata is consolidated within this notebook since the files are small and can be processed locally quickly.

Both the points geojson (for the markers) and the lines geojson (for the segments) is generated from this notebook.

### Direction from Advisor

N/A

### Tasks/Questions to Answer
#### Questions to Answer

N/A

#### Tasks

Create a geojson file that is consumed by the visualization.

### Results

See below

### Conclusions

N/A

In [2]:
from os.path import expanduser
import pandas as pd
import simplejson
import numpy as np
import urllib
import json
import glob

# Generate Metadata File for 2008 to 2015

In [3]:
!ls ../data/meta

[34m2008[m[m [34m2009[m[m [34m2010[m[m [34m2011[m[m [34m2012[m[m [34m2013[m[m [34m2014[m[m [34m2015[m[m [34m2016[m[m


In [4]:
meta_dir = '../data/meta/*/d11/*text_meta_*.txt'
meta_files = glob.glob(meta_dir)
meta_file_list = []
for meta_file in meta_files:
    date = str('_'.join(meta_file.split('_')[4:7])).split('.')[0]
    df = pd.read_table(meta_file, index_col=None, header=0)
    date_col = pd.Series([date] * len(df))
    df['file_date'] = date_col
    meta_file_list.append(df)
    
print meta_files[0:5]
meta_frame = pd.concat(meta_file_list)

['../data/meta/2008/d11/d11_text_meta_2007_09_21.txt', '../data/meta/2008/d11/d11_text_meta_2008_03_06.txt', '../data/meta/2008/d11/d11_text_meta_2008_04_15.txt', '../data/meta/2008/d11/d11_text_meta_2008_04_16.txt', '../data/meta/2008/d11/d11_text_meta_2008_04_18.txt']


In [5]:
meta_frame.columns

Index([u'ID', u'Fwy', u'Dir', u'District', u'County', u'City', u'State_PM',
       u'Abs_PM', u'Latitude', u'Longitude', u'Length', u'Type', u'Lanes',
       u'Name', u'User_ID_1', u'User_ID_2', u'User_ID_3', u'User_ID_4',
       u'file_date'],
      dtype='object')

In [6]:
print "total stations: %s" % len(meta_frame.ID.unique())
print "distribution of stations per type: "
print meta_frame.drop_duplicates(subset='ID', keep='last').Type.value_counts()
all_stations = meta_frame.ID.unique()

total stations: 1783
distribution of stations per type: 
ML    979
OR    344
FR    263
HV    106
FF     81
CH      7
CD      3
Name: Type, dtype: int64


In [7]:
# the type in the meta data are just the detector types.  Need to analyze the "change" that cohort 1 referred to
drop_na = meta_frame.dropna(axis=0, how='any', subset=['Latitude', 'Longitude'])
no_dup_keep_last = drop_na.drop_duplicates(subset='ID', keep='last') # TODO: assuming meta and 5min agree on freeway type...check?
print "unique count of stations: %s" % no_dup_keep_last.shape[0]

print "\ndistribution of Types of stations"
no_dup_keep_last.Type.value_counts()

unique count of stations: 1779

distribution of Types of stations


ML    975
OR    344
FR    263
HV    106
FF     81
CH      7
CD      3
Name: Type, dtype: int64

In [8]:
filter_ids = no_dup_keep_last.ID.unique()
missing_ids = set(all_stations) - set(filter_ids)

meta_frame[meta_frame.ID.isin(missing_ids)].drop_duplicates(subset='ID', keep='last')

Unnamed: 0,ID,Fwy,Dir,District,County,City,State_PM,Abs_PM,Latitude,Longitude,Length,Type,Lanes,Name,User_ID_1,User_ID_2,User_ID_3,User_ID_4,file_date
650,1113328,125,N,11,73,,29.96,28.863,,,5.0,ML,2,CONNECTOR TO WB 52,353.0,,,,08_13
652,1113336,125,S,11,73,,29.96,28.863,,,5.0,ML,2,52 EB CON TO 125 SB,354.0,,,,08_13
767,1114649,805,S,11,73,66000.0,28.811,28.662,,,0.651,ML,4,S/B AT JCT I-5,4045.0,,,,12_17
1324,1125383,52,W,11,73,70224.0,14.756,14.756,,,0.695,ML,2,52 WB from 125 Conn,43812.0,,,,12_17


## Review of the stations above
I reviewed the stations above and decided that they are ok to drop.
The two stations for the 125 N / S are a lot higher on the Abs_PM then the end of the freeway, so even if it's valid it would mess up our analysis in that the next adjacent station is much further away than normal.

For the 805 S station, it looks like there is another station are essentially the same location so this one will be ignored.
28.811 	28.66 	0.197 	1114649 	S/B AT JCT I-5 	4 	Mainline 	Other 	No 	1114643 	4045

For the 52W it also looks like this station has been replaced.
Santee 	14.756 	14.76 	0.695 	1125383 	52 WB from 125 Conn 	2 	Mainline 		No 	1111463 	43812

In [9]:
# no_dup_keep_last.to_csv('../data/meta_2008_2015.csv')

In [10]:
no_dup_keep_last.Type.unique()

array(['ML', 'OR', 'FR', 'FF', 'HV', 'CD', 'CH'], dtype=object)

In [11]:
def create_freeway_vectors(frame_to_use, columns_to_select=['ID', 'Latitude', 'Longitude', 'Abs_PM', 'Lanes'],
                           ML_only=True):
    """
    This function will create a dictionary of freeway /direction specific lists sorted in order of Abs_PM
    """
    ml_frame = frame_to_use[frame_to_use.Type == 'ML']
    to_loop = ml_frame.groupby(['Fwy', 'Dir'])['ID'].count().reset_index()[['Fwy', 'Dir']].values

    ret = {}
    for Fwy, Dir in to_loop:
        if Dir == "N":
            sort_order = ('Abs_PM', True)        
        elif Dir == "S":
            sort_order = ('Abs_PM', True)        
        elif Dir == "E":
            sort_order = ('Abs_PM', True)        
        elif Dir == "W":
            sort_order = ('Abs_PM', True)
        
        tmp = ml_frame[(ml_frame.Fwy == Fwy) & (ml_frame.Dir == Dir)]\
            .sort_values(by=sort_order[0], ascending=sort_order[1])[columns_to_select] # .drop_duplicates()
        tmp['order'] = pd.Series(index=tmp.index, data=sorted(range(0, len(tmp.ID)), reverse=(not sort_order[1])))
    
        if not ML_only:
            full_tmp = frame_to_use[(frame_to_use.Fwy == Fwy) & (frame_to_use.Dir == Dir)]
            full_tmp['sec_sort'] = full_tmp.Type.apply(lambda x: 10 if x == 'ML' else 1)
            tmp = tmp[['ID', 'order']].merge(full_tmp, how='right', on='ID').sort_values(
                by=[sort_order[0], 'sec_sort'], ascending=sort_order[1])[columns_to_select]
            
        ret["%s_%s" % (Fwy, Dir)] = tmp
    return ret 

In [12]:
def sorted_func(x):
    values = x.split('_')
    Fwy = int(values[0])
    Dir = values[1]
    if Dir == 'N' or Dir == 'E':
        dir_weight = 0
    else:
        dir_weight = 1
    return Fwy + dir_weight

In [13]:
freeway_vectors_update = create_freeway_vectors(
    no_dup_keep_last, [u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes', u'Name', 'Type'])

In [14]:
freeway_keys = sorted(freeway_vectors_update.keys(), key=sorted_func)
freeway_keys

['5_N',
 '5_S',
 '8_E',
 '8_W',
 '15_N',
 '15_S',
 '52_E',
 '52_W',
 '54_E',
 '54_W',
 '56_E',
 '56_W',
 '78_E',
 '78_W',
 '94_E',
 '94_W',
 '125_N',
 '125_S',
 '163_N',
 '163_S',
 '805_N',
 '805_S',
 '905_E',
 '905_W']

In [15]:
for ind, i in freeway_vectors_update['5_N'].iterrows():
    print i
    print i['ID']
    break

ID                        1114091
Fwy                             5
Dir                             N
Abs_PM                      0.057
Latitude                  32.5428
Longitude                 -117.03
Lanes                           6
Name         N/O CMNO DE LA PLAZA
Type                           ML
order                           0
Name: 730, dtype: object
1114091


# Create geojson using 2008 through 2015 data

In [16]:
# data for all districts if decide to upscale
# from pyspark.sql.functions import hour, mean,minute, stddev, count,max as psmax,min as psmin, date_format, \
#     split, explode

# from pyspark.sql import SQLContext
# from pyspark.sql import Row
# from pyspark.sql.types import *
# from pyspark.sql import DataFrameReader

In [17]:
# spark_df = spark.read.format("com.databricks.spark.csv").option("header", "true") \
#     .option("mode", "DROPMALFORMED") \
#     .load('../data/stats_2008_2015_d11.csv');

# spark_df.show()

In [18]:
df_new = pd.read_csv('../data/weekday_stats_2008_2015_d11.csv', usecols=range(1,5))
df_new.columns

Index([u'station', u'hour', u'minute', u'flow_mean'], dtype='object')

In [19]:
no_dup_keep_last.columns

Index([u'ID', u'Fwy', u'Dir', u'District', u'County', u'City', u'State_PM',
       u'Abs_PM', u'Latitude', u'Longitude', u'Length', u'Type', u'Lanes',
       u'Name', u'User_ID_1', u'User_ID_2', u'User_ID_3', u'User_ID_4',
       u'file_date'],
      dtype='object')

In [20]:
print no_dup_keep_last.Fwy.unique()

[ 94  78   5 805   8 163  15  52 125 905  56  54  67]


In [21]:
df_new.columns

Index([u'station', u'hour', u'minute', u'flow_mean'], dtype='object')

In [22]:
df_new['Time'] = pd.to_datetime(df_new['hour'].astype('str') + ':' + df_new['minute'].astype('str'),
                                format='%H:%M').dt.time

In [23]:
test = df_new[['station', 'Time', 'flow_mean']].as_matrix()
test

array([[1100745, datetime.time(0, 0), 7.512630014860001],
       [1108341, datetime.time(0, 0), 51.48754789270001],
       [1118333, datetime.time(0, 0), 33.414307004499996],
       ..., 
       [1122612, datetime.time(6, 35), nan],
       [1118394, datetime.time(4, 35), 72.0],
       [1118401, datetime.time(23, 0), 112.0]], dtype=object)

In [24]:
complete_with_meta = pd.merge(df_new, no_dup_keep_last[['ID', 'District', 'County', 'City', 'State_PM', 'Abs_PM',
                                                      'Latitude', 'Longitude', 'Name', 'Lanes', 'Type', 'Fwy',
                                                      'Dir']], how='left', left_on='station',
                              right_on='ID')
complete_with_meta['Time'] = pd.to_datetime(complete_with_meta['hour'].astype('str') + ':' + \
                                            complete_with_meta['minute'].astype('str'),
                                format='%H:%M').dt.time

In [25]:
average_day = []
for key in freeway_keys:
    Fwy, Dir = key.split('_')
    tmp = complete_with_meta[(complete_with_meta.Fwy == int(Fwy)) & (complete_with_meta.Dir == Dir)]
    average_day.append(tmp.groupby('ID')['flow_mean'].mean())
df_avg = pd.concat(average_day)

In [26]:
complete_with_meta_avg = pd.merge(pd.DataFrame(df_avg).reset_index(),
                                  no_dup_keep_last[['ID', 'District', 'County', 'City', 'State_PM', 'Abs_PM',
                                                    'Latitude', 'Longitude', 'Name', 'Lanes', 'Type', 'Fwy',
                                                    'Dir']], how='left', left_on='ID',
                              right_on='ID')

In [27]:
complete_with_meta_avg.columns

Index([u'ID', u'flow_mean', u'District', u'County', u'City', u'State_PM',
       u'Abs_PM', u'Latitude', u'Longitude', u'Name', u'Lanes', u'Type',
       u'Fwy', u'Dir'],
      dtype='object')

In [28]:
complete_with_meta.Fwy.unique()

array([  94.,    8.,    5.,  805.,   15.,   78.,   52.,  163.,  125.,
         56.,  905.,   54.,   nan,   67.])

In [29]:
df_new.columns

Index([u'station', u'hour', u'minute', u'flow_mean', u'Time'], dtype='object')

In [30]:
freeway_vectors_update[key].columns

Index([u'ID', u'Fwy', u'Dir', u'Abs_PM', u'Latitude', u'Longitude', u'Lanes',
       u'Name', u'Type', u'order'],
      dtype='object')

In [61]:
# update ML file for 2008 to 2015
final = {}

for key in freeway_keys:
    print key
    new_geojson = {'type': 'FeatureCollection', 'features': []}

    # freeway_vectors_update has all of the metadata info
    df = freeway_vectors_update[key]
    for idx, row in df.iterrows():
        properties = {'key': key,
                      'ID': row['ID'],
                      'Lanes': row['Lanes'],
                      'Name': row['Name'],
                      'Abs_PM': np.round(row['Abs_PM'], decimals=1),
                      'Order': row['order'],
                      'Type': row['Type'],
                     }
        flow_data = df_new[df_new.station == row['ID']][['Time', 'flow_mean']].sort_values(by='Time').set_index('Time')
        properties['Flow'] = flow_data.flow_mean.tolist()
        geometry = {'type': "Point", "coordinates": [row['Longitude'], row['Latitude']]}
        temp = {'type': 'Feature', 'properties': properties, "geometry": geometry}
        new_geojson['features'].append(temp)
#         break
    print "geojson len: %s" % len(new_geojson['features'])
    final[key] = {'visible': False, 'data': new_geojson}
#     print final
#     break

json_string = json.dumps(final)
final_string = 'var meta_data_points = ' + json_string
with open('../vis/WiggleVis/data/2015_to_2008_ML_d11_geojson_points2.js', 'w') as outfile:
    outfile.write(final_string)

5_N
geojson len: 135
5_S
geojson len: 119
8_E
geojson len: 49
8_W
geojson len: 49
15_N
geojson len: 87
15_S
geojson len: 84
52_E
geojson len: 27
52_W
geojson len: 28
54_E
geojson len: 3
54_W
geojson len: 3
56_E
geojson len: 17
56_W
geojson len: 14
78_E
geojson len: 19
78_W
geojson len: 26
94_E
geojson len: 16
94_W
geojson len: 23
125_N
geojson len: 35
125_S
geojson len: 37
163_N
geojson len: 15
163_S
geojson len: 17
805_N
geojson len: 73
805_S
geojson len: 75
905_E
geojson len: 11
905_W
geojson len: 13


## Calculate midpoint

source: http://www.movable-type.co.uk/scripts/latlong.html

var Bx = Math.cos(φ2) * Math.cos(λ2-λ1);

var By = Math.cos(φ2) * Math.sin(λ2-λ1);

var φ3 = Math.atan2(Math.sin(φ1) + Math.sin(φ2),
                    Math.sqrt( (Math.cos(φ1)+Bx)*(Math.cos(φ1)+Bx) + By*By ) );

var λ3 = λ1 + Math.atan2(By, Math.cos(φ1) + Bx);


In [31]:
# source:
# http://stackoverflow.com/questions/5895832/python-lat-long-midpoint-calculation-gives-wrong-result-when-longitude-90
import math

def midpoint(lat1, lon1, lat2, lon2, debug=False):
    if debug:
        print lat1, lon1
        print lat2, lon2
    lonA = math.radians(lon1)
    lonB = math.radians(lon2)
    latA = math.radians(lat1)
    latB = math.radians(lat2)

    dLon = lonB - lonA

    Bx = math.cos(latB) * math.cos(dLon)
    By = math.cos(latB) * math.sin(dLon)

    latC = math.atan2(math.sin(latA) + math.sin(latB),
                  math.sqrt((math.cos(latA) + Bx) * (math.cos(latA) + Bx) + By * By))
    lonC = lonA + math.atan2(By, math.cos(latA) + Bx)
    lonC = (lonC + 3 * math.pi) % (2 * math.pi) - math.pi

    return math.degrees(latC), math.degrees(lonC)

In [32]:
# test
midpoint(32.542842, -117.030331, 32.551690, -117.045725)

(32.54726623446039, -117.03802762069233)

In [33]:
# prototype
# shifted = freeway_vectors_update['5_N'].shift(-1)
# result = []
# final = []
# total = len(freeway_vectors_update['5_N'])
# print "total: %s" % total
# index = 0
# for idx, item in freeway_vectors_update['5_N'].iterrows():
# #     print item['order']
# #     print index
#     if item['order'] != (total - 1):
#         result.append(midpoint(item['Latitude'], item['Longitude'], shifted.iloc[index]['Latitude'],
#                                shifted.iloc[index]['Longitude']))
#         final.append([item['Latitude'], item['Longitude'], result[index][0], result[index][1]])
#     else:
#         final.append([result[index-1][0], result[index-1][1], item['Latitude'], item['Longitude']])
#     index += 1


In [34]:
# example format for geojson for the line
{ 
    "type": "Feature",
    "properties":
    {
        "id": 2,
        "elevation": 50
    },
    "geometry":
    {
        "type": "LineString",
        "coordinates": 
        [
            [ 11.836395263671875, 47.75317468890147 ],
            [ 11.865234375, 47.73193447949174 ]
        ]
    }
}

{'geometry': {'coordinates': [[11.836395263671875, 47.75317468890147],
   [11.865234375, 47.73193447949174]],
  'type': 'LineString'},
 'properties': {'elevation': 50, 'id': 2},
 'type': 'Feature'}

In [35]:
# example of format
# [ 
# { "type": "Feature", "properties": { "id": 2, "elevation": 50 }, "geometry": { "type": "LineString", "coordinates": [ [ 11.836395263671875, 47.75317468890147 ], [ 11.865234375, 47.73193447949174 ] ] } },
# { "type": "Feature", "properties": { "id": 1, "elevation": 750 }, "geometry": { "type": "LineString", "coordinates": [ [ 11.865234375,47.73193447949174 ], [ 11.881027221679688, 47.700520033704954 ] ] } },
# { "type": "Feature", "properties": { "id": 0, "elevation": 1700 }, "geometry": { "type": "LineString", "coordinates": [ [ 11.881027221679688, 47.700520033704954 ], [ 11.923599243164062, 47.706527200903395 ] ] } },
# { "type": "Feature", "properties": { "id": 0, "elevation": 3000 }, "geometry": { "type": "LineString", "coordinates": [ [ 11.923599243164062, 47.706527200903395 ], [ 11.881027221679688, 47.700520033704954 ], ] } }
# ]

In [94]:
example.head()

Unnamed: 0,ID,0,1,2,3,4,5,6,7,8,...,1430,1431,1432,1433,1434,1435,1436,1437,1438,1439
0,1114091,-3.490106,3.42419,4.529551,5.012564,5.301757,5.48724,5.602815,5.664003,5.678146,...,5.35952,5.453801,5.50047,5.502015,5.457048,5.359772,5.19743,4.942799,4.52831,3.698384
1,1118333,-2.35646,2.751024,3.715568,4.166535,4.440042,4.615214,4.722751,4.776899,4.784312,...,4.584922,4.671999,4.712405,4.708016,4.656904,4.55259,4.381129,4.112542,3.670161,2.73631
2,1114709,-2.035623,3.518397,4.294961,4.693409,4.939621,5.096301,5.189139,5.230228,5.224993,...,5.290863,5.375363,5.412806,5.405157,5.350486,5.242211,5.0661,4.7914,4.338684,3.369389
3,1118348,-2.525474,2.490085,3.583167,4.062706,4.349642,4.533259,4.647139,4.70672,4.719312,...,4.582817,4.668961,4.708619,4.703665,4.652189,4.54777,4.376604,4.10908,3.669823,2.750651
4,1114720,-2.191517,3.750611,4.51945,4.915459,5.160343,5.316088,5.408152,5.448524,5.442565,...,5.492724,5.57749,5.615277,5.60807,5.55397,5.446464,5.271465,4.998589,4.549712,3.59622


In [106]:
example.head()

Unnamed: 0,ID,0,1,2,3,4,5,6,7,8,...,1430,1431,1432,1433,1434,1435,1436,1437,1438,1439
0,1114091,-3.490106,3.42419,4.529551,5.012564,5.301757,5.48724,5.602815,5.664003,5.678146,...,5.35952,5.453801,5.50047,5.502015,5.457048,5.359772,5.19743,4.942799,4.52831,3.698384
1,1118333,-2.35646,2.751024,3.715568,4.166535,4.440042,4.615214,4.722751,4.776899,4.784312,...,4.584922,4.671999,4.712405,4.708016,4.656904,4.55259,4.381129,4.112542,3.670161,2.73631
2,1114709,-2.035623,3.518397,4.294961,4.693409,4.939621,5.096301,5.189139,5.230228,5.224993,...,5.290863,5.375363,5.412806,5.405157,5.350486,5.242211,5.0661,4.7914,4.338684,3.369389
3,1118348,-2.525474,2.490085,3.583167,4.062706,4.349642,4.533259,4.647139,4.70672,4.719312,...,4.582817,4.668961,4.708619,4.703665,4.652189,4.54777,4.376604,4.10908,3.669823,2.750651
4,1114720,-2.191517,3.750611,4.51945,4.915459,5.160343,5.316088,5.408152,5.448524,5.442565,...,5.492724,5.57749,5.615277,5.60807,5.55397,5.446464,5.271465,4.998589,4.549712,3.59622


In [114]:
def calculate_segments(freeway_df, wig_dat):
    """
    This function will calculate the segments from the ordered stations and return
    an array of midpoints
    """
    shifted = freeway_df.shift(-1)
    result = []
    final = []
    data = []
    stations = []
    total = len(freeway_df)
    print "total: %s" % total
    index = 0
    for index, (idx, item) in enumerate(freeway_df.iterrows()):
    #     print item['order']
    #     print index
#         print item
        station = item['ID']
#         print "station: %s" % station
        stations.append(station)
#         print "test: %s" % wig_dat[wig_dat.ID == station]
        points = wig_dat[wig_dat.ID == station].reset_index().T.iloc[2:1442].T.ix[0].tolist()
#         points = wig_dat.iloc[0].T.iloc[1:1442].T.tolist()
#         print 'points: %s' % points
#         print points
        data.append(points)
        if item['order'] == 0:
            result.append(midpoint(item['Latitude'], item['Longitude'], shifted.iloc[index]['Latitude'],
                                   shifted.iloc[index]['Longitude']))
            final.append((1, [[item['Longitude'], item['Latitude']], [result[index][1], result[index][0]]]))
        elif item['order'] != (total - 1):
            result.append(midpoint(item['Latitude'], item['Longitude'], shifted.iloc[index]['Latitude'],
                                   shifted.iloc[index]['Longitude']))
            final.append((2, [[result[index-1][1], result[index-1][0]], [item['Longitude'], item['Latitude']]],
                         [[item['Longitude'], item['Latitude']], [result[index][1], result[index][0]]]))
        else:
            final.append((1, [[result[index-1][1], result[index-1][0]], [item['Longitude'], item['Latitude']]]))
        index += 1
    return final, data, stations

In [115]:
example = pd.read_csv('../vis/WiggleVis/data/heatmaps/wiggle_analysis_%s_%s_for_segment.csv' % (5, 'N'))
# print example.head()
segments, wiggles, stations = calculate_segments(freeway_vectors_update['5_N'], example)

total: 135


In [116]:
max_result = max([max(item) for item in wiggles])
min_result = min([min(item) for item in wiggles])
print max_result, min_result

7.91882517129 -7.25527337518


In [117]:
from pprint import pprint

In [119]:
# update ML file for 2008 to 2015
final = {}

for key in freeway_keys:
# for key in ['54_W']:
    print key
    Fwy, Dir = key.split('_')
    wiggle_data = pd.read_csv('../vis/WiggleVis/data/heatmaps/wiggle_analysis_%s_%s_for_segment.csv' % (Fwy, Dir))
    segments, wiggles, stations = calculate_segments(freeway_vectors_update[key], wiggle_data)

    max_result = max([max(item) for item in wiggles])
    min_result = min([min(item) for item in wiggles])
    
    data_to_store = {'type': "FeatureCollection", 'features': []}
    for segs, wig, stat in zip(segments, wiggles, stations):
#         print "segs: %s" % str(segs)
        for idx in range(1, segs[0]+1):
#             print idx
#             print "segs[idx]: %s" % segs[idx]
            new_geojson = {'type': 'Feature'}
            properties = {'wiggles': wig, 'ID': stat, 'min': min_result, 'max': max_result}
            geometry = {'type': "LineString", "coordinates": segs[idx]}
            new_geojson['geometry'] = geometry
            new_geojson['properties'] = properties
    #         print new_geojson
            data_to_store['features'].append(new_geojson)
    final[key] = data_to_store
#     pprint(final)
# print final
json_string = json.dumps(final)
final_string = 'var segment_data = ' + json_string
with open('../vis/WiggleVis/data/2015_to_2008_ML_d11_geojson_lines.js', 'w') as outfile:
    outfile.write(final_string)

5_N
total: 135
5_S
total: 119
8_E
total: 49
8_W
total: 49
15_N
total: 87
15_S
total: 84
52_E
total: 27
52_W
total: 28
54_E
total: 3
54_W
total: 3
56_E
total: 17
56_W
total: 14
78_E
total: 19
78_W
total: 26
94_E
total: 16
94_W
total: 23
125_N
total: 35
125_S
total: 37
163_N
total: 15
163_S
total: 17
805_N
total: 73
805_S
total: 75
905_E
total: 11
905_W
total: 13


## Generate Station indicators

In [120]:
def create_freeway_vectors(frame_to_use, columns_to_select=['ID', 'Latitude', 'Longitude', 'Abs_PM', 'Lanes'],
                           ML_only=True):
    """
    This function will create a dictionary of freeway /direction specific lists sorted in order of Abs_PM
    """
    ml_frame = frame_to_use[frame_to_use.Type == 'ML']
    to_loop = ml_frame.groupby(['Fwy', 'Dir'])['ID'].count().reset_index()[['Fwy', 'Dir']].values

    ret = {}
    for Fwy, Dir in to_loop:
        if Dir == "N":
            sort_order = ('Abs_PM', True)        
        elif Dir == "S":
            sort_order = ('Abs_PM', True)        
        elif Dir == "E":
            sort_order = ('Abs_PM', True)        
        elif Dir == "W":
            sort_order = ('Abs_PM', True)
        
        tmp = ml_frame[(ml_frame.Fwy == Fwy) & (ml_frame.Dir == Dir)]\
            .sort_values(by=sort_order[0], ascending=sort_order[1])[columns_to_select] # .drop_duplicates()
        tmp['order'] = pd.Series(index=tmp.index, data=sorted(range(0, len(tmp.ID)), reverse=(not sort_order[1])))
    
        if not ML_only:
            full_tmp = frame_to_use[(frame_to_use.Fwy == Fwy) & (frame_to_use.Dir == Dir)]
            full_tmp['sec_sort'] = full_tmp.Type.apply(lambda x: 10 if x == 'ML' else 1)
            tmp = tmp[['ID', 'order']].merge(full_tmp, how='right', on='ID').sort_values(
                by=[sort_order[0], 'sec_sort'], ascending=sort_order[1])
            tmp = tmp[columns_to_select + ['order', 'sec_sort']]
            
        ret["%s_%s" % (Fwy, Dir)] = tmp
    return ret 

In [121]:
fwy_keys = create_freeway_vectors(no_dup_keep_last, columns_to_select=['ID', 'Latitude', 'Longitude', 'Abs_PM',
                                                                       'Lanes', 'Type'], ML_only=False)
# test['125_N']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [124]:
for test_key in fwy_keys:
    fwy_keys[test_key].index = range(0, len(fwy_keys[test_key]))
    extra_data = []
    last_ML_abs_PM = None
    for idx, row in fwy_keys[test_key].iterrows():
    #     print idx, row
        if row.Type == 'ML' and not extra_data:
            fwy_keys[test_key].set_value(idx, 'extra', '')
            last_ML_abs_PM = row.Abs_PM
        elif row.Type == 'ML' and extra_data:
            fwy_keys[test_key].set_value(idx, 'extra', ', '.join(extra_data))
            extra_data = []
            last_ML_abs_PM = row.Abs_PM
        elif row.Type != 'ML':
            extra_data.append(row.Type)
            fwy_keys[test_key].set_value(idx, 'extra', '')        

In [125]:
fwy_keys['125_N']

Unnamed: 0,ID,Latitude,Longitude,Abs_PM,Lanes,Type,order,sec_sort,extra
0,1119021,32.595673,-116.964628,2.344,2,ML,0.0,10,
1,1119041,32.608637,-116.967247,3.282,3,ML,1.0,10,
2,1119050,32.617282,-116.971114,3.928,3,ML,2.0,10,
3,1119059,32.619301,-116.971171,4.067,2,ML,3.0,10,
4,1119075,32.623484,-116.971141,4.356,2,ML,4.0,10,
5,1119085,32.625784,-116.971135,4.515,3,ML,5.0,10,
6,1119094,32.629142,-116.971145,4.747,2,ML,6.0,10,
7,1119102,32.634145,-116.971152,5.092,2,ML,7.0,10,
8,1100760,32.771524,-117.001822,5.377,1,FR,,1,
9,1119110,32.641088,-116.970426,5.575,2,ML,8.0,10,FR


In [126]:
fwy_keys['125_N'][fwy_keys['125_N'].Type == 'ML']

Unnamed: 0,ID,Latitude,Longitude,Abs_PM,Lanes,Type,order,sec_sort,extra
0,1119021,32.595673,-116.964628,2.344,2,ML,0.0,10,
1,1119041,32.608637,-116.967247,3.282,3,ML,1.0,10,
2,1119050,32.617282,-116.971114,3.928,3,ML,2.0,10,
3,1119059,32.619301,-116.971171,4.067,2,ML,3.0,10,
4,1119075,32.623484,-116.971141,4.356,2,ML,4.0,10,
5,1119085,32.625784,-116.971135,4.515,3,ML,5.0,10,
6,1119094,32.629142,-116.971145,4.747,2,ML,6.0,10,
7,1119102,32.634145,-116.971152,5.092,2,ML,7.0,10,
9,1119110,32.641088,-116.970426,5.575,2,ML,8.0,10,FR
10,1119126,32.649208,-116.970687,6.136,2,ML,9.0,10,
