# Project: Traffic Prediction

### Introduction
The project investigates road traffic problem of detecting anomalous behaviour using data recorded by sensors. The model uses density and its density-difference features lean a pattern from the data,if the density spikes beyond a certain threshold then anomaly is detected.

The model is intended to be implemented in a solution such as a mobile application to be used by drivers and motorists to know in advance about road status hence make good decisions for better travelling experience.

### Methods and Algorithms

The solution will be implemented using unsupervised learning algorithm, auto-encoder neural network, to develop models that will then be used to predict anomalous observations of traffic flow data.  Anomalous observations are irregularities that cause a change in normal flow of traffic. These include; natural disaster, accident, public activities such as marathon or road construction. 

Unsupervised learning algorithm, autoencoder has been used together with Long Short Term Memory(LSTM). The choice behind this design decision is because the problem is categorized as a sequence classification problem, since data is read from sensors as a sequence of sensor readings, based on the reading, the model can classify if there is usual or unusual behaviour at a certain time. Recurrent Neural Networks (RNNs) are known for solving such problems. RNNs contains loops of a basic neural network where activations or output from previous time step act as input to the next time step, in addition to a new data sample and together they are used to make new predictions.

Autoencoder, a neural network algorithm is used to help reconstruct data. In many cases, Neural networks are given inputs and expected to predict a target. In this project, un-labeled dataset with no traffic problems will be used to train the neural network, then, the network re-constructs the data input as its prediction. The network will then be tested with un-censored traffic data which contains both normal and abnormal trends. The model is expected to fail to re-construct the new data by showing a spike in prediction error, hence detection of traffic.LSTM is used to analyse time series sensor data.

### Dataset
Public dataset on traffic flow from Hops platform was used. TrafficFlow_Sample (hdfs:///Projects/TrafficFlow/TrafficFlow_Sample/) -  The dataset is a sample of traffic flow data collected from sensors deployed on Stockholm highways. The dataset is labelled to indicate good and bad traffic conditions. The labeled portion will be used to filter data with good traffic conditions which is used to train the model using a density feature engineered based on other existing feature fro the dataset i.e Flow-In, Timestamp and Average-Speed.

### Tools and platform used
Keras to implement autoencoder and LSTM deep learning algorithm
Apache Spark Cluster running on Cognitive AI platform (https://labs.cognitiveclass.ai/), Jupyter notebook, Pyspark


# Steps: 

## 1. Import and Explore Data

In [5]:
from pyspark.sql.types import *
from pyspark.sql.functions import udf , col, lag, datediff, unix_timestamp,lit,coalesce,concat,split, explode
from pyspark.sql.window import Window

In [6]:
schema_flow = StructType().add('Timestamp', TimestampType(), False) \
        .add('Ds_Reference', StringType(), False) \
        .add('Detector_Number', ShortType(), False) \
        .add('Traffic_Direction', ShortType(), False) \
        .add('Flow_In', ShortType(), False) \
        .add('Average_Speed', ShortType(), False) \
        .add('Sign_Aid_Det_Comms', ShortType(), False) \
        .add('Status', ShortType(), False) \
        .add('Legend_Group', ShortType(), False) \
        .add('Legend_Sign', ShortType(), False) \
        .add('Legend_SubSign', ShortType(), False) \
        .add('Protocol_Version', StringType(), False) 

In [7]:
df_raw = spark.read.csv('data/mcs_201606.csv', sep=';', schema=schema_flow, ignoreLeadingWhiteSpace=True, \
                    ignoreTrailingWhiteSpace=True, timestampFormat='yyyy-MM-dd HH:mm:ss.SSS')
df_raw.printSchema()

root
 |-- Timestamp: timestamp (nullable = true)
 |-- Ds_Reference: string (nullable = true)
 |-- Detector_Number: short (nullable = true)
 |-- Traffic_Direction: short (nullable = true)
 |-- Flow_In: short (nullable = true)
 |-- Average_Speed: short (nullable = true)
 |-- Sign_Aid_Det_Comms: short (nullable = true)
 |-- Status: short (nullable = true)
 |-- Legend_Group: short (nullable = true)
 |-- Legend_Sign: short (nullable = true)
 |-- Legend_SubSign: short (nullable = true)
 |-- Protocol_Version: string (nullable = true)



In [8]:
%%time
df_raw.count()

CPU times: user 0 ns, sys: 8 ms, total: 8 ms
Wall time: 56.1 s


85255773

In [4]:
df_raw.show(5)

+-------------------+------------+---------------+-----------------+-------+-------------+------------------+------+------------+-----------+--------------+----------------+
|          Timestamp|Ds_Reference|Detector_Number|Traffic_Direction|Flow_In|Average_Speed|Sign_Aid_Det_Comms|Status|Legend_Group|Legend_Sign|Legend_SubSign|Protocol_Version|
+-------------------+------------+---------------+-----------------+-------+-------------+------------------+------+------------+-----------+--------------+----------------+
|2016-06-01 00:00:00| E182N 2,015|             49|               78|      0|          252|                 0|     1|         255|          1|             1|               4|
|2016-06-01 00:00:00| E182N 2,015|             50|               78|      0|          252|                 0|     1|         255|          1|             1|               4|
|2016-06-01 00:00:00| E182N 2,015|             51|               78|      0|          252|                 0|     1|         255| 

In [5]:
df_raw.select('Detector_Number','Timestamp','Ds_Reference','Average_Speed','Flow_In','Status').show(10)

+---------------+-------------------+------------+-------------+-------+------+
|Detector_Number|          Timestamp|Ds_Reference|Average_Speed|Flow_In|Status|
+---------------+-------------------+------------+-------------+-------+------+
|             49|2016-06-01 00:00:00| E182N 2,015|          252|      0|     1|
|             50|2016-06-01 00:00:00| E182N 2,015|          252|      0|     1|
|             51|2016-06-01 00:00:00| E182N 2,015|          252|      0|     1|
|             52|2016-06-01 00:00:00| E182N 2,015|          252|      0|     1|
|             49|2016-06-01 00:00:00| E182N 2,325|          252|      0|     1|
|             50|2016-06-01 00:00:00| E182N 2,325|          252|      0|     1|
|             51|2016-06-01 00:00:00| E182N 2,325|          252|      0|     1|
|             49|2016-06-01 00:00:00| E182N 2,690|          252|      0|     1|
|             50|2016-06-01 00:00:00| E182N 2,690|          252|      0|     1|
|             51|2016-06-01 00:00:00| E1

## 2. Clean the Data

Three Cleaning functions that generate sensor ID that will be used to uniquely identify a sensor, it consists of two fields from the dataset, Ds_Reference and Detector_Number number.

In [6]:
## See Reference [1]
split_schema = StructType([
  StructField('Road', StringType(), False),
  StructField('Km_Ref', IntegerType(), False)
])

@udf(split_schema)
def split_ds_ref(s):
    try:
        r, km = s.split(' ')
        k, m = km.split(',')
        meter = int(k)*1000 + int(m)
        return r, meter
    except:
        return None
    
@udf(StringType())
def split_ds_ref2(s):
    try:
        r, km = s.split(' ')
        return r
    except:
        return None 
@udf(StringType())
def split_ds_ref3(s):
    try:
        r, km = s.split(' ')
        k, m = km.split(',')
        meter = int(k)*1000 + int(m)  
        return meter
    except:
        return None     
#def generate_sensor_ids1(*cols):
#    return concat(*[coalesce(c, lit("*")) for c in cols]) 
def generate_sensor_ids(s, d):
    r, km = s.split(' ')
    k, m = km.split(',')
    meter = int(k)*1000 + int(m)
    var1 = r, meter
    return var

funcConcatCols = udf(lambda x,y,z: x+'_'+y+'_'+z,StringType())
 

##### Clean up 1: Convert detector number to type int for easy manipulation

In [10]:
ascii_to_int = udf(lambda x : x - 48, ShortType())
df_cleanup1 = df_raw.withColumn('Detector_Number', ascii_to_int('Detector_Number'))


##### Clean up 2 and 3: Split detector reference ID

The field is first split up for easy concantination while generating unique IDs

In [8]:
df_cleanup2 = df_cleanup1.withColumn('Ds_Ref_temp1',   split_ds_ref2('Ds_Reference')).withColumn('Ds_Ref_temp2',split_ds_ref3('Ds_Reference'))

df_cleanup3 = df_cleanup2.withColumn('Ds_Ref', funcConcatCols(col('Ds_Ref_temp1'), col('Ds_Ref_temp2'),col('Detector_Number').cast(StringType())))
df_cleanup3.show(2)

df_cleanup4 = df_cleanup3.withColumn('Ds_Reference',split_ds_ref('Ds_Reference'))
df_cleanup4.show(2)

+-------------------+------------+---------------+-----------------+-------+-------------+------------------+------+------------+-----------+--------------+----------------+------------+------------+------------+
|          Timestamp|Ds_Reference|Detector_Number|Traffic_Direction|Flow_In|Average_Speed|Sign_Aid_Det_Comms|Status|Legend_Group|Legend_Sign|Legend_SubSign|Protocol_Version|Ds_Ref_temp1|Ds_Ref_temp2|      Ds_Ref|
+-------------------+------------+---------------+-----------------+-------+-------------+------------------+------+------------+-----------+--------------+----------------+------------+------------+------------+
|2016-06-01 00:00:00| E182N 2,015|              1|               78|      0|          252|                 0|     1|         255|          1|             1|               4|       E182N|        2015|E182N_2015_1|
|2016-06-01 00:00:00| E182N 2,015|              2|               78|      0|          252|                 0|     1|         255|          1|       

##### Clean up 4: Convert data to parquet format 

For faster execution of operations

In [None]:
df_cleanup4.write.save('data/trafficData_E4N.parquet', format='parquet')

Select only normal data from the dataset, that is where status field indicates 3

In [15]:
%%time
df_trafficData_E4N = spark.read.parquet('data/trafficData_E4N.parquet').select('Timestamp', 'Ds_Reference', 'Ds_Ref', 'Detector_Number', 'Flow_In', 'Average_Speed').where('Status == 3 AND Ds_Reference.Road == "E4N"')

CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 951 ms


In [16]:
df_trafficData_E4N.createOrReplaceTempView("NormalTrafficFlow")

In [11]:
df_trafficData_E4N.show(10)

+-------------------+------------+-----------+---------------+-------+-------------+
|          Timestamp|Ds_Reference|     Ds_Ref|Detector_Number|Flow_In|Average_Speed|
+-------------------+------------+-----------+---------------+-------+-------------+
|2016-06-09 01:45:00|[E4N, 47465]|E4N_47465_2|              2|      4|          101|
|2016-06-09 01:45:00|[E4N, 47465]|E4N_47465_3|              3|      7|           84|
|2016-06-09 01:45:00|[E4N, 47800]|E4N_47800_2|              2|      4|           98|
|2016-06-09 01:45:00|[E4N, 47800]|E4N_47800_3|              3|      7|           91|
|2016-06-09 01:45:00|[E4N, 48290]|E4N_48290_2|              2|      2|          104|
|2016-06-09 01:45:00|[E4N, 48290]|E4N_48290_3|              3|      4|           97|
|2016-06-09 01:45:00|[E4N, 48620]|E4N_48620_2|              2|      2|          119|
|2016-06-09 01:45:00|[E4N, 48620]|E4N_48620_3|              3|      3|          105|
|2016-06-09 01:45:00|[E4N, 48935]|E4N_48935_2|              2|   

## 3. Perform Feature Selection

According to the general known knowledge on fundamental characteristics of traffic flow. There are three important features in relation to traffic flow.
1. Flow - Number of moving vehicles
2. Speed - Higher car speed indicated good traffic flow.
3. Density - Concentration of the traffic. Too dense flow when approaching traffic jam and low density or free flow when no traffic.

The three interrelated characteristics can be used to measure the traffic of a roadway. In this project, Density feature is added and its difference calculated to analyse time series data where a sequence of density difference in previous time steps is used together with the difference in current sequence to predict if there is traffic or not.

##### Add Density Column

In [17]:
df2_trafficData_E4N = df_trafficData_E4N.withColumn('Density', col('Flow_In')*60/col('Average_Speed'))
w = Window.partitionBy('Ds_Reference', 'Detector_Number').orderBy('Timestamp')

##### Add Time-Lag Column
This is used to filter data that has time difference of 1-minute since the sensors record readings every minute

In [12]:
time_difference = unix_timestamp('Timestamp', format='yyyy-MM-dd HH:mm:ss.SSS') - lag(unix_timestamp('Timestamp', format='yyyy-MM-dd HH:mm:ss.SSS')).over(w)                            
df3_trafficData_E4N = df2_trafficData_E4N.withColumn('Time_Lag_Length', time_difference).filter(col('Time_Lag_Length') == 60)
df3_trafficData_E4N.show(20)

+-------------------+------------+-----------+---------------+-------+-------------+------------------+---------------+
|          Timestamp|Ds_Reference|     Ds_Ref|Detector_Number|Flow_In|Average_Speed|           Density|Time_Lag_Length|
+-------------------+------------+-----------+---------------+-------+-------------+------------------+---------------+
|2016-06-01 03:03:00|[E4N, 30710]|E4N_30710_2|              2|     18|          102|10.588235294117647|             60|
|2016-06-01 03:04:00|[E4N, 30710]|E4N_30710_2|              2|     24|           96|              15.0|             60|
|2016-06-01 03:05:00|[E4N, 30710]|E4N_30710_2|              2|     15|           97| 9.278350515463918|             60|
|2016-06-01 03:06:00|[E4N, 30710]|E4N_30710_2|              2|      8|          105| 4.571428571428571|             60|
|2016-06-01 03:07:00|[E4N, 30710]|E4N_30710_2|              2|     10|          100|               6.0|             60|
|2016-06-01 03:08:00|[E4N, 30710]|E4N_30

Select only the colums with features that will be used to train the model and save it to CSV file, to be read later for model training.

In [14]:
#df4_trafficData_E4N = df3_trafficData_E4N.drop("Ds_Reference")
df4_trafficData_E4N = df3_trafficData_E4N.select('Timestamp', 'Density', 'Ds_Ref', 'Detector_Number', 'Flow_In', 'Average_Speed').where('Status == 3 AND Ds_Reference.Road == "E4N"')
df4_trafficData_E4N.show(2)

+-------------------+------------------+-----------+---------------+-------+-------------+
|          Timestamp|           Density|     Ds_Ref|Detector_Number|Flow_In|Average_Speed|
+-------------------+------------------+-----------+---------------+-------+-------------+
|2016-06-01 03:03:00|10.588235294117647|E4N_30710_2|              2|     18|          102|
|2016-06-01 03:04:00|              15.0|E4N_30710_2|              2|     24|           96|
+-------------------+------------------+-----------+---------------+-------+-------------+
only showing top 2 rows



In [None]:
import pandas as pd
df4_trafficData_E4N.write.save('data/df4_trafficData_E4N.parquet', format='parquet')
df = pd.read_parquet('data/df4_trafficData_E4N.parquet')

In [18]:
df.to_csv('df4_trafficData_E4N.csv')

## 4. Implement Neural Network

### Transform the data
Required to get the right feature to train the model.

In [19]:
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.regularizers import L1L2
from math import sqrt
import matplotlib
import numpy as np
from keras import regularizers

Using TensorFlow backend.


##### (a). Load the data

In [27]:
def time_stamp_parser(time_stamp):
    return datetime.strptime(time_stamp, '%Y-%m-%d %H:%M:%S')

traffic_series = read_csv('data/df4_trafficData_E4N.csv', header=0, parse_dates=[1],
                  squeeze=True, decimal=',', date_parser=time_stamp_parser)

traffic_series.head()
traffic_series.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15156387 entries, 0 to 15156386
Data columns (total 7 columns):
Unnamed: 0         int64
Timestamp          datetime64[ns]
Density            object
Ds_Ref             object
Detector_Number    int64
Flow_In            int64
Average_Speed      int64
dtypes: datetime64[ns](1), int64(4), object(2)
memory usage: 809.4+ MB


Convert Density column from object type to float type, for easy mathematical operations such as getting density difference.

In [36]:
traffic_series1 = traffic_series.Density.astype(float)
traffic_series1.head()

0    10.588235
1    15.000000
2     9.278351
3     4.571429
4     6.000000
Name: Density, dtype: float64

##### (b). Frame time series into as a supervised learning problem
Convert data to be stationary whereby output from previous time step will be used as input of current time step.

In [20]:
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    return df

##### (c). Calculate density difference
Using the density feature, calculate the density difference to create the sequence. 

In [29]:
def getDifference(dataset, interval=2):
    density_diff = list() 
    for i in range(interval, len(dataset)):
        density_value = dataset[i] - dataset[i - interval]
        density_diff.append(density_value)
    return Series(density_diff)

### Normalize the Data
Normalisation is performed to re-scale data into a distribution that states minimum and maximum values that need to be observed.
Rescaling technique used here is MinMaxScaler from sklearn library.

In [23]:
# scale data to range from -1 to 1
def scaleData(train, test):
    # fit scaler
    min_max_scaler = MinMaxScaler(feature_range=(-1, 1))
    min_max_scaler = min_max_scaler.fit(train)
    # transform train
    # train[train_indices]
    train = train.values.reshape(train.shape[0],train.shape[1])
    
    #train = train.reshape(train.shape[0], train.shape[1])
    train_min_max_scaled_data = min_max_scaler.transform(train)
    # transform test
    test = test.values.reshape(test.shape[0],test.shape[1])
    # test = test.reshape(test.shape[0], test.shape[1])
    test_min_max_scaled_data = min_max_scaler.transform(test)
    return min_max_scaler, train_min_max_scaled_data, test_min_max_scaled_data

### Create LSTM Network
Fit LSTM network into training data. The network takes in input data in 3D format, sample, time-steps and dimension(number of features), reshape method helps to shape the data.

Layer 1: 8 neurons, sample size same as batch size, time-step X.shape[1] - based on the number of  detectors, X.shape[2] is the density feature.

Layer 2: 4 neurons, the layer uses hard-sigmoid as an activation function as a way of mitigating vanishing/exploding gradient problem.

Elasticnet regulizer to prevent over-fitting of the model to training data.

The network uses Adam optimization algorithm and Mean Square-Error to measure error scores.

##### Challenge: 
The network is experiencing Vanishing and Exploding gradient problem. Several techniques have been tested to solve the problem.
1. Use of non-saturating activation function i.e hard-sigmoid.
2. Gradient clipping - adam = Adam(lr=0.01, clipvalue=0.5), clips gradient between minimum of - 0.5 and maximum of 0.5
3. Weight Initialization using Xavier Initialization i.e init='glorot_normal' in keras. A literature review revealed that batch Normalization is not good for LSTM Networks.

Due to vanishing gradient problem, score errors to be used to plot a graph and see perfomance of the model and distribution of error where anormaly is expected to 


In [31]:
from keras.optimizers import Adam

def fit_lstm(train, batch_size, epochs, neurons,elasticnet_regularizer):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    model = Sequential()
    # model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True,recurrent_regularizer=elasticnet_regularizer))
    # TODO X.shape[1]
    adam = Adam(lr=0.01, clipvalue=0.5)
    model.add(LSTM(8, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), init='glorot_normal', return_sequences=True,
                   kernel_regularizer=regularizers.l2(0.01), activity_regularizer=regularizers.l2(0.01)))
    model.add(LSTM(4, activation = 'hard_sigmoid', inner_activation = 'hard_sigmoid')) #return_sequences = True
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    print(model.summary())
    for i in range(epochs):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False)
        model.reset_states()
    return model

##### Perform Reverse 
Only important for plotting

In [32]:
# Reverse values
def reverse_density_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# Reverse scaling for predicted values
def reverse_scale(scaler, X, yhat):
    create_row = [x for x in X] + [yhat]
    array = numpy.array(create_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

### Network Training Loop

In [34]:

from sklearn.model_selection import train_test_split
# run recurrent experiment
def experiment(traffic_series, time_lag, recurrence, epochs, batch_size, neurons,elasticnet_regularizer):
    # Convert time series to stationary data
    
    raw_density_values = traffic_series.values
    differenced_density_values = getDifference(raw_density_values, 1)
    
    # Convert time series to supervised learning, prediction at previous timestep to be used current time step
    supervised_data = timeseries_to_supervised(differenced_density_values, time_lag)
   
    # split data into train and test-sets
    train, test = train_test_split(supervised_data, train_size=0.8)
    
    # transform the scale of the data
    scaler, train_scaled, test_scaled = scaleData(train, test)
    
    # run experiment
    error_scores = list()
    for r in range(recurrence):
        # fit the model
        train_trimmed = train_scaled[2:, :]
        model = fit_lstm(train_trimmed, batch_size, epochs, neurons,elasticnet_regularizer)
        
        # make prediction on test dataset
        test_reshaped = test_scaled[:,0:-1]
        test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
        output = model.predict(test_reshaped, batch_size=batch_size)
        predictions = list()
        # Reverse to original scale before calculating prediction error
        for i in range(len(output)):
            yhat = output[i,0]
            X_input = test_scaled[i, 0:-1]
            # Reverse scaling
            yhat = reverse_scale(scaler, X_input, yhat)
            # Reverse differencing density
            yhat = reverse_density_difference(raw_density_values, yhat, len(test_scaled)+1-i)
            # Save predictions
            predictions.append(yhat)
            # report performance
        rmse = sqrt(mean_squared_error(raw_density_values[:], predictions))
        print('%d) Test RMSE: %.3f' % (r+1, rmse))
        error_scores.append(rmse)
    return error_scores


### Configure the network and run experiments

In [None]:
import pandas as pd
from pandas import read_csv
#from pandas import datetime
from datetime import datetime
# configure the experiment
def run():
    # load dataset
    #traffic_series = traffic_series1
    #units=128
    # configure the experiment
    time_lag = 1
    runExperiments = 30  # TODO: Run Experiment according to number of sensors  
    epochs = 2 #1000
    batch_size = 6
    neurons = 50
    elasticnet_regularizer = L1L2(l1=0.01, l2=0.01)
    # run the experiment
    results = DataFrame()
    results['results'] = experiment(traffic_series1, time_lag, runExperiments, epochs, batch_size, neurons,elasticnet_regularizer)
    # summarize results
    print(results.describe())
    
run() 

  # This is added back by InteractiveShellApp.init_path()
  if sys.path[0] == '':


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (6, 1, 8)                 320       
_________________________________________________________________
lstm_2 (LSTM)                (6, 4)                    208       
_________________________________________________________________
dense_1 (Dense)              (6, 1)                    5         
Total params: 533
Trainable params: 533
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/1
  118050/12125106 [..............................] - ETA: 3:26:37 - loss: 1.5749e-04

 The above experiement run with different configurations 

In [None]:
##



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (6, 1, 8)                 320       
_________________________________________________________________
lstm_2 (LSTM)                (6, 4)                    208       
_________________________________________________________________
dense_1 (Dense)              (6, 1)                    5         
Total params: 533
Trainable params: 533
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/1

## Appendix

##### Output of density difference



                  0          0
13916465  -2.078049  -0.506567
9753022    3.170732  -8.000000
5887555    4.198758  -3.630847
10175463  -5.316456   2.488341
7027018    2.938053  -1.721519
5193790   -2.666667   4.802260
12840706  -3.166667   7.500000
6885462   -1.522118  -0.083857
5038497   -0.097371  -4.518014
8144673    1.212121  -0.599251
12712583  -4.567757   5.339034
7397697   -0.666667   1.926740
11175574   0.865385   0.157343
7516057   -5.597561  -4.147059
9317225    6.610797  -4.542569
15101068  -4.550000   2.234848
6278627   -0.025497   0.030888
13660861 -15.353535  22.091503
11600672   0.000000   0.825200
4385440    5.521669  -6.103896
2404482   -2.684932   3.052632
5933896   -6.095662   5.806452
2462454   -2.103387   5.426945
15091987   9.647059 -20.804954
2749573    0.260445  -1.344450
121336    -0.385488   0.539503
11087262  -6.251217  -5.226516
5982344   -0.453782  -1.100213
1082379    0.090909   6.309091
3432260    0.838235   5.969178
...             ...        ...
4619816 

AttributeError: 'DataFrame' object has no attribute 'reshape'