# SPBD Assignment 1

This notebook contains the code developed to implement the propoused solutions to this course assignment

Developed by:
    * Lucas Fischer, nº54659
    * Joana Martins, nº54707

## HDFS directories setup

The first step is to create directories in the HDFS cluster.
1. Create a directory for the group
2. Create a directory for the results

In [10]:
!hdfs dfs -mkdir /user/jovyan/SPBD-1819/Lucas_Joana
!hdfs dfs -mkdir /user/jovyan/SPBD-1819/Lucas_Joana/results
!hdfs dfs -ls /user/jovyan/SPBD-1819/Lucas_Joana/results

mkdir: `/user/jovyan/SPBD-1819/Lucas_Joana': File exists
mkdir: `/user/jovyan/SPBD-1819/Lucas_Joana/results': File exists
Found 3 items
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 19:31 /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-30-54
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 19:35 /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-35-10
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 19:53 /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-52-11
Found 6 items
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 18:51 /user/jovyan/SPBD-1819/44987
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 18:34 /user/jovyan/SPBD-1819/Lucas_Joana
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 14:36 /user/jovyan/SPBD-1819/example
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 15:03 /user/jovyan/SPBD-1819/results
-rw-r--r--   1 jovyan supergroup      12322 2018-11-14 20:03 /user/jovyan/SPBD-1819/taxi_zone_lookup.csv
-rw-r--r-

In [15]:
!rm -rf spark_rdd_results && mkdir spark_rdd_results

# Spark RDD solution
The first implemented solution was to use spark and the RDDs (spark's core abstraction object).
This solution creates an inverted index where the key is a weekday, hour, pick-up zone ID and drop off zone ID and its value is a tuple containing the average of the trip durations, and the average of the trip amount


In [16]:
import pyspark
from pyspark.sql import SparkSession
import traceback
import datetime
from datetime import datetime as dt
import calendar
import time
import numpy as np

spark = SparkSession.builder.master('local[*]').appName('taxify').getOrCreate()
sc = spark.sparkContext

#PU/DO zone ids range from 1 to 265, see taxi_zone_lookup.csv
#date(position 1, maybe split datetime into weekday and time), PU_ID (position 7), DO_ID (position 8), totalammount (position 16)

#Main implementation

def create_key_value(line):
    """
        Function that creates the key value structure for every line of interest

        Params:
            A non-filtered raw line of the CSV file
    """
    splitted = line.split(",")
    pick_up_datetime = splitted[1]

    week_day = (calendar.day_name[dt.strptime(splitted[1], '%Y-%m-%d %H:%M:%S').weekday()]).lower()
    time = pick_up_datetime[11:13]

    pick_up_id = splitted[7]
    dropoff_up_id = splitted[8]

    key = (week_day, time, pick_up_id, dropoff_up_id)

    duration = get_duration(pick_up_datetime,splitted[2])
    total_amount = float(splitted[16])
    
    value = ([duration], [total_amount])

    return (key, value)



def get_duration(pick_up_datetime, drop_off_datetime):
    """
        Get duration of trip in minutes from pick up and drop off times
    """

    d1 = time.mktime(dt.strptime(drop_off_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    d2 = time.mktime(dt.strptime(pick_up_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    return int((d1 - d2) / 60)


def create_inverted_index(filename = 'yellow_tripdata_2018-01_sample.csv'):
    """
        Function that creates the inverted index. This function holds the main implementation of spark code to create the inverted index

        Params:
            user_weekday - An integer ranging from 1 to 7 representing the day of the week chosen by the user
            user_puid - An integer ranging from 1 to 265 representing the pick-up zone ID chosen by the user
            user_doid - An integer ranging from 1 to 265 representing the drop off zone ID chosen by the user
            user_hour - An integer representing the hour chosen by the user
            user_hour - An integer representing the minutes chosen by the user
            filename - Name of the file to read the information from
    """

    try :
        time_before = dt.now()
        lines = sc.textFile(filename) #read csv file (change this to the full dataset instead of just the sample)
        first_line = lines.first()

        #Filtering out the first line, empty lines
        non_empty_lines = lines.filter(lambda line: len(line) > 0 and line != first_line)
        
        # ((weekday, time, PU_ID, DO_ID), (duration, Total_Ammount))
        organized_lines = non_empty_lines.map(lambda line: create_key_value(line))
        
        #Reduce everything by key returning a 3 column tuple
        #(vendor_ID, list of durations, list of amounts)
        grouped = organized_lines.reduceByKey(lambda accum, elem: (accum[0] + elem[0], accum[1] + elem[1]))

        #Obtain the average of the 
        grouped_with_averages = grouped.mapValues(lambda tup: (np.mean(tup[0]), np.mean(tup[1])))
        
        grouped_with_averages.saveAsTextFile("spark_rdd_results/inverted_index.txt")

        time_after = dt.now()
        seconds = (time_after - time_before).total_seconds()
        print("Execution time {} seconds".format(seconds))

        sc.stop()
    except:
        traceback.print_exc()
        sc.stop()


create_inverted_index()

Execution time 35.862637 seconds


# Spark RDD client

This is an example application of how we could use our inverted index previously created

In [11]:
import pyspark
from pyspark.sql import SparkSession
import traceback
import calendar

spark = SparkSession.builder.master('local[*]').appName('taxify').getOrCreate()
sc = spark.sparkContext

def get_user_options():
    """
        Function that gets all the users input for creating the inverted index.
        This function gets the desired weekday, time, pickup and dropoff zone
    """

    pickup_correct = False
    dropoff_correct = False
    weekday_correct = False
    time_correct = False
    pickup_id = ""
    dropoff_id = ""
    weekday = ""
    hour = ""                                            

    #Continue asking the user until he/she gives us a weekday
    while(not weekday_correct):
        weekday = input("\nPlease insert you weekday (1- Monday, 2- Tuesday, ..., 7- Sunday): ")
        try:
            if(int(weekday) >= 1 and int(weekday) <= 7):
                weekday_correct = True
            else:
                print("\nPlease insert a number between 1 - 7\n")
        except:
            #User didn't sent us a number
            print("\nPlease insert a number between 1 - 7\n")

    #Continue asking the user until he/she gives us an hour
    while(not time_correct):
        time_input = input("\nPlease insert the desired hour: ")
        try:
            if(int(time_input) >= 0 and int(time_input) <=23):
                time_correct = True
                hour = time_input
            else:
                print("\Hour should be an integer between 0 and 23 \n")
        except:
            #User didn't sent us a number
            print("\Hour should be an integer between 0 and 23 \n")    



    #Continue asking the user until he/she gives us a number between 1 and 265
    while(not pickup_correct):
        pickup_id = input("\nPlease insert you Pick-Up location ID (1 - 265): ")
        try:
            if(int(pickup_id) >= 1 and int(pickup_id) <= 265):
                pickup_correct = True
            else:
                print("\nPlease insert a number between 1 - 265\n")
        except:
            #User didn't sent us a number
            print("\nPlease insert a number between 1 - 265\n")

        

    #Continue asking the user until he/she gives us a number between 1 and 265
    while(not dropoff_correct):
        dropoff_id = input("\nPlease insert you Drop-Off location ID (1 - 265): ")
        try:
            if(int(dropoff_id) >= 1 and int(dropoff_id) <= 265):
                dropoff_correct = True
            else:
                print("\nPlease insert a number between 1 - 265\n")
        except:
            #User didn't sent us a number
            print("\nPlease insert a number between 1 - 265\n")


    return(weekday, pickup_id, dropoff_id, hour)



def transform_line(line):
    """
        Function that transforms every String line in the inverted index into a searchable array
    """
    stripped_line = line.replace("(", "").replace(")", "").replace(" ", "").replace("\'", "")
    return stripped_line.split(",")


    
def search_index(user_weekday = 1, user_puid = 41, user_doid = 24, user_hour = 0, filename = "spark_rdd_results/inverted_index.txt"):
    try:
        lines = sc.textFile(filename) #read the inverted index created previously
        
        #Transform each line into an array
        transform_lines = lines.map(lambda line: transform_line(line)) 
        
        #Filter out the lines that don't match the user's desired pick up zone and drop off zone
        lines_with_puid_doid = transform_lines.filter(lambda arr: arr[2] == str(user_puid) and arr[3] == str(user_doid))
        
        #Filter out the lines that don't match the user's desired weekday
        lines_with_weekday = lines_with_puid_doid.filter(lambda arr: arr[0] == (calendar.day_name[user_weekday - 1]).lower())
        
        #Filter out the lines that don't match user's desired hour
        lines_with_hour = lines_with_weekday.filter(lambda arr: int(arr[1]) == user_hour)
        
        for result in lines_with_hour.collect():
            print(result)

    except:
        traceback.print_exc()
        sc.stop()

user_weekday, user_puid, user_doid, user_hour = get_user_options()

search_index(int(user_weekday), user_puid, user_doid, int(user_hour))


Please insert you weekday (1- Monday, 2- Tuesday, ..., 7- Sunday): 1

Please insert the desired hour: 2

Please insert you Pick-Up location ID (1 - 265): 246

Please insert you Drop-Off location ID (1 - 265): 239
['monday', '02', '246', '239', '8.375', '12.9275']


# Spark SQL solution

The second solution we implemented was using spark's SQL abstraction in order to use a much familiar DSL when creating the inverted index

In [60]:
from pyspark.sql import *
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark import SparkContext
import traceback
import datetime
from datetime import datetime as dt
import calendar
import time

spark = SparkSession.builder.master('local[*]').appName('taxify').getOrCreate()
sc = spark.sparkContext

#PU/DO zone ids range from 1 to 265, see taxi_zone_lookup.csv
#date(position 1, maybe split datetime into weekday and time), PU_ID (position 7), DO_ID (position 8), totalammount (position 16)

#Main implementation

def get_duration(pick_up_datetime, drop_off_datetime):
    """
        Get duration of trip in minutes from pick up and drop off times
    """

    d1 = time.mktime(dt.strptime(drop_off_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    d2 = time.mktime(dt.strptime(pick_up_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    return int((d1 - d2) / 60)



def convert_to_weekday(date):
    """
        Function that converts a date to weekday
    """
    date_obj = dt.strptime(date, '%Y-%m-%d %H:%M:%S')
    return (calendar.day_name[date_obj.weekday()]).lower()



def convert_to_hour(date):
    """
        Function that gets the hour from a date
    """
    return date[11:13]



def create_inverted_index(user_weekday = 1, user_puid = 41, user_doid = 24, user_hour = 0, user_minutes = 21, filename = 'yellow_tripdata_2018-01_sample.csv'):
    try :
        time_before = dt.now()

        lines = sc.textFile(filename) #read csv file (change this to the full dataset instead of just the sample) (this is local to my machine)
        first_line = lines.first()

        #USER DEFINED FUNCTION CREATION

        # convert_to_weekday_udf = udf(lambda pickup_date: convert_to_weekday(pickup_date), StringType())
        spark.udf.register("convert_to_weekday_udf", lambda pickup_date: convert_to_weekday(pickup_date), StringType())
        spark.udf.register("convert_to_hour_udf", lambda pickup_date: pickup_date[11:13], StringType())
        spark.udf.register("convert_to_duration", lambda pickup_date, dropoff_date: get_duration(pickup_date, dropoff_date), IntegerType())

        #Filtering out the first line, empty lines
        non_empty_lines = lines.filter(lambda line: len(line) > 0 and line != first_line)

        # Create a Row object with pickup_datetime, dropoff_datetime, pickup_id, dropoff_id and amount
        fields = non_empty_lines.map(lambda line : Row(pickup_datetime = line.split(',')[1], dropoff_datetime = line.split(',')[2], pickup_id = line.split(',')[7], dropoff_id = line.split(',')[8], amount = line.split(',')[16]));
        
        # Transform fields to dataframe
        fields_df = spark.createDataFrame(fields)

        #Create a temporary table called fields_table
        fields_df.createOrReplaceTempView("fields_table")

        inverted_index = spark.sql(
            """
            SELECT 
                convert_to_weekday_udf(pickup_datetime) AS weekday,
                convert_to_hour_udf(pickup_datetime) AS hour,
                pickup_id,
                dropoff_id,
                AVG(convert_to_duration(pickup_datetime, dropoff_datetime)) AS average_duration,
                AVG(amount) AS average_amount 
            FROM 
                fields_table 
            GROUP BY 
                weekday,
                hour,
                pickup_id,
                dropoff_id
            """
        )

        inverted_index.collect()

        time_after = dt.now()
        seconds = (time_after - time_before).total_seconds()
        print("Execution time {} seconds".format(seconds))

        sc.stop()
    except:
        traceback.print_exc()
        sc.stop()



# user_weekday, user_puid, user_doid, user_hour, user_minutes = get_user_options()

# create_inverted_index(int(user_weekday), user_puid, user_doid, int(user_hour), int(user_minutes))

create_inverted_index()

CONTAR AGORA
Execution time 28.313849


# Hadoop (Map-Reduce)
The third implemented solution was to achieve the same goal, this time using Hadoop.


In [32]:
%%file mapper.py
#!/usr/bin/env python

import sys
import traceback
import datetime
from datetime import datetime as dt
import calendar
import time
import numpy as np
    
def get_duration(pick_up_datetime, drop_off_datetime):
    """
        Get duration of trip in minutes from pick up and drop off times
    """

    d1 = time.mktime(dt.strptime(drop_off_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    d2 = time.mktime(dt.strptime(pick_up_datetime, '%Y-%m-%d %H:%M:%S').timetuple())
    return int((d1 - d2) / 60)


def create_key_value(splitted_line):
    """
        Function that creates the key value structure for every line of interest

        Params:
            A non-filtered raw line of the CSV file
    """
    pick_up_datetime = splitted_line[1]

    week_day = (calendar.day_name[dt.strptime(splitted_line[1], '%Y-%m-%d %H:%M:%S').weekday()]).lower()
    hour =  pick_up_datetime[11:13]

    pick_up_id = splitted_line[7]
    dropoff_up_id = splitted_line[8]

    key = (week_day, hour, pick_up_id, dropoff_up_id)

    duration = get_duration(pick_up_datetime, splitted_line[2])
    total_amount = float(splitted_line[16])
    
    value = (duration, total_amount)

    return (key, value)


#Simulating user options
user_weekday, user_puid, user_doid, user_hour, user_minutes = (1, 246, 239, 2, 8)

#Iterating every line from the input file
is_first_line = True

for line in sys.stdin:
    
    if(not is_first_line): #Filtering out the first line
        if(len(line) > 0): #Filtering out non empty lines

            splitted_line = line.split(",")
            if(len(splitted_line) == 17):
                
                #((Weekday, pick-up ID, drop-off ID), ([duration], [total_amount]))
                key,value = create_key_value(splitted_line)

                print("{}\t{}\t{}".format(key, value[0], value[1]))

    else:
        is_first_line = False

Overwriting mapper.py


In [50]:
%%file reducer.py
#!/usr/bin/env python

import sys
import numpy as np
import datetime
from datetime import datetime as dt

inverted_index = {}

for line in sys.stdin:
    key, duration, amount = line.split("\t")
    
    if key in inverted_index:
        index_entry = inverted_index[key]
        index_entry[0].append(int(duration))
        index_entry[1].append(float(amount))
    else:
        inverted_index[key] = ([int(duration)], [float(amount)])
    
    
for key, value in inverted_index.items():
    print(key, (np.mean(value[0]), np.mean(value[1])))

Overwriting reducer.py


In [51]:
!chmod a+x mapper.py && chmod a+x reducer.py

In [54]:
!rm -rf results

In [56]:
%time !hadoop jar /opt/hadoop-3.1.1/share/hadoop/tools/lib/hadoop-*streaming*.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input yellow_tripdata_2018-01_sample.csv -output results

2018-11-17 11:32:58,820 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-11-17 11:32:59,246 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-11-17 11:32:59,247 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2018-11-17 11:32:59,344 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-11-17 11:33:00,323 INFO mapred.FileInputFormat: Total input files to process : 1
2018-11-17 11:33:00,420 INFO mapreduce.JobSubmitter: number of splits:3
2018-11-17 11:33:01,271 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1528920513_0001
2018-11-17 11:33:01,278 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-11-17 11:33:02,879 INFO mapred.LocalDistributedCacheManager: Localized file:/home/jovyan/work/mapper.py as file:/tmp/hadoop-jovyan/mapred/local/1542454381925/mapper.py
2018-11-17 11:33:02,961 INFO mapred.LocalDistributedCacheManager: Localized file:/home/jovyan/wo

2018-11-17 11:34:00,890 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-11-17 11:34:00,890 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-11-17 11:34:00,890 INFO mapred.MapTask: soft limit at 83886080
2018-11-17 11:34:00,890 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-11-17 11:34:00,890 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2018-11-17 11:34:00,890 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2018-11-17 11:34:00,892 INFO streaming.PipeMapRed: PipeMapRed exec [/home/jovyan/work/./mapper.py]
2018-11-17 11:34:00,898 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2018-11-17 11:34:00,898 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2018-11-17 11:34:00,898 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2018-11-17 11:34:00,900 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2018-11-17 11:34:0

2018-11-17 11:34:36,056 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1260650496, maxSingleShuffleLimit=315162624, mergeThreshold=832029376, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-11-17 11:34:36,057 INFO reduce.EventFetcher: attempt_local1528920513_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2018-11-17 11:34:36,080 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1528920513_0001_m_000000_0 decomp: 15431730 len: 15431734 to MEMORY
2018-11-17 11:34:36,092 INFO reduce.InMemoryMapOutput: Read 15431730 bytes from map-output for attempt_local1528920513_0001_m_000000_0
2018-11-17 11:34:36,093 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 15431730, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->15431730
2018-11-17 11:34:36,095 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1528920513_0001_m_000002_0 decomp: 9691234 le

## Check results

In [22]:
!hdfs dfs -ls /user/jovyan/SPBD-1819/Lucas_Joana/results

Found 2 items
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 19:31 /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-30-54
drwxr-xr-x   - jovyan supergroup          0 2018-11-14 19:35 /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-35-10


In [23]:
!hdfs dfs -cat /user/jovyan/SPBD-1819/Lucas_Joana/results/18-11-14-19-30-54/*


(('monday', '246', '238'), ([11, 11], [15.96, 16.56]))
(('tuesday', '246', '238'), ([9], [13.3]))


In [57]:
!cat results/part-*

('friday', '00', '100', '1') (30.0, 75.299999999999997)	
('friday', '00', '100', '100') (2.0, 4.7999999999999998)	
('friday', '00', '100', '107') (6.5, 9.5800000000000001)	
('friday', '00', '100', '112') (20.0, 23.16)	
('friday', '00', '100', '113') (9.0, 14.039999999999999)	
('friday', '00', '100', '116') (15.0, 30.940000000000001)	
('friday', '00', '100', '137') (10.0, 9.8000000000000007)	
('friday', '00', '100', '14') (37.0, 40.799999999999997)	
('friday', '00', '100', '142') (8.0, 10.56)	
('friday', '00', '100', '143') (11.0, 11.800000000000001)	
('friday', '00', '100', '145') (18.0, 23.100000000000001)	
('friday', '00', '100', '148') (13.0, 14.800000000000001)	
('friday', '00', '100', '151') (13.5, 16.329999999999998)	
('friday', '00', '100', '158') (11.0, 13.56)	
('friday', '00', '100', '159') (22.0, 32.68)	
('friday', '00', '100', '161') (5.5, 7.1749999999999998)	
('friday', '00', '100', '162') (6.0, 7.5499999999999998)	
('friday', '00', '100', '163') (6.1666666

('friday', '07', '137', '233') (3.8571428571428572, 6.7628571428571433)	
('friday', '07', '137', '234') (8.4285714285714288, 9.6571428571428584)	
('friday', '07', '137', '236') (17.0, 14.533333333333333)	
('friday', '07', '137', '237') (14.333333333333334, 13.993333333333334)	
('friday', '07', '137', '246') (12.0, 13.23)	
('friday', '07', '137', '249') (10.0, 11.16)	
('friday', '07', '137', '262') (12.5, 14.155000000000001)	
('friday', '07', '137', '263') (15.0, 16.559999999999999)	
('friday', '07', '137', '34') (19.0, 27.879999999999999)	
('friday', '07', '137', '4') (5.0, 9.8000000000000007)	
('friday', '07', '137', '45') (13.0, 17.460000000000001)	
('friday', '07', '137', '48') (22.0, 18.300000000000001)	
('friday', '07', '137', '68') (10.0, 9.9299999999999997)	
('friday', '07', '137', '74') (16.0, 16.800000000000001)	
('friday', '07', '137', '75') (16.0, 21.055)	
('friday', '07', '137', '79') (5.5, 7.8487500000000008)	
('friday', '07', '137', '87') (12.0, 17.3999999

('friday', '10', '113', '231') (10.571428571428571, 11.082857142857142)	
('friday', '10', '113', '232') (9.0, 10.56)	
('friday', '10', '113', '233') (21.0, 18.359999999999999)	
('friday', '10', '113', '234') (8.5714285714285712, 9.4900000000000002)	
('friday', '10', '113', '237') (32.0, 24.359999999999999)	
('friday', '10', '113', '239') (32.0, 29.760000000000002)	
('friday', '10', '113', '246') (14.0, 12.75)	
('friday', '10', '113', '249') (6.4615384615384617, 8.0538461538461537)	
('friday', '10', '113', '255') (19.0, 19.559999999999999)	
('friday', '10', '113', '261') (13.5, 12.23)	
('friday', '10', '113', '262') (32.0, 29.75)	
('friday', '10', '113', '263') (37.0, 25.800000000000001)	
('friday', '10', '113', '33') (22.0, 17.300000000000001)	
('friday', '10', '113', '4') (7.0, 7.5500000000000007)	
('friday', '10', '113', '40') (26.0, 20.300000000000001)	
('friday', '10', '113', '45') (14.0, 13.82)	
('friday', '10', '113', '48') (16.0, 14.300000000000001)	
('friday', 

('friday', '13', '170', '43') (18.0, 15.365555555555554)	
('friday', '13', '170', '45') (22.0, 18.359999999999999)	
('friday', '13', '170', '48') (17.5, 13.43)	
('friday', '13', '170', '50') (25.0, 18.300000000000001)	
('friday', '13', '170', '68') (16.384615384615383, 14.475384615384616)	
('friday', '13', '170', '7') (26.0, 22.300000000000001)	
('friday', '13', '170', '75') (25.5, 20.18)	
('friday', '13', '170', '79') (12.199999999999999, 12.434999999999999)	
('friday', '13', '170', '87') (16.5, 18.550000000000001)	
('friday', '13', '170', '88') (15.333333333333334, 22.166666666666668)	
('friday', '13', '170', '90') (11.9, 11.01)	
('friday', '13', '170', '97') (27.0, 32.770000000000003)	
('friday', '13', '170', '98') (49.0, 46.299999999999997)	
('friday', '13', '177', '49') (20.0, 14.800000000000001)	
('friday', '13', '179', '137') (23.5, 22.399999999999999)	
('friday', '13', '179', '141') (20.0, 16.300000000000001)	
('friday', '13', '179', '142') (27.0, 26.76000000000

('monday', '00', '161', '167') (21.0, 25.800000000000001)	
('monday', '00', '161', '168') (20.0, 22.300000000000001)	
('monday', '00', '161', '17') (42.0, 37.299999999999997)	
('monday', '00', '161', '170') (13.452380952380953, 11.440000000000001)	
('monday', '00', '161', '179') (24.0, 20.800000000000001)	
('monday', '00', '161', '186') (33.916666666666664, 21.413333333333338)	
('monday', '00', '161', '190') (31.0, 48.450000000000003)	
('monday', '00', '161', '192') (25.0, 32.799999999999997)	
('monday', '00', '161', '193') (32.5, 25.049999999999997)	
('monday', '00', '161', '202') (22.0, 19.800000000000001)	
('monday', '00', '161', '208') (25.0, 32.799999999999997)	
('monday', '00', '161', '209') (22.0, 21.800000000000001)	
('monday', '00', '161', '211') (30.5, 23.77)	
('monday', '00', '161', '212') (22.0, 27.300000000000001)	
('monday', '00', '161', '223') (17.5, 23.175000000000001)	
('monday', '00', '161', '224') (31.0, 25.147500000000001)	
('monday', '00', '161', '2

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

