# Using the k-Nearest Neighbours Algorithm to Predict Bike Availability

The machine learning algorithm we decided to use in our project to predict bike availability was a tweaked version of the k-nearest neighbours algorithm. We have chosen this algorithm both for it's simplicity and, we would argue, it's suitability with some alteration for the task of predicting bike availability based on weather data. Our primary source of information for learning about this algorithm was the chapter 8.2 in Tom Mitchel's book "Machine Learning".

In the k-nearest neighbours algorithm, the instance we are trying to predict what a particular instance is based on the k (some number) "nearest neighbours" from out data set, that is to say the instances from our dataset that our closest to the instance being predicted in it's known qualities. This algorithm is usually used to allocate instances into discrete categories, in which cases the instance will be given the same classification as the majority of it's k-nearest neighbours. As the predictions we are assigning are continuous, we will use the average of the k-nearest neighbours, i.e. the average available bikes of the nearest neighbours selected based on similarity of weather conditions. To maximise accuracy, we will be implementing a version of k nearest neighbours outlined by Mitchell on pp.233-4, distance weighted nearest neighbours where k is all training examples. In terms of our project, this means our prediction will be based on the weighted average of all of the previous bike availability data for a station, with examples having a higher weight in the average the closer they are in weather to weather for our prediction.  

In [1]:
import mysql.connector
import math
import datetime
import requests
from datetime import datetime, timedelta

### Prediction Model

In [37]:
def nearest_neighbours(station_no):
    data = requests.get("http://api.openweathermap.org/data/2.5/forecast?q=Dublin&appid=6fb76ecce41a85161d4c6ea5e2758f2b").json()
    mydb = mysql.connector.connect(
        host="newdublinbikesinstance.cevl8km57x9m.us-east-1.rds.amazonaws.com",
        user="root",
        passwd="secretpass"
    )

    cursor = mydb.cursor(buffered=True)

    #forecasts = {}
    #dates = []
    #dates_and_times = {}
    
    counter = 0
    
    predictions = []

    for forecast in data['list']: #retrieving the data and time information from the api call to display
        dt = forecast['dt']
        dt = int(dt)
        dt = datetime.utcfromtimestamp(dt).strftime('%Y-%m-%d %H:%M:%S')
        date, time = dt.split(" ")
        #print(date)
        #print(time)
        #forecasts[dt] = forecast
        #if date not in dates:
        #    dates.append(date)
        #    dates_and_times[date] = []
        #dates_and_times[date].append(time)

        #print()

        #for key in dates_and_times:
        #    print(key)

        #print() 

        #date = input("Please enter which of the above dates you would like to predict bike availability on: ")

        #print() 

        #for time in dates_and_times[date]:
            #print(time)

        #print() 

        #time = input("Please enter one of the above times to predict bike availability on the selected date for: ")

        #for time in dates_and_times[date]:

        forecast_key =  date + " " + time

        prediction_date = datetime.strptime(date, '%Y-%m-%d')

        prediction_day = prediction_date.weekday()

        dt_time = datetime.strptime(time, '%H:%M:%S')

        prediction_time = timedelta(hours=dt_time.hour, minutes=dt_time.minute, seconds=dt_time.second)

        prediction_temp = forecast['main']['temp']

        prediction_weather = forecast['weather'][0]['main']
        
        weight_total = 0 
        weighted_predictors_total = 0 

        #now that we have a prediction for weather for that particular time and date, we can compare it to our previous records

        if (prediction_day < 5):
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % (station_no, prediction_weather))
            #for the station number in question we are retrieving all of our records where the general weather description is the same (raining, clouds, etc.) and the time of day is roughly the same, that is to say less than 11 minutes off. We will not be lookng at records where the station was not open.

            rows = cursor.fetchall()
            if rows == []: #if a currently unknown weather is encountered (one there is not previous data on), we will do the same as above but for all weather description types, i.e. if snow is encountered for the first time we will take records with rainy, clear, clouds, mist, drizzle and any other weather types
                cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' weekday(weather.date) < 5" % station_no)
                rows = cursor.fetchall()

        #we will now get the weighted average of all the records retrieved above 
             
            for row in rows:
                row_temp = row[6]
                row_bikes = row[2]
                row_time = row[8]
                temp_weight = 1/(math.sqrt((row_temp - prediction_temp)**2) + 0.5) #the difference between the temperature in a record and the predicted temperature
                time_weight = 1/(math.sqrt((round((row_time - prediction_time).total_seconds()/60)**2)) + 0.5) #the difference between the time in a record and the time of the prediction
                weight = temp_weight + time_weight #weight is determined based on the difference in both time and temperature
                #adding the weight and weighted predictions from this record to the totals
                weight_total += weight   
                weighted_predictor = row_bikes * weight
                weighted_predictors_total += weighted_predictor


        #if weekend, use only records from that day:
        else:
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) = %s" % (station_no, prediction_weather, prediction_day))
            rows = cursor.fetchall()
            if rows == []: #if a new weather is encountered, use records for all weather
                cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weekday(weather.date) = %s" % (station_no, prediction_day))
                rows = cursor.fetchall()
            
            for row in rows:
                row_temp = row[6]
                row_bikes = row[2]
                row_time = row[8]
                temp_weight = 1/(math.sqrt((row_temp - prediction_temp)**2) + 0.5)
                time_weight = 1/(math.sqrt((round((row_time - prediction_time).total_seconds()/60)**2)) + 0.5)
                weight = temp_weight + time_weight 
                weight_total += weight
                weighted_predictor = row_bikes * weight
                weighted_predictors_total += weighted_predictor

        #finally, our prediction is the waited average available bikes from the records we retrieved
        #print(weighted_predictors_total)
        #print(weight_total)
        #print()
        prediction = round(weighted_predictors_total/weight_total)
        #print(prediction)
        predictions.append(prediction)
        counter += 1
        if counter == 9:
            break

    return predictions
        

In [38]:
prediction = nearest_neighbours(6)
print(prediction)

[5, 5, 5, 5, 5, 5, 5, 5, 3]


In [39]:
print("Available bikes prediction for station 2 is:", prediction)

Available bikes prediction for station 2 is: [5, 5, 5, 5, 5, 5, 5, 5, 3]


### Model for making a prediction of the current weather to evaluate accuracy

In [4]:
def check_accuracy():
    import mysql.connector
    import math
    import datetime

    mydb = mysql.connector.connect(
        host="newdublinbikesinstance.cevl8km57x9m.us-east-1.rds.amazonaws.com",
        user="root",
        passwd="secretpass"
    )

    cursor = mydb.cursor(buffered=True)

    cursor.execute("SELECT DISTINCT station_no FROM innodb.station_fixed ORDER BY station_no")

    station_no_rows = cursor.fetchall()

    station_nos = []



    for i in station_no_rows:
        station_nos.append(i[0])

    cursor.execute("SELECT * FROM innodb.weather ORDER BY date DESC, time DESC LIMIT 1")

    weather_row = cursor.fetchone()

    current_temp = weather_row[0]

    current_weather = weather_row[1]

    current_time = weather_row[2]

    current_day = weather_row[3].weekday()

    total_prediction_error = 0

    no_stations = len(station_nos)

    #if the prediction is one a weekday, we only use weekday records:
    if (current_day < 5):
        for station_no in station_nos:
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % (station_no, current_weather))
            rows = cursor.fetchall()
            #print(rows)
            if rows == []:  
                cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % station_no)
                rows = cursor.fetchall()
            #finding currently available bikes at station for comparison
            cursor.execute("SELECT * FROM innodb.station_var ORDER BY last_update_date DESC, lat_update_time DESC LIMIT 1")
            current_station_data = cursor.fetchone()
            current_bikes_available = current_station_data[2]
            weight_total = 0 
            weighted_predictors_total = 0  
            for row in rows:
                row_temp = row[6]
                row_bikes = row[2]
                row_time = row[8]
                temp_weight = 1/(math.sqrt((row_temp - current_temp)**2) + 0.5)
                time_weight = 1/(math.sqrt((round((row_time - current_time).total_seconds()/60)**2)) + 0.5)
                weight = temp_weight + time_weight 
                weight_total += weight
                weighted_predictor = row_bikes * weight
                weighted_predictors_total += weighted_predictor
            prediction = round(weighted_predictors_total/weight_total)
            print()
            print("---------------------------------------------------------------------------------------------------------")
            print("Available bikes at station", station_no)
            print("Prediction:", prediction)
            print("Reality:", current_bikes_available)
            total_prediction_error += abs(prediction - current_bikes_available)


    #if weekend, use only records from that day, i.e. Sunday for Sundays, Saturday for Saturdays:
    else:
        for station_no in station_nos:
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) = %s" % (station_no, current_weather, current_day))
            rows = cursor.fetchall()
            if rows == []: #if a new weather is encountered, use records for all weather
                cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weekday(weather.date) = %s" % (station_no, current_day))
                rows = cursor.fetchall()
            #finding currently available bikes at station for comparison
            cursor.execute("SELECT * FROM innodb.station_var ORDER BY last_update_date DESC, lat_update_time DESC LIMIT 1")
            current_station_data = cursor.fetchone()
            current_bikes_available = current_station_data[2]
            weight_total = 0 
            weighted_predictors_total = 0  
            for row in rows:
                row_temp = row[6]
                row_bikes = row[2]
                row_time = row[8]
                temp_weight = 1/(math.sqrt((row_temp - current_temp)**2) + 0.5)
                time_weight = 1/(math.sqrt((round((row_time - current_time).total_seconds()/60)**2)) + 0.5)
                weight = temp_weight + time_weight 
                weight_total += weight
                weighted_predictor = row_bikes * weight
                weighted_predictors_total += weighted_predictor
            prediction = round(weighted_predictors_total/weight_total)
            print()
            print("---------------------------------------------------------------------------------------------------------")
            print("Available bikes at station", station_no)
            print("Prediction:", prediction)
            print("Reality:", current_bikes_available)
            total_prediction_error += abs(prediction - current_bikes_available)

    print()
    print("---------------------------------------------------------------------------------------------------------")        
    print("Average difference between prediction and reality:", round(total_prediction_error/no_stations))
    return round(total_prediction_error/no_stations)

In [15]:
test1 = check_accuracy() #13:26 06/04/2019


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 3
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 4
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 17
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 16
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 12
Reality: 17

------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 6
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 1
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 11
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 7
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 5
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 4
Reality: 17

--------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 2
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 7
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 13
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 10
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 12
Reality: 17

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 11
Reality: 17

-----------

In [16]:
test2 = check_accuracy() #15:15 06/04/2019


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 4
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 5
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 14
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 16
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 10
Reality: 6

------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 6
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 1
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 10
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 6
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 5
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 3
Reality: 6

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 2
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 8
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 12
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 8
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 12
Reality: 6

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 9
Reality: 6

-------------------

In [4]:
test3 = check_accuracy() #15:55 06/04/2019
tests.append(test3)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 4
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 5
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 15
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 16
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 10
Reality: 13

------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 6
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 1
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 10
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 6
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 5
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 3
Reality: 13

--------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 1
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 8
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 11
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 9
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 12
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 9
Reality: 13

-------------

In [44]:
test4 = check_accuracy() #19:35 06/04/2019
tests.append(test4)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 2
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 6
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 14
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 16
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 14
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 3
Reality: 9

------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 8
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 1
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 3
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 3
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 10
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 7
Reality: 9

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 6
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 9
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 8
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 16
Reality: 9

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 4
Reality: 9

--------------------

In [60]:
test5 = check_accuracy() #20:05 06/04/2019
tests.append(test5)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 7
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 13
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 17
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 14
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 3
Reality: 1

------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 8
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 2
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 2
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 10
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 7
Reality: 1

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 6
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 8
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 4
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 17
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 3
Reality: 1

--------------------

In [61]:
test5 = check_accuracy() #20:15 06/04/2019
tests.append(test5)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 9
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 14
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 16
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 14
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 3
Reality: 24

------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 8
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 2
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 3
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 2
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 11
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 7
Reality: 24

--------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 5
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 6
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 4
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 2
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 17
Reality: 24

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 2
Reality: 24

--------------

In [62]:
test6 = check_accuracy() #20:55 06/04/2019
tests.append(test6)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 4
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 11
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 13
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 18
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 12
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 8

-----------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 7
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 2
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 3
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 1
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 9
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 7
Reality: 8

---------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 5
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 7
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 4
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 2
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 17
Reality: 8

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 4
Reality: 8

--------------------

In [63]:
test7 = check_accuracy() #21:35
tests.append(test7)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 7
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 12
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 13
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 18
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 12
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 0

-----------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 8
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 2
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 3
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 1
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 7
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 7
Reality: 0

---------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 5
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 8
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 6
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 19
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 5
Reality: 0

--------------------

In [4]:
test8 = check_accuracy() # 17:35 08/04/2019
tests.append(test8)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 1
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 1
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 8
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 1

---------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 23
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 13
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 29
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 19
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 32
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 1
Reality: 1

----------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 0
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 1
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 3
Reality: 1

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 2
Reality: 1

---------------------

In [5]:
test9 = check_accuracy() # 17:55 08/04/2019


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 1
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 1
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 8
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 5

---------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 22
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 13
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 28
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 18
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 31
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 1
Reality: 5

----------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 0
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 2
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 3
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 3
Reality: 5

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 3
Reality: 5

---------------------

In [6]:
test10 = check_accuracy() # 18:05 08/04/2019
tests.append(test10)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 1
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 1
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 8
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 13

---------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 21
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 13
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 27
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 18
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 30
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 1
Reality: 13

----------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 0
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 2
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 3
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 3
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 3
Reality: 13

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 3
Reality: 13

---------------

In [7]:
test11 = check_accuracy() # 18:45 08/04/2019
tests.append(test11)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 2
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 1
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 10
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 27

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 19
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 11
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 26
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 16
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 27
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 1
Reality: 27

----------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 0
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 2
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 4
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 5
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 4
Reality: 27

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 4
Reality: 27

---------------

In [9]:
test12 = check_accuracy() # 19:55 08/04/2019
tests.append(test12)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 3
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 2
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 11
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 2
Reality: 11

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 17
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 9
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 23
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 14
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 24
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 1
Reality: 11

-----------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 1
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 3
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 6
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 5
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 4
Reality: 11

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 6
Reality: 11

---------------

In [5]:
test13 = check_accuracy() # 09/04/19 08:15
tests.append(test13)


---------------------------------------------------------------------------------------------------------
Available bikes at station 2
Prediction: 10
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 3
Prediction: 4
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 4
Prediction: 2
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 5
Prediction: 16
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 6
Prediction: 1
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 7
Prediction: 1
Reality: 0

-------------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 54
Prediction: 6
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 55
Prediction: 4
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 56
Prediction: 9
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 57
Prediction: 5
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 58
Prediction: 11
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 59
Prediction: 2
Reality: 0

--------------------------


---------------------------------------------------------------------------------------------------------
Available bikes at station 105
Prediction: 11
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 106
Prediction: 22
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 107
Prediction: 20
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 108
Prediction: 16
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 109
Prediction: 10
Reality: 0

---------------------------------------------------------------------------------------------------------
Available bikes at station 110
Prediction: 13
Reality: 0

---------------

In [15]:
total = 8 + 7 + 6 + 6 + 10 + 14 + 7 + 11 + 9 + 7 + 10 + 17 + 7 + 11

In [16]:
mean_inaccuracy = total/13 
print(mean_inaccuracy)

10.0


In [17]:
total_as_list = [8,7,6,6,10,14,7,11,9,7,10,17,7,11]
median_inaccuracy = total_as_list[round(len(total_as_list)/2) - 1]
print(median_inaccuracy)

7


Over 13 tests finding the average inaccuracy of each bike station, we found that the mean inaccuracy of our model is 10 bikes. However, this number is brought up by a few high outliers, with the number usually actually lying between 6-8, with our median inaccuracy being 7 bikes.

### Measuring and Improving Performance

Our function already seems quite fast, but we will now see exactly how fast it is and see if it's performance can be optimised.

In [52]:
import time as stopwatch

In [53]:
def timer(station_no):
    data = requests.get("http://api.openweathermap.org/data/2.5/forecast?q=Dublin&appid=6fb76ecce41a85161d4c6ea5e2758f2b").json()

    mydb = mysql.connector.connect(
        host="newdublinbikesinstance.cevl8km57x9m.us-east-1.rds.amazonaws.com",
        user="root",
        passwd="secretpass"
    )

    cursor = mydb.cursor(buffered=True)

    forecasts = {}
    dates = []
    dates_and_times = {}

    for forecast in data['list']:
        dt = forecast['dt']
        dt = int(dt)
        dt = datetime.utcfromtimestamp(dt).strftime('%Y-%m-%d %H:%M:%S')
        date, time = dt.split(" ")
        forecasts[dt] = forecast
        if date not in dates:
            dates.append(date)
            dates_and_times[date] = []
        dates_and_times[date].append(time)

    print()

    for key in dates_and_times:
        print(key)

    print() 

    date = input("Please enter which of the above dates you would like to predict bike availability on: ")

    print() 

    for time in dates_and_times[date]:
        print(time)

    print() 

    time = input("Please enter one of the above times to predict bike availability on the selected date for: ")

    start = stopwatch.time() #starting the 'stopwatch' after last user input has been received
    
    forecast_key =  date + " " + time

    prediction_date = datetime.strptime(date, '%Y-%m-%d')

    prediction_day = prediction_date.weekday()

    dt_time = datetime.strptime(time, '%H:%M:%S')

    prediction_time = timedelta(hours=dt_time.hour, minutes=dt_time.minute, seconds=dt_time.second)

    prediction_temp = forecasts[forecast_key]['main']['temp']

    prediction_weather = forecasts[forecast_key]['weather'][0]['main']

    if (prediction_day < 5):
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % (station_no, prediction_weather))
        rows = cursor.fetchall()
        if rows == []: #if a new weather is encountered, use records for all weather
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' weekday(weather.date) < 5" % station_no)
            rows = cursor.fetchall()
        weight_total = 0 
        weighted_predictors_total = 0  
        for row in rows:
            row_temp = row[6]
            row_bikes = row[2]
            row_time = row[8]
            temp_weight = 1/(math.sqrt((row_temp - prediction_temp)**2) + 0.5)
            time_weight = 1/(math.sqrt((round((row_time - prediction_time).total_seconds()/60)**2)) + 0.5)
            weight = temp_weight + time_weight 
            weight_total += weight
            weighted_predictor = row_bikes * weight
            weighted_predictors_total += weighted_predictor
       

    #if weekend, use only records from that day:
    else:
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) = %s" % (station_no, prediction_weather, prediction_day))
        rows = cursor.fetchall()
        if rows == []: #if a new weather is encountered, use records for all weather
            #print("New weather encountered:", prediction_weather)
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weekday(weather.date) = %s" % (station_no, prediction_day))
            rows = cursor.fetchall()
        weight_total = 0 
        weighted_predictors_total = 0  
        for row in rows:
            row_temp = row[6]
            row_bikes = row[2]
            row_time = row[8]
            temp_weight = 1/(math.sqrt((row_temp - prediction_temp)**2) + 0.5)
            time_weight = 1/(math.sqrt((round((row_time - prediction_time).total_seconds()/60)**2)) + 0.5)
            weight = temp_weight + time_weight 
            weight_total += weight
            weighted_predictor = row_bikes * weight
            weighted_predictors_total += weighted_predictor
            
    prediction = round(weighted_predictors_total/weight_total)
    
    stop = stopwatch.time() 
    
    return stop - start #returning how long after the last user input the function took to run
        

In [54]:
timer(2)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-07

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 15:00:00


0.5116457939147949

In [55]:
timer(105)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-10

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 03:00:00


0.4765136241912842

In [21]:
timer(51)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-09

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 12:00:00
Available bikes at station 51 prediction: 6

Done.


1.0977745056152344

In [22]:
timer(68)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-08

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 06:00:00
Available bikes at station 68 prediction: 20

Done.


1.2782726287841797

In [23]:
timer(97)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-07

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 18:00:00

Done.


0.6453049182891846

A prediction usually seems to be received in around a second or less, going slightly over in one case. This seems like acceptable performance, but perhaps we could improve it through the use of generators:

In [41]:
#source for code in this cell: http://code.activestate.com/recipes/137270-use-generators-for-fetching-large-db-record-sets/

from __future__ import generators

def ResultIter(cursor, arraysize=1000):
    'An iterator that uses fetchmany to keep memory usage down'
    while True:
        results = cursor.fetchmany(arraysize)
        if not results:
            break
        for result in results:
            yield result

In [57]:
def function_with_generators(station_no):
    data = requests.get("http://api.openweathermap.org/data/2.5/forecast?q=Dublin&appid=6fb76ecce41a85161d4c6ea5e2758f2b").json()

    mydb = mysql.connector.connect(
        host="newdublinbikesinstance.cevl8km57x9m.us-east-1.rds.amazonaws.com",
        user="root",
        passwd="secretpass"
    )

    cursor = mydb.cursor(buffered=True)

    forecasts = {}
    dates = []
    dates_and_times = {}

    for forecast in data['list']:
        dt = forecast['dt']
        dt = int(dt)
        dt = datetime.utcfromtimestamp(dt).strftime('%Y-%m-%d %H:%M:%S')
        date, time = dt.split(" ")
        forecasts[dt] = forecast
        if date not in dates:
            dates.append(date)
            dates_and_times[date] = []
        dates_and_times[date].append(time)

    print()

    for key in dates_and_times:
        print(key)

    print() 

    date = input("Please enter which of the above dates you would like to predict bike availability on: ")

    print() 

    for time in dates_and_times[date]:
        print(time)

    print() 

    time = input("Please enter one of the above times to predict bike availability on the selected date for: ")

    start = stopwatch.time() #starting the 'stopwatch' after last user input has been received
    
    forecast_key =  date + " " + time

    prediction_date = datetime.strptime(date, '%Y-%m-%d')

    prediction_day = prediction_date.weekday()

    dt_time = datetime.strptime(time, '%H:%M:%S')

    prediction_time = timedelta(hours=dt_time.hour, minutes=dt_time.minute, seconds=dt_time.second)

    prediction_temp = forecasts[forecast_key]['main']['temp']

    prediction_weather = forecasts[forecast_key]['weather'][0]['main']
    
    class global_vars():
        weighted_predictors_total = 0
        weight_total = 0
    
    def add_predictor(row):
        #row_temp = row[6] #skip the assignments for the sake of speed
        #row_bikes = row[2] 
        #row_time = row[8]
        temp_weight = 1/(math.sqrt((row[6] - prediction_temp)**2) + 0.5)
        time_weight = 1/(math.sqrt((round((row[8] - prediction_time).total_seconds()/60)**2)) + 0.5)
        weight = temp_weight + time_weight 
        global_vars.weight_total += weight
        weighted_predictor = row[2] * weight
        global_vars.weighted_predictors_total += weighted_predictor
               

    if (prediction_day < 5):
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % (station_no, prediction_weather))
        #rows = cursor.fetchall()
        #if rows == []: #if a new weather is encountered, use records for all weather
            #cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' weekday(weather.date) < 5" % station_no)
            #rows = cursor.fetchall()
        for result in ResultIter(cursor):
            add_predictor(result)
       
        
       

    #if weekend, use only records from that day:
    else:
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) = %s" % (station_no, prediction_weather, prediction_day))
        #rows = cursor.fetchall()
        #if rows == []: #if a new weather is encountered, use records for all weather
            #cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weekday(weather.date) = %s" % (station_no, prediction_day))
            #rows = cursor.fetchall()
        for result in ResultIter(cursor):
            add_predictor(result)

    print(global_vars.weighted_predictors_total)
    
    print(global_vars.weight_total)
    
    prediction = round(global_vars.weighted_predictors_total/global_vars.weight_total)
    
    print("Available bikes prediction for station 2 is:", prediction)
    
    stop = stopwatch.time() 
    
    return stop - start #returning how long after the last user input the function took to run
        

In [58]:
function_with_generators(2)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-07

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 15:00:00
48.54034592764413
18.423190002522933
Available bikes prediction for station 2 is: 3


0.5123507976531982

In [40]:
function_with_generators(105)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-10

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 03:00:00
2.5975309057212597
0.6493827264303149
Available bikes prediction for station 2 is: 4


0.48226332664489746

It seems that adding generators has actually made the function slightly slower, unfortunately. 

We will now test the our predictive function without generators but with the other optimisations included in function_with_generators (less assignments).

In [49]:
def optimised_function(station_no):
    data = requests.get("http://api.openweathermap.org/data/2.5/forecast?q=Dublin&appid=6fb76ecce41a85161d4c6ea5e2758f2b").json()

    mydb = mysql.connector.connect(
        host="newdublinbikesinstance.cevl8km57x9m.us-east-1.rds.amazonaws.com",
        user="root",
        passwd="secretpass"
    )

    cursor = mydb.cursor(buffered=True)

    forecasts = {}
    dates = []
    dates_and_times = {}

    for forecast in data['list']:
        dt = forecast['dt']
        dt = int(dt)
        dt = datetime.utcfromtimestamp(dt).strftime('%Y-%m-%d %H:%M:%S')
        date, time = dt.split(" ")
        forecasts[dt] = forecast
        if date not in dates:
            dates.append(date)
            dates_and_times[date] = []
        dates_and_times[date].append(time)

    print()

    for key in dates_and_times:
        print(key)

    print() 

    date = input("Please enter which of the above dates you would like to predict bike availability on: ")

    print() 

    for time in dates_and_times[date]:
        print(time)

    print() 

    time = input("Please enter one of the above times to predict bike availability on the selected date for: ")

    start = stopwatch.time() #starting the 'stopwatch' after last user input has been received
    
    forecast_key =  date + " " + time

    prediction_date = datetime.strptime(date, '%Y-%m-%d')

    prediction_day = prediction_date.weekday()

    dt_time = datetime.strptime(time, '%H:%M:%S')

    prediction_time = timedelta(hours=dt_time.hour, minutes=dt_time.minute, seconds=dt_time.second)

    prediction_temp = forecasts[forecast_key]['main']['temp']

    prediction_weather = forecasts[forecast_key]['weather'][0]['main']

    if (prediction_day < 5):
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) < 5" % (station_no, prediction_weather))
        rows = cursor.fetchall()
        if rows == []: #if a new weather is encountered, use records for all weather
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' weekday(weather.date) < 5" % station_no)
            rows = cursor.fetchall()
        weight_total = 0 
        weighted_predictors_total = 0  
        for row in rows:
            #row_temp = row[6] #skip the assignments for the sake of speed
            #row_bikes = row[2] 
            #row_time = row[8]
            temp_weight = 1/(math.sqrt((row[6] - prediction_temp)**2) + 0.5)
            time_weight = 1/(math.sqrt((round((row[8] - prediction_time).total_seconds()/60)**2)) + 0.5)
            weight = temp_weight + time_weight 
            weight_total += weight
            weighted_predictor = row[2] * weight
            weighted_predictors_total += weighted_predictor
       

    #if weekend, use only records from that day:
    else:
        cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weather.description = '%s' AND weekday(weather.date) = %s" % (station_no, prediction_weather, prediction_day))
        rows = cursor.fetchall()
        if rows == []: #if a new weather is encountered, use records for all weather
            #print("New weather encountered:", prediction_weather)
            cursor.execute("SELECT DISTINCT * FROM innodb.station_var JOIN innodb.weather on (station_var.last_update_date = weather.date AND minute(timediff(station_var.lat_update_time, weather.time)) < 11 AND hour(timediff(station_var.lat_update_time, weather.time)) = 0) WHERE station_var.station_no = %s AND station_var.status = 'OPEN' AND weekday(weather.date) = %s" % (station_no, prediction_day))
            rows = cursor.fetchall()
        weight_total = 0 
        weighted_predictors_total = 0  
        for row in rows:
            #row_temp = row[6] #skip the assignments for the sake of speed
            #row_bikes = row[2] 
            #row_time = row[8]
            temp_weight = 1/(math.sqrt((row[6] - prediction_temp)**2) + 0.5)
            time_weight = 1/(math.sqrt((round((row[8] - prediction_time).total_seconds()/60)**2)) + 0.5)
            weight = temp_weight + time_weight 
            weight_total += weight
            weighted_predictor = row[2] * weight
            weighted_predictors_total += weighted_predictor
            
    prediction = round(weighted_predictors_total/weight_total)
    
    stop = stopwatch.time() 
    
    return stop - start #returning how long after the last user input the function took to run
        

In [50]:
optimised_function(2)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-07

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 15:00:00


0.5102629661560059

In [59]:
optimised_function(105)


2019-04-06
2019-04-07
2019-04-08
2019-04-09
2019-04-10
2019-04-11

Please enter which of the above dates you would like to predict bike availability on: 2019-04-10

00:00:00
03:00:00
06:00:00
09:00:00
12:00:00
15:00:00
18:00:00
21:00:00

Please enter one of the above times to predict bike availability on the selected date for: 03:00:00


0.487457275390625

Reducing the number of assignments doesn't seem to have made much of a difference to the runtime, but we will use this for the final version as it should technically be slightly faster due to containing less operations. The function was sufficiently fast from the beginning anyway, so while further optimisation would have been ideal, it is not really needed. 