### Use case: Generate playlist

1. Use a base model trained with listening history and gelocation
2. Transferred learning model to fine tune with recently played and location tracking??
3. anything else??

#### Pre-Action in Android App:
- Android Controller will call Python API /predictTrackAttributes with userId, location and timeStamp 

#### In Python API:
- Get listening history by userId from Backend API
- Get geolocation history by userId from Backend API
- Filter for unique songs from listening history
- Get track attributes for unique songs from Spotify API
- Combine into single dataframe
- Perform cleaning and feature engineering
- Train model on user data
- Predict track attributes based on geolocation(X)
- Return list of track attributes to AndroidApp

[Alternate Flow: If user has no listening history]
- If user has no listening history, use base model to predict track attributes
- Return list of track attributes to AndroidApp

### Post-Action in Android App:
- Get recommended songs from Spotify API based on track attributes
- Create a playlist object
- Store the playlist object into DB with userId and timestamp
- Show playlist to user

[Alternate flow: If list of track attributes is null or playlist creation throws error]
- Show error message


In [204]:
# import relevant libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import plotly as plt
import seaborn as sns
import re
import json 
import csv
import os
import datetime 
import time

In [205]:
pd.options.display.max_columns = 500
pd.options.display.max_rows = 500

In [206]:
'''
Convert Google location history 'timestamp' 
to date, time and location (lat, long) attributes
and write to csv file
'''
def process_location_history(file_path, data_writer):
    with open(file_path, encoding='utf-8-sig') as file:
        data = json.load(file)
        try:
            for obj in data['locations']:
                if 'timestamp' in obj:
                    timestamp = obj['timestamp']
                    # try:
                    #     date_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%SZ')
                    # except ValueError: 
                    #     date_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')
                    # date_str = date_obj.strftime("%Y-%m-%d")
                    try:
                        datetime_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%SZ')
                    except ValueError:
                        datetime_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')
                    epoch_time = int(datetime_obj.timestamp())
                    lat = obj['latitudeE7']
                    lon = obj['longitudeE7']
                    data_writer.writerow([epoch_time, lat, lon])
                    # print('\r', timestamp + " latitude: " + str(lat) + " longitude: " + str(lon), end=" ")
        except KeyError:
                print('\r',end=" ")

In [208]:
# Read location history into csv and dataframe
JSON_data = os.path.join(os.getcwd() + "/Training_Data/Records.json")
CSV_file = os.path.join(os.getcwd() + "/Training_Data/ferozLocationHistory.csv")

with open(CSV_file, 'w', newline='') as outfile:
    data_writer = csv.writer(outfile)
    data_writer.writerow(["epoch_time", "lat", "lon"])
    process_location_history(JSON_data, data_writer)

loc_history_df = pd.read_csv(CSV_file)
loc_history_df.head()

Unnamed: 0,epoch_time,lat,lon
0,1430508952,13129029,1038495443
1,1430508961,13129002,1038495793
2,1430509027,13129346,1038496821
3,1430509088,13129156,1038495669
4,1430509149,13129102,1038495432


In [211]:
# Read listening history into csv and dataframe
root_dir = os.path.join(os.getcwd() +"/Training_Data/MySpotifyExtendedData")

list_hist_files = []
for subdir, dirs, files in os.walk(root_dir):
    for file in files:
        if file.endswith(".json"):
            file_path = os.path.join(subdir, file)
            list_hist_files.append(file_path)


frames = [pd.read_json(f) for f in list_hist_files]
list_hist_df = pd.concat(frames, ignore_index=True, sort=True)
list_hist_df.head()

Unnamed: 0,conn_country,episode_name,episode_show_name,incognito_mode,ip_addr_decrypted,master_metadata_album_album_name,master_metadata_album_artist_name,master_metadata_track_name,ms_played,offline,offline_timestamp,platform,reason_end,reason_start,shuffle,skipped,spotify_episode_uri,spotify_track_uri,ts,user_agent_decrypted,username
0,SG,,,False,111.65.34.186,CLOUDS,NF,CLOUDS,900,False,1614831786681,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:5UMMPHPp6vRP6ghPpSUOzp,2021-03-04T04:23:08Z,unknown,_feroz_
1,SG,,,False,111.65.34.186,Parachute,Petit Biscuit,Parachute - Big Gigantic Remix,1840,False,1614831787587,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:1ZkK9XVWBM3OlbJelK0mK2,2021-03-04T04:23:10Z,unknown,_feroz_
2,SG,,,False,111.65.34.186,Vera Level Sago,A.R. Rahman,"Vera Level Sago (From ""Ayalaan"")",8040,False,1614831789444,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:0xcBgFgDYiV0oqJk5Bl7nf,2021-03-04T04:23:18Z,unknown,_feroz_
3,SG,,,False,111.65.34.186,Neptune Interlude,Dennis Kuo,Neptune Interlude,920,False,1614831797489,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:3tF0xLqHUZKZ9rDFBFHExH,2021-03-04T04:23:19Z,unknown,_feroz_
4,SG,,,False,111.65.34.186,Time is a River,Christoffer Franzen,Time Is A River,920,False,1614831798420,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:7mcyhVQxDJmY6EPxsmA3pU,2021-03-04T04:23:20Z,unknown,_feroz_


In [252]:
# Make a copy for manipulation
location_history_df = loc_history_df.copy(deep=True)

In [253]:
# Remove columns that wont be obtainable through 'Get recently played tracks' API call
listening_history_df = list_hist_df.copy(deep=True)
column_to_remove = ['username', 'platform', 'ip_addr_decrypted', 'user_agent_decrypted', 'master_metadata_track_name',
                     'master_metadata_album_artist_name','master_metadata_album_album_name', 'episode_name', 
                     'episode_show_name', 'spotify_episode_uri', 'shuffle',  'offline', 'offline_timestamp',
                     'incognito_mode', 'skipped', 'conn_country', 'reason_start', 'reason_end']
listening_history_df.drop(labels=column_to_remove, axis=1, inplace=True)

# Convert timestamp to datetime obj and create columns matching location_history_df
listening_history_df['ts'] = pd.to_datetime(listening_history_df['ts'], format= '%Y-%m-%dT%H:%M:%SZ') 
listening_history_df['epoch_time'] = listening_history_df['ts'].astype('int64') // 10**9

# Drop null rows
listening_history_df = listening_history_df.dropna(axis=0)

# Drop timestamp
listening_history_df.drop(labels=['ts'], axis=1, inplace=True)

listening_history_df.head()

Unnamed: 0,ms_played,spotify_track_uri,epoch_time
0,900,spotify:track:5UMMPHPp6vRP6ghPpSUOzp,1614831788
1,1840,spotify:track:1ZkK9XVWBM3OlbJelK0mK2,1614831790
2,8040,spotify:track:0xcBgFgDYiV0oqJk5Bl7nf,1614831798
3,920,spotify:track:3tF0xLqHUZKZ9rDFBFHExH,1614831799
4,920,spotify:track:7mcyhVQxDJmY6EPxsmA3pU,1614831800


In [254]:
# convert epoch timestamp to datetime 
listening_history_df['ts'] = pd.to_datetime(listening_history_df['epoch_time'], unit='s')
location_history_df['ts'] = pd.to_datetime(location_history_df['epoch_time'], unit='s')

In [255]:
'''
Convert from epoch time to UNIX timestamp
'''
def convertEpochToSplitDateTime(df):
    df['year'] = pd.DatetimeIndex(df["ts"]).year
    df['month'] = pd.DatetimeIndex(df["ts"]).month
    df['day'] = pd.DatetimeIndex(df["ts"]).day
    df['weekday'] = pd.DatetimeIndex(df["ts"]).weekday
    df['24hr_time'] = pd.DatetimeIndex(df["ts"]).strftime('%H%M').astype(int)
    df['hour'] = pd.DatetimeIndex(df["ts"]).hour
    df['min'] = pd.DatetimeIndex(df["ts"]).minute
    return df

In [256]:
listening_history_df = convertEpochToSplitDateTime(listening_history_df)
location_history_df = convertEpochToSplitDateTime(location_history_df)

In [257]:
listening_history_df = listening_history_df.sort_values(by='ts')

In [258]:
print(listening_history_df.shape)
print(location_history_df.shape)

(105467, 11)
(231930, 11)


In [259]:
listening_history_df.head()

Unnamed: 0,ms_played,spotify_track_uri,epoch_time,ts,year,month,day,weekday,24hr_time,hour,min
33191,181880,spotify:track:2zDt2TfQbxiSPjTVJTgbwz,1485061477,2017-01-22 05:04:37,2017,1,22,6,504,5,4
33192,261153,spotify:track:66qlqxhEMpSHOzjRK4il0b,1485061739,2017-01-22 05:08:59,2017,1,22,6,508,5,8
33193,7687,spotify:track:1CnPYaKxTVb4LWOtiGOm0m,1485061747,2017-01-22 05:09:07,2017,1,22,6,509,5,9
33194,1625,spotify:track:7BKLCZ1jbUBVqRi2FVlTVw,1485061749,2017-01-22 05:09:09,2017,1,22,6,509,5,9
33195,15836,spotify:track:0FE9t6xYkqWXU2ahLh6D8X,1485061765,2017-01-22 05:09:25,2017,1,22,6,509,5,9


In [260]:
location_history_df.head()

Unnamed: 0,epoch_time,lat,lon,ts,year,month,day,weekday,24hr_time,hour,min
0,1430508952,13129029,1038495443,2015-05-01 19:35:52,2015,5,1,4,1935,19,35
1,1430508961,13129002,1038495793,2015-05-01 19:36:01,2015,5,1,4,1936,19,36
2,1430509027,13129346,1038496821,2015-05-01 19:37:07,2015,5,1,4,1937,19,37
3,1430509088,13129156,1038495669,2015-05-01 19:38:08,2015,5,1,4,1938,19,38
4,1430509149,13129102,1038495432,2015-05-01 19:39:09,2015,5,1,4,1939,19,39


In [261]:
train_df = pd.merge(left=location_history_df, 
                    right=listening_history_df, 
                    on=['year','month','day','weekday','24hr_time','hour','min'],
                    suffixes=("_x", "_y"))

In [267]:
drop_columns = ['epoch_time_x', 'ts_x', 'epoch_time_y', 'ts_y', ]
train_df.drop(labels=drop_columns, axis=1, inplace=True)

In [269]:
train_df.shape

(2966, 11)

In [268]:
train_df.head()

Unnamed: 0,lat,lon,year,month,day,weekday,24hr_time,hour,min,ms_played,spotify_track_uri
0,13131749,1038494532,2017,11,7,1,532,5,32,71970,spotify:track:0rpzxOXN5bmiEZqFcENNsj
1,13020899,1038750091,2017,11,7,1,2344,23,44,1045,spotify:track:5sNESr6pQfIhL3krM8CtZn
2,13020899,1038750091,2017,11,7,1,2344,23,44,597,spotify:track:58HpsDKeYoLtNhXFQyQmz5
3,12992327,1038454262,2017,11,7,1,2357,23,57,287049,spotify:track:1f3yAtsJtY87CTmM8RLnxf
4,13129424,1038496407,2017,11,8,2,757,7,57,103360,spotify:track:1m69ELEgE6k5ZWsap40ozt
