### Use case: Generate playlist

1. Use a base model trained with listening history and gelocation
2. Transferred learning model to fine tune with recently played and location tracking??
3. anything else??

#### Pre-Action in Android App:
- Android Controller will call Python API /predictTrackAttributes with userId, location and timeStamp 

#### In Python API:
- Get listening history by userId from Backend API
- Get geolocation history by userId from Backend API
- Filter for unique songs from listening history
- Get track attributes for unique songs from Spotify API
- Combine into single dataframe
- Perform cleaning and feature engineering
- Train model on user data
- Predict track attributes based on geolocation(X)
- Return list of track attributes to AndroidApp

[Alternate Flow: If user has no listening history]
- If user has no listening history, use base model to predict track attributes
- Return list of track attributes to AndroidApp

### Post-Action in Android App:
- Get recommended songs from Spotify API based on track attributes
- Create a playlist object
- Store the playlist object into DB with userId and timestamp
- Show playlist to user

[Alternate flow: If list of track attributes is null or playlist creation throws error]
- Show error message


In [69]:
# import relevant libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import plotly as plt
import seaborn as sns
import re
import json 
import csv
import os
import datetime 
import time

In [21]:
pd.options.display.max_columns = 500
pd.options.display.max_rows = 500

In [157]:
'''
Convert Google location history 'timestamp' 
to date, time and location (lat, long) attributes
and write to csv file
'''
def process_location_history(file_path, data_writer):
    with open(file_path, encoding='utf-8-sig') as file:
        data = json.load(file)
        try:
            for obj in data['locations']:
                if 'timestamp' in obj:
                    timestamp = obj['timestamp']
                    # try:
                    #     date_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%SZ')
                    # except ValueError: 
                    #     date_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')
                    # date_str = date_obj.strftime("%Y-%m-%d")
                    try:
                        datetime_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%SZ')
                    except ValueError:
                        datetime_obj = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')
                    epoch_time = int(datetime_obj.timestamp())
                    lat = obj['latitudeE7']
                    lon = obj['longitudeE7']
                    data_writer.writerow([epoch_time, lat, lon])
                    # print('\r', timestamp + " latitude: " + str(lat) + " longitude: " + str(lon), end=" ")
        except KeyError:
                print('\r',end=" ")

In [158]:
# Read location history into csv and dataframe
JSON_data = os.path.join(os.getcwd() + "Training_Data/Records.json")
CSV_file = os.path.join(os.getcwd() + "Training_Data/ferozLocationHistory.csv")

with open(CSV_file, 'w', newline='') as outfile:
    data_writer = csv.writer(outfile)
    data_writer.writerow(["epoch_time", "lat", "lon"])
    process_location_history(JSON_data, data_writer)

location_history_df = pd.read_csv(CSV_file)
location_history_df.head()

Unnamed: 0,epoch_time,lat,lon
0,1430508952,13129029,1038495443
1,1430508961,13129002,1038495793
2,1430509027,13129346,1038496821
3,1430509088,13129156,1038495669
4,1430509149,13129102,1038495432


In [159]:
location_history_df.shape

(231930, 3)

In [160]:
location_history_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 231930 entries, 0 to 231929
Data columns (total 3 columns):
 #   Column      Non-Null Count   Dtype
---  ------      --------------   -----
 0   epoch_time  231930 non-null  int64
 1   lat         231930 non-null  int64
 2   lon         231930 non-null  int64
dtypes: int64(3)
memory usage: 5.3 MB


In [161]:
# Read listening history into csv and dataframe
root_dir = os.path.join(os.getcwd() +"Training_Data/MySpotifyExtendedData")

list_hist_files = []
for subdir, dirs, files in os.walk(root_dir):
    for file in files:
        if file.endswith(".json"):
            file_path = os.path.join(subdir, file)
            list_hist_files.append(file_path)


frames = [pd.read_json(f) for f in list_hist_files]
list_hist_df = pd.concat(frames, ignore_index=True, sort=True)
list_hist_df.head()

Unnamed: 0,conn_country,episode_name,episode_show_name,incognito_mode,ip_addr_decrypted,master_metadata_album_album_name,master_metadata_album_artist_name,master_metadata_track_name,ms_played,offline,offline_timestamp,platform,reason_end,reason_start,shuffle,skipped,spotify_episode_uri,spotify_track_uri,ts,user_agent_decrypted,username
0,SG,,,False,111.65.34.186,CLOUDS,NF,CLOUDS,900,False,1614831786681,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:5UMMPHPp6vRP6ghPpSUOzp,2021-03-04T04:23:08Z,unknown,_feroz_
1,SG,,,False,111.65.34.186,Parachute,Petit Biscuit,Parachute - Big Gigantic Remix,1840,False,1614831787587,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:1ZkK9XVWBM3OlbJelK0mK2,2021-03-04T04:23:10Z,unknown,_feroz_
2,SG,,,False,111.65.34.186,Vera Level Sago,A.R. Rahman,"Vera Level Sago (From ""Ayalaan"")",8040,False,1614831789444,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:0xcBgFgDYiV0oqJk5Bl7nf,2021-03-04T04:23:18Z,unknown,_feroz_
3,SG,,,False,111.65.34.186,Neptune Interlude,Dennis Kuo,Neptune Interlude,920,False,1614831797489,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:3tF0xLqHUZKZ9rDFBFHExH,2021-03-04T04:23:19Z,unknown,_feroz_
4,SG,,,False,111.65.34.186,Time is a River,Christoffer Franzen,Time Is A River,920,False,1614831798420,"iOS 13.0 (iPhone11,2)",fwdbtn,fwdbtn,False,,,spotify:track:7mcyhVQxDJmY6EPxsmA3pU,2021-03-04T04:23:20Z,unknown,_feroz_


In [162]:
list_hist_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106622 entries, 0 to 106621
Data columns (total 21 columns):
 #   Column                             Non-Null Count   Dtype 
---  ------                             --------------   ----- 
 0   conn_country                       106622 non-null  object
 1   episode_name                       881 non-null     object
 2   episode_show_name                  881 non-null     object
 3   incognito_mode                     106622 non-null  bool  
 4   ip_addr_decrypted                  106622 non-null  object
 5   master_metadata_album_album_name   105467 non-null  object
 6   master_metadata_album_artist_name  105467 non-null  object
 7   master_metadata_track_name         105467 non-null  object
 8   ms_played                          106622 non-null  int64 
 9   offline                            106622 non-null  bool  
 10  offline_timestamp                  106622 non-null  int64 
 11  platform                           106622 non-null  

In [165]:
list_hist_df.isnull().sum()

conn_country                              0
episode_name                         105741
episode_show_name                    105741
incognito_mode                            0
ip_addr_decrypted                         0
master_metadata_album_album_name       1155
master_metadata_album_artist_name      1155
master_metadata_track_name             1155
ms_played                                 0
offline                                   0
offline_timestamp                         0
platform                                  0
reason_end                                0
reason_start                              0
shuffle                                   0
skipped                               90068
spotify_episode_uri                  105741
spotify_track_uri                      1155
ts                                        0
user_agent_decrypted                     16
username                                  0
dtype: int64

In [186]:
# Remove columns that wont be obtainable through 'Get recently played tracks' API call
listening_history_df = list_hist_df.copy(deep=True)
column_to_remove = ['username', 'platform', 'ip_addr_decrypted', 'user_agent_decrypted', 'master_metadata_track_name',
                     'master_metadata_album_artist_name','master_metadata_album_album_name', 'episode_name', 
                     'episode_show_name', 'spotify_episode_uri', 'shuffle',  'offline', 'offline_timestamp',
                     'incognito_mode', 'skipped', 'conn_country', 'reason_start', 'reason_end']
listening_history_df.drop(labels=column_to_remove, axis=1, inplace=True)

# Convert timestamp to datetime obj and create columns matching location_history_df
listening_history_df['ts'] = pd.to_datetime(listening_history_df['ts'], format= '%Y-%m-%dT%H:%M:%SZ') 
listening_history_df['epoch_time'] = listening_history_df['ts'].astype('int64') // 10**9

# Drop null rows
listening_history_df = listening_history_df.dropna(axis=0)

# Drop timestamp
listening_history_df.drop(labels=['ts'], axis=1, inplace=True)

listening_history_df.head()

Unnamed: 0,ms_played,spotify_track_uri,epoch_time
0,900,spotify:track:5UMMPHPp6vRP6ghPpSUOzp,1614831788
1,1840,spotify:track:1ZkK9XVWBM3OlbJelK0mK2,1614831790
2,8040,spotify:track:0xcBgFgDYiV0oqJk5Bl7nf,1614831798
3,920,spotify:track:3tF0xLqHUZKZ9rDFBFHExH,1614831799
4,920,spotify:track:7mcyhVQxDJmY6EPxsmA3pU,1614831800


In [187]:
# convert epoch timestamp to datetime 
listening_history_df['epoch_time'] = pd.to_datetime(listening_history_df['epoch_time'], unit='s')
location_history_df['epoch_time'] = pd.to_datetime(location_history_df['epoch_time'], unit='s')

In [188]:
listening_history_df.head()

Unnamed: 0,ms_played,spotify_track_uri,epoch_time
0,900,spotify:track:5UMMPHPp6vRP6ghPpSUOzp,2021-03-04 04:23:08
1,1840,spotify:track:1ZkK9XVWBM3OlbJelK0mK2,2021-03-04 04:23:10
2,8040,spotify:track:0xcBgFgDYiV0oqJk5Bl7nf,2021-03-04 04:23:18
3,920,spotify:track:3tF0xLqHUZKZ9rDFBFHExH,2021-03-04 04:23:19
4,920,spotify:track:7mcyhVQxDJmY6EPxsmA3pU,2021-03-04 04:23:20


In [189]:
location_history_df.head()

Unnamed: 0,epoch_time,lat,lon
0,2015-05-01 19:35:52,13129029,1038495443
1,2015-05-01 19:36:01,13129002,1038495793
2,2015-05-01 19:37:07,13129346,1038496821
3,2015-05-01 19:38:08,13129156,1038495669
4,2015-05-01 19:39:09,13129102,1038495432


In [190]:
def convertEpochToSplitDateTime(df):
    df['year'] = pd.DatetimeIndex(df["epoch_time"]).year
    df['month'] = pd.DatetimeIndex(df["epoch_time"]).month
    df['day'] = pd.DatetimeIndex(df["epoch_time"]).day
    df['weekday'] = pd.DatetimeIndex(df["epoch_time"]).weekday
    df['time'] = pd.DatetimeIndex(df["epoch_time"]).time
    df['hours'] = pd.DatetimeIndex(df["epoch_time"]).hour
    df['day-name'] = df["epoch_time"].apply(lambda x: x.day_name())
    return df

In [191]:
listening_history_df = convertEpochToSplitDateTime(listening_history_df)

KeyError: 'epoch_time'

In [184]:
listening_history_df.tail()

Unnamed: 0,ms_played,spotify_track_uri,epoch_time,year,month,day,weekday,time,hours,day-name
106617,920,spotify:track:1SFA5zEVOsLhEg7ynbvQFT,2023-03-28 09:12:06,2023,3,28,1,09:12:06,9,Tuesday
106618,4100,spotify:track:3g0mEQx3NTanacLseoP0Gw,2023-03-28 09:12:10,2023,3,28,1,09:12:10,9,Tuesday
106619,224615,spotify:track:1SFA5zEVOsLhEg7ynbvQFT,2023-03-28 09:15:53,2023,3,28,1,09:15:53,9,Tuesday
106620,216668,spotify:track:3LxG9HkMMFP0MZuiw3O2rF,2023-03-28 09:19:31,2023,3,28,1,09:19:31,9,Tuesday
106621,90496,spotify:track:01Cbf3nijsIYjKcqdDGvqa,2023-03-28 09:21:03,2023,3,28,1,09:21:03,9,Tuesday


In [None]:
train_df = pd.merge(left=location_history_df, 
                    right=listening_history_df, 
                    on=['epoch_time'],
                    suffixes=("_x", "_y"),
                    sort=True,
                    how='inner')
train_df.head(10)