## Music mapping flow 1
1. get user spotify ID
    - with user spotify ID, retrieve user spotify content list;
        - tracklist
        - audio qualities for respective song
        - lyrics for respective song
    - Backend done. Frontend needed for external use
    - [Next feature](https://towardsdatascience.com/get-your-spotify-streaming-history-with-python-d5a208bbcbd3): analysis of user streaming history and compare with playlist patterns
2. run existing Big5 text analysis to prepare collected data for review;
    - genres over time
    - MeyersBriggs v1 ft. Tufts buckets
3. return two interactive visualizations;
    - user music tastes over time
    - user music moods over time

### Step 1: Get user info

In [8]:
# testing on another user...

# variable setup
from spotify import *
from text_miner import *
songs = []
username = 12125327174 # --> JowMao ()
helper = spotifyApi()
txtMiner = textMiner()
sp = helper.sp
audio_df = pd.DataFrame()

# retrieve user playlists
playlists = helper.get_user_playlists(username=username, sp=sp)
for playlist in tqdm(playlists):
    
    # retrieve playlist songs
    tracklist = helper.get_playlist_content(username=username, playlist=playlist, sp=sp, csv=False)
    
    # retrieve playlist audio features
    tmp_df = helper.get_playlist_audio_features(username=username, playlist=playlist, sp=sp, csv=False)
    audio_df = pd.concat([audio_df, tmp_df], axis=0)
    
    # retrieve song lyrics (searching genius api)
    for track in tracklist:
        track['genius_url'] = None
        track_name = track["track"]["name"]
        track_artist = track["track"]["artists"][0]["name"]
        genius_song_info = txtMiner.search_genius_song(track_name=track_name, track_artist=track_artist, debug=False)
        if genius_song_info:
            track["track"]["genius_info"] = genius_song_info
        else:
            track["track"]["genius_info"] = {
                "result": {
                    "url": "https://genius.com/" + track_name.replace(" ", "-") + "-" + track_artist.replace(" ", "-") + "-" + "lyrics"
                }
            }
        track["track"]["lyrics"] = txtMiner.scrape_genius_lyrics(artistname=track_artist, songname=track_name, url=track["track"]["genius_info"]["result"]["url"])
    
    # save collected data for tracklist
    songs.extend(tracklist)
    
print(
    f"\nExtracted {len(songs)} songs from {len(playlists)} playlists from your Spotify Account.\nScraped lyrics for {len([s for s in songs if s['track']['lyrics'] != ''])} of the {len(songs)} songs from genius.com."
)

# savings objects here just in case...
txtMiner.save_df(filename=f"User_{username}_audioFeautures_", df=audio_df)
txtMiner.save_df(filename=f"User_{username}_songsNlyrics_", df=pd.DataFrame([songs]))

# success! 
# result of running this cell: 
# { 
#  'songs': list of songs in your playlists. Available lyrics from genius.com included, 
#  'audio_df': pandas dataframe containing the spotify-provided audio features present in each song in songs
# }

# NOTE: Connecting this to a user-interface will take a little tuning...could be hosted on AWS....on local server...

Name: b'new muscles unlocked ', Number of songs: 5, Playlist ID: 2OuWrX2IsFgZnLw0XEzPeF 
Name: b'la di da - acoustic', Number of songs: 13, Playlist ID: 64sJirLaDYFs8bE9IM8nIy 
Name: b'mad shower ', Number of songs: 26, Playlist ID: 6uImMNgmnO5Ls6wRZlrPt7 
Name: b'late shower ', Number of songs: 13, Playlist ID: 1mZNXPGJHFZujIMt6HSb8g 
Name: b'stoner works out ', Number of songs: 11, Playlist ID: 0v69rj4zRWesNXlxwBxiGQ 
Name: b'Your Heart', Number of songs: 27, Playlist ID: 5Gz7Kizr995O9XX4QLEEu8 
Name: b'god help us all', Number of songs: 32, Playlist ID: 4ioyY08fTYEbyCM21677Fl 
Name: b'Miss You', Number of songs: 1, Playlist ID: 3WVyQI7fXOr39PGn6WVGks 
Name: b"I Wasn't Made to Fall in Love", Number of songs: 30, Playlist ID: 4aHtqEEuoaEMz6q087BrD1 
Name: b'HIMBO', Number of songs: 24, Playlist ID: 4UO54EWjw4MPFmK6EfzjkI 
Name: b'Long Day', Number of songs: 9, Playlist ID: 0UXfA16WQYqKykudY7xc5S 
Name: b'adolescence 2', Number of songs: 6, Playlist ID: 4v0Zs01XXPkjisIvDNcmRC 
Name: b'

100%|█████████████████████████████████████████████████████████████████████████| 50/50 [22:17<00:00, 26.75s/it]



Extracted 985 songs from 50 playlists from your Spotify Account.
Scraped lyrics for 848 of the 985 songs from genius.com.


### Step 2: Data preparation

##### Step 2a: Audio preparation

In [6]:
# get latest set of audioFeatures (moods over time)
# data_path=txtMiner.data_path
# allfiles = [f for f in listdir(data_path) if isfile(join(data_path, f))]
# audio_dfs = [filename for filename in allfiles if "audioFeatures" in filename]
# audio_dfs.sort()
# audio_df = pd.read_csv(data_path + audio_dfs[-1])
audio_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1103 entries, 0 to 56
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   energy            1103 non-null   float64
 1   liveness          1103 non-null   float64
 2   tempo             1103 non-null   float64
 3   speechiness       1103 non-null   float64
 4   acousticness      1103 non-null   float64
 5   instrumentalness  1103 non-null   float64
 6   time_signature    1103 non-null   object 
 7   danceability      1103 non-null   float64
 8   key               1103 non-null   object 
 9   duration_ms       1103 non-null   object 
 10  loudness          1103 non-null   float64
 11  valence           1103 non-null   float64
 12  mode              1103 non-null   object 
 13  type              1103 non-null   object 
 14  uri               1103 non-null   object 
 15  playlist_id       1103 non-null   object 
dtypes: float64(9), object(7)
memory usage: 146.5

In [7]:
# TODO: Is cleaning necessary for these new datasets?

NOTES: 
    
    This is all bad. The output from step 1 should be a single denormalized dataset. Any alternative will result in disaster.
    

##### Step 2b: Song / Lyrics preparation (genres over time)