# Explorando JSONs

1. Inferences.json - Inferencias sobre mi personsa
2. Playlist1.json - Lista de listas de reproducción 
3. SearchQueries.json - Busquedas
4. StreamingHistory*.json - Historial de streams
5. YourLibrary.json - Los tracks guardados en mi biblioteca

In [1]:
from rich import print
import json
from glob import glob

import pandas as pd

## Inferences

It's a simple long list.

In [2]:
with open("MyData/Inferences.json", "r") as file:
    inferences_json = json.load(file)

In [3]:
inferences_df = pd.DataFrame(inferences_json)
inferences_df.to_csv("csvs/inferences.csv", encoding = 'utf-8', index=False)
inferences_df

Unnamed: 0,inferences
0,1P_Custom_Discovery_Streamers
1,1P_Custom_High_Intent_Streamers
2,1P_Custom_Social_Gamers
3,2P_AT&T_DirectTV_HHEC_0P1C_HAS_DTV > Y_28Feb20...
4,2P_AT&T_HHEC_0P1B_HAS_TV > Y_10Apr2020_US [Do ...
...,...
1633,podcast-audience-segmentation-rules-engagement...
1634,podcast-audience-segmentation-rules-engagement...
1635,podcast-audience-segmentation-rules-engagement...
1636,podcast-audience-segmentation-rules-format-len...


In [4]:
with open("MyData/Playlist1.json", "r") as file:
    playlists_json = json.load(file)

In [5]:
playlists_list = []
tracks_in_playlists = []
for playlist in playlists_json['playlists']:
    # Playlist
    _playlist = {}
    _playlist['name'] = playlist['name']
    _playlist['last_modified_date'] = playlist['lastModifiedDate']
    _playlist['n_items'] = len(playlist['items'])
    _playlist['description'] = playlist['description']
    _playlist['n_followers'] = playlist['numberOfFollowers']
    
    # Tracks in playlist
    for position,track in enumerate(playlist['items'], start = 1):
        try:
            _track = {}
            _track['track_name'] = track['track']['trackName']
            _track['artist_name'] = track['track']['artistName']
            _track['album_name'] = track['track']['albumName']
            _track['track_uri'] = track['track']['trackUri']
            _track['playlist_name'] = playlist['name']
            _track['position_in_list'] = position
            
            # append each track as it goes - the FK is the playlist name anyways
            tracks_in_playlists.append(_track)
        except TypeError:
            print("local item.. skipping.")
            pass
    
    # After going through each playlist and each item in playlist we append to list
    print(f"Appending playlist:\n{_playlist}")
    playlists_list.append(_playlist)

In [6]:
print(f"Number of playlists: {len(playlists_list)}")
playlists_df = pd.DataFrame(playlists_list)
playlists_df.to_csv("csvs/playlists.csv", encoding = 'utf-8', index = False)
playlists_df.head()

Unnamed: 0,name,last_modified_date,n_items,description,n_followers
0,Heavy Rotation: Oct 21,2021-10-07,17,,0
1,descubriendo,2021-10-06,17,,0
2,World We Created,2021-09-27,2,,0
3,30-min workout,2021-09-28,14,,0
4,Beibi Songs,2021-09-04,2,,0


In [7]:
print(f"Number of tracks in playlists: {len(tracks_in_playlists)}")
tracks_in_playlists_df = pd.DataFrame(tracks_in_playlists)
tracks_in_playlists_df.to_csv("csvs/tracks_in_playlists.csv", encoding = 'utf-8', index = False)
tracks_in_playlists_df.head()

Unnamed: 0,track_name,artist_name,album_name,track_uri,playlist_name,position_in_list
0,"niño,",Ed Maverick,eduardo,spotify:track:3Yle1MUxWyj8NGS4ej8vnX,Heavy Rotation: Oct 21,1
1,Duele,Simpson Ahuevo,Duele,spotify:track:0vp1LRAhFT6PK9ixOmFT6E,Heavy Rotation: Oct 21,2
2,Fiebre,Simpson Ahuevo,Jorge,spotify:track:7LNlghRol6aZwCjFGTujKV,Heavy Rotation: Oct 21,3
3,Socios,Santa Fe Klan,Socios,spotify:track:7d8TPVWQLeW5PKrAb76V3N,Heavy Rotation: Oct 21,4
4,El diablo anda suelto,Código KM,El Fin Del Mundo,spotify:track:3itof28qmCDvqiyJSLLlzy,Heavy Rotation: Oct 21,5


## Search queries
It's a list of search queries but a query can have 0 or more interactions.

This can be a nested list since it's not that important imo

In [8]:
with open("MyData/SearchQueries.json", "r") as file:
    search_queries_json = json.load(file)

In [10]:
search_queries_df = pd.DataFrame(search_queries_json)
search_queries_df.columns = ['platform', 'search_time', 'search_query', 'search_interaction_uris']
search_queries_df.to_csv("csvs/search_queries.csv", encoding = 'utf-8', index = False)
search_queries_df.head()

Unnamed: 0,platform,search_time,search_query,search_interaction_uris
0,IPHONE,2021-07-11T00:16:29.401Z[UTC],vice city,[spotify:track:2SIdGJTWirTxRNEyCp9ocI]
1,IPHONE,2021-07-11T00:24:05.691Z[UTC],ab soul,[spotify:artist:0g9vAlRPK9Gt3FKCekk4TW]
2,IPHONE,2021-07-11T03:27:52.664Z[UTC],de do,"[spotify:track:0ZMd3vx4pIcQyKAMWWvE8i, spotify..."
3,IPHONE,2021-07-12T18:20:10.919Z[UTC],mill,[spotify:artist:31l8FA2bO5qxpqf8uhV5eZ]
4,IPHONE,2021-07-13T00:22:26.503Z[UTC],818,"[spotify:track:3ZwrKxXpwOXPezZ9ey9QT1, spotify..."


## Streaming history

We'll borrow the code from [@io_exception](https://twitter.com/io_exception) in his [tacosdedatos tutorial](https://old.tacosdedatos.com/mas-data-viz-con-spotify-python)

In [11]:
history = []
for file in sorted(glob("MyData/StreamingHistory*.json")):
    with open(file) as readable:
        history.extend(json.load(readable))
history = pd.DataFrame(history)
history["endTime"] = pd.to_datetime(history["endTime"])
history.columns = ['end_time', 'artist_name', 'track_name', 'ms_played']

history.to_csv("csvs/streaming_history.csv", encoding = 'utf-8', index = False)
history.head()

Unnamed: 0,end_time,artist_name,track_name,ms_played
0,2020-10-06 20:12:00,Gera MX,Rumores,2693
1,2020-10-07 02:33:00,Gera MX,A Lo Mexicano,118769
2,2020-10-07 02:33:00,Gera MX,Gran Pez,1951
3,2020-10-07 02:36:00,Gera MX,Gran Pez,190375
4,2020-10-07 02:38:00,Gera MX,Gran Pez,147921


## Library
YourLibrary.json is actually a collection of all your saved items
but this means there's `tracks`, `albums`, `shows`, `episodes`, and `artists` in one file. 

These will be each a `library_` table/dataframe

In [12]:
with open("MyData/YourLibrary.json", "r") as file:
    library_json = json.load(file)

In [13]:
for label in ['tracks', 'albums', 'shows', 'episodes', 'artists']:
    file_name = f"csvs/library_{label}.csv"
    print(f"saving: {file_name}")
    pd.DataFrame(library_json[label]).to_csv(file_name, encoding = 'utf-8', index = False)