link for outline
<a id='acquire_random_data'></a>

# What are we trying to do

Acquire data for random songs from spotify API to use in comparison to songs from specific playlist

### Issues

- According to [here](https://perryjanssen.medium.com/getting-random-tracks-using-the-spotify-api-61889b0c0c27), API returns more "famous" artists/tracks at top of searches.  Luckily, can use an off-set to counteract

- Also according to above link, API results are also different according to "market" in which IP number posting `REQUEST` is located, which is co-extensive but not congruous with "country". So, should also randomize market. 

### External resources

Here's something someone made with the spotipy package that:

- randomly chooses a letter

- randomly chooses whether it's at the start, beginning, or end of a word

- randomly chooses whether it's in a song title, album, etc (set of choices here is selectable)

https://github.com/michimalek/spotipy-random

In order to get around the fact that spotify api returns "most popular" songs first, an offset is available to select "further down the popularity list".  

So, we have to randomize that offset.

In [104]:
from spotipy import Spotify
from spotipy_random import get_random

from spotipy.oauth2 import SpotifyClientCredentials

from time import sleep

import random as rnd
import pandas as pd

import requests

import os
import sys

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src import api


%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [105]:
auth_manager = SpotifyClientCredentials(
    client_id="1c2e33a8bebd41fbbb8a1ecf0e8c4273",
    client_secret="4964098dcc7b41f99c4178e6403645c1"
)

sp = Spotify(auth_manager=auth_manager)

# spotify_client = Spotify(auth_manager=SpotifyClientCredentials(
#                                  client_id="1c2e33a8bebd41fbbb8a1ecf0e8c4273",
#                                  client_secret="4964098dcc7b41f99c4178e6403645c1")
#                         )

random_track_json: str = get_random(spotify=sp, type="track", year=2020, offset_max=1000)

In [106]:
random_track_json['id']

'2tCVQ8Ez3MSHMnLjmGlrgQ'

## Have it working, now what?

- set random number for offset max
- unparse track id
- get track raw data
- store
- pause
- get track spotify features
- get track metadata
- store

In [33]:
random.randint(0,1000)


273

In [57]:
raw_frames = []
spotify_frames = []
metadata_frames = []

for counter in range(0,1000):
    if counter%50==0:
        print(counter)
    
    random_max_offset = rnd.randint(0, 1000)
    
    try:
        random_track_json: str = get_random(
            spotify=sp, 
            type="track", 
            year=2020,
            offset_max=random_max_offset
        )
            
    except IndexError:
        print(random_max_offset)
        
    random_track_id = random_track_json['id']
    
    random_track_data = api.get_raw_data_track(
        random_track_id
    )
   
    
    try:
        random_track_raw_frame = api.unpack_json(
            random_track_data,
            random_track_id
        )
    except IndexError:
        print(random_track_id)
    
    raw_frames.append(
        random_track_raw_frame
    )
    
    base = random_track_json
    random_track_meta_dict = {
        'id': base['id'],
        'popularity': base['popularity'],
        'artist': base['artists'][0]['name'],
        'artist_type': base['artists'][0]['type'],
        'album_type': base['album']['album_type'],
        'album_release_date': base['album']['release_date'],
        'release_precision': base['album']['release_date_precision'],
        'track_name': base['name'],
        'track_type': base['type'],
    }
    random_track_meta_frame = pd.DataFrame(
        random_track_meta_dict,
        index=[0]
    )
    
    metadata_frames.append(
        random_track_meta_frame
    )
    
    
    random_track_sp_features = \
        api.get_spotify_features_from_trackid(
            [random_track_id]
    )
    
    spotify_frames.append(
        random_track_sp_features
    )
    
    
    
    
    sleep(.1)

0
50
48
100
503
947
886
79T9Dm63dHw8I5JMc9LVxz
150
272
5pGBDKBaR63vuJ4g8ialcU
844
79
612
200
762
652
250
300


KeyboardInterrupt: 

## Two problems

- offset is too high for the search results.  Spotipy works by providing a `limit` to the list being brough back, and then selecting an `offset` of the index of the first element to return.  So, if it's a weird search term eg "z in the middle of the word" and the returned list is shorter than the `offset`, there's gonna be trouble.  **Solution** is to get the returned list w/o an offset and "if offset > len of list, return last element"

- In a test-run, a couple tracks were selected but failed to go through the unpacking process 

In [59]:
test = api.get_raw_data_track('79T9Dm63dHw8I5JMc9LVxz')

In [69]:
try:
    for key in api.default_track_dictionary.keys():
        api.unpack_json(
            test, 
            '79T9Dm63dHw8I5JMc9LVxz', 
            columns_dictionary={
                key: api.default_track_dictionary[key]
            }
        )
except IndexError:
    print(key)

bars


In [77]:
test['bars']
test['beats']

[]

hm, that's weird . . . no bars or beats data

In [78]:
test.keys()

dict_keys(['meta', 'track', 'bars', 'beats', 'sections', 'segments', 'tatums'])

In [79]:
test['meta']

{'analyzer_version': '4.0.0',
 'platform': 'Linux',
 'detailed_status': 'OK',
 'status_code': 0,
 'timestamp': 1591585652,
 'analysis_time': 11.25034,
 'input_process': 'libvorbisfile L+R 44100->22050'}

In [4]:
api.get_spotify_features_from_trackid(['79T9Dm63dHw8I5JMc9LVxz'])

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0,0.000458,10,-22.879,0,0,0.743,0.939,0.111,0,0,audio_features,79T9Dm63dHw8I5JMc9LVxz,spotify:track:79T9Dm63dHw8I5JMc9LVxz,https://api.spotify.com/v1/tracks/79T9Dm63dHw8...,https://api.spotify.com/v1/audio-analysis/79T9...,127500,0


In [11]:
test = Out[4]['track_href'][0]

In [13]:
test1 = requests.get(
        test,
        headers=api.headers
).json()

In [16]:
test1['name']

'ZzzzZ 1.4 kHz'

In [19]:
test2 = api.get_spotify_features_from_trackid(['5pGBDKBaR63vuJ4g8ialcU'])['track_href'][0]

In [20]:
test3 = requests.get(
        test2,
        headers=api.headers
).json()

In [21]:
test3

{'album': {'album_type': 'album',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2v4Lbdw4AEnnNVVUHi9esf'},
    'href': 'https://api.spotify.com/v1/artists/2v4Lbdw4AEnnNVVUHi9esf',
    'id': '2v4Lbdw4AEnnNVVUHi9esf',
    'name': 'White Noise Spa',
    'type': 'artist',
    'uri': 'spotify:artist:2v4Lbdw4AEnnNVVUHi9esf'}],
  'available_markets': ['AD',
   'AE',
   'AG',
   'AL',
   'AM',
   'AO',
   'AR',
   'AT',
   'AU',
   'AZ',
   'BA',
   'BB',
   'BD',
   'BE',
   'BF',
   'BG',
   'BH',
   'BI',
   'BJ',
   'BN',
   'BO',
   'BR',
   'BS',
   'BT',
   'BW',
   'BY',
   'BZ',
   'CA',
   'CD',
   'CG',
   'CH',
   'CI',
   'CL',
   'CM',
   'CO',
   'CR',
   'CV',
   'CW',
   'CY',
   'CZ',
   'DE',
   'DJ',
   'DK',
   'DM',
   'DO',
   'DZ',
   'EC',
   'EE',
   'EG',
   'ES',
   'FI',
   'FJ',
   'FM',
   'FR',
   'GA',
   'GB',
   'GD',
   'GE',
   'GH',
   'GM',
   'GN',
   'GQ',
   'GR',
   'GT',
   'GW',
   'GY',
   'HK',
   'HN',
   'HR',
   'H

Aaaaand turns out they're both white noise tracks lol

So, next on to-do
- re-write randomzie search code to "select last element of list" if offset > list length

- re-write rewrite unpacking code to skip if get a white noise track 

### Actually, investigate first

Don't actually know if the above logic is what's going on.  Let's figure this out with an example of spotipy search

In [184]:
test_dict = sp.search('e%', limit=50, offset=951)
test_dict['tracks']['total']

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'e%', 'limit': 50, 'offset': 951, 'type': 'track', 'market': None} returned 404 due to Not found.


SpotifyException: http status: 404, code:-1 - https://api.spotify.com/v1/search?q=e%25&limit=50&offset=951&type=track:
 Not found., reason: None

In [180]:
test_dict = sp.search('e%', limit=1, offset=0)
test_dict['tracks']['total']

10031

In [181]:
test_dict = sp.search('e%', limit=25, offset=0)
test_dict['tracks']['total']

10772

In [182]:
test_dict = sp.search('e%', limit=39, offset=0)
test_dict['tracks']['total']

10888

In [178]:
test_dict['tracks']['items'][24]['name']

'GOD DID (feat. Rick Ross, Lil Wayne, Jay-Z, John Legend & Fridayy)'

Ok, so the way this works is:
- return `full_list` from search terms
- starting part of list is chopped off according to `offset`
    - essentially, `full_list` becomes `full_list[offset:]`
- `full_list` is truncated by `limit` and returned
    - essentially, returns `full_list[offset:offset+limit]`
    
    

And, this limits to only top 1000 tracks, ie last element we can possibly get in a list of 10,000 tracks is 999

So, let's try something different: use musicbrainz
https://musicbrainz.org/

gather a list of track titles from musicbrainz, randomly sample ~100-200k of them, then search for the track names on spotify.  If it pops up, download the info.  If not (ie it's not on spotify) grap another random track name from musicbrainz and try and download it.  