## Data Collection
 * [generate](#Artist-List) list of artists from "library"
 * [search](#Loop-for-Similar-Artists)  for up to 10 similar artists from each artist via [last.fm API](#last.fm-API-settings)
   * gather similar artist data:
     * name (string)
     * match score (float, 0-1) --> saved as (integer, 0-1e7)
     * if similar artist in library (boolean)
     * last.fm URL (string, valid URL)
   * try replacing "&" and "+" with "and" if artist not found
     * if successful, save alias names as dictionary for future use
   * [save](#Save-Data) similar artist data and failed searches as JSON files
 * *optional*: analyze failed searches and try again
   * [sample data after collection](#Data-Preview)

----
###
### Import packages

In [1]:
import requests
import json
from pathlib import Path
import pandas as pd
from tqdm import tqdm
from time import sleep
from collections import defaultdict

----
###  
### Artist List

I have a [SQLite database](https://github.com/NBPub/MusicDB-sql-practice#part-one---db-creation) for my music library and gathered a list of artists from that. Other methods may be more suitable for you:
 * use folder names if library stored with **Artist Names** as folders, `Path.iterdir()`
 * use API of your streaming service
   * [last.fm User Library](https://www.last.fm/api/show/library.getArtists)
   * [Spotify Followed Artists](https://developer.spotify.com/documentation/web-api/reference/get-followed)
 * manually create a list of artists to use

In [2]:
import sqlite3

con = sqlite3.connect('seven.db')
cur = con.cursor()

artist_list = cur.execute('SELECT name FROM Artists').fetchall()
artist_list = [val[0] for val in artist_list]

# capitalization shouldn't affect determination if similar artist in library
artist_lower = [val.lower() for val in artist_list] 
print(len(artist_list))

1538


In [3]:
con.close()

----
####  
#### last.fm API settings

Specify a valid header and API key, [documentation](https://www.last.fm/api). See the example data returned for [similar artists info](https://www.last.fm/api/show/artist.getSimilar) below.

In [38]:
headers = {'user-agent': 'your-project-name/0.0.1'}
key = 'your-API-key'
base = 'http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist='

In [6]:
example_artist = 'A Tribe Called Quest'
URL = f'{base}{example_artist}&limit=10&api_key={key}&format=json'
r = requests.get(URL, headers=headers)
json.loads(r.content)

{'similarartists': {'artist': [{'name': 'Q-Tip',
    'match': '1',
    'url': 'https://www.last.fm/music/Q-Tip',
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'mega'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': ''}],
    'streamable': '0'},
   {'name': 'The Pharcyde',
    'mbid': 'd7134426-a937-43bf-bc54-f10ad8102ed9',
    'match': '0.988053',
    'url'

----
### 
### Loop for Similar Artists

Notes:
 * similar artist and failed search data saved in default dictionaries, alias names saved in dictionary
 * [tqdm](https://tqdm.github.io/) used to track loop progress
   * progress bar could be customized to provide more information
 * rate limits are not specified for API, but conservative sleep time used for final data collection.
   * can be reduced, I only slept 0.1s for example
 
*sample of artist list used for example in this notebook*

In [14]:
sim_artists = defaultdict(list)
artists_alias = {}
fails = defaultdict(list)

In [11]:
example_list = [artist_list[val] for val in range(0,len(artist_list),5)]
print(len(example_list))

308


In [21]:
for artist in tqdm(example_list):
    alias = False
    URL = f'{base}{artist}&limit=10&api_key={key}&format=json'                     
    r = requests.get(URL, headers=headers)
    
    # Check for valid response may be useful in development
    # if r.status_code != 200:
    #     fails[artist].append('request')
    #     continue
    
    # dictionary.get() method helpful to reduce amount of if statements
    data = json.loads(r.content)
    message = data.get('message','')
    data = data.get('similarartists',{}).get('artist',[])
      
    if not data: # try replacing symbols for "and" and search again
        new_artist = artist.replace('+','and').replace('&','and')
        URL = f'{base}{new_artist}&limit=10&api_key={key}&format=json'                     
        r = requests.get(URL, headers=headers)
        data = json.loads(r.content)
        message = data.get('message','')
        data = data.get('similarartists',{}).get('artist',[])
        alias = True # mark to save alias or note attempt in error message
    
    sleep(0.5)
    # gather data if present    
    if data: 
        try: # try/else not needed, but could be helpful in development
            for similar in data:
                sim_artists[artist].append(
                    {'name':similar['name'],'score':int(float(similar['match'])*1e6), 'link':similar['url'], 
                     'library': True if similar['name'].lower() in artist_lower else False}
                                            )
                if alias:               
                    artists_alias[artist] = new_artist
        except Exception as e:
            fails[artist].append(str(e))
        
    else:
        # data could be empty, alias could have been tried, error message could be present
        if message:
            fails[artist].append(f"{message} Alias tried") if alias else fails[artist].append(message)
        else:
            fails[artist].append('No Matches! Alias tried.') if alias else fails[artist].append('No Matches!')
        continue

100%|██████████| 308/308 [01:27<00:00,  3.52it/s]


----
### 
### Data Preview
#### Similar Artist Data

In [31]:
print([sim_artists[val] for val in list(sim_artists.keys())[0:5]])

[[{'name': 'The Juan Maclean', 'score': 1000000, 'link': 'https://www.last.fm/music/The+Juan+Maclean', 'library': False}, {'name': 'Holy Ghost!', 'score': 942644, 'link': 'https://www.last.fm/music/Holy+Ghost%21', 'library': True}, {'name': 'The Rapture', 'score': 803641, 'link': 'https://www.last.fm/music/The+Rapture', 'library': True}, {'name': 'Cut Copy', 'score': 652309, 'link': 'https://www.last.fm/music/Cut+Copy', 'library': True}, {'name': 'LCD Soundsystem', 'score': 631905, 'link': 'https://www.last.fm/music/LCD+Soundsystem', 'library': True}, {'name': 'Hot Chip', 'score': 598909, 'link': 'https://www.last.fm/music/Hot+Chip', 'library': True}, {'name': 'Yacht', 'score': 593413, 'link': 'https://www.last.fm/music/Yacht', 'library': False}, {'name': 'The Faint', 'score': 587127, 'link': 'https://www.last.fm/music/The+Faint', 'library': False}, {'name': 'Soulwax', 'score': 570667, 'link': 'https://www.last.fm/music/Soulwax', 'library': False}, {'name': 'Fujiya & Miyagi', 'score': 

#### Alias Artist Names
*keys are library artists as stored in the database, values were strings used for successful searches*

In [32]:
artists_alias

{'Alina Baraz & Galimatias': 'Alina Baraz and Galimatias',
 'David Byrne & St. Vincent': 'David Byrne and St. Vincent',
 'Diplo & Datsik': 'Diplo and Datsik',
 'Donnie Trumpet & The Social Experiment': 'Donnie Trumpet and The Social Experiment',
 'Dr. Dre & Eminem': 'Dr. Dre and Eminem',
 'Emerson, Lake & Palmer': 'Emerson, Lake and Palmer',
 'Pete Philly & Perquisite': 'Pete Philly and Perquisite',
 'Rising Appalachia & The Human Experience': 'Rising Appalachia and The Human Experience',
 'Rosie Thomas & Sufjan Stevens': 'Rosie Thomas and Sufjan Stevens',
 'Wynton Kelly Trio & Wes Montgomery': 'Wynton Kelly Trio and Wes Montgomery',
 'Yo-Yo Ma & Bobby McFerrin': 'Yo-Yo Ma and Bobby McFerrin'}

#### Failed Searches
*example for one artist*

In [36]:
fails['tab spencer']

['No Matches! Alias tried.']

----
### 
### Save Data

Saved data will be loaded and used for network graph creation and plotting.

In [37]:
with open('ex_sim_artists.json', 'w', encoding = 'utf-8') as file:
    file.write(json.dumps(sim_artists))
    
with open('ex_artists_alias.json', 'w', encoding = 'utf-8') as file:
    file.write(json.dumps(artists_alias))
    
with open('ex_fails.json', 'w', encoding = 'utf-8') as file:
    file.write(json.dumps(fails))