### In this notebook, we'll retrieve a subset from our main dataset and write that to a csv file, so that this subset can be used in all our mini models

Start by importing certain pickle variables that might be useful

In [1]:
import pickle
def writePickle(Variable, fname):
    filename = fname +".pkl"
    f = open("pickle_vars/"+filename, 'wb')
    pickle.dump(Variable, f)
    f.close()
def readPickle(fname):
    filename = "pickle_vars/"+fname +".pkl"
    f = open(filename, 'rb')
    obj = pickle.load(f)
    f.close()
    return obj
def readPicklePast(fname):
    filename = "../pickle_vars/"+fname +".pkl"
    f = open(filename, 'rb')
    obj = pickle.load(f)
    f.close()
    return obj

In [2]:
EN_PRGE_metadata_dict = readPicklePast("EN_PRGE_metadata_dict") # metadata dataset for english only songs with raw and projeted genre labels
children_to_parent_genre_dict = readPicklePast("children_to_parent_genre_dict") # a dictionary that maps all children genre labels to parent genre labels

We aim to form a sub-dataset that have the following properties: <br>
- It will have 10 artists from each parent genre
- The parent genre set will exclude 'Rest', and include all the remaining genres (i.e 'Reggae', 'R&B', 'Punk', 'Pop', 'Blues', 'Folk', '(Electronic) Dance', 'Jazz', 'Heavy Metal', 'Country', 'Rock', 'Hip Hop' and 'Gospel&Religious')
- Each artist will have a total of 100 song lyrics. This set of 100 lyrics will be allocated as 80-10-10 for the training, evaluation and test sets respectively.

Let's start by collecting each entry (object or instance) in the dataset under a dictionary with the following example structure: <br>
dict = { 'parent_genre_1' : {'artist_name_1' : [object_id_1, object_id_2, ..., object_id_i], 'artist_name_2' : [object_id_j, ..., object_id_k], ... }, <br>
&emsp;&emsp;&emsp; 'parent_genre_2' : {'artist_name_10' : [object_id_l, object_id_m, ..., object_id_p], 'artist_name_11' : [object_id_q, ..., object_id_t], ... }, <br>
&emsp;&emsp;&emsp; 'parent_genre_3'...}

In [3]:
parent_genre_collection_dict = dict()
for parent_genre_label in list(set(list(children_to_parent_genre_dict.values()))):
    parent_genre_collection_dict[parent_genre_label] = dict()
del parent_genre_collection_dict['Rest'] # we'll discard the 'Rest' genre

for object_id, metadata in EN_PRGE_metadata_dict.items():
    genre = metadata[1]
    artist_name = metadata[2]
    parent_genre = children_to_parent_genre_dict[genre]
    if parent_genre == 'Rest':
        pass
    else:
        try:
            parent_genre_collection_dict[parent_genre][artist_name].append(object_id)
        except:
            parent_genre_collection_dict[parent_genre][artist_name] = list()
            parent_genre_collection_dict[parent_genre][artist_name].append(object_id)



In [4]:
# an example
print(parent_genre_collection_dict["Heavy Metal"]["Metallica"])

['ObjectId(5714dedb25ac0d8aee4ad800)', 'ObjectId(5714dedb25ac0d8aee4ad802)', 'ObjectId(5714dedb25ac0d8aee4ad803)', 'ObjectId(5714dedb25ac0d8aee4ad806)', 'ObjectId(5714dedb25ac0d8aee4ad807)', 'ObjectId(5714dedb25ac0d8aee4ad808)', 'ObjectId(5714dedb25ac0d8aee4ad809)', 'ObjectId(5714dedb25ac0d8aee4ad80e)', 'ObjectId(5714dedb25ac0d8aee4ad80f)', 'ObjectId(5714dedb25ac0d8aee4ad812)', 'ObjectId(5714dedb25ac0d8aee4ad813)', 'ObjectId(5714dedb25ac0d8aee4ad814)', 'ObjectId(5714dedb25ac0d8aee4ad816)', 'ObjectId(5714dedb25ac0d8aee4ad818)', 'ObjectId(5714dedb25ac0d8aee4ad81a)', 'ObjectId(5714dedb25ac0d8aee4ad81b)', 'ObjectId(5714dedb25ac0d8aee4ad81d)', 'ObjectId(5714dedb25ac0d8aee4ad81e)', 'ObjectId(5714dedb25ac0d8aee4ad822)', 'ObjectId(5714dedb25ac0d8aee4ad824)', 'ObjectId(5714dedb25ac0d8aee4ad825)', 'ObjectId(5714dedb25ac0d8aee4ad826)', 'ObjectId(5714dedb25ac0d8aee4ad829)', 'ObjectId(5714dedb25ac0d8aee4ad82d)', 'ObjectId(5714dedb25ac0d8aee4ad82f)', 'ObjectId(5714dedb25ac0d8aee4ad830)', 'ObjectId(5

For each parent genre category, find the list of artists that have more than 100 songs:

In [5]:
_100_plus = dict()
for parent_genre_label in list(set(list(children_to_parent_genre_dict.values()))):
    _100_plus[parent_genre_label] = list()
del _100_plus['Rest'] # we'll discard the 'Rest' genre

for parent_genre, artists in parent_genre_collection_dict.items():
    for artist in artists.keys():
        if len(artists[artist]) >= 100:
            _100_plus[parent_genre].append(artist)
    

In [6]:
for genre_label, artist_list in _100_plus.items():
    print(genre_label, "has", len(artist_list), "artists with more than 100 songs in the dataset.\n")

Jazz has 10 artists with more than 100 songs in the dataset.

Country has 113 artists with more than 100 songs in the dataset.

Hip Hop has 102 artists with more than 100 songs in the dataset.

(Electronic) Dance has 17 artists with more than 100 songs in the dataset.

Folk has 29 artists with more than 100 songs in the dataset.

R&B has 48 artists with more than 100 songs in the dataset.

Gospel&Religious has 24 artists with more than 100 songs in the dataset.

Reggae has 8 artists with more than 100 songs in the dataset.

Pop has 64 artists with more than 100 songs in the dataset.

Heavy Metal has 71 artists with more than 100 songs in the dataset.

Blues has 21 artists with more than 100 songs in the dataset.

Punk has 43 artists with more than 100 songs in the dataset.

Rock has 250 artists with more than 100 songs in the dataset.



According to our analysis of genre labels and artists, 'Reggae' cannot comply with our criteria of having 10 artists with at least 100 songs each. Therefore: <br>
1- We'll remove 'Reggae' and work with the remaining genre labels <br>
2- For each remaining genre label, we'll randomly select 10 artists from the set of all artists available <br>
3- For each artist, we'll randomly select 100 songs (instances) each, and form out subset

In [7]:
# remove 'Reggae'
del _100_plus['Reggae']

!!! Important Point - Even though the exact song lyrics were spotted earlier on and removed, some lyrics are partially the same. For such songs, we need to carry out a similarity analysis. !!!

In [26]:
# import useful gensim packages for creating our bag of words out of the sub corpus we would like to work with
import gensim
from gensim.matutils import softcossim 
from gensim import corpora
import gensim.downloader as api
from gensim.utils import simple_preprocess

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 


# download fast text model to create a similarity matrix to be used for similarity calculations
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

In [29]:
# gather all lyrics in one list
all_lyrics = list()
for genre_label, artist_list in _100_plus.items():
    for artist in artist_list:
        song_id_list = parent_genre_collection_dict[genre_label][artist]
        for song_id in song_id_list:
            all_lyrics.append(EN_PRGE_metadata_dict[song_id][5])


In [30]:
# tokenize all lyrics and create a token dictionary
lyrics_tokens = [[token for token in simple_preprocess(lyric)] for lyric in all_lyrics]
tokens_dictionary = corpora.Dictionary(lyrics_tokens)
print("The token dictionary created by all examples (including training and test) is:",tokens_dictionary.token2id)



In [31]:
# using the token dictionary, convert all lyrics into tokens with bag of words
all_tokenized = [simple_preprocess(lyric) for lyric in all_lyrics]
corpus = [tokens_dictionary.doc2bow(lyric, allow_update=True) for lyric in all_tokenized]
corpus_word_counts = [[(tokens_dictionary[id], count) for id, count in line] for line in corpus]
print("Some examples of samples written in word counts:\n", corpus_word_counts[0], '\n', corpus_word_counts[1])

Some examples of samples written in word counts:
 [('and', 3), ('as', 1), ('at', 1), ('bee', 2), ('bewitched', 1), ('but', 1), ('crown', 1), ('deep', 1), ('done', 1), ('dream', 1), ('feet', 1), ('found', 1), ('golden', 1), ('ground', 1), ('hand', 1), ('have', 1), ('in', 2), ('land', 1), ('lies', 1), ('long', 1), ('look', 1), ('love', 1), ('lover', 1), ('maybe', 1), ('me', 1), ('my', 1), ('nmy', 1), ('of', 1), ('off', 1), ('one', 1), ('palm', 1), ('re', 1), ('seems', 1), ('she', 1), ('sleepin', 2), ('sweet', 1), ('that', 1), ('the', 4), ('told', 1), ('true', 1), ('walk', 1), ('when', 1), ('with', 2), ('you', 1), ('your', 1)] 
 [('and', 2), ('as', 2), ('but', 2), ('deep', 1), ('in', 1), ('love', 2), ('me', 5), ('of', 2), ('one', 1), ('re', 3), ('that', 1), ('the', 1), ('true', 1), ('when', 1), ('with', 2), ('you', 8), ('always', 1), ('be', 3), ('bet', 1), ('cause', 1), ('cloudy', 1), ('come', 6), ('days', 1), ('don', 1), ('ever', 1), ('fine', 1), ('gonna', 3), ('guess', 1), ('happy', 1),

In [32]:
# out of this word count corpus, calculate a similarity matrix with the fasttext model
print("calculating similarity matrix... this may take a while!")
similarity_matrix = fasttext_model300.similarity_matrix(tokens_dictionary, tfidf=None, threshold=0.0, exponent=2.0, nonzero_limit=100)

calculating similarity matrix... this may take a while!


In [126]:
# save the similarity matrix for later use
def writecsPickle(Variable, fname):
    filename = fname +".pkl"
    f = open("cosine_model_pickle_vars/"+filename, 'wb')
    pickle.dump(Variable, f)
    f.close()
writecsPickle(similarity_matrix, "similarity_matrix_for_dataset_formation")

In our design of selecting songs we have to consider certain criteria: <br>
1- Some artist appear multiple times for different genres

In [92]:
artist_appearances = list()
for artist in _100_plus.values():
    artist_appearances.extend(artist)
multiples = list()
for artist in artist_appearances:
    if artist_appearances.count(artist) > 1:
        print(artist, artist_appearances.count(artist))
        multiples.append(artist)
multiples = set(multiples)
print(multiples)

The Supremes 2
The Supremes 2
Chris Rea 2
Chris Rea 2
{'Chris Rea', 'The Supremes'}


2- Jazz only has 10 unique artists that have more than 100 songs. Therefore if we'd like to work with jazz, we have to accept all artists without any criteria <br>

In [94]:
# now, we'll select our songs with the basis of similarity criteria
import random
random.seed(450) # use the same seed to generate same order each time
subset_song_id_list = list() # a list for all the song (object) ids that will be evaluated in the model
max_sim_dict = dict() # this is a dictionary that shows for each artist the max similarity value among her songs

for genre_label, artist_list in _100_plus.items(): # we'd like to loop over each genre label and artists associated with it
    random.shuffle(artist_list) # shuffle the artist list with the constant seed
    artist_count = 0
    for artist in artist_list:
        if artist_count == 10:
            break
        if artist in multiples: # do not dare to work with artists that belong to multiple genres
            pass
        
        sim_thres = 0.8 # for each artist, we'll start with a similarity threshold of 0.8
        selected_song_ids = list() # also for each artist, we'll have a selected song ids list that will be filled until its count is 100
        selected_song_lyrics = list() # also for each artist, we'll have a selected song lyrics list that will be filled until its count is 100
        similarities = list() # for each artist, we'll keep a record of similarity values among songs
        print(artist, "from the", genre_label, "genre is being processed!")
        artist_songs = parent_genre_collection_dict[genre_label][artist] # get artist song ids
        lyrics = [EN_PRGE_metadata_dict[song_id][5] for song_id in artist_songs] # by song ids, get a list of actual lyrics
        while len(selected_song_ids) < 100: # as long as the number of songs selected per artist is below 100
            for lyric, song_id in zip(lyrics, artist_songs):
                if song_id in selected_song_ids: # if that song_id has already been accepted to the list
                    pass
                if len(selected_song_ids) == 0:
                    selected_song_ids.append(song_id)
                    selected_song_lyrics.append(lyric)
                else:
                    print(len(selected_song_ids))
                    if len(selected_song_ids) == 100:
                        break
                    sim_log = list()
                    for already in selected_song_lyrics: # iterate over lyrics that have already been chosen
                        # tokenize both lyric samples separately
                        already_tokenized = tokens_dictionary.doc2bow(simple_preprocess(already), allow_update=True)
                        lyric_tokenized = tokens_dictionary.doc2bow(simple_preprocess(lyric), allow_update=True)
                        # check their simiarity value
                        sim_val = softcossim(lyric_tokenized, already_tokenized, similarity_matrix)
                        
                        # if we encounter any song that has already been added and has a huge similarity with the candidate song, we break the loop and continue with the next song candidate
                        if sim_val > sim_thres:
                            break
                        else:
                            sim_log.append(sim_val)
                    if len(sim_log) == len(selected_song_ids): # in other words, if the loop hasn't been broken
                        selected_song_ids.append(song_id)
                        selected_song_lyrics.append(lyric)
                        similarities.append(max(sim_log)) # add the maximum similarity value to the similarities list kept for each artist
            sim_thres += 0.03 # after our complete loop, if we don't have sufficient number of songs per artist \
                             # (i.e while loop continues), we need to increase the threshold and continue with the loop again
        # after the while loop is completed, before proceeding with a new artist, check criteria, then update two major records
        if max(similarities) < 0.88 or genre_label == 'Jazz':
            artist_count += 1
            max_sim_dict[artist] = max(similarities)
            subset_song_id_list.extend(selected_song_ids)
            print(max_sim_dict[artist])
            print(len(subset_song_id_list))
        else: # if the artist has multiple songs with high similarity (sim >= 0.88), then drop this artist
            pass      
        
        
        

Cab Calloway from the Jazz genre is being processed!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
26
27
28
29
30
31
32
33
34
34
35
36
37
38
39
39
40
40
41
42
43
44
45
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
81
82
83
84
85
86
86
86
87
88
89
90
90
91
92
93
94
95
95
95
95
95
96
97
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
100
0.8180857083247919
100
Michael Franks from the Jazz genre is being processed!
1
2
3
4
5
6
7
8
8
9
10
10
11
12
13
14
15
16
17
18
18
19
20
21
22
23
24
24
25
25
26
27
28
29
30
30
31
32
33
34
35
35
36
37
38
39
39
40
41
41
42
42
43
44
45
46
47
47
48
49
50
51
52
52
53
53
54
54
55
56
57
58
58
59
59
59
60
60
61
62
63
64
65
65
66
67
68
68
68
68
69
70
71
71
71
71
71
72
73
74
75
76
77
78
79
80
81
81
81
82
83
84
85
85
86
87
88
89
89
90
91
92
93
94
95
96
97
98
98
98
98
98
98
98
98
98
98
99
99
99
100
0.8151669839839244


32
33
34
34
34
35
36
37
38
39
40
41
42
43
44
45
46
47
47
48
49
49
49
49
49
50
51
52
53
54
55
55
56
57
58
59
60
61
61
62
62
62
62
63
64
64
64
64
65
66
67
68
68
69
69
70
70
71
72
73
73
73
74
74
75
75
76
76
76
77
77
78
79
80
81
81
82
83
84
85
86
87
88
88
89
90
91
91
92
92
92
92
92
92
92
92
92
92
93
93
93
93
93
93
93
93
94
94
94
94
94
94
94
94
94
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
97
97
97
98
98
99
100
0.8292926713718953
1300
Loretta Lynn from the Country genre is being processed!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
18
19
20
21
22
23
23
24
25
26
27
28
28
29
30
31
32
33
34
35
35
36
37
38
39
40
41
42
43
43
44
45
45
46
46
47
48
49
50
50
51
51
52
53
53
54
55
55
56
57
58
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
74
75
76
77
78
79
80
81
82
83
84
84
85
86
86
87
88
89
89
90
91
92
93
94
95
96
96
97
98
98
99
100
0.7995712997371398
1400
Dwight Yoakam from the Country genre is being processed!
1
2
3
4
5
6
7
8
9
10


8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
9
10
11
11
11
11
11
11
12
12
12
12
13
13
13
13
13
13
13
13
13
13
13
13
13
13
14
14
14
15
15
15
15
15
15
16
16
17
17
17
17
17
17
18
18
18
18
18
18
18
18
18
18
18
19
19
19
20
20
20
20
20
20
20
20
20
20
21
22
22
23
23
23
23
23
23
23
23
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
25
25
25
25
26
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
28
28
28
28
28
29
29
29
29
29
29
29
29
29
29
29
29
29
29
30
30
30
30
30
31
31
31
31
31
31
31
32
33
34
34
34
34
34
34
34
34
34
34
34
34
34
34
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
35
36
36
36
37
37
37
37
37
37
37
37
38
38
38
38
38
38
38
38
38
38
38
38
39
39
39
39
40
40
40
40
40
40
40
40
40
41
41
42
42
42
42
42
42
43
43
43
43
43
43
43
43
43
43
44
44
44
44
44
44
45
46
46
46
46
46
46
46
46
46
47
47
47
47
47
47
47
47
47
47
47
47
47
48
48
48
48
48
48
48
48
48
48
48
49
49
49
50
50
50
50
50
50
50
50
50
50
50
51
51
51
51
51
51
51
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52

52
52
52
52
52
52
52
52
53
54
54
54
54
54
54
55
56
56
56
56
56
56
56
56
56
56
56
56
56
56
57
57
57
57
57
58
58
58
58
58
58
58
58
58
58
58
59
59
59
59
59
59
59
60
60
60
61
61
61
61
61
61
61
61
61
61
61
62
62
62
62
62
62
62
62
62
62
62
63
63
63
63
63
63
63
64
64
64
64
64
65
65
65
65
66
66
66
67
67
67
67
67
67
67
68
68
68
68
68
69
70
70
70
71
71
71
72
73
74
75
75
76
76
76
76
76
76
76
76
76
76
76
76
77
78
78
79
79
79
79
79
79
79
79
79
79
79
79
79
79
79
80
81
81
81
81
81
81
81
81
81
81
82
83
83
83
83
83
83
83
83
84
84
84
84
84
84
84
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
86
87
87
87
87
88
88
88
88
88
89
89
89
89
89
89
89
90
90
91
91
91
91
92
92
92
93
93
93
93
93
93
93
93
93
93
93
93
93
94
94
94
95
95
95
95
95
96
96
97
97
97
97
97
97
98
98
99
99
99
99
100
Esham from the Hip Hop genre is being processed!
1
1
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
7
8
9
9
9
9
9
9
9
10
10

67
67
67
67
67
67
67
67
67
68
68
68
68
68
68
69
69
70
70
70
70
70
70
70
70
70
71
71
71
71
71
72
72
72
72
72
72
72
72
72
73
73
73
73
73
73
73
73
73
73
73
73
74
74
74
74
75
76
76
76
76
76
76
76
76
76
76
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
78
78
78
79
79
79
79
80
80
80
80
80
80
80
80
80
80
80
80
80
80
80
81
82
82
82
82
82
83
83
83
83
83
83
83
83
83
83
84
84
84
84
84
84
84
84
84
84
84
84
84
85
85
85
85
85
85
85
85
85
85
86
87
88
88
89
89
89
90
91
91
91
92
92
92
92
92
92
92
92
92
93
93
93
93
94
95
95
96
96
96
97
97
97
97
97
97
97
98
98
98
98
98
98
99
99
100
Trina from the Hip Hop genre is being processed!
1
2
2
3
3
4
4
4
4
4
5
5
5
5
6
7
7
7
7
7
8
8
8
8
9
10
10
10
10
11
11
11
11
12
13
13
13
14
14
14
14
14
14
15
16
16
17
18
18
18
19
20
20
20
21
21
22
23
23
23
24
25
26
27
28
29
30
31
31
32
33
34
35
35
35
35
35
36
36
36
36
37
38
38
38
39
40
40
40
40
40
41
41
41
41
41
41
42
42
42
42
42
43
44
44
45
45
46
47
47
47
47
47
47
48
49
50
50
50
50
51
51
51
52
52
52
52
52
53
53
53
53
53
53
53


16
16
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
18
18
18
18
18
18
18
18
18
18
18
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
20
20
20
20
20
20
20
20
20
20
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
22
22
22
22
22
22
22
22
22
22
22
22
23
23
23
23
23
23
24
24
24
24
24
24
24
24
24
24
24
25
25
26
26
26
26
26
27
27
27
27
27
27
27
27
27
27
27
27
28
28
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
30
30
30
30
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
32
32
33
33
33
33
34
35
35
35
35
35
35
36
36
37
37
37
38
38
38
38
38
38
38
38
38
38
38
38
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
40
40
40
40
40
40
40
40
40
40
40
40
40
40
40
41
41
41
41
41
41
42
42
43
43
43
43
43
43
43
43
43
43
43
43
44
44
44
45
45
45
45
45
45
45
45
45
45
45
45
45
45
46
46
46
46
46
46
4

50
50
50
50
50
50
51
52
52
53
54
54
54
54
54
54
54
54
54
55
55
55
56
56
56
56
56
56
56
56
56
56
56
56
56
56
57
57
57
57
57
58
59
59
59
59
59
59
59
59
60
60
61
61
62
62
63
63
64
64
64
64
64
64
64
64
64
64
64
65
65
65
65
65
65
65
65
65
65
65
65
65
66
66
66
66
66
66
66
66
66
66
66
66
66
67
67
67
67
67
67
67
67
67
67
67
67
67
68
68
69
69
69
69
69
69
69
70
70
70
70
71
71
72
72
73
74
74
74
75
75
76
77
77
77
78
78
78
78
78
78
78
78
78
79
79
80
80
80
80
80
80
80
80
80
81
81
81
81
82
82
82
82
82
82
82
82
83
83
83
83
83
83
83
83
83
83
83
83
83
83
83
83
84
84
84
84
84
84
84
84
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
86
87
87
87
87
87
87
87
87
87
88
89
89
89
89
89
89
89
89
89
89
89
90
90
90
90
91
91
92
92
92
92
92
92
92
93
93
93
93
93
93
93
93
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
95
95
95
95
95
96
97
98
98
99
99
100
Tyga from the Hip Hop genre is being processed!
1
2
3
4
4
5
6
7
8
8
9
9
10
10
10
11
11
12
12
13
14
14
14
14
14
15
16
16
17
17
18
18


80
80
81
81
81
82
82
83
83
83
83
83
83
83
83
83
83
83
83
84
84
84
85
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
87
87
87
87
87
87
87
87
87
87
87
87
87
87
87
87
87
88
88
88
88
88
89
90
90
90
90
90
90
91
91
91
91
91
92
92
92
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
94
94
94
94
94
94
94
95
96
96
96
96
97
97
97
97
97
97
97
97
97
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
Cam'ron from the Hip Hop genre is being processed!
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
4
5
5
5
5
5
5
5
5
5
5
5
5
5
6
6
6
6
6
7
8
8
9
9
9
10
10
11
12
12
12
12
13
13
14
14
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
16
16
16
16
17
17
17
17
18
19
19
20
21
22
22
23
24
24
24
24
25
25
25
25
25
26
27
27
28
29
29
29
29
29
29
30
30
31
32
32
33
33
33
33
33
33
33
34
34
34
34
34
35
36
37
37
37
37
37
37
37
38
38
39
40
40
40
41
4

31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
31
32
32
32
32
32
32
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
33
34
34
35
35
35
35
35
35
35
36
36
36
37
37
38
38
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
39
40
40
41
41
41
42
42
42
42
43
43
43
43
43
43
43
43
43
43
44
44
44
45
45
45
45
45
45
45
45
45
46
46
47
47
47
47
47
47
47
47
47
47
47
48
48
49
49
50
50
50
50
50
50
50
51
51
51
51
51
51
51
51
51
51
51
52
52
52
52
52
52
52
52
52
52
52
52
52
53
53
53
53
53
53
54
54
54
54
55
55
55
56
56
56
56
56
56
56
56
56
56
57
57
57
57
57
58
58
58
59
59
59
60
61
62
62
63
63
64
65
65
65
65
65
65
65
65
65
65
65
65
66
66
66
66
66
67
67
68
68
68
69
70
70
70
70
70
70
71
71
71
71
71
71
72
73
73
73
73
73
73
73
73
73
73
73
74
74
74
74
74
74
74
74
74
74
74
74
74
74
75
75
75
75
75
75
75
76
76
77
77
78
78
78
78
78
78
78
78
78
78
78
78
78
78
79
79
79
79
79
79
80
80
80
80
80
80
80
80
8

75
76
76
76
76
76
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
78
79
79
79
80
80
80
80
80
80
80
80
81
81
81
81
81
82
82
82
83
83
84
84
84
85
85
85
85
85
86
86
87
88
88
89
89
90
90
90
90
90
90
91
91
91
91
91
91
91
91
91
92
93
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
95
95
95
95
95
95
95
95
96
96
96
97
97
97
97
97
97
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
100
Necro from the Hip Hop genre is being processed!
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
6
6
6
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
8
8
8
9
10
10
10
10
10
10
10
11
11
11
11
11
11
11
12
12
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
14
14
14
15
15
15
15
15
15
15
15
16
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
18
18
19
19
20
21
21
21
21
21
21
2

7
7
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
10
10
10
11
12
13
13
13
14
15
15
15
15
15
15
15
15
15
15
15
15
15
15
16
16
16
16
16
16
16
16
16
16
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
18
18
18
18
18
18
18
18
18
19
19
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
21
22
22
22
22
22
22
22
22
23
23
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
24
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
26
26
27
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
28
29
29
30
30
31
32
32
32
33
34
34
35
35
35
35
35
35
36
36
36
37
37
37
37
37
37
37
38
38
38
39
39
39
39
39
39
39
39
40
40
41
41
42
42
42
43
44
44
44
44
44
44
44
44
44
44
45
45
45
45
45
45
46
46
46
46
46
47
47
47
47
47
47
47
47
47
47
48
48
48
49
49
49
49
49
50
50
50
50
50
50
51
51
52
52
52
53
5

60
60
60
60
60
60
61
61
61
61
61
61
61
61
61
61
61
61
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
63
64
64
64
64
65
65
65
66
66
66
66
67
67
67
67
67
67
67
68
68
68
68
68
68
69
69
70
70
70
71
72
72
72
73
73
73
73
73
73
73
74
74
74
74
74
74
74
74
74
74
74
74
75
76
76
76
76
76
76
77
77
77
77
77
77
77
78
79
79
79
79
79
80
80
80
80
80
80
80
80
80
80
81
82
82
82
82
82
82
82
82
82
82
82
82
82
82
82
82
83
83
83
83
83
83
83
83
83
84
84
84
84
84
84
85
85
85
85
85
85
85
85
85
85
86
86
87
87
87
87
87
87
87
87
87
87
87
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
89
89
90
90
90
90
90
91
91
91
91
91
92
92
92
93
93
93
94
94
94
94
94
94
95
95
95
95
95
95
95
96
96
97
97
98
98
98
98
98
98
98
98
98
98
98
98
98
98
99
100
Lloyd Banks from the Hip Hop genre is being processed!
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
4
5
6
6
7
8
8
8
8
8
8
8
8
8
8
8
8
8
8
9
9
9
9
9
9
9
9
9
9
9
10
10
10
10
10
10
10
10
10
10
10
10
10
11
11
11
11
11
11
11
11
11
11
11
12
13
14
14
15
15
15
15
15
15

49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
51
51
51
52
52
52
52
52
52
52
52
52
52
53
54
54
54
54
54
54
54
54
54
54
54
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
57
57
57
57
57
57
57
57
57
57
57
57
57
58
58
58
58
58
58
58
58
58
58
59
59
60
60
60
60
60
60
60
60
61
62
62
62
62
62
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
63
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
65
65
65
65
65
65
65
66
67
67
67
67
67
67
68
68
68
68
69
69
69
69
70
70
70
70
70
70
70
71
71
71
71
71
71
71
71
72
72
72
72
72
72
72
72
72
72
72
72
72
72
72
72
73
73
73
73
73
73
74
74
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
76
77
77
77
77
77
78
78
78
78
78
78
78
78
78
78
78
78
78
78
78
78
79
79
80
80
80
80
80
8

20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
21
21
21
22
22
22
22
22
22
22
22
22
22
22
22
22
22
23
23
23
24
25
25
25
25
25
25
25
25
25
26
26
26
26
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
27
28
28
28
28
29
29
29
29
29
29
29
29
29
29
29
29
30
30
30
30
30
30
31
31
31
31
32
32
33
34
35
35
35
35
35
36
36
36
36
36
36
36
36
36
36
36
36
36
37
37
37
38
38
38
38
38
39
40
40
40
41
41
41
41
41
41
41
41
41
41
41
41
41
41
41
42
42
42
42
42
42
43
43
43
43
43
43
43
43
43
43
43
43
44
44
44
44
44
44
45
45
45
46
46
46
46
46
46
46
46
46
47
47
47
47
47
47
47
48
48
48
49
49
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
51
52
52
52
52
52
52
52
52
52
53
53
53
53
53
54
55
55
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
56
57
57
57
58
58
58
58
58
58
59
59
59
59
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
61
61
61
61
61
61
61
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
62
6

50
51
52
53
54
55
56
57
58
59
60
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
79
79
80
81
82
83
84
85
86
87
88
88
89
90
91
92
93
94
95
95
96
96
96
97
98
99
100
0.7969758314568866
3900
Leæther Strip from the (Electronic) Dance genre is being processed!
1
2
3
4
5
6
6
6
6
7
8
9
10
11
12
12
13
13
13
14
14
15
16
17
18
18
18
19
20
20
21
22
22
23
24
25
26
27
28
29
30
30
30
31
32
32
33
34
35
36
37
38
39
39
39
40
40
41
42
42
43
44
44
45
46
47
48
49
50
51
52
53
53
54
55
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
82
82
83
84
84
84
85
86
87
88
89
90
90
91
92
93
93
93
94
95
95
96
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
98
98
99
99
99
99
99
99
99
99
99
100
0.8213028177252728
4000
Joan Baez from the Folk genre is being processed!
1
2
3
4
4
5
6
6
6
7
8
9
10
11
12
12
12
12
13
14
14
15
15
16
17
17
18
19
20
21
21
22
23
24
25
26
27
28
28
28
29
30
31
31
32
32
33
34
35
36
37
38
39
39
39
40
40
40
41
41
41
41
41
42
42
43
43
44
44
4

84
84
85
85
85
85
85
85
85
85
85
85
85
85
85
86
86
86
86
86
87
88
88
88
88
88
89
89
89
89
89
89
89
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
91
91
91
91
91
91
91
91
91
91
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
93
93
93
93
93
94
94
94
95
95
95
95
95
95
95
95
95
95
95
95
96
96
96
96
96
96
96
96
96
96
96
96
96
96
97
98
98
98
98
98
98
98
98
98
98
98
99
99
100
0.8571006128469085
5000
James Brown from the R&B genre is being processed!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
16
17
18
19
20
21
21
22
23
24
25
26
27
28
28
28
29
30
31
32
33
34
35
36
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
50
51
52
53
54
55
56
56
56
57
57
57
58
59
60
61
61
62
62
63
63
64
65
66
67
67
68
69
69
70
70
71
71
72
73
74
75
76
76
76
77
78
79
79
80
81
82
83
84
85
85
86
87
87
87
88
89
89
90
91
91
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
93
93
93
93
93
93
93
93
93
93
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
95
95
95
95


97
97
97
97
98
98
98
98
98
98
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
100
Lloyd from the R&B genre is being processed!
1
2
2
3
3
4
4
5
6
7
7
7
7
8
8
8
9
9
10
10
11
11
11
11
12
12
13
14
14
14
14
15
15
16
17
17
17
18
19
20
21
22
22
23
23
23
23
24
24
24
24
25
26
26
26
26
27
28
29
29
29
29
29
30
30
30
30
30
31
32
32
33
33
33
33
33
33
34
34
34
35
36
36
37
37
37
37
38
39
40
41
41
42
42
43
43
44
44
44
44
44
44
45
45
45
45
45
46
47
47
47
47
47
47
47
47
47
47
47
47
48
49
49
49
50
50
50
50
50
50
50
51
51
51
51
51
51
51
51
51
52
52
52
52
52
52
53
53
53
53
53
53
53
53
54
54
54
54
54
54
55
55
55
55
56
57
57
57
57
58
59
59
59
59
60
60
60
60
60
60
60
60
60
60
60
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
61
62
63
63
63
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
65
65
66
66
67
67
67
67
67
67
68
68
68
69
69
69
69
69
69
69
70
70
70
70
70
70
70
70
70
70
71
71
71
71
71
71
71
71
72
72
72
72
72
72
72
72
72
72
72
72
72
72
72


55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
74
75
76
76
76
77
78
79
80
81
82
83
83
84
85
86
87
88
89
90
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
91
92
92
92
92
92
92
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
94
95
95
95
95
95
95
95
95
95
95
95
95
95
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
97
97
97
98
98
98
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
9

59
60
60
61
62
63
64
65
65
66
67
67
67
68
69
70
71
72
72
73
74
75
76
77
77
78
79
80
81
82
82
83
83
84
85
86
87
88
89
89
90
90
90
91
91
92
93
93
93
93
94
95
95
96
97
97
97
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
100
0.8279624591343646
7500
Bobby Darin from the Pop genre is being processed!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
19
19
20
21
22
23
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
55
56
57
58
59
60
61
61
62
63
64
65
66
67
68
69
70
70
71
72
72
73
74
75
76
76
77
78
79
80
81
82
83
84
85
85
86
87
88
89
90
91
92
93
94
95
96
97
98
98
99
100
0.799664634294115
7600
Avril Lavigne from the Pop genre is being processed!
1
2
2
3
3
4
5
6
7
7
8
9
10
11
12
13
13
13
14
15
16
17
18
19
20
20
21
22
23
24
25
26
27
27
28
28
29
29
30
31
32
33
34
35
36
37
38
39
39
40
40
40
40
41
41
42
42
43
44
45
46
47
47
47
48
49
50
50
51
52
53
53
54
55
56
56
56
57
58
58
58
5

4
5
6
7
8
9
10
11
12
13
14
15
15
15
16
17
18
19
20
21
22
23
24
25
26
26
27
28
29
29
30
31
32
33
34
35
36
37
37
38
38
39
40
41
42
43
44
45
46
46
46
47
48
48
49
49
49
50
50
51
52
53
54
54
55
56
57
58
59
60
61
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
80
80
81
82
83
83
83
84
85
86
87
88
89
90
90
91
92
93
94
94
95
96
97
98
98
98
98
99
100
0.7990649994350998
8500
Brainstorm from the Heavy Metal genre is being processed!
1
2
3
4
4
5
5
6
6
7
8
8
9
10
10
10
11
11
11
11
12
13
14
15
15
15
15
16
16
16
16
17
17
17
17
17
17
18
18
19
19
19
19
19
19
19
19
20
20
20
20
21
21
22
22
22
22
23
23
23
23
23
23
23
24
25
26
27
27
27
27
27
28
29
30
30
30
30
30
30
30
30
31
31
31
32
32
32
32
33
33
34
34
34
35
36
36
37
37
38
38
38
38
38
38
38
38
39
39
39
39
39
40
40
41
41
42
42
42
43
43
43
44
45
45
46
46
47
47
47
47
47
47
47
47
47
48
48
48
48
48
49
49
49
49
49
49
49
49
49
49
50
50
50
50
50
51
51
51
51
51
51
51
51
51
51
51
51
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
52
53
54
54
54


31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
56
57
58
59
59
60
60
61
62
63
64
65
66
67
68
68
69
70
71
72
72
72
73
74
75
76
77
78
79
80
81
82
83
84
85
85
86
86
87
87
88
88
89
90
91
91
91
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
93
94
94
94
94
94
94
95
95
95
95
95
95
95
95
95
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
96
97
97
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
98
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
100
0.8388180234189605
9200
Larry Norman from the Blues genre is being processed!
1
2
3
4
5
6
7
7
8
9
9
10
11
12
13
14
14
15
16
17
18
19
20
21
21
22
22
23
23
24
24
25
25
25
25
26
26
26
27
28
28
28
28
29
30
30
31
31
32
33
34
35
35
35
36
37
38
39
40
40
40
41
41
42
43
43
43
44

79
79
79
79
80
80
81
81
81
82
82
83
83
83
83
83
84
84
84
84
84
84
85
85
85
85
85
85
85
85
85
85
85
85
85
85
85
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
87
87
87
87
87
87
87
87
88
88
88
88
88
88
88
88
88
88
88
88
88
89
89
89
90
90
90
91
92
92
92
93
93
93
93
93
93
93
94
94
94
94
94
94
94
94
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
95
96
96
96
97
97
97
97
97
97
97
98
98
98
99
100
Napalm Death from the Punk genre is being processed!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
25
26
27
28
29
30
30
31
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
46
47
48
49
49
50
51
52
53
54
55
56
57
58
59
60
61
62
62
63
64
65
66
67
67
68
68
69
70
71
72
72
73
74
75
75
76
77
78
79
80
80
81
82
83
84
85
86
87
88
89
90
91
91
92
93
94
95
96
96
97
98
99
100
0.7985830986472479
10300
Bad Religion from the Punk genre is being processed!
1
2
3
4
4
5
6
7
8
9
10
11
12
13
14
15
16
17
17
18
19
20
21
22
23
23
24
25
26
27
28
28
28
28
29
30
31
3

83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
0.7976097709795029
11100
Asia from the Rock genre is being processed!
1
2
3
3
4
5
6
7
7
8
9
9
10
11
12
13
14
15
16
17
18
19
20
21
21
22
23
24
25
26
27
28
29
29
30
31
31
32
33
34
35
35
35
35
36
37
38
39
39
40
40
41
42
43
43
44
44
45
45
46
47
47
47
48
49
49
50
51
52
52
53
54
55
56
57
58
59
59
60
61
62
63
63
64
65
65
66
67
67
68
69
70
71
72
73
73
74
75
76
76
76
77
77
77
77
77
78
78
78
79
79
79
79
79
79
79
79
79
79
79
79
80
80
80
80
80
80
80
80
80
80
80
80
80
81
81
81
81
81
81
81
81
81
82
82
82
83
83
83
83
83
83
84
84
84
84
84
84
85
85
86
86
86
86
86
86
87
87
87
87
87
88
89
89
89
90
90
90
90
91
91
91
91
91
91
91
91
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
92
93
93
93
94
95
95
95
95
95
95
95
95
95
95
96
96
96
96
96
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
97
98
98
98
98
98
98
98
98
98
98
98
99
99
99
99
100
0.8556129962290832
11200
Van Morrison from the Ro

3- Some songs are duplicates in the selected subset. We should detect those and replace them with non-existing ones

In [100]:
# detect and record duplicates
duplicate_ids = list()
for ids in subset_song_id_list:
    if subset_song_id_list.count(ids) > 1:
        duplicate_ids.append(ids)
duplicate_ids = set(duplicate_ids)
print(duplicate_ids)

{'ObjectId(5714dec725ac0d8aee3b3749)', 'ObjectId(5714dec725ac0d8aee3b374d)', 'ObjectId(5714dec725ac0d8aee3b374e)', 'ObjectId(5714dec725ac0d8aee3b374b)'}


In [104]:
# find the owners of these songs
for song_id in duplicate_ids:
    print(EN_PRGE_metadata_dict[song_id][2])

Billie Holiday
Billie Holiday
Billie Holiday
Billie Holiday


In [109]:
# seems that the problematic artist is Billie Holiday. find all of his song ids in the collection, 
# remove those items, and then randomly select 100 songs from Billie Holiday and append
print(len(subset_song_id_list))
for song_id in set(parent_genre_collection_dict["Jazz"]["Billie Holiday"]):
    try:
        subset_song_id_list.remove(song_id)
        print(len(subset_song_id_list), "when song with id", song_id, "removed")
    except:
        print(song_id, "is not in our collection")
billie_holliday = random.sample(parent_genre_collection_dict["Jazz"]["Billie Holiday"],100)
subset_song_id_list.extend(billie_holliday)
print(len(subset_song_id_list))

12000
11999 when song with id ObjectId(5714dec725ac0d8aee3b3ad9) removed
11998 when song with id ObjectId(5714dec725ac0d8aee3b3aca) removed
11997 when song with id ObjectId(5714dec725ac0d8aee3b374d) removed
11996 when song with id ObjectId(5714dec725ac0d8aee3b39b6) removed
11995 when song with id ObjectId(5714dec725ac0d8aee3b378c) removed
11994 when song with id ObjectId(5714dec725ac0d8aee3b385f) removed
11993 when song with id ObjectId(5714dec725ac0d8aee3b3969) removed
11992 when song with id ObjectId(5714dec725ac0d8aee3b3abb) removed
11991 when song with id ObjectId(5714dec725ac0d8aee3b375f) removed
ObjectId(5714dec725ac0d8aee3b3acf) is not in our collection
11990 when song with id ObjectId(5714dec725ac0d8aee3b37d5) removed
11989 when song with id ObjectId(5714dec725ac0d8aee3b3abc) removed
11988 when song with id ObjectId(5714dec725ac0d8aee3b395c) removed
11987 when song with id ObjectId(5714dec725ac0d8aee3b3916) removed
11986 when song with id ObjectId(5714dec725ac0d8aee3b3adc) remo

With a manuel detection, it has been discovered that Billie Holiday has not 4 extra songs. 3 of those are duplicates. Remove 3 duplicates + 1 randomly selected song, manually.

In [113]:
subset_song_id_list.remove('ObjectId(5714dec725ac0d8aee3b3749)')
subset_song_id_list.remove('ObjectId(5714dec725ac0d8aee3b374d)')
subset_song_id_list.remove('ObjectId(5714dec725ac0d8aee3b374b)')
subset_song_id_list.remove('ObjectId(5714dec725ac0d8aee3b3ab7)')

Finally check that we have 100 songs per each artist in the dataset, and a total of 120 artists

In [114]:
counter_dict = dict()
for ids in subset_song_id_list:
    try:
        counter_dict[EN_PRGE_metadata_dict[ids][2]] += 1
    except:
        counter_dict[EN_PRGE_metadata_dict[ids][2]] = 1
print(counter_dict)
print(len(counter_dict))

{'Cab Calloway': 100, 'Michael Franks': 100, 'Billie Holiday': 100, 'Paul Anka': 100, 'The Manhattan Transfer': 100, 'Sarah Vaughan': 100, 'Diana Krall': 100, 'Tony Bennett': 100, 'Al Jarreau': 100, 'Ella Fitzgerald': 100, 'Ferlin Husky': 100, 'Reba McEntire': 100, 'Trisha Yearwood': 100, 'Loretta Lynn': 100, 'Dwight Yoakam': 100, 'Jim Ed Brown': 100, 'Freddie Hart': 100, 'The Oak Ridge Boys': 100, 'David Allan Coe': 100, 'George Strait': 100, 'Jay-Z': 100, 'Kanye West': 100, 'T.I.': 100, 'Three 6 Mafia': 100, 'Tyga': 100, 'Ludacris': 100, '50 Cent': 100, 'Wiz Khalifa': 100, 'Rick Ross': 100, 'Canibus': 100, 'Björk': 100, 'Brave Combo': 100, 'Tricky': 100, 'The Jackson 5': 100, 'Wumpscut': 100, 'Skinny Puppy': 100, 'Praga Khan': 100, 'Bobby O': 100, 'Joy Electric': 100, 'Leæther Strip': 100, 'Joan Baez': 100, 'Greg Brown': 100, "Daniel O'Donnell": 100, 'Judy Collins': 100, 'Devendra Banhart': 100, 'Bruce Cockburn': 100, 'Pete Seeger': 100, 'Josh Rouse': 100, 'Gordon Lightfoot': 100, 'A

To finalize our subset, let's form a dictionary that takes song_id's as its keys, and a list of ['artist_name', 'genre', 'lyrics'] as its values.

In [116]:
sub_dataset = dict()
for song_id in subset_song_id_list:
    sub_dataset[song_id] = [EN_PRGE_metadata_dict[song_id][2], \
                            children_to_parent_genre_dict[EN_PRGE_metadata_dict[song_id][1]], EN_PRGE_metadata_dict[song_id][5]]

In [117]:
# some examples
print(list(sub_dataset.items())[1100:1103])

[('ObjectId(5714dee125ac0d8aee4f0fdc)', ['Reba McEntire', 'Country', "She saw it in the window, just a callin' out her name.\nShe mowed the grass, took out the trash and saved, saved, saved.\nShe bought it on a Monday, had a gig on Friday night.\nIn the garage, in front of her mom, she came alive!\n\nShe likes to play, she loves to rock.\nYeah, she's closer to the bottom but she's headed for the top.\nShe's got a dream to be a star dressed in black like Johnny Cash, with a pink guitar.\n\nNo, she didn't go to college, she just up and hit the road.\nWhere ever they were jamming she would go go go.\nAnd every single hole in the wall from here to Shreveport, she'd have them in the palm of her hands, screamin' for more!\n\nChrous:\nShe likes to play, she loves to rock!\nYeah, she's closer to the bottom but she's headed for the top.\nShe's got a dream to be a star dressed in black like Johnny Cash, with a pink guitar.\n\nSome day you're gonna see her up there on the Opry stage.\nAnd soon yo

In [118]:
# write the sub_dataset dictionary to a pickle file
writePickle(sub_dataset, "sub_dataset")

From this moment on, we'll convert our dictionary to a csv file

In [119]:
sub_dataset = readPickle("sub_dataset")

In [120]:
# start with a dictionary that maps ids to lyrics
ids2lyrics = dict((ids, info[2]) for ids, info in sub_dataset.items())

In [121]:
# collect two separate lists of unique artist and genre labels
unique_artist_list = sorted(list(set(list(info[0] for info in sub_dataset.values()))))
unique_genre_list = sorted(list(set(list(info[1] for info in sub_dataset.values()))))

# create 2 dictionaries that map artist_labels to integer values, and vice versa
artist2id = dict((a, i+1) for i, a in enumerate(unique_artist_list))
id2artist = dict((i+1, a) for i, a in enumerate(unique_artist_list))

# the same for genre_labels
genre2id = dict((a, i+1) for i, a in enumerate(unique_genre_list))
id2genre = dict((i+1, a) for i, a in enumerate(unique_genre_list))

In [122]:
writePickle(artist2id, "artist2id")
writePickle(id2artist, "id2artist")
writePickle(genre2id, "genre2id")
writePickle(id2genre, "id2genre")

In [123]:
# continue with another dictionary that maps artist labels to a list of her song_ids in the dataset
artist_to_song_ids = dict((artist, []) for artist in unique_artist_list)
for ids, info in sub_dataset.items():
    artist_to_song_ids[info[0]].append(ids)

Now, we'll use a function to write our complete sub-dataset into a csv file. <br>
The format of this csv will be like: artist_label_index, genre_label_index, lyrics

In [124]:
import csv

# training csv file
#with open('train.csv', 'w', newline='') as file:
with open('sub_dataset.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for artist_label, id_list in artist_to_song_ids.items():
        for ids in id_list:
            writer.writerow([artist2id[artist_label],genre2id[sub_dataset[ids][1]], ids2lyrics[ids]])


In [125]:
# In other files, for retrieving the complete sub_dataset from the csv file above, use the following helper function

import string
import numpy as np
import pandas as pd
from keras.utils.np_utils import to_categorical

def load_data():
    
    data = pd.read_csv('sub_dataset.csv', header=None)
    data = data.dropna()

    x = data[2]
    x = np.array(x)

    y_artist = data[0] - 1
    y_artist = to_categorical(y_artist)
    
    y_genre = data[1] - 1
    y_genre = to_categorical(y_genre)
    
    return (x, y_artist, y_genre)