In [94]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import pairwise_distances

pd.set_option('display.max_colwidth', 0)

In [49]:
annotations = pd.read_csv('annotations10k.tsv', sep='\t')

In [50]:
lyric_train, lyric_val, y_train, y_val = train_test_split(annotations.lyric, annotations.annotation, test_size=0.2)

In [106]:
vectorizer = TfidfVectorizer(stop_words = 'english', strip_accents='unicode')
vectorizer.fit_transform(lyric_train)

<7868x10459 sparse matrix of type '<class 'numpy.float64'>'
	with 54490 stored elements in Compressed Sparse Row format>

In [107]:
X_train, X_val = vectorizer.transform(lyric_train), vectorizer.transform(lyric_val)

In [108]:
d = pairwise_distances(X_val, X_train, metric='cosine')

In [109]:
i = np.argmin(d, axis=1)
nearest = annotations.loc[i]
neighbors, predictions = nearest.lyric, nearest.annotation

In [110]:
results = pd.DataFrame({'lyric': lyric_val.values, 'real_annotation': y_val.values, 'nearest_neighbor': neighbors.values, 'predicted_annotation': predictions.values})

In [112]:
results.sample(5)

Unnamed: 0,lyric,real_annotation,nearest_neighbor,predicted_annotation
679,About these rappers that I came after when they was boring,"Lucy has the ability to upgrade “boring” rappers, taking them from mediocrity to mega status. All that is required is you sell your soul…",But I had them singles though,"“ Jesus Walks ”, “ Diamonds ”, “ Through the Wire ”, etc."
909,"Makin' sure my punctuation curve, every letter here's true Livin' my life in the margin and that metaphor was proof","Kendrick poetically justifies why his writing is vital to hip-hop. To explain this, he sneaks 4 literary mechanisms into the inner-workings of his rhymes (punctuation, letter, margin, metaphor):","I can't call it,","Since in the next line he says ‘I got the swerve like alcoholics’, it could mean that he can’t call what will happen in his life, or predict what happens, meaning his life ‘swerves’, or makes a sudden turn. After all, it was a shocking turn of events that occurred in a second that changed Kanye’s life, and let us hear this masterpiece…"
62,Had my life threatened by best friends with selfish intents What I'm supposed to do? Ride around with a bulletproof car and some tints?,"Kanye mentions a stolen laptop in this song and in “Real Friends.” The contents are potentially life threatening, with personal information, dates, phone numbers and locations all present on the hard drive. Kanye has purchased bulletproof and armoured cars for the safety of his family.",Church is definitely on the move And we gonna continue to hustle and grow and develop by far,"This is a movement about continuing to work hard towards your goal. GLC makes an allusion back to his Gangster Disciple roots and the book that started the whole movement, “The Blueprint: From Gangster Disciple to Growth and Development.”"
1431,I don’t give a butt about no politics in rap,"“Politics” here means rap politics (i.e. gossip, rap beef, etc.), national and global politics (i.e a song like “The Blacker The Berry” . He expressed a similar disdain for rap politics on “The Heart, Pt. 2” : [BLOCKQUOTE]",I took a nap in the pulpit I never like how a suit fit,"The pulpit is the area in a Church where sermons are given and pastors preach to their congregations. Chainz is an incredibly religious person, so sleeping in the pulpit could be his apology for waiting nearly 4 years to do another collaboration with Kanye. It’s standard to wear a suit to church, although it’s rare to see 2 Chainz wearing one. He stands nearly 2 metres tall, making it difficult to tailor one to his frame."
841,"But, it'll be YOUR money No more borrowing from mom for my high!","Even though you still waste money on stupidity like getting high at least the one paying for the high now and you don’t have to trouble your mom with it. It brings a sense of accomplishment with it, even though in the end it is still wrong.","Anthony was the oldest of seven Well-respected, calm and collected Laughin' and jokin' made life easier; hard times, Momma on crack","Anthony “Top Dawg” Tiffith is the founder of , the label Kendrick signed to as a 16-year-old that he’s stuck with since."
