### Named entity recognition counting tests

from:
https://www.wisecube.ai/blog/named-entity-recognition-ner-with-python/

categories list:
https://www.kaggle.com/code/curiousprogrammer/entity-extraction-and-classification-using-spacy

In [1]:
import pandas as pd
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
ner_categories = ["PERSON", "NORP", "ORG", "GPE", "PRODUCT", "WORK_OF_ART", "EVENT",
                  "TIME", "MONEY", "QUANTITY", "ORDINAL", "CARDINAL", "FAC"]

In [4]:
df_path = "dataframes/genres/combined.csv"
df = pd.read_csv(df_path, index_col = 0)
df.head()

Unnamed: 0,Artist,Song Title,Full Title,Release Date,Year,Month,Day,Pageviews,url,featured_count,producer_count,writer_count,Song Lyrics,Artist Image
0,Aaliyah,Are You That Somebody?,Are You That Somebody? by Aaliyah (Ft. Timbaland),1998-05-26,1998.0,5.0,26.0,373960,https://genius.com/Aaliyah-are-you-that-somebo...,1,1,2,79 ContributorsTranslationsFrançaisAre You Tha...,https://images.genius.com/3fea34947a97beb226fc...
1,Aaliyah,Enough Said,Enough Said by Aaliyah (Ft. Drake),2012-08-05,2012.0,8.0,5.0,316333,https://genius.com/Aaliyah-enough-said-lyrics,1,1,3,64 ContributorsEnough Said Lyrics\n(Uh) 'Cause...,https://images.genius.com/3fea34947a97beb226fc...
2,Aaliyah,At Your Best (You Are Love),At Your Best (You Are Love) by Aaliyah,1994-08-22,1994.0,8.0,22.0,285549,https://genius.com/Aaliyah-at-your-best-you-ar...,0,1,6,57 ContributorsAt Your Best (You Are Love) Lyr...,https://images.genius.com/3fea34947a97beb226fc...
3,Aaliyah,Miss You,Miss You by Aaliyah,2002-11-16,2002.0,11.0,16.0,245608,https://genius.com/Aaliyah-miss-you-lyrics,0,1,3,"36 ContributorsMiss You Lyrics\nOh, hey\nYeah-...",https://images.genius.com/3fea34947a97beb226fc...
4,Aaliyah,Age Ain’t Nothing But a Number,Age Ain't Nothing But a Number by Aaliyah,1994-12-06,1994.0,12.0,6.0,207419,https://genius.com/Aaliyah-age-aint-nothing-bu...,0,1,1,54 ContributorsAge Ain’t Nothing But a Number ...,https://images.genius.com/3fea34947a97beb226fc...


#### Save Artists with their summed lyrics to a separate df file for ease of use

In [7]:
artists = list(df['Artist'].unique())

In [12]:
all_lyrics_df = pd.DataFrame(columns = ['Artist', 'all_lyrics', 'genre', 'gender'])

for artist in artists:
    artist_df = df[df['Artist'] == artist]
    artist_lyrics = ' '.join(artist_df['Song Lyrics'].str.strip())
    artist_df = artist_df.reset_index()
    # print(artist_df.index)
    # genre = artist_df['genre'][0]
    # gender = artist_df['gender'][0]

    new_row = {
        'Artist': artist,
        'all_lyrics': artist_lyrics,
        # 'genre': genre,
        # 'gender': gender
    }

    new_row_df = pd.DataFrame([new_row])
    all_lyrics_df = pd.concat([all_lyrics_df, new_row_df])

In [14]:
all_lyrics_df.reset_index(inplace=True, drop=True)

In [16]:
all_lyrics_df.to_csv('dataframes/lyrics_combined_unclean.csv')

In [29]:
clean_lyrics_df = pd.read_csv('dataframes/lyrics_combined.csv', index_col=0)

In [28]:
# clean_lyrics_df.reset_index(inplace=True, drop=True)

In [27]:
# clean_lyrics_df.to_csv('dataframes/lyrics_combined.csv')

In [31]:
# clean_lyrics_df['genre']

In [32]:
all_lyrics_df['genre'] = clean_lyrics_df['genre']
all_lyrics_df['gender'] = clean_lyrics_df['gender']

In [33]:
all_lyrics_df.to_csv('dataframes/lyrics_combined_unclean.csv')

In [34]:
# test = df['Song Lyrics']

In [None]:
# test.str.strip()

In [37]:
test_text = all_lyrics_df['all_lyrics'][0]
print(test_text[:250])

79 ContributorsTranslationsFrançaisAre You That Somebody? Lyrics
Dirty South, can y'all really feel me?
East Coast, feel me, West Coast, feel me
Dirty South, can y'all really feel me?
East Coast, feel me, West Coast, feel me
Dirty South, can y'all re


In [38]:
test_doc = nlp(test_text)

In [41]:
entities = []
for ent in test_doc.ents:
    if ent.label_ in ner_categories:
        entities.append((ent.text, ent.label_))

In [42]:
for entity, category in entities:
    print(f"{entity}: {category}")

79: CARDINAL
one: CARDINAL
Timbaland: GPE
Japan: GPE
Aaliyah: GPE
Makin: PERSON
64: CARDINAL
ContributorsEnough: ORG
Baby: PRODUCT
Manchester: GPE
Balotelli: PERSON
150: CARDINAL
150: CARDINAL
Yo: ORG
57: CARDINAL
Haah: GPE
36: CARDINAL
Said: PERSON
Said: PERSON
Said: PERSON
Said: PERSON
Said: PERSON
Said: PERSON
Said: PERSON
54: CARDINAL
Aaliyah: GPE
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
tonight: TIME
Throwin: PERSON
Throwin: PERSON
Baby: PRODUCT
Aaliyah: GPE
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
Throwin: PERSON
43: CARDINAL
ContributorsOne: FAC
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
Love: WORK_OF_ART
one: CARDINAL
a million: CARDINAL
feelin: GPE
one: CARDINAL
a million: CARDINAL
feelin: GPE
Need: ORG
one: CARDINAL
a million: CARDINAL
feelin: GPE
one: CARDINAL
a million: CARDINAL
feelin: GPE
Anything: GPE
one: CARDINAL
a million: CA

In [43]:
spacy.displacy.render(test_doc, style="ent")