## Summary
The notebook's primary function is to transform a dataset of song lyrics from simple text to a structured multi-task learning dataset, suitable for teaching a model both phoneme recognition and couplet generation. Starting with a .csv file, it cleans the data and then transforms the song lyrics into discrete lines. These lines are then paired into rhyming couplets. The culmination of the notebook's process is the creation of a dataset that serves dual purposes: it facilitates phoneme-to-grapheme translation (and vice versa) and bridges the first line to the second in couplets. This dual-purpose dataset is instrumental for the model to not only anticipate the subsequent line in a couplet but also to comprehend the rhythm and meter inherent in the song lyrics, attributed to the phonemic patterns.

In [1]:
%matplotlib inline

In [2]:
from google.colab import drive
drive.mount('/content/MyDrive')

Mounted at /content/MyDrive


In [3]:
import pandas as pd

lyrics_df = pd.read_csv("/content/MyDrive/MyDrive/NLP Project/lyrics.csv")
print(lyrics_df.head())
lyrics_df = lyrics_df[["Title","Artist", "Lyrics"]]
lyrics_df.rename({"Lyrics": "lyrics_g", "Title": "title", "Artist": "artist"}, axis=1, inplace=True)
print(lyrics_df.head())
lyrics_df.head()
lyrics_df.describe()


                                   Title                   Artist  \
0                   Love The Way You Lie     Eminem feat. Rihanna   
1            Godzilla (feat. Juice WRLD)  Eminem feat. Juice WRLD   
2                            The Monster     Eminem feat. Rihanna   
3                             Not Afraid                   Eminem   
4  Venom (Music From The Motion Picture)                   Eminem   

                                              Lyrics  
0  Just gonna stand there and watch me burn?\nWel...  
1  (Ugh, you're a monster)\n\nI can swallow a bot...  
2  I'm friends with the monster that's under my b...  
3  Yeah, It's been a ride...\nI guess I had to go...  
4  I got a song filled with shit for the strong-w...  
                                   title                   artist  \
0                   Love The Way You Lie     Eminem feat. Rihanna   
1            Godzilla (feat. Juice WRLD)  Eminem feat. Juice WRLD   
2                            The Monster     Emi

Unnamed: 0,title,artist,lyrics_g
count,1204,1204,1204
unique,1059,645,1105
top,WAP (feat. Megan Thee Stallion),Rihanna,It may not mean nothin' to y'all\nBut understa...
freq,4,30,4


## Clean Lyrics

In [4]:
#apply regex to remove "Embed" from end of each lyric
import re
pattern = r"\d+Embed$"

lyrics_df["lyrics_g"] = lyrics_df["lyrics_g"].str.replace(pattern, "")

In [5]:
#convert to lowercase
lyrics_df["lyrics_g"] = lyrics_df["lyrics_g"].apply(lambda x: x.lower())

In [6]:
#drop duplicates
lyrics_df.drop_duplicates(inplace=True)

In [7]:
#drop languages that aren't english
!pip install langdetect
from langdetect import detect

def detect_lyric(x):
    try:
        return detect(x)
    except:
        return

lyrics_df["language"] = lyrics_df["lyrics_g"].apply(detect_lyric)
lyrics_df= lyrics_df[lyrics_df.language == "en"]

Collecting langdetect
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m276.5/981.5 kB[0m [31m8.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: langdetect
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone
  Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993227 sha256=73cfdbca984d01124c6ab2eccd2e9b8d89dc48688872d71a047f2ccfb9e25a8e
  Stored in directory: /root/.cache/pip/wheels/95/03/7d/59ea870c70ce4e5a370638b5462a7711ab78fba2f655d05106
Successfully built langdetect
Installing collected packages: langdetect
Successfully installed langdetect-1.0.9


In [8]:
lyrics_df = lyrics_df[lyrics_df["artist"] != "Anuel AA"]

In [9]:
#drop all empty lyrics
lyrics_df = lyrics_df[lyrics_df["lyrics_g"]!=""]

In [10]:
lyrics = lyrics_df["lyrics_g"].to_list()
print(lyrics[0])

just gonna stand there and watch me burn?
well, that's alright because i like the way it hurts
just gonna stand there and hear me cry?
well, that's alright because i love the way you lie
i love the way you lie

i can't tell you what it really is
i can only tell you what it feels like
and right now, there's a steel knife in my windpipe
i can't breathe, but i still fight while i can fight
as long as the wrong feels right, it's like i'm in flight
high off of love, drunk from my hate
it's like i'm huffing paint, and i love her the more i suffer, i suffocate
and right before i'm about to drown, she resuscitates me
she fuckin' hates me, and i love it

"wait! where you going?" "i'm leaving you"
"no, you ain't, come back"
we're running right back, here we go again
it's so insane 'cause when it's going good, it's going great
i'm superman with the wind at his back, she's lois lane
but when it's bad, it's awful
i feel so ashamed, i snapped, "who's that dude?"
i don't even know his name
i laid han

In [11]:
lyrics = list(map(lambda x: x.replace("::", ":"),lyrics))
print(lyrics[0])

just gonna stand there and watch me burn?
well, that's alright because i like the way it hurts
just gonna stand there and hear me cry?
well, that's alright because i love the way you lie
i love the way you lie

i can't tell you what it really is
i can only tell you what it feels like
and right now, there's a steel knife in my windpipe
i can't breathe, but i still fight while i can fight
as long as the wrong feels right, it's like i'm in flight
high off of love, drunk from my hate
it's like i'm huffing paint, and i love her the more i suffer, i suffocate
and right before i'm about to drown, she resuscitates me
she fuckin' hates me, and i love it

"wait! where you going?" "i'm leaving you"
"no, you ain't, come back"
we're running right back, here we go again
it's so insane 'cause when it's going good, it's going great
i'm superman with the wind at his back, she's lois lane
but when it's bad, it's awful
i feel so ashamed, i snapped, "who's that dude?"
i don't even know his name
i laid han

In [12]:
verses = [lyric.split(":") for lyric in lyrics]

In [13]:
import itertools

verses = list(itertools.chain(*verses))
display(verses)

['just gonna stand there and watch me burn?\nwell, that\'s alright because i like the way it hurts\njust gonna stand there and hear me cry?\nwell, that\'s alright because i love the way you lie\ni love the way you lie\n\ni can\'t tell you what it really is\ni can only tell you what it feels like\nand right now, there\'s a steel knife in my windpipe\ni can\'t breathe, but i still fight while i can fight\nas long as the wrong feels right, it\'s like i\'m in flight\nhigh off of love, drunk from my hate\nit\'s like i\'m huffing paint, and i love her the more i suffer, i suffocate\nand right before i\'m about to drown, she resuscitates me\nshe fuckin\' hates me, and i love it\n\n"wait! where you going?" "i\'m leaving you"\n"no, you ain\'t, come back"\nwe\'re running right back, here we go again\nit\'s so insane \'cause when it\'s going good, it\'s going great\ni\'m superman with the wind at his back, she\'s lois lane\nbut when it\'s bad, it\'s awful\ni feel so ashamed, i snapped, "who\'s th

In [14]:
def create_couplets(verses: list):
    couplets = []
    for i in range(1,len(verses)):
        couplet = verses[i-1] + "\n" + verses[i]
        couplets.append(couplet)

    return couplets


couplets = create_couplets(verses)
#display(couplets)

couplets_df = pd.DataFrame(couplets, columns=["couplets_g"])

In [15]:
! pip install phyme

Collecting phyme
  Downloading Phyme-0.0.9.tar.gz (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: phyme
  Building wheel for phyme (setup.py) ... [?25l[?25hdone
  Created wheel for phyme: filename=Phyme-0.0.9-py3-none-any.whl size=1379042 sha256=c7111766b7d8730b714c8c773aa5d8bd511b4a7ab554120c654eb35ec9ff9635
  Stored in directory: /root/.cache/pip/wheels/e1/d6/ec/c4ab763b3515017f54b2144837ae5eac2ce58cd748410377d4
Successfully built phyme
Installing collected packages: phyme
Successfully installed phyme-0.0.9


In [16]:
! /Users/austinpaxton/anaconda3/envs/lyric_generation_capstone/bin/python3.10 -m pip install phyme

/bin/bash: line 1: /Users/austinpaxton/anaconda3/envs/lyric_generation_capstone/bin/python3.10: No such file or directory


https://github.com/jameswenzel/Phyme#

In [17]:
import re

def get_last_words(couplet):
    last_words = []
    lines = couplet.split("\n")
    for line in lines:
        line_words = line.split(" ")
        last_word = re.sub(r"[^a-zA-Z]+", "",line_words[-1]) #remove everything that is not a letter from last word
        last_words.append(last_word)
    return last_words


couplets_df["last_words"] = couplets_df["couplets_g"].apply(get_last_words)

In [18]:
import itertools
import re
from Phyme import Phyme

ph = Phyme()


def check_perfect_rhyme(words: list, ph: Phyme) -> bool:
    if len(words[0])==1:
        return False
    try:
        # get rhymes for first word in list and reformat output dictionary into a list
        rhymes = ph.get_perfect_rhymes(words[0]).values()
        rhymes = list(itertools.chain(*rhymes))
        pattern = "\(\d\)" # remove (2)
        rhymes = [re.sub(pattern,"", rhyme) for rhyme in rhymes]
        if words[1] in rhymes:
            return True
        else:
            return False
    except KeyError as ke:
        return f"{ke} NOT FOUND"

couplets_df["rhyme"] = couplets_df.apply(lambda row: check_perfect_rhyme(row["last_words"],ph),axis=1)

In [19]:
display(couplets_df)
couplets_df.groupby("rhyme").count()

Unnamed: 0,couplets_g,last_words,rhyme
0,just gonna stand there and watch me burn?\nwel...,"[burn, hurts, cry, lie, lie, , is, like, windp...",False
1,"(ugh, you're a monster)\n\ni can swallow a bot...","[monster, , godzilla, dealer, party, the, mani...",False
2,i'm friends with the monster that's under my b...,"[bed, head, breath, crazy, , newsweek, choosey...",True
3,"yeah, it's been a ride...\ni guess i had to go...","[ride, one, place, me, there, , em, em, mayhem...",False
4,i got a song filled with shit for the strong-w...,"[strongwilled, deal, you, belong, field, mitoc...",'STRONGWILLED' NOT FOUND
...,...,...,...
1093,00 and then three more are bein' cracked (fact...,"[facts, , therapy, that, whack, haha, , facts,...",False
1094,i guess right now you've got the last laugh\n\...,"[laugh, , uninterested, indifferent, here, her...",False
1095,"lemme shout out bobby cause, 6ix in there like...","[like, goodness, squad, god, this, my, go, yea...",False
1096,"bitch, i did it, i made it, i'm loved and i'm ...","[hated, gated, faded, related, side, lied, , u...",False


Unnamed: 0_level_0,couplets_g,last_words
rhyme,Unnamed: 1_level_1,Unnamed: 2_level_1
False,772,772
True,178,178
'' NOT FOUND,6,6
'ADHD' NOT FOUND,1,1
'AHAHAH' NOT FOUND,1,1
...,...,...
'YALL' NOT FOUND,3,3
'YAYO' NOT FOUND,1,1
'YC' NOT FOUND,1,1
'YERP' NOT FOUND,1,1


In [20]:
rhyme_couplets_df = couplets_df[couplets_df["rhyme"] ==True]
display(rhyme_couplets_df)

Unnamed: 0,couplets_g,last_words,rhyme
2,i'm friends with the monster that's under my b...,"[bed, head, breath, crazy, , newsweek, choosey...",True
14,"feels like a close, it's coming to\nfuck am i ...","[to, do, over, know, , is, song, along, droppe...",True
20,"yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah...","[yeah, yeah, woo, whatever, check, em, , timel...",True
29,"yeah\nyeah, yeah\ni said, i feel invincible (y...","[yeah, yeah, me, to, wait, hahaha, yeah, grrt,...",True
40,"uh-huh, uh-huh (yeah)\nuh, uh, uh-huh (yeah)\n...","[yeah, yeah, yeah, look, , glocks, tecs, dots,...",True
...,...,...,...
1080,you must remember me\nyou must remember me\nle...,"[me, me, know, know, know, ayy, know, know, , ...",True
1086,i know that you think this song is for you\ni ...,"[you, you, you, you, , woah, long, long, long,...",True
1091,"aw, yeah\nevery time i fall, you know i get up...","[yeah, yeah, yeah, yeah, yeah, yeah, yeah, yea...",True
1092,yeah\nmy anxiety was takin' over (yeah)\nremov...,"[yeah, yeah, on, dreams, damn, , game, plane, ...",True


In [21]:
def split_couplet(couplet):
    lines = couplet.split("\n")
    return lines


rhyme_couplets_df["couplet_split"] = rhyme_couplets_df["couplets_g"].apply(split_couplet)
rhyme_couplets_df["line1_g"] = rhyme_couplets_df["couplet_split"].apply(lambda x: x[0])
rhyme_couplets_df["line2_g"] = rhyme_couplets_df["couplet_split"].apply(lambda x: x[1])
display(rhyme_couplets_df)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rhyme_couplets_df["couplet_split"] = rhyme_couplets_df["couplets_g"].apply(split_couplet)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rhyme_couplets_df["line1_g"] = rhyme_couplets_df["couplet_split"].apply(lambda x: x[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rhyme_couplets_df["line2_g"

Unnamed: 0,couplets_g,last_words,rhyme,couplet_split,line1_g,line2_g
2,i'm friends with the monster that's under my b...,"[bed, head, breath, crazy, , newsweek, choosey...",True,[i'm friends with the monster that's under my ...,i'm friends with the monster that's under my bed,get along with the voices inside of my head
14,"feels like a close, it's coming to\nfuck am i ...","[to, do, over, know, , is, song, along, droppe...",True,"[feels like a close, it's coming to, fuck am i...","feels like a close, it's coming to",fuck am i gonna do?
20,"yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah...","[yeah, yeah, woo, whatever, check, em, , timel...",True,"[yeah (yeah, yeah, yeah, yeah, yeah, yeah, yea...","yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah)","what? (uh) what? (yeah, yeah, yeah, yeah, yeah..."
29,"yeah\nyeah, yeah\ni said, i feel invincible (y...","[yeah, yeah, me, to, wait, hahaha, yeah, grrt,...",True,"[yeah, yeah, yeah, i said, i feel invincible (...",yeah,"yeah, yeah"
40,"uh-huh, uh-huh (yeah)\nuh, uh, uh-huh (yeah)\n...","[yeah, yeah, yeah, look, , glocks, tecs, dots,...",True,"[uh-huh, uh-huh (yeah), uh, uh, uh-huh (yeah),...","uh-huh, uh-huh (yeah)","uh, uh, uh-huh (yeah)"
...,...,...,...,...,...,...
1080,you must remember me\nyou must remember me\nle...,"[me, me, know, know, know, ayy, know, know, , ...",True,"[you must remember me, you must remember me, l...",you must remember me,you must remember me
1086,i know that you think this song is for you\ni ...,"[you, you, you, you, , woah, long, long, long,...",True,"[i know that you think this song is for you, i...",i know that you think this song is for you,i used to long for you and adore you
1091,"aw, yeah\nevery time i fall, you know i get up...","[yeah, yeah, yeah, yeah, yeah, yeah, yeah, yea...",True,"[aw, yeah, every time i fall, you know i get u...","aw, yeah","every time i fall, you know i get up (aw, yeah)"
1092,yeah\nmy anxiety was takin' over (yeah)\nremov...,"[yeah, yeah, on, dreams, damn, , game, plane, ...",True,"[yeah, my anxiety was takin' over (yeah), remo...",yeah,my anxiety was takin' over (yeah)


In [22]:
#remove boring couplets where line 1 is same as line 2
rhyme_couplets_df = rhyme_couplets_df[rhyme_couplets_df["line1_g"]!= rhyme_couplets_df["line2_g"]]
display(rhyme_couplets_df)

Unnamed: 0,couplets_g,last_words,rhyme,couplet_split,line1_g,line2_g
2,i'm friends with the monster that's under my b...,"[bed, head, breath, crazy, , newsweek, choosey...",True,[i'm friends with the monster that's under my ...,i'm friends with the monster that's under my bed,get along with the voices inside of my head
14,"feels like a close, it's coming to\nfuck am i ...","[to, do, over, know, , is, song, along, droppe...",True,"[feels like a close, it's coming to, fuck am i...","feels like a close, it's coming to",fuck am i gonna do?
20,"yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah...","[yeah, yeah, woo, whatever, check, em, , timel...",True,"[yeah (yeah, yeah, yeah, yeah, yeah, yeah, yea...","yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah)","what? (uh) what? (yeah, yeah, yeah, yeah, yeah..."
29,"yeah\nyeah, yeah\ni said, i feel invincible (y...","[yeah, yeah, me, to, wait, hahaha, yeah, grrt,...",True,"[yeah, yeah, yeah, i said, i feel invincible (...",yeah,"yeah, yeah"
40,"uh-huh, uh-huh (yeah)\nuh, uh, uh-huh (yeah)\n...","[yeah, yeah, yeah, look, , glocks, tecs, dots,...",True,"[uh-huh, uh-huh (yeah), uh, uh, uh-huh (yeah),...","uh-huh, uh-huh (yeah)","uh, uh, uh-huh (yeah)"
...,...,...,...,...,...,...
1079,i could give a fuck the shit you on\nbut i'm r...,"[on, on, gone, yeah, on, on, gone, long, , one...",True,"[i could give a fuck the shit you on, but i'm ...",i could give a fuck the shit you on,but i'm really tryna put you on
1086,i know that you think this song is for you\ni ...,"[you, you, you, you, , woah, long, long, long,...",True,"[i know that you think this song is for you, i...",i know that you think this song is for you,i used to long for you and adore you
1091,"aw, yeah\nevery time i fall, you know i get up...","[yeah, yeah, yeah, yeah, yeah, yeah, yeah, yea...",True,"[aw, yeah, every time i fall, you know i get u...","aw, yeah","every time i fall, you know i get up (aw, yeah)"
1092,yeah\nmy anxiety was takin' over (yeah)\nremov...,"[yeah, yeah, on, dreams, damn, , game, plane, ...",True,"[yeah, my anxiety was takin' over (yeah), remo...",yeah,my anxiety was takin' over (yeah)


In [25]:
# write to csv so it can be phonemized
rhyme_couplets_df.to_csv("/content/MyDrive/MyDrive/NLP Project/rhyme_couplets.csv",index=False)

In [26]:
!pip install phonemizer
from phonemizer import phonemize, version
from phonemizer.separator import Separator
import pandas as pd
from datetime import datetime



In [27]:
!sudo apt-get install festival espeak-ng mbrola

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  alsa-utils espeak-ng-data festlex-cmu festlex-poslex festvox-kallpc16k
  libatopology2 libespeak-ng1 libestools2.5 libfftw3-single3 libpcaudio0
  libsonic0 sgml-base
Suggested packages:
  dialog pidgin-festival festival-freebsoft-utils libfftw3-bin libfftw3-dev
  mbrola-voice cicero sgml-base-doc
The following NEW packages will be installed:
  alsa-utils espeak-ng espeak-ng-data festival festlex-cmu festlex-poslex
  festvox-kallpc16k libatopology2 libespeak-ng1 libestools2.5 libfftw3-single3
  libpcaudio0 libsonic0 mbrola sgml-base
0 upgraded, 15 newly installed, 0 to remove and 45 not upgraded.
Need to get 13.2 MB of archives.
After this operation, 39.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 sgml-base all 1.30 [12.5 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 li

In [29]:
# from festival import festival
input_file = "/content/MyDrive/MyDrive/NLP Project/rhyme_couplets.csv"
couplets_df = pd.read_csv(input_file)

line1_g =couplets_df["line1_g"]
line2_g =couplets_df["line2_g"]
# print(line1_g)

line1_p = []
line2_p = []
batch_size =200

for i in range(0,len(couplets_df),batch_size):
#prepare batches for efficiency while phoonemizing
  if i + batch_size<len(couplets_df):
    batch1 = line1_g[i:i+batch_size]
    batch2 = line2_g[i:i+batch_size]
  else:
    batch1 = line1_g[i:len(couplets_df)]
    batch2 = line2_g[i:len(couplets_df)]
  # try:
    print(batch1)
    #phonemize batches
    batch1_p = phonemize(batch1, language='en-us',
    backend='festival',separator=Separator(phone="-", word=' ',
    syllable='|'), strip=True)
    print(batch1_p)

    batch2_p = phonemize(batch2, language='en-us',
    backend='festival',separator=Separator(phone="-", word=' ',
    syllable='|'), strip=True)

    line1_p.extend(batch1_p)
    line2_p.extend(batch2_p)

  # except:
  #   print(f"phonemization failed for batch:{i}")
  #   # Add None placeholders for failed batches
  #   line1_p.extend([None] * len(batch1))
  #   line2_p.extend([None] * len(batch2))

couplets_df["line1_p"] = line1_p
couplets_df["line2_p"] = line2_p
# phonemize_df(couplets_df, "line1_g", "line1_p")
# phonemize_df(couplets_df, "line2_g", "line2_p")

couplets_df.to_csv("/content/MyDrive/MyDrive/NLP Project/rhyme_couplets_f-phonemized_07-30-23.csv", index=False)

0      i'm friends with the monster that's under my bed
1                    feels like a close, it's coming to
2       yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah)
3                                                  yeah
4                                 uh-huh, uh-huh (yeah)
                             ...                       
135                 i could give a fuck the shit you on
136          i know that you think this song is for you
137                                            aw, yeah
138                                                yeah
139                     where is my mind? it's far away
Name: line1_g, Length: 140, dtype: object
['ay-m f-r-eh-n-d-z w-ih-dh dh-ax m-aa-n|s-t-er dh-ae-t-s ah-n|d-er m-ay b-eh-d', 'f-iy-l-z l-ay-k ax k-l-ow-s ih-t-s k-ah|m-ax-ng t-ax', 'y-ae y-ae y-ae y-ae y-ae y-ae y-ae y-ae', 'y-ae', 'ah hh-ah ah hh-ah y-ae', 'ay l-ay-k m-ay y-ah|m-iy y-eh|l-ow y-eh|l-ow', 'b-ey|b-iy w-eh-n w-iy g-aa-n s-l-ay-d', 'eh-m eh-m eh-m ah ah ah ah', 'dh-ax-s

# ------------Reload Phonemized CSV and Assemble tasks for training----------

In [30]:

import pandas as pd
couplets_gp_df = pd.read_csv("/content/MyDrive/MyDrive/NLP Project/rhyme_couplets_f-phonemized_07-30-23.csv")

display(couplets_gp_df)

Unnamed: 0,couplets_g,last_words,rhyme,couplet_split,line1_g,line2_g,line1_p,line2_p
0,i'm friends with the monster that's under my b...,"['bed', 'head', 'breath', 'crazy', '', 'newswe...",True,"[""i'm friends with the monster that's under my...",i'm friends with the monster that's under my bed,get along with the voices inside of my head,ay-m f-r-eh-n-d-z w-ih-dh dh-ax m-aa-n|s-t-er ...,g-eh-t ax|l-ao-ng w-ih-dh dh-ax v-oy|s-ax-z ih...
1,"feels like a close, it's coming to\nfuck am i ...","['to', 'do', 'over', 'know', '', 'is', 'song',...",True,"[""feels like a close, it's coming to"", 'fuck a...","feels like a close, it's coming to",fuck am i gonna do?,f-iy-l-z l-ay-k ax k-l-ow-s ih-t-s k-ah|m-ax-n...,f-ah-k ae-m ay g-aa|n-ax d-uw
2,"yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah...","['yeah', 'yeah', 'woo', 'whatever', 'check', '...",True,"['yeah (yeah, yeah, yeah, yeah, yeah, yeah, ye...","yeah (yeah, yeah, yeah, yeah, yeah, yeah, yeah)","what? (uh) what? (yeah, yeah, yeah, yeah, yeah...",y-ae y-ae y-ae y-ae y-ae y-ae y-ae y-ae,w-ah-t ah w-ah-t y-ae y-ae y-ae y-ae y-ae y-ae...
3,"yeah\nyeah, yeah\ni said, i feel invincible (y...","['yeah', 'yeah', 'me', 'to', 'wait', 'hahaha',...",True,"['yeah', 'yeah, yeah', ""i said, i feel invinci...",yeah,"yeah, yeah",y-ae,y-ae y-ae
4,"uh-huh, uh-huh (yeah)\nuh, uh, uh-huh (yeah)\n...","['yeah', 'yeah', 'yeah', 'look', '', 'glocks',...",True,"['uh-huh, uh-huh (yeah)', 'uh, uh, uh-huh (yea...","uh-huh, uh-huh (yeah)","uh, uh, uh-huh (yeah)",ah hh-ah ah hh-ah y-ae,ah ah ah hh-ah y-ae
...,...,...,...,...,...,...,...,...
135,i could give a fuck the shit you on\nbut i'm r...,"['on', 'on', 'gone', 'yeah', 'on', 'on', 'gone...",True,"['i could give a fuck the shit you on', ""but i...",i could give a fuck the shit you on,but i'm really tryna put you on,ay k-uh-d g-ih-v ax f-ah-k dh-ax sh-iy-t y-uw ...,b-ah-t ay-m r-ih|l-iy t-r-ih|n-ax p-uh-t y-uw ...
136,i know that you think this song is for you\ni ...,"['you', 'you', 'you', 'you', '', 'woah', 'long...",True,"['i know that you think this song is for you',...",i know that you think this song is for you,i used to long for you and adore you,ay n-ow dh-ae-t y-uw th-ih-ng-k dh-ax-s s-ao-n...,ay y-uw-z-d t-ax l-ao-ng f-ao-r y-uw ae-n-d ax...
137,"aw, yeah\nevery time i fall, you know i get up...","['yeah', 'yeah', 'yeah', 'yeah', 'yeah', 'yeah...",True,"['aw, yeah', 'every time i fall, you know i ge...","aw, yeah","every time i fall, you know i get up (aw, yeah)",ao y-ae,ax|v-er|iy t-ay-m ay f-ao-l y-uw n-ow ay g-eh-...
138,yeah\nmy anxiety was takin' over (yeah)\nremov...,"['yeah', 'yeah', 'on', 'dreams', 'damn', '', '...",True,"['yeah', ""my anxiety was takin' over (yeah)"", ...",yeah,my anxiety was takin' over (yeah),y-ae,m-ay ax-ng|z-ay|ax|t-iy w-ax-z t-ey|k-ax-n ow|...


In [31]:
#check for lines rthat failed to phonemize
couplets_gp_df[(couplets_gp_df["line1_p"] == None) |(couplets_gp_df["line2_p"] == None)]

Unnamed: 0,couplets_g,last_words,rhyme,couplet_split,line1_g,line2_g,line1_p,line2_p


In [32]:
import random

# create tasks for multi-task learning
# < line 1 grapheme =1G|2G= line 2 grapheme>
# <line 1 phoneme =1P|2P= line 2 phoneme>
# [ line 1 grapheme =1G|1P= line 1 phoneme]
# [line 1 phoneme =1P|1G= line 1 grapheme ]
# [line 2 grapheme =2G|2P=line 2 phoneme]
# [line 2 phoneme =2P|2G= line 2 grapheme]


line1_g = couplets_gp_df["line1_g"].to_list()
line2_g = couplets_gp_df["line2_g"].to_list()
line1_p = couplets_gp_df["line1_p"].to_list()
line2_p = couplets_gp_df["line2_p"].to_list()

tasks = []

for i in range(len(line1_g)):
    tasks.append(f"~ {line1_g[i]} =1G->2G= {line2_g[i]} ~")
    tasks.append(f"~ {line1_p[i]} =1P->2P= {line2_p[i]} ~")
    tasks.append(f"[ {line1_g[i]} =1G->1P= {line1_p[i]} ]")
    tasks.append(f"[ {line1_p[i]} =1P->1G= {line1_g[i]} ]")
    tasks.append(f"[ {line2_g[i]} =2G->2P= {line2_p[i]} ]")
    tasks.append(f"[ {line2_p[i]} =2P->2G= {line2_g[i]} ]")

random.shuffle(tasks)

display(len(tasks))
train_test_split = round(0.99*len(tasks))

couplets_train = tasks[:train_test_split]
couplets_test = tasks[train_test_split:]

with open("/content/MyDrive/MyDrive/NLP Project/train_couplets.txt", "w") as f:
    for task in couplets_train:
        f.write(task+"\n")

with open("/content/MyDrive/MyDrive/NLP Project/test_couplets.txt", "w") as f:
    for task in couplets_test:
        f.write(task+"\n")


840