#### Haiku Detector
Haikus have been around for centuries, this Japanese form of poetry originally consisted of nature themed short poems, which were exactly 17 phonemes long. These 17 phonemes are then divided over three lines, following a 5, 7, 5 pattern. Haikus by modern writers often do not include nature as a theme, but rather emphasize the funny and unexpected charm of a well-written haiku. This code takes modern song lyrics and aims to detect haikus with unexpected charm hidden within the lyrics. 

Popular packages such as nltk and syllapy are used in order to recognize the amount of syllables in lyrics, and match them to 5-7-5 syllable pattern. Any detected haikus will then be returned as output.

Dataset can be found on Kaggle at: https://www.kaggle.com/datasets/marzenah/azlyrics-recorded-songs-with-lyrics 

In [1]:
import pandas as pd
import re
import syllapy
import nltk
from nltk.corpus import words

In [2]:
# Read the dataset and display the rows and columns
data = pd.read_csv('h_artists_songs.csv') 

data.head()

Unnamed: 0,Artist_Name,Song_Title,Year,Lyrics_URL,Lyrics
0,H1GHR MUSIC,H1GHR,2020,https://www.azlyrics.com/lyrics/h1ghrmusic/h1g...,\n\r\nH1GHR\nH1GHR\nH1GHR\n\nThe clique gettin...
1,H1GHR MUSIC,Melanin Handsome,2020,https://www.azlyrics.com/lyrics/h1ghrmusic/mel...,\n\n[Romanized:]\n\nNone of your business\nEot...
2,H1GHR MUSIC,How We Rock,2020,https://www.azlyrics.com/lyrics/h1ghrmusic/how...,\n\n[Romanized:]\n\nThis is how we rock yeah\n...
3,H1GHR MUSIC,DDDD Freestyle (뚝딱Freestyle),2020,https://www.azlyrics.com/lyrics/h1ghrmusic/ddd...,\n\n[Romanized:]\n\nToo many hustlers' here\nO...
4,H1GHR MUSIC,4eva,2020,https://www.azlyrics.com/lyrics/h1ghrmusic/4ev...,\n\n[Romanized:]\n\nH1GHR than the sky so fire...


In [3]:
# Load only the Lyrics column and display the new dataframe
df = pd.DataFrame(data['Lyrics'])

df

Unnamed: 0,Lyrics
0,\n\r\nH1GHR\nH1GHR\nH1GHR\n\nThe clique gettin...
1,\n\n[Romanized:]\n\nNone of your business\nEot...
2,\n\n[Romanized:]\n\nThis is how we rock yeah\n...
3,\n\n[Romanized:]\n\nToo many hustlers' here\nO...
4,\n\n[Romanized:]\n\nH1GHR than the sky so fire...
...,...
34567,\n\n[Romanized:]\n\nMaltuwa haengdongeul kkumi...
34568,\n\n[Romanized:]\n\nNaneun malhae I don't care...
34569,\n\n[Romanized:]\n\nYojeum ttara yeminhae deo\...
34570,\n\r\n무뎌져 My pain\n난 너를 보면 타올라\n취할 것 같아 Awake\...


In [19]:
# Convert the lyrics dataframe into a list
lyrics = df['Lyrics'].tolist()

# Clean lyrics from interpunction and other non-text characters such as whitespace
lyrics = [re.sub(r"[^\w\s]", ' ', lyric) for lyric in lyrics]

# Remove the word "Romanized" as it is not part of the lyrics
lyrics = [re.sub('Romanized', ' ', text) for text in lyrics]

# Show example lyrics
lyrics[:20]

['\n\r\nH1GHR\nH1GHR\nH1GHR\n\nThe clique getting big bring a bigger table\nAll we do is win name a bigger label\nKings and Queens come claim your throne\nThe fallen angels singing our song they\nTried dying a legacy only to fail\nThis is the jungle not for the weak and frail\nYield the power given otherwise lose it\nLook at how we living H1GHR MUSIC\nLook at how we living H1GHR MUSIC\nLook at how we living H1GHR MUSIC\nAV and Souf Souf the streets on lock\nJay Park  Sik K the kings of pop\nPH 1  HAON yeah they never flop\nLook at how we living H1GHR MUSIC\nChaCha my partner in crime dollar signs\nGochild  Phe REDS  BIG Naughty  WOOGIE  and GroovyRoom\nH1GHR MUSIC we killin father time\nCause we foreva  foreva eva was outcasted\nThey try to kill us but we back alive\nThis the motha fuckin H1GHR academy TRADE L\nWelcome to the family yeah you can t refute it\nLook at how we living H1GHR MUSIC\n\nH1GHR\nWe livin  it up H1GHR\nH1GHR\nH1GHR\nEverything we do we make it golden\nH1GHR\nWe ju

In [5]:
# Download the English word corpus from NLTK
nltk.download('words')

# Create an English dictionary object
english_words = set(words.words())

# Function to filter out non-English words
lyrics = [word for line in lyrics for word in line.split() if word.lower() in english_words]

# Show example lyrics
lyrics[:20]

[nltk_data] Downloading package words to /home/codespace/nltk_data...
[nltk_data]   Package words is already up-to-date!


['The',
 'clique',
 'getting',
 'big',
 'bring',
 'a',
 'bigger',
 'table',
 'All',
 'we',
 'do',
 'is',
 'win',
 'name',
 'a',
 'bigger',
 'label',
 'and',
 'come',
 'claim']

In [18]:
# Function to check for 5, 7, 5 patterns in lyrics

def recognize_575_pattern(words):

    print_line = []
    lines = []

    # Initialize syllable count list
    syllable_counts = []
    count = 0

    # Make sure full words are used to count syllables instead of parts of words
    for word in words:
        print_line.append(word)
        count += syllapy.count(word)
        if count == 5 or count == 12:  # Checks for the fifth and twelfth syllable and appends to syllable_counts if true
            syllable_counts.append(count)
            lines.append(print_line)
            print_line = []
            
    if count == 17:
        syllable_counts.append(count)  # Append the syllable count for the last line
        lines.append(print_line)
            
    
    return lines, syllable_counts == [5, 12, 17] # Return if the counts are 5, 12, 17 in that order (5-7-5 pattern)


# Check every line for matching patterns and print the found haikus
for line in [lyrics[i:i+10] for i in range(0, len(lyrics), 5)]:
    lines, pattern_matched = recognize_575_pattern(line)
    
    if pattern_matched:
        print("This lyric follows the 5-7-5 pattern: \n")

        # Print the haiku in three lines following the 575 pattern
        for line in lines:
            print(' '.join(line))
        print("\n")


This lyric follows the 5-7-5 pattern: 

say so Because I
flex big Melanin handsome
Melanin handsome


This lyric follows the 5-7-5 pattern: 

Ne nae nae bogo
nan nae Melanin handsome
Melanin handsome


This lyric follows the 5-7-5 pattern: 

more thing I tell ya
Banger Melanin handsome
Melanin handsome


This lyric follows the 5-7-5 pattern: 

that activation
Never stop that s on God
world domination


This lyric follows the 5-7-5 pattern: 

What about tango
Purge Locked and loaded gang
already Ready


This lyric follows the 5-7-5 pattern: 

UP IT UP Stupid
enemy Magazine Back
Pack Value Gamble


This lyric follows the 5-7-5 pattern: 

winning Everyday
we on a champagne diet
Everyday we on


This lyric follows the 5-7-5 pattern: 

winning Everyday
we on a champagne diet
Everyday we on


This lyric follows the 5-7-5 pattern: 

winning Everyday
we on a champagne diet
Everyday we on


This lyric follows the 5-7-5 pattern: 

winning Everyday
we on a champagne diet
Everyday we on


This lyr