1. Lexical Richness
* Measure: Vocabulary diversity in the lyrics.
* Metrics:
  * Type-Token Ratio (TTR): Ratio of unique words to the total number of words.
  * Advanced Type-Token Ratio (MTLD, HD-D): More sophisticated measures of lexical richness.


In [2]:
from nltk.tokenize import word_tokenize

simple = "I love you. You love me. We are happy together."
sophisticated = "Adoration intertwines with reciprocity, forging an eternal bond of joy."

for text in [simple, sophisticated]:
    tokens = word_tokenize(text)
    unique_tokens = set(tokens)
    ttr = len(unique_tokens) / len(tokens)
    print(f"Text: {text}\nType-Token Ratio (TTR): {ttr}\n")


Text: I love you. You love me. We are happy together.
Type-Token Ratio (TTR): 0.7692307692307693

Text: Adoration intertwines with reciprocity, forging an eternal bond of joy.
Type-Token Ratio (TTR): 1.0



2. Readability Scores
* Measure: How complex the lyrics are in terms of readability.
* Metrics:
  * Flesch Reading Ease Score: Higher scores mean simpler text.
  * Gunning Fog Index: Estimates years of education needed to understand the text.
  * Dale-Chall Index: Considers the proportion of difficult words.

In [3]:
from textstat import flesch_reading_ease, gunning_fog, dale_chall_readability_score

simple = "The cat sat on the mat. The sun is shining."
sophisticated = "The feline gracefully reclined upon the ornate rug while the celestial sphere radiated brilliance."

for text in [simple, sophisticated]:
    print(f"Text: {text}")
    print("Flesch Reading Ease:", flesch_reading_ease(text))
    print("Gunning Fog Index:", gunning_fog(text))
    print("Dale-Chall Score:", dale_chall_readability_score(text), "\n")


Text: The cat sat on the mat. The sun is shining.
Flesch Reading Ease: 108.7
Gunning Fog Index: 2.0
Dale-Chall Score: 0.25 

Text: The feline gracefully reclined upon the ornate rug while the celestial sphere radiated brilliance.
Flesch Reading Ease: 31.89
Gunning Fog Index: 14.17
Dale-Chall Score: 13.35 



3. Semantic Depth
* Measure: Depth of meaning and abstractness.
* Metrics:
  * WordNet Synset Depth: Use WordNet (via nltk) to calculate the average depth of words in a lexical taxonomy.
  * Sentiment Complexity: Use tools like VADER or TextBlob to detect nuanced sentiment variation within the lyrics.

In [4]:
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

simple = "Love is kind and good."
sophisticated = "Altruism embodies kindness and magnanimity, transcending mundane affection."

for text in [simple, sophisticated]:
    tokens = word_tokenize(text)
    synset_depths = [max(len(ss.hypernym_paths()[0]) for ss in wn.synsets(word)) 
                     for word in tokens if wn.synsets(word)]
    avg_depth = sum(synset_depths) / len(synset_depths) if synset_depths else 0
    print(f"Text: {text}\nAverage Semantic Depth: {avg_depth}\n")

Text: Love is kind and good.
Average Semantic Depth: 8.75

Text: Altruism embodies kindness and magnanimity, transcending mundane affection.
Average Semantic Depth: 6.571428571428571



4. Syntactic Complexity
* Measure: Sentence structure and grammatical sophistication.
* Metrics:
  * Average Sentence Length: Number of words per sentence.
  * Parse Tree Depth: Using NLP parsers like spaCy to calculate the depth of syntactic trees.
  * Clause-to-Sentence Ratio: Ratio of clauses to sentences.


In [5]:
import spacy

nlp = spacy.load("en_core_web_sm")

simple = "The dog runs fast."
sophisticated = "Bounding swiftly across the verdant meadow, the canine displayed unparalleled agility."

for text in [simple, sophisticated]:
    doc = nlp(text)
    sent_lengths = [len(sent.text.split()) for sent in doc.sents]
    avg_length = sum(sent_lengths) / len(sent_lengths)
    print(f"Text: {text}\nAverage Sentence Length: {avg_length}\n")


Text: The dog runs fast.
Average Sentence Length: 4.0

Text: Bounding swiftly across the verdant meadow, the canine displayed unparalleled agility.
Average Sentence Length: 11.0



5. Rhyme Density and Patterning
* Measure: Intricacy of rhyming schemes and patterns.
* Metrics:
  * Rhyme Density: Ratio of rhyming words to total words.
  * Rhyme Complexity: Use phonetic matching (e.g., pronouncing library) to analyze internal rhymes or multisyllabic rhymes.

In [6]:
import pronouncing

simple = "The cat sat on the mat."
sophisticated = "Though the twilight fades, serenades of cascading shades pervade."

for text in [simple, sophisticated]:
    lines = text.split(".")
    rhyme_pairs = 0
    for line in lines:
        words = line.split()
        if len(words) > 1 and pronouncing.rhymes(words[-1]):
            rhymes = [w for w in pronouncing.rhymes(words[-1]) if w in words]
            rhyme_pairs += len(rhymes)
    rhyme_density = rhyme_pairs / len(text.split())
    print(f"Text: {text}\nRhyme Density: {rhyme_density}\n")


Text: The cat sat on the mat.
Rhyme Density: 0.3333333333333333

Text: Though the twilight fades, serenades of cascading shades pervade.
Rhyme Density: 0.0



6. Sentiment Variability
* Measure: Range of emotions expressed in the lyrics.

In [7]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import statistics

analyzer = SentimentIntensityAnalyzer()

simple = "I am happy. I feel good."
sophisticated = "Though elation surged like a tidal wave, an undertow of melancholic introspection lingered."

for text in [simple, sophisticated]:
    lines = text.split(". ")
    scores = [analyzer.polarity_scores(line)['compound'] for line in lines]
    variability = statistics.stdev(scores) if len(scores) > 1 else 0
    print(f"Text: {text}\nSentiment Variability: {variability}\n")


Text: I am happy. I feel good.
Sentiment Variability: 0.09298454172603096

Text: Though elation surged like a tidal wave, an undertow of melancholic introspection lingered.
Sentiment Variability: 0



7. Linguistic Uniqueness
* Measure: Rarity of the vocabulary used.
Implementation:
* Compare the words in the lyrics against a frequency list (e.g., Google Books Ngram corpus).
* Calculate the percentage of rare words.

In [8]:
from wordfreq import word_frequency

simple = "The cat sat on the mat."
sophisticated = "Ephemeral whispers danced through the corridors of oblivion."

def uniqueness_ratio(lyrics):
    tokens = lyrics.lower().split()
    clean_tokens = [word.strip(",.") for word in tokens]
    
    # Use a threshold to determine "rare" words
    threshold = 1e-6  # Frequency below this is rare
    rare_words = [word for word in clean_tokens if word_frequency(word, 'en') < threshold]
    ratio = len(rare_words) / len(clean_tokens)
    return ratio

for text in [simple, sophisticated]:
    print(f"Text: {text}\nLinguistic Uniqueness Ratio: {uniqueness_ratio(text):.2f}\n")

Text: The cat sat on the mat.
Linguistic Uniqueness Ratio: 0.00

Text: Ephemeral whispers danced through the corridors of oblivion.
Linguistic Uniqueness Ratio: 0.12

