# เฉลยโจทย์: การประยุกต์ใช้โมเดลการประมวลผลภาษาธรรมชาติแบบสำเร็จรูป

In [1]:
import pandas as pd 
import spacy
import transformers

In [2]:
with open('data/singer-review.txt', 'r') as f:
    reviews = f.readlines()
# Convert to DataFrame
df = pd.DataFrame(reviews, columns=['text'])
len(df)

193

In [3]:
# Use spacy to get NER tags
nlp = spacy.load("en_core_web_sm")
def get_ner_tags(text):
    doc = nlp(text)
    # extract entity strings with PER tag
    entities = []
    for ent in doc.ents:
        if ent.label_ == 'PERSON':
            entities.append(ent.text)
    return entities[0] if len(entities) > 0 else ''

df['person'] = df['text'].apply(get_ner_tags)

In [4]:
df['person'].value_counts()

person
                     81
Madonna              23
Maddie               23
Lady Gaga            16
Taylor Swift         13
Gaga                 12
Taylor Swift’s        8
Lady Gaga’s           4
Swifty’s              3
Gaga’s                3
Taylor Swift's        2
Swifty’s Easter       1
Lady Gaga's           1
Grammy                1
Madge                 1
Gaga’s Chromatica     1
Name: count, dtype: int64

In [5]:
# Combine Madonna and Maddie and put them in 'singer' column
def combine_name(entity_mention):
    if 'Madonna' in entity_mention or 'Maddie' in entity_mention:
        return 'Madonna'
    # Combine Swift to Taylor Swift
    elif 'Swift' in entity_mention:
        return 'Taylor Swift'
    # Combine Gaga to Lady Gaga
    elif 'Gaga' in entity_mention:
        return 'Lady Gaga'
    return ''
    
df['singer'] = df['person'].apply(combine_name)

In [6]:
df['singer'].value_counts()

singer
                83
Madonna         46
Lady Gaga       37
Taylor Swift    27
Name: count, dtype: int64

In [7]:
def get_adjectives(text):
    doc = nlp(text)
    adjectives = []
    for token in doc:
        if token.pos_ == 'ADJ':
            adjectives.append(token.text)
    return adjectives
df['adjectives'] = df['text'].apply(get_adjectives)

# Combine adjective list for each singer into a single list
def combine_adjectives(adjective_list):
    combined = []
    for adjectives in adjective_list:
        combined.extend(adjectives)
    return list(set(combined))
adj_by_singer = df.groupby('singer')['adjectives'].apply(combine_adjectives)

# Print adjectives for each singer
for singer, adjectives in adj_by_singer.items():
    print(f"{singer}: {', '.join(adjectives)}\n")

: subtle, deep, Swifty, few, biggest, -, full, masterful, postmodern, academic, loyal, passionate, classic, cinematic, artistic, religious, institutional, historic, balanced, strategic, Shallow, unapologetic, red, best, perfect, stellar, unforgettable, outlive, next, deliberate, acoustic, powerful, magical, stunning, short, orchestral, lyrical, relevant, better, earlier, new, millennial, visual, pop, many, bold, cultural, personal, little, recorded, fearless, unmatched, multidimensional, countless, political, gold, much, last, tailored, latest, re, meta, separate, safe, intertextual, more

Lady Gaga: public, deep, honest, -, full, live, artistic, social, beautiful, best, mental, pure, incredible, admirable, amazing, unfiltered, post, next, chronic, magical, recent, visual, dismiss, new, traditional, equal, theatrical, fearless, vocal, garde, avant, intellectual, iconic, own, devoted, charitable, traumatic, latest, raw, emotional, jarring, greater, early, profound

Madonna: unbelievable

In [None]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis', model='cardiffnlp/twitter-roberta-base-sentiment-latest')
def get_sentiment(text):
    result = classifier(text)
    # return the label with the highest score
    return result[0]['label']
df['sentiment'] = df['text'].apply(get_sentiment)
df.groupby('singer')['sentiment'].value_counts(normalize=True)