## Beforehand...
** 1.1 NLTK Setup  **
   - Install the NLTK library (refer to the previous python file)
   - Once NLTK is installed, the text data files (corpora) should be downloaded.  See the following cell to start the download.

We will be using a song lyric dataset from Kaggle to identify songs with similar lyrics. The data set contains artists, songs and lyrics for 55K+ songs

In [1]:
# Import library
import pandas as pd
import nltk

In [2]:
data = pd.read_csv('songdata.csv')
data.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


## Question 1

In [4]:
# Clean the \n from lyrics (text column)

data['cleaned_lyrics'] = data['text'].str.replace('\n', "")
print(data['cleaned_lyrics'])

0        Look at her face, it's a wonderful face  And i...
1        Take it easy with me, please  Touch me gently ...
2        I'll never know why I had to go  Why I had to ...
3        Making somebody happy is a question of give an...
4        Making somebody happy is a question of give an...
                               ...                        
57645    Irie days come on play  Let the angels fly let...
57646    Power to the workers  More power  Power to the...
57647    all you need  is something i'll believe  flash...
57648    northern star  am i frightened  where can i go...
57649    come in  make yourself at home  i'm a bit late...
Name: cleaned_lyrics, Length: 57650, dtype: object


## Question 2



In [5]:
# List all the rows with "Imagine" in the title

data[data['song'].str.contains('Imagine')]

Unnamed: 0,artist,song,link,text,cleaned_lyrics
1769,Bon Jovi,Imagine,/b/bon+jovi/imagine_20525130.html,"Imagine there's no heaven, \nIt's easy if you...","Imagine there's no heaven, It's easy if you t..."
4215,Diana Ross,Imagine,/d/diana+ross/imagine_20040404.html,Imagine there's no heaven \nIt's easy if you ...,Imagine there's no heaven It's easy if you tr...
6885,Glee,Imagine,/g/glee/imagine_20854234.html,"Imagine there's no countries, \nIt isn't hard...","Imagine there's no countries, It isn't hard t..."
7340,Guns N' Roses,Imagine,/g/guns+n+roses/imagine_20254363.html,Imagine there's no heaven \nIt's easy if you ...,Imagine there's no heaven It's easy if you tr...
15678,Pearl Jam,Hard To Imagine,/p/pearl+jam/hard+to+imagine_20106382.html,"Paint a picture, using only gray \nLight your...","Paint a picture, using only gray Light your p..."
19748,Train,Imagine,/t/train/imagine_21054702.html,Finally met Virginia on a slow summer night \...,Finally met Virginia on a slow summer night T...
24406,Avril Lavigne,Imagine,/a/avril+lavigne/imagine_20785697.html,Imagine there's no Heaven \nIt's easy if you ...,Imagine there's no Heaven It's easy if you tr...
24783,The Beatles,Imagine,/b/beatles/imagine_20254326.html,Imagine there's no heaven \nIt's easy if you ...,Imagine there's no heaven It's easy if you tr...
29441,Demi Lovato,I Can Only Imagine,/d/demi+lovato/i+can+only+imagine_20868017.html,I can only imagine \nSurrounded by your glory...,"I can only imagine Surrounded by your glory, ..."
40519,Kirk Franklin,Imagine Me,/k/kirk+franklin/imagine+me_20370453.html,Imagine me \nLoving what I see when the mirro...,Imagine me Loving what I see when the mirror ...


## Question 3

In [6]:
# Extract the first line of lyric out from the first song.

first_sentence = data['text'].str.split("\n").str.get(0)
print(first_sentence[0])

Look at her face, it's a wonderful face  


In [12]:
# Find out the sentiment of the extracted lyric (from Question 3). hint: use nltk vader_lexicon OR textblob > sentiment method

nltk.download('vader_lexicon')     #https://www.nltk.org/howto/sentiment.html
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
scores = sid.polarity_scores(first_sentence[0])
print(scores)

#=======================================================================================================
!pip install textblob
from textblob import TextBlob
# t=TextBlob(first_sentence[0])
# print(t)
# t.sentiment

{'neg': 0.0, 'neu': 0.619, 'pos': 0.381, 'compound': 0.5719}


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\admin\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Collecting textblob
  Obtaining dependency information for textblob from https://files.pythonhosted.org/packages/02/07/5fd2945356dd839974d3a25de8a142dc37293c21315729a41e775b5f3569/textblob-0.18.0.post0-py3-none-any.whl.metadata
  Downloading textblob-0.18.0.post0-py3-none-any.whl.metadata (4.5 kB)
Downloading textblob-0.18.0.post0-py3-none-any.whl (626 kB)
   ---------------------------------------- 0.0/626.3 kB ? eta -:--:--
    --------------------------------------- 10.2/626.3 kB ? eta -:--:--
   -- ------------------------------------ 41.0/626.3 kB 667.8 kB/s eta 0:00:01
   -- ------------------------------------ 41.0/626.3 kB 667.8 kB/s eta 0:00:01
   -- ------------------------------------ 41.0/626.3 kB 667.8 kB/s eta 0:00:01
   --------- ---------------------------- 163.8/626.3 kB 821.4 kB/s eta 0:00:01
   --------- ---------------------------- 163.8/626.3 kB 821.4 kB/s eta 0:00:01
   ----------------- -------------------- 286.7/626.3 kB 983.9 kB/s eta 0:00:01
   ---------------

SentimentIntensityAnalyzer().polarity_scores
1. The neg score represents the proportion of the text that conveys a negative sentiment. It quantifies the degree of negativity in the text. (Range: 0 to 1)
2. The neu score measures the proportion of the text that is neutral or does not convey strong positive or negative sentiment. It reflects the extent to which the text is neutral. (Range: 0 to 1)
3. The pos score indicates the proportion of the text that expresses positive sentiment. It measures how much of the text conveys a positive feeling.(Range: 0 to 1)
4. The compound score is a normalized, aggregated score that combines the neg, neu, and pos scores into a single value. It represents the overall sentiment of the text on a scale from -1 (most negative) to +1 (most positive).

Textblob.sentiment
1. The polarity score is a float within the range [-1.0, 1.0]. -1 indicates highly negative sentiment, 0 indicates neutral sentiment (neither positive nor negative), and +1 indicates highly positive sentiment.
2. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective; 0 indicates highly objective or factual content and 1 indicates highly subjective or opinionated content.
3. Objective Text: Text with a subjectivity score close to 0 is likely to contain factual information, statistical data, or descriptions without personal opinions or emotional tones.
4. Subjective Text: Text with a subjectivity score closer to 1 tends to contain personal opinions, evaluations, emotions, or subjective judgments.

## Question 4

In [13]:
# Apply the textblob polarity method to analyse polarity for all lyrics and store the sentiments and subjectivities into array

def analyze_sentiment(lyric):
    # Create a TextBlob object
    tb = TextBlob(lyric)
    
    # Perform sentiment analysis
    sentiment = tb.sentiment.polarity
    subjectivity = tb.sentiment.subjectivity
    return sentiment, subjectivity

# Initialize empty lists for sentiment scores and subjectivities
sentiments = []
subjectivities = []

# Iterate over each lyric in the DataFrame
for i in range(len(data)):
    lyric = data.loc[i, 'cleaned_lyrics']
    sentiment, subjectivity = analyze_sentiment(lyric)
    sentiments.append(sentiment)
    subjectivities.append(subjectivity)

# Add sentiment and subjectivity scores to the DataFrame
data['sentiment_score'] = sentiments
data['subjectivity'] = subjectivities

# Display the cleaned lyrics, sentiment score, and subjectivity for the first 10 rows
print(data[['song', 'artist', 'cleaned_lyrics', 'sentiment_score', 'subjectivity']].head(10))

                    song artist  \
0  Ahe's My Kind Of Girl   ABBA   
1       Andante, Andante   ABBA   
2         As Good As New   ABBA   
3                   Bang   ABBA   
4       Bang-A-Boomerang   ABBA   
5     Burning My Bridges   ABBA   
6              Cassandra   ABBA   
7             Chiquitita   ABBA   
8            Crazy World   ABBA   
9        Crying Over You   ABBA   

                                      cleaned_lyrics  sentiment_score  \
0  Look at her face, it's a wonderful face  And i...         0.447619   
1  Take it easy with me, please  Touch me gently ...         0.202222   
2  I'll never know why I had to go  Why I had to ...         0.300881   
3  Making somebody happy is a question of give an...         0.355000   
4  Making somebody happy is a question of give an...         0.355000   
5  Well, you hoot and you holler and you make me ...        -0.339935   
6  Down in the street they're all singing and sho...        -0.097061   
7  Chiquitita, tell me what's 

## Question 5

In [14]:
# Export the results from pandas DataFrame to a CSV file using the to_csv method.

data.to_csv('sentiment_analysis_result.csv', index=True)
print("Results exported to sentiment_analysis_results.csv")

Results exported to sentiment_analysis_results.csv
