# YouTube Comments Sentiment Analysis

Sentiment analysis is the process of detecting positive or negative sentiment in text. It’s often used by businesses to detect sentiment in social data, gauge brand reputation, and understand customers.

In this project, I performed a sentiment analysis on the popular media platform, YouTube. This project has been broken down into the following sections:

### Polarity
Each comment of the GBcomments dataset is analyzed to determine the polarity of the text contained in the comment.  It is scored using polarity values that range from 1 to -1. Values closer to 1 indicate more positivity, while values closer to -1 indicate more negativity. The data was collected from a Youtube video (video id: jt2OHQh0HoQ) and the sentiment analysis was conducted using the TextBlob library. A wordcloud was generated using WordCloud library for both the positive and negative comments to find the most common words used in each.

### Trending Tags
Video data located in the USvideos dataset collected from YouTube was analyzed to find the most commonly used tags on each video. A very simple analysis was performed and a wordcloud was generated showin the most common used tags.

### Correlation of Views, Likes, and Dislikes
Using the same USvideos dataset, I performed an analysis on the likes, dislikes, and views of each video. Using Matplot, I generated a regression plot to analyze the correlation between views and dislikes, as well as a plot to analyze views and likes. A heatmap was also generated to view the correlation matrix.

### Emoji Analysis
Using the same USvideos dataset, I performed an analysis of all of the emojis used in each comment, under each video included in the dataset. 



In [1]:
#import all of the necessary libraries to perform sentiment analysis

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [2]:
#Install textblob library
!pip install textblob




In [3]:
from textblob import TextBlob


In [4]:
comments = pd.read_csv('/Users/breyannabroughton/Desktop/DA/Youtube Project/Data Resources/GBcomments.csv', error_bad_lines = False)



  exec(code_obj, self.user_global_ns, self.user_ns)
b'Skipping line 113225: expected 4 fields, saw 5\n'
b'Skipping line 158379: expected 4 fields, saw 7\nSkipping line 241590: expected 4 fields, saw 5\nSkipping line 245637: expected 4 fields, saw 7\n'
b'Skipping line 521402: expected 4 fields, saw 5\n'


**Here after importing the libraries and the YouTube comments dataset, I'll explore the data to the see the various datatypes and drop rows with missing values (if any)**

In [5]:
comments.head()

Unnamed: 0,video_id,comment_text,likes,replies
0,jt2OHQh0HoQ,It's more accurate to call it the M+ (1000) be...,0,0
1,jt2OHQh0HoQ,To be there with a samsung phone\n😂😂😂,1,0
2,jt2OHQh0HoQ,"Thank gosh, a place I can watch it without hav...",0,0
3,jt2OHQh0HoQ,What happened to the home button on the iPhone...,0,0
4,jt2OHQh0HoQ,Power is the disease. Care is the cure. Keep...,0,0


**Check for *null* values in the dataset**

In [6]:
comments.isna().sum()

video_id         0
comment_text    28
likes            0
replies          0
dtype: int64

**Since only a few rows (28) had missing data, I found it better to drop those and work on the remaining data and set the inplace parameter as True to update the dataframe.**

In [7]:
comments.dropna(inplace = True)

In [8]:
#call the sentiment polarity of the first comment
TextBlob('Its more accurate to call it the M+ (1000) be...').sentiment.polarity

0.45000000000000007

**I'm running the sentiment polarity for the whole dataset and adding a polarity column to the dataset to display the results.**

In [None]:
comments['polarity']=polarity

In [12]:
polarity=[]

for i in comments['comment_text']:
    polarity.append(TextBlob(i).sentiment.polarity)

comments.head(20)

In [None]:
## filter the dataframe ##

In [None]:
comments_positive=comments[comments['polarity']==1]

In [None]:
comments_positive.shape


In [None]:
comments_positive.head()

In [None]:
# Wordcloud representation of sentiments
#!pip install wordcloud

In [None]:
from wordcloud import WordCloud,STOPWORDS

In [None]:
stopwords=set(STOPWORDS)

In [None]:
# call function to display words used in positive comments

In [None]:
total_comments=' '.join(comments_positive['comment_text'])

In [None]:
wordcloud=WordCloud(width=1000,height=500,stopwords=STOPWORDS).generate(total_comments)

In [None]:
plt.figure(figsize=(15,5))
plt.imshow(wordcloud)
plt.axis('off')

**The most common words used in positive comments are 'awesome', 'best', 'perfect', 'beautiful'.**

In [None]:
# call function to display words in negative comments

In [None]:
comments_negative=comments[comments['polarity']==-1]

In [None]:
comments_negative.head()

In [None]:
total_comments=' '.join(comments_negative['comment_text'])

In [None]:
wordcloud=WordCloud(width=1000,height=500,stopwords=STOPWORDS).generate(total_comments)

In [None]:
plt.figure(figsize=(15,5))
plt.imshow(wordcloud)
plt.axis('off')

**The most common words used in negative comments were 'boring', 'terrible', 'worst', and 'horrible'.**

## Analyzing Trending Tags and Views 
### *What are the trending tags on YouTube?*

In [None]:
#import USvideos dataset

In [None]:
videos = pd.read_csv('/Users/breyannabroughton/Desktop/DA/Youtube Project/Data Resources/USvideos.csv', error_bad_lines = False)

In [None]:
videos.head()

In [None]:
tags_complete=' '.join(videos['tags'])

In [None]:
tags_complete.head()

In [None]:
# remove caracters that are not alphabetical

In [None]:
import re

In [None]:
tags=re.sub('[^a-zA-Z]',' ',tags_complete)

In [None]:
tags

In [None]:
#remove extra spacing

In [None]:
tags=re.sub(' +',' ',tags)

In [None]:
# create wordcloud for trending tags on YouTube

In [None]:
wordcloud=WordCloud(width=1000,height=500,stopwords=set(STOPWORDS)).generate(tags)

In [None]:
plt.figure(figsize=(15,5))
plt.imshow(wordcloud)
plt.axis('off')

**The top trending tags amongst US YouTube videos are 'iPhone X', 'makeup tutorial', 'music video', and 'none'.**

# Perform analysis on likes, views, and dislikes and find how they are co-related with eachother

In [None]:
sns.regplot(data=videos,x='views',y='likes')
plt.title('Regression plot for views & likes')

**Regression plot shows a positive correlation between views and likes.**

In [None]:
sns.regplot(data=videos,x='views',y='dislikes')
plt.title('Regression plot for views & dislikes')

**Regression plot shows a positive correlation between views and dislikes.**

In [None]:
#draw a correlation matrix for views, likes, and dislikes

In [None]:
df_corr=videos[['views','likes','dislikes']]

In [None]:
df_corr.corr()

In [None]:
sns.heatmap(df_corr.corr(),annot= True)

**Views and likes has highest correlation.**

## Perform Emoji's analysis in comments

In [None]:
comments.head()

In [None]:
comments['comment_text'][1]

In [None]:
#identify unicode character for certain emojis

In [None]:
print('\U0001F600')

In [None]:
#import emoji library

In [None]:
!pip install emoji

In [None]:
import emoji

In [None]:
len(comments)

In [None]:
comment=comments['comment_text'][1]

In [None]:
[c for c in comment if c in emoji.UNICODE_EMOJI_ENGLISH]

In [None]:
str=''
for i in comments['comment_text']:
    list=[c for c in i if c in emoji.UNICODE_EMOJI_ENGLISH]
    for ele in list:
        str=str+ele
    

In [None]:
len(str)

In [None]:
str

**333,278 emojis were counted in all of the comments. Each emoji is then indexed to find out how many times it was used total**

In [None]:
result={}
for i in set(str):
    result[i]=str.count(i)

In [None]:
result

In [None]:
result.items()

In [None]:
final={}
for key, value in sorted(result.items(),key =lambda item:item[1]):
    final[key]=value


In [None]:
final 

In [None]:
keys=[*final.keys()]

In [None]:
keys

In [None]:
values=[*final.values()]

In [None]:
values

**Create a dataframe of the indexed emojis for readability.**

In [None]:
df=pd.DataFrame({'chars' :keys[-20:],'num':values[-20:]})

In [None]:
df

**Install the plotly library to create a bar chart that displays a visual of the emoji analysis.**

In [None]:
!pip install plotly

In [None]:
import plotly.graph_objs as go
from plotly.offline import iplot

In [None]:
trace=go.Bar(
x=df['chars'],
y=df['num']
)

iplot([trace])

**This bar chart displays each emoji and how many times it was used in the dataset.**

In [None]:
##fin