# # Sentiment Analysis Project

# ## Import Dependencies

In [76]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
from summarizer import Summarizer
import tensorflow as tf
import numpy as np
import pandas as pd
from googleapiclient.discovery import build

# ## Define Functions

In [77]:
def load_model():
    tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
    model = TFAutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
    return tokenizer, model

def sentiment_score(tokenizer, model, review):
    tokens = tokenizer.encode(review, return_tensors='tf')
    result = model(tokens)
    return int(tf.argmax(result.logits, axis=1)) + 1

def load_reviews_into_dataframe(reviews):
    df = pd.DataFrame(np.array(reviews), columns=['comment'])
    return df

def generate_summary_extractive(reviews):
    model = Summarizer()
    summary = model('\n'.join(reviews), min_length=50, max_length=200)
    return summary

# Function to retrieve YouTube comments using YouTube Data API
def get_youtube_comments(api_key, video_id, max_results=100):
    youtube = build('youtube', 'v3', developerKey=api_key)
    request = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id,
        maxResults=max_results,
        textFormat="plainText"
    )
    response = request.execute()

    comments = [item['snippet']['topLevelComment']['snippet']['textDisplay'] for item in response['items']]
    return comments

# ## Main Execution

### We are taking a video from 'https://www.youtube.com/@SpawnPoiint' and extracting the comments from it.
### Video title : My YouTube Setup: How I Make Videos with an iPhone - Starting a YouTube Channel!
### Video link : https://www.youtube.com/watch?v=wLsJXKuJUZ4

In [78]:
# Set up YouTube API key and video ID
youtube_api_key = 'AIzaSyAg0EPQgoCBjitk82FLdD2P5cQ-BEhhIho'
youtube_video_id = 'wLsJXKuJUZ4'

In [79]:
# Collect YouTube Comments
youtube_comments = get_youtube_comments(youtube_api_key, youtube_video_id)

In [81]:
# Instantiate Model
tokenizer,model = load_model()

Some layers from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [82]:
# Load Reviews into DataFrame and Score
df = load_reviews_into_dataframe(youtube_comments)
df['sentiment'] = df['comment'].apply(lambda x: sentiment_score(tokenizer, model, x[:512]))
df.head()

Unnamed: 0,comment,sentiment
0,Did this video help? Hitting the 'THANKS' butt...,3
1,Nice Job! I realize now I have a long way to go.,5
2,Amazing video. Super helpful. Thank you for sh...,5
3,do you use the same setup now? in end of 2023?,1
4,Just started my channel. Thanks for the infor...,5


In [83]:
# Display a specific review and its sentiment score
sample_review_index = 3
print(f"Review: {df['comment'].iloc[sample_review_index]}")
print(f"Sentiment Score: {df['sentiment'].iloc[sample_review_index]}")

Review: do you use the same setup now? in end of 2023?
Sentiment Score: 1


In [84]:
# Calculate Overall Sentiment Score
overall_sentiment_score = df['sentiment'].mean()

In [85]:
# Generate Overall Summary Extractively
overall_summary = generate_summary_extractive(df['comment'])

  super()._check_params_vs_input(X, default_n_init=10)


In [86]:
# Display Overall Sentiment Score and Summary
print("Overall Sentiment Score:", overall_sentiment_score)
print("\nOverall Summary Extracted:", overall_summary)

Overall Sentiment Score: 4.515151515151516

Overall Summary Extracted: Hitting the 'THANKS' button helps me out too 🙏
Nice Job! Two years after you made this video, I found it and really enjoyed it! This was extremely helpful and validating 👏👏thank you !! Can you share a template to your excel planning sheet? I’ve been wanting to make more complex videos for my academic channel. This is incredible 🤷🏾‍♂️
I enjoyed watching this video. I am annoyed at it as i have many ideas i want to share for my business, but do not know how to get rid of these black sides. I appreciate you taking the time to walk us through the process. Liked and subscribed...
That was a great video, Thank you brother
Shot on iPhone. all the settings are on point
Technically you've been filming with Sony
Which software do you use to edit such lovely video
Outstanding! Keep it up bro
I'm so sick and tired of people advertising crap, then they promote what they got with their code, feeling like a pyramid scheme. I’m als