<a href="https://colab.research.google.com/github/unt-iialab/INFO5731_Spring2020/blob/master/Assignments/INFO5731_Assignment_Four.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **INFO5731 Assignment Four**

In this assignment, you are required to conduct topic modeling, sentiment analysis based on **the dataset you created from assignment three**.

# **Question 1: Topic Modeling**

(30 points). This question is designed to help you develop a feel for the way topic modeling works, the connection to the human meanings of documents. Based on the dataset from assignment three, write a python program to **identify the top 10 topics in the dataset**. Before answering this question, please review the materials in lesson 8, especially the code for LDA, LSA, and BERTopic. The following information should be reported:

1. Features (text representation) used for topic modeling.

2. Top 10 clusters for topic modeling.

3. Summarize and describe the topic for each cluster.


In [1]:
!pip install bertopic


Collecting bertopic
  Downloading bertopic-0.16.1-py2.py3-none-any.whl (158 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/158.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.5/158.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Collecting hdbscan>=0.8.29 (from bertopic)
  Downloading hdbscan-0.8.33.tar.gz (5.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting umap-learn>=0.5.0 (from bertopic)
  Downloading umap_learn-0.5.6-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.7/85.7 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
Collecting sentence-transformers>=0.4.1 (from bertopic)
  Downloading sentence

In [3]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation, TruncatedSVD
from bertopic import BERTopic
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')

# Load the dataset
df = pd.read_csv('reviews_with_neutral.csv')

# Preprocessing the text
def preprocess(text):
    stop_words = set(stopwords.words('english'))
    tokens = nltk.word_tokenize(text.lower())  # Tokenization and lower casing
    tokens = [token for token in tokens if token.isalpha() and token not in stop_words]  # Remove punctuation and stop words
    return " ".join(tokens)

df['cleaned_text'] = df['Review'].apply(preprocess)

# Vectorization for LDA and LSA
vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
data_vectorized = vectorizer.fit_transform(df['cleaned_text'])

# LDA Model
lda_model = LatentDirichletAllocation(n_components=10, random_state=42)
lda_topics = lda_model.fit_transform(data_vectorized)

# LSA Model
lsa_model = TruncatedSVD(n_components=10)
lsa_topics = lsa_model.fit_transform(data_vectorized)

# BERTopic Model
bertopic_model = BERTopic(min_topic_size=10)
topics, probs = bertopic_model.fit_transform(df['cleaned_text'])

# Displaying topics for LDA and LSA
def display_topics(model, feature_names, no_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic %d:" % topic_idx)
        print(" ".join([feature_names[i] for i in topic.argsort()[:-no_top_words - 1:-1]]))

print("LDA Topics:")
display_topics(lda_model, vectorizer.get_feature_names_out(), 10)

print("\nLSA Topics:")
display_topics(lsa_model, vectorizer.get_feature_names_out(), 10)

print("\nBERTopic Topics and Info:")
bertopic_model.get_topic_info()  # This will display the topic info including the size of each topic

# Optionally, to print summaries for each topic from BERTopic
topic_summaries = bertopic_model.get_topics()
for topic_num, topic_content in topic_summaries.items():
    print(f"Topic {topic_num}: {topic_content[:10]}")  # Prints top 10 words for each topic


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

LDA Topics:
Topic 0:
cleaner cleaning recommend highly attachments make love amazing power dirt
Topic 1:
cleaner money broke away definitely waste worth stay immediately recommend
Topic 2:
cleaner purchase ineffective terrible home amazed penny cleans worth regret
Topic 3:
hair pet cleaner unfortunately terrible like powerful waste makes work
Topic 4:
cleaner floors hard best like perfect makes work quick cleans
Topic 5:
cleaner carpets thanks powerful suction home dirt amazed warn upholstery
Topic 6:
useless recommend quickly broke cleaner buying unfortunately love avoid away
Topic 7:
cleaner disappointed avoid stopped working bought way extremely constantly perfect
Topic 8:
cleaner buying debris pick regret avoid fails warn immediately recommend
Topic 9:
cleaner suction purchasing easy cleaning regret excels gets fails constantly

LSA Topics:
Topic 0:
cleaner carpets powerful suction cleaning home thanks recommend money floors
Topic 1:
powerful cleaning makes pet hair quick carpets d

### 1. Features (text representation) used for topic modeling:
For topic modeling, the LDA and LSA models utilized the `CountVectorizer` method, transforming the text into a matrix of token counts. This technique highlights the frequency of word occurrences, suitable for these traditional statistical models which rely on document-term matrices to discern patterns. In contrast, BERTopic employed embeddings that capture both the occurrence and the contextual nuances of words. This advanced approach allows BERTopic to uncover more contextually rich and semantically deep topics, providing a nuanced understanding of the text.

### 2. Top 10 clusters for topic modeling:
The models revealed a range of topics that covered various aspects of the vacuum cleaners discussed in the reviews. LDA and LSA models generated topics largely around the effectiveness, efficiency, and satisfaction related to the vacuum cleaners. These topics ranged from positive feedback about the cleaners' capabilities in handling dirt and pet hair to criticisms regarding their durability and functionality. BERTopic, leveraging deeper semantic processing, identified both granular negative sentiments regarding the reliability and value of the products, as well as positive notes on their effectiveness on specific tasks like carpet cleaning and suction power.

### 3. Summarize and describe the topic for each cluster:
The topics generated by LDA pointed towards a mix of high satisfaction in some areas, such as cleaning power and suitability for cleaning various types of debris, alongside significant disappointments due to breakdowns and inefficiency. LSA slightly shifted focus, underscoring the comparative analysis of performance across different tasks and surfaces, reflecting a broader spectrum of user experiences from excellent to poor. Meanwhile, BERTopic's findings painted a detailed picture of user sentiment, splitting focus between commendations for specific functionalities and critiques over product longevity and overall purchase satisfaction. These insights suggest a divergent user experience, where certain features of the vacuum cleaners are appreciated for their efficiency, while other aspects related to durability and cost-effectiveness are met with criticism.

Overall, the application of these topic modeling techniques to the review texts has illuminated the key areas of consumer concern and satisfaction, providing valuable feedback that could be instrumental in guiding product improvements and marketing strategies.

# **Question 2: Sentiment Analysis**

(30 points). Sentiment analysis also known as opinion mining is a sub field within Natural Language Processing (NLP) that builds machine learning algorithms to classify a text according to the sentimental polarities of opinions it contains, e.g., positive, negative, neutral. The purpose of this question is to develop a machine learning classifier for sentiment analysis. Based on the dataset from assignment three, write a python program to implement a sentiment classifier and evaluate its performance. Notice: **80% data for training and 20% data for testing**.  

1. Select features for the sentiment classification and explain why you select these features. Use a markdown cell to provide your explanation.

2. Select two of the supervised learning algorithms/models from scikit-learn library: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning, to build two sentiment classifiers respectively. Note: Cross-validation (5-fold or 10-fold) should be conducted. Here is the reference of cross-validation: https://scikit-learn.org/stable/modules/cross_validation.html.

3. Compare the performance over accuracy, precision, recall, and F1 score for the two algorithms you selected. The test set must be used for model evaluation in this step. Here is the reference of how to calculate these metrics: https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9.

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.pipeline import Pipeline
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')

# Step 1: Load the dataset
df = pd.read_csv('reviews_with_neutral.csv')

# Step 2: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df['Review'], df['Sentiment'], test_size=0.2, random_state=42)

# Step 3: Text vectorization
vectorizer = CountVectorizer(stop_words=stopwords.words('english'))

# Step 4: Build the models using pipelines
nb_pipeline = Pipeline([
    ('vect', vectorizer),
    ('clf', MultinomialNB())
])

svm_pipeline = Pipeline([
    ('vect', vectorizer),
    ('clf', SVC(kernel='linear'))
])

# Step 5: Cross-validation
nb_cv_scores = cross_val_score(nb_pipeline, X_train, y_train, cv=5, scoring='accuracy')
svm_cv_scores = cross_val_score(svm_pipeline, X_train, y_train, cv=5, scoring='accuracy')

# Step 6: Training the models
nb_pipeline.fit(X_train, y_train)
svm_pipeline.fit(X_train, y_train)

# Step 7: Evaluate the models on the test set
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)
    return {
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred, average='weighted'),
        'recall': recall_score(y_test, y_pred, average='weighted'),
        'f1_score': f1_score(y_test, y_pred, average='weighted')
    }

nb_performance = evaluate_model(nb_pipeline, X_test, y_test)
svm_performance = evaluate_model(svm_pipeline, X_test, y_test)

# Step 8: Print performance metrics
print("Naive Bayes Cross-Validation Scores:", nb_cv_scores)
print("Naive Bayes Test Performance:", nb_performance)
print("SVM Cross-Validation Scores:", svm_cv_scores)
print("SVM Test Performance:", svm_performance)


Naive Bayes Cross-Validation Scores: [0.8        0.9        1.         0.88888889 0.88888889]
Naive Bayes Test Performance: {'accuracy': 1.0, 'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0}
SVM Cross-Validation Scores: [0.9        0.8        0.7        0.88888889 0.88888889]
SVM Test Performance: {'accuracy': 1.0, 'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0}


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Feature Selection for Sentiment Classification:
In our approach to sentiment classification, the primary feature we utilized was the textual content from the `Review` column of our dataset. This feature is crucial because the sentiment expressed in a review is intrinsically tied to its textual content. We employed the `CountVectorizer` to transform this text into a numerical format suitable for machine learning algorithms. This transformation results in a bag-of-words model that counts the frequency of each word present in the text, excluding stopwords to ensure that the model focuses on more meaningful words likely to be indicative of sentiment. This method is both straightforward and effective for capturing essential features that contribute significantly to sentiment analysis.

### Implementation of Sentiment Classifiers:
We selected two classifiers for this task from the scikit-learn library:

1. **Multinomial Naive Bayes**: Chosen for its prevalent use in text classification tasks, especially suitable for scenarios where the feature set is based on word counts. The independence assumption between predictors in Naive Bayes, although a strong assumption, typically yields surprisingly effective results for document classification.

2. **Support Vector Machine (SVM)**: We used SVM due to its effectiveness in handling high-dimensional spaces typical of text data. The linear kernel was particularly appropriate for our needs, aiming to find a hyperplane that best separates the data into positive, negative, and neutral sentiments.

### Cross-Validation and Model Evaluation:
**Cross-Validation Results**:
- **Naive Bayes** showed high and stable performance across the five folds with scores: [0.8, 0.9, 1.0, 0.88888889, 0.88888889], indicating its robustness and reliability across different subsets of our dataset.
- **SVM** presented some variability in its scores: [0.9, 0.8, 0.7, 0.88888889, 0.88888889]. Although generally performing well, it was slightly less consistent than Naive Bayes.

**Test Set Evaluation**:
Remarkably, both classifiers achieved perfect metrics on the test set, each with an accuracy, precision, recall, and F1 score of 1.0. These results suggest exceptional effectiveness in classifying sentiments in the test data, highlighting the potential of both models to accurately predict sentiment.

### Comparison of Performance:
Despite both models demonstrating perfect scores on the test dataset, Naive Bayes showed a slight advantage in terms of consistency during cross-validation. This might suggest a better generalization capability under varied data conditions. The perfect test scores for both models could also imply a lack of challenging or diverse examples in our test set, which could differentiate their performance in more complex scenarios.

In conclusion, our analysis using Naive Bayes and SVM for sentiment analysis demonstrates that both models are highly capable of interpreting and classifying textual sentiment effectively. Naive Bayes, in particular, may offer a slight edge in stability and consistency, making it a slightly more reliable choice for ongoing use in similar tasks.

# **Question 3: House price prediction**

(20 points). You are required to build a **regression** model to predict the house price with 79 explanatory variables describing (almost) every aspect of residential homes. The purpose of this question is to practice regression analysis, an supervised learning model. The training data, testing data, and data description files can be download from canvas. Here is an axample for implementation: https://towardsdatascience.com/linear-regression-in-python-predict-the-bay-areas-home-price-5c91c8378878.

1. Conduct necessary Explatory Data Analysis (EDA) and data cleaning steps on the given dataset. Split data for training and testing.
2. Based on the EDA results, select a number of features for the regression model. Shortly explain why you select those features.
3. Develop a regression model. The train set should be used.
4. Evaluate performance of the regression model you developed using appropriate evaluation metrics. The test set should be used.

In [5]:
# Write your code here
import pandas as pd
import zipfile
import os

with zipfile.ZipFile('assignment4-question3-data.zip', 'r') as zip_ref:
    zip_ref.extractall('assignment4_data')

train_df = pd.read_csv('assignment4_data/train.csv')
test_df = pd.read_csv('assignment4_data/test.csv')

# Print the first few rows of the training data
print("First few rows of the training data:")
print(train_df.head())

# Print the column names
print("\nColumn Names:")
print(train_df.columns)
print("\nBasic Statistics:")
print(train_df.describe())

# Check for missing values in each column
print("\nMissing Values in Each Column:")
print(train_df.isnull().sum())

# Print the data types of each column
print("\nData Types:")
print(train_df.dtypes)




First few rows of the training data:
   Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0   1          60       RL         65.0     8450   Pave   NaN      Reg   
1   2          20       RL         80.0     9600   Pave   NaN      Reg   
2   3          60       RL         68.0    11250   Pave   NaN      IR1   
3   4          70       RL         60.0     9550   Pave   NaN      IR1   
4   5          60       RL         84.0    14260   Pave   NaN      IR1   

  LandContour Utilities  ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold  \
0         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
1         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      5   
2         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      9   
3         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
4         Lvl    AllPub  ...        0    NaN   NaN         NaN       0     12   

  YrSold  SaleType  SaleConditi

In [6]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline

# Load the data
train_df = pd.read_csv('assignment4_data/train.csv')
test_df = pd.read_csv('assignment4_data/test.csv')

# Features selected based on common sense and domain knowledge
features = ['OverallQual', 'GrLivArea', '1stFlrSF', 'TotalBsmtSF', 'YearBuilt', 'FullBath', 'TotRmsAbvGrd']

# Check for SalePrice in train and test data
print("SalePrice in train_df:", "SalePrice" in train_df.columns)
print("SalePrice in test_df:", "SalePrice" in test_df.columns)

# Handling missing data
imputer = SimpleImputer(strategy='median')
train_df[features] = imputer.fit_transform(train_df[features])
test_df[features] = imputer.transform(test_df[features])

# Training the model only if SalePrice is available in train_df
if "SalePrice" in train_df.columns:
    X_train = train_df[features]
    y_train = train_df['SalePrice']

    # Setup a pipeline for training
    model_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy='median')),
        ('regressor', LinearRegression())
    ])

    model_pipeline.fit(X_train, y_train)

    # Evaluate the model on the training data
    y_train_pred = model_pipeline.predict(X_train)
    train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))
    train_r2 = r2_score(y_train, y_train_pred)

    print("Training RMSE: {:.2f}".format(train_rmse))
    print("Training R2 Score: {:.2f}".format(train_r2))

# Predicting on test data
if "SalePrice" not in test_df.columns:
    X_test = test_df[features]
    y_test_pred = model_pipeline.predict(X_test)
    print("Predictions on test data prepared.")
    submission_df = pd.DataFrame({'Id': test_df['Id'], 'SalePrice': y_test_pred})
    submission_df.to_csv('house_prices_submission.csv', index=False)
    print("Test predictions saved to 'house_prices_submission.csv'.")


SalePrice in train_df: True
SalePrice in test_df: False
Training RMSE: 38853.23
Training R2 Score: 0.76
Predictions on test data prepared.
Test predictions saved to 'house_prices_submission.csv'.


# **Question 4: Using Pre-trained LLMs**

(20 points)
Utilize a **Pre-trained Language Model (PLM) from the Hugging Face Repository** for predicting sentiment polarities on the data you collected in Assignment 3.

Then, choose a relevant LLM from their repository, such as GPT-3, BERT, or RoBERTa or any other related models.
1. (5 points) Provide a brief description of the PLM you selected, including its original pretraining data sources,  number of parameters, and any task-specific fine-tuning if applied.
2. (10 points) Use the selected PLM to perform the sentiment analysis on the data collected in Assignment 3. Only use the model in the **zero-shot** setting, NO finetuning is required. Evaluate performance of the model by comparing with the groundtruths (labels you annotated) on Accuracy, Precision, Recall, and F1 metrics.
3. (5 points) Discuss the advantages and disadvantages of the selected PLM, and any challenges encountered during the implementation. This will enable a comprehensive understanding of the chosen LLM's applicability and effectiveness for the given task.


In [7]:
# Write your code here
!pip install transformers torch



In [8]:
import pandas as pd
from transformers import pipeline
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# Load the data
df = pd.read_csv('reviews_with_neutral.csv')
reviews = df['Review'].tolist()
labels = df['Sentiment'].apply(lambda x: 'positive' if x == 1 else ('negative' if x == -1 else 'neutral')).tolist()

# Initialize zero-shot classification pipeline with RoBERTa
classifier = pipeline("zero-shot-classification", model="roberta-large-mnli")

# Perform zero-shot sentiment analysis
results = classifier(reviews, candidate_labels=["positive", "negative", "neutral"], hypothesis_template="This text is {}.")

# Extract predicted labels
predicted_labels = [result['labels'][0] for result in results]

# Calculate performance metrics
accuracy = accuracy_score(labels, predicted_labels)
precision, recall, f1, _ = precision_recall_fscore_support(labels, predicted_labels, average='weighted')

# Output performance metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")


config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Accuracy: 1.00
Precision: 1.00
Recall: 1.00
F1 Score: 1.00


### Description of the Pre-trained Language Model (PLM) Selected

For our project, we selected the `roberta-large-mnli` model from the Hugging Face repository. This model is a variant of RoBERTa, which stands for Robustly Optimized BERT Pretraining Approach. Originally, RoBERTa was trained on a much larger corpus compared to BERT, encompassing diverse datasets such as BooksCorpus, CC-News, OpenWebText, and Stories. This extensive training with about 355 million parameters allows RoBERTa to understand and process a wide range of linguistic nuances and contexts. Specifically, `roberta-large-mnli` was fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset, preparing it for complex inference tasks across various textual genres, which is particularly beneficial for zero-shot learning tasks like sentiment analysis.

### Performance Evaluation of the PLM

In our approach, we utilized the `roberta-large-mnli` model in a zero-shot classification setting to perform sentiment analysis. We did not require any task-specific fine-tuning, adhering strictly to a zero-shot learning framework. The model achieved perfect scores across all evaluated metrics, with an accuracy, precision, recall, and F1 score of 1.00. These results not only demonstrate the model's exceptional capability to classify sentiments accurately but also highlight its efficiency in handling such tasks directly, leveraging its extensive pre-training.

### Advantages, Disadvantages, and Implementation Challenges

The use of RoBERTa provided several advantages, including its robustness and high accuracy, which made it highly effective for our sentiment analysis task. The flexibility of zero-shot learning was particularly advantageous, allowing us to apply the model directly without the need for additional training data.

we encountered practical challenges during the implementation, particularly with the model's initialization and loading times due to its large size. The sensitivity of the model's performance to the design of the input prompt also required careful consideration to ensure the prompts were optimally phrased for accurate sentiment classification.

