<a href="https://colab.research.google.com/github/Yuvika-14/Jwoc-Brics-Sentiment-Analysis/blob/main/Gradient_Boosting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Importing the files**

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import GradientBoostingClassifier  # Import Gradient Boosting classifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

**Importing the dataset**

In [2]:
df=pd.read_csv('dataset.csv')
df.columns

Index(['Unnamed: 0', 'textDisplay', 'likeCount', 'label'], dtype='object')

In [3]:
df.shape
df.head(4)

Unnamed: 0.1,Unnamed: 0,textDisplay,likeCount,label
0,0,"hurray for Brics nation, I hope this changes t...",0,0
1,1,BRICS are modern Axis Powers😂,0,0
2,2,As France is asked to remove Troops from Niger...,0,0
3,3,BE WARNED,0,0


**Take care of missing data if any**

In [4]:
df.isnull().sum()

Unnamed: 0     0
textDisplay    0
likeCount      0
label          0
dtype: int64

In [5]:
# dropping unecessary columns
df = df[['textDisplay', 'label']]
df = df.dropna()


**Splitting the dataset into test and training set**

In [6]:
X_train, X_test, y_train, y_test = train_test_split(df['textDisplay'], df['label'],test_size=0.2, random_state=42)

In [7]:
# Convert the text data into numerical features using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

**Training the model on Gradient Boosting**

In [8]:
gb_model = GradientBoostingClassifier()

# Train the Gradient Boosting model
gb_model.fit(X_train_tfidf, y_train)


**Predicting the test results**

In [9]:
y_pred_gb = gb_model.predict(X_test_tfidf)

**Evaluating the model**

In [10]:
accuracy_gb = accuracy_score(y_test, y_pred_gb)
conf_matrix_gb = confusion_matrix(y_test, y_pred_gb)
classification_rep_gb = classification_report(y_test, y_pred_gb)


**Confusion matrix**

In [11]:
# Print the evaluation metrics
print("Gradient Boosting Model Metrics:")
print("Accuracy:", accuracy_gb)
print("Confusion Matrix:\n", conf_matrix_gb)
print("Classification Report:\n", classification_rep_gb)

Gradient Boosting Model Metrics:
Accuracy: 0.5304347826086957
Confusion Matrix:
 [[29 21]
 [33 32]]
Classification Report:
               precision    recall  f1-score   support

           0       0.47      0.58      0.52        50
           1       0.60      0.49      0.54        65

    accuracy                           0.53       115
   macro avg       0.54      0.54      0.53       115
weighted avg       0.54      0.53      0.53       115

