<a href="https://colab.research.google.com/github/Mohammed-Taha20/sentiment-analysis/blob/main/sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# settings

## Kaggle API

**Generate an API Token:**

a) Go to the Kaggle website and log in to your account.

b)Navigate to your Account settings (click on your user profile picture in the header, then select "Account").

c) Scroll down to the API section.

d) Click on the "Create New API Token" button. This will download a file named kaggle.json containing your API credentials (username and key).

e) use the next command to upload the kaggle.json


In [None]:
from google.colab import files
files.upload()

**Place the kaggle.json file:**

a) Linux/OSX/UNIX-based systems: Move the kaggle.json file to the ~/.kaggle/ directory. If the .kaggle directory does not exist, create it.

b) Windows: Move the kaggle.json file to `C:\Users\<Windows-username>\.kaggle\`.

c)Important: Ensure the permissions of the kaggle.json file are set correctly to chmod 600 (read/write only for the owner) for security reasons.

In [3]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

##HuggineFace

**HuggineFace API**

a) Log in to your Huggine Face account.

b) create huggineface token from this link: https://huggingface.co/settings/tokens

In [None]:
from huggingface_hub import login
    from google.colab import userdata

    HF_TOKEN = "Your_huggineface_token"

    if HF_TOKEN:
        login(HF_TOKEN)
        print("Successfully logged in to Hugging Face!")
    else:
        print("Hugging Face token not found in Colab Secrets. Please add it.")


## Dataset API

In [4]:
!kaggle datasets download arhamrumi/amazon-product-reviews

Dataset URL: https://www.kaggle.com/datasets/arhamrumi/amazon-product-reviews
License(s): CC0-1.0
Downloading amazon-product-reviews.zip to /content
 60% 69.0M/115M [00:00<00:00, 363MB/s]
100% 115M/115M [00:00<00:00, 428MB/s] 


In [5]:
!unzip /content/amazon-product-reviews.zip

Archive:  /content/amazon-product-reviews.zip
  inflating: Reviews.csv             


In [15]:
!pip install -q transformers accelerate bitsandbytes langchain_core langchain_community langchain langchain_huggingface torch

## imports

In [1]:
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline ,BitsAndBytesConfig
from langchain_core.output_parsers import StrOutputParser ,JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
import torch


# data

## preprossesing

In [None]:
reviews = pd.read_csv("/content/Reviews.csv", nrows=300)
reviews

In [3]:
reviews.drop(["Id", "ProductId", "UserId", "ProfileName", "HelpfulnessNumerator", "HelpfulnessDenominator", "Time","Summary" ], axis=1, inplace=True)
reviews

Unnamed: 0,Score,Text
0,5,I have bought several of the Vitality canned d...
1,1,Product arrived labeled as Jumbo Salted Peanut...
2,4,This is a confection that has been around a fe...
3,2,If you are looking for the secret ingredient i...
4,5,Great taffy at a great price. There was a wid...
...,...,...
295,5,I've been feeling extremely tired around the l...
296,4,The energy shot truly does work! It had a terr...
297,5,"I've tried 5-hour energy, red rain, NOS, and o..."
298,5,If you're looking for an energy boost without ...


In [None]:
reviews.info()

In [None]:
reviews.describe()

In [None]:
reviews.isna().sum()

# LLM model

In [4]:
model_name_or_path = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

In [None]:

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             quantization_config=quantization_config,
                                             device_map="auto",
                                             torch_dtype = torch.float16,
                                            trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
                                          trust_remote_code=True)

In [35]:
system_prompt = """ You will be given a review of a client. \
    Return a JSON output that has: \
    Review_text: the text in the review. \
    Score: a score that indicates how far the client is on a scale from 1 to 5 (focus on the world that describes the product). \
    Product: the product name that the customer made the review on(if it's in the review_text, else return "not mentioned"). \
    Category: the of the category product (if product is "not mentioned", it should be "not mentioned").. \

    Only return the JSON output and nothing else. \
    Be concise and do not return any introduction or conclusion. \
    """



In [None]:
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    max_new_tokens = 128,
    temperature = 0.1,
    do_sample = True,
    return_full_text = False
)
hf_pipeline = HuggingFacePipeline(pipeline=pipe)

template = ChatPromptTemplate(
    [
        ("system", system_prompt),
        ("human", "{user_query}"),
    ]
)

llm = ChatHuggingFace(llm=hf_pipeline)
parser = JsonOutputParser()
chain = template | llm |parser

# validation
don't run it, takes a lot of time

In [43]:
user_query = reviews.iloc[0:100,1]
reviews_list=[]
for review in user_query:
    reviews_list.append({"user_query":review})

In [None]:
result = chain.batch(reviews_list)

In [66]:
result_scores = []
for rec in result:
    if "score" in rec.keys():
        result_scores.append(rec["score"])
    elif "Score" in rec.keys():
        result_scores.append(rec["Score"])
    else:
        result_scores.append(0)

In [67]:
scores = reviews.iloc[0:100,0]

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr , spearmanr

y_pred = result_scores
y_true = scores

mea = np.mean(np.abs(y_true - y_pred))
mse = np.mean((y_true - y_pred)**2)
pearson, _ = pearsonr(y_true, y_pred)
spearman_corr, _ = spearmanr(y_true, y_pred)

print("Mean Absolute Error (MAE):", mea)
print("Mean Squared Error (MSE):", mse)
print("Pearson Correlation Coefficient:", pearson)
print("Spearman's Rank Correlation Coefficient:", spearman_corr)

In [None]:
plt.figure(figsize=(25, 5))

plt.subplot(1, 2, 1)
x = np.arange(len(y_true))
plt.plot(x, y_true, label='True Values', color='blue')
plt.plot(x, y_pred, label='Predicted Values', color='red',linewidth = 2.5 , marker= "o",alpha = 0.8)

plt.fill_between(x, y_true, y_pred, color='gray', alpha=0.3, label='Prediction Interval')
plt.xlabel('True Values', fontsize=15)
plt.ylabel('Predictions', fontsize=15)
plt.title('True Values vs. Predictions', fontsize=15)
plt.legend()

plt.show()

# try here

In [None]:
user_query = input("enter your input")

In [None]:
chain.invoke({"user_query":user_query})