<a href="https://colab.research.google.com/github/OziomaEunice/Sentiment_GPT/blob/develop/Python_LLaMA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **LLaMA For Sentiment Analysis**

In [1]:
# install bitsandbytes library for its low usage of memory
# and prevent the system from crashing.
# install other libraries

#! pip install -q -U bitsandbytes
#! pip install -q -U git+https://github.com/huggingface/transformers.git
#! pip install -q -U git+https://github.com/huggingface/peft.git
#! pip install -q -U git+https://github.com/huggingface/accelerate.git
#! pip install -q sentencepiece
#! pip install -q langchain
#! pip install accelerate
#! pip install --upgrade bitsandbytes
! pip install -i https://test.pypi.org/simple/ bitsandbytes --upgrade
! pip install --upgrade transformers accelerate

Looking in indexes: https://test.pypi.org/simple/


In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.ca

In [3]:
# use standard model and tokeniser from the huggingface transformer

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Llama-2-7b-hf" # llama model (using 7b parameter)

# the load_in_4bit=True is here
# because it will automatically download the quantitizer
# model so that it will take up less memory
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
tokeniser = AutoTokenizer.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [22]:
# import pipeline
from transformers import pipeline

generate_pipeline_for_text = pipeline(
    model=model, tokenizer=tokeniser,
    return_full_text = True,
    task = 'text-generation',

    temperature = 0.1,
    max_new_tokens = 512,
    repetition_penalty = 1.1  #if this isn't implemented the output begins to repeat
)

In [23]:
# import pandas and io
import pandas as pd
import io

# since dataset is imported to my Google Colab (which will remain in here temporary)
# read csv file
df = pd.read_csv('/Test_IMDB_Dataset.csv', encoding='utf-8')

In [24]:
df.head()

Unnamed: 0,review
0,One of the other reviewers has mentioned that ...
1,A wonderful little production. <br /><br />The...
2,I thought this was a wonderful way to spend ti...
3,Basically there's a family where a little boy ...
4,"Petter Mattei's ""Love in the Time of Money"" is..."


In [25]:
text_input = """
  Evaluate the text as either positive or negative. What do you think of the sentiment of the review in row 2?
"""

In [26]:
result = generate_pipeline_for_text(text_input)
print(result[0]['generated_text'])

 
  Evaluate the text as either positive or negative. What do you think of the sentiment of the review in row 2?

  ```python
  # Load the data
  df = pd.read_csv('../data/sentiment-analysis/sentiment140.csv')

  # Get the sentiment score for each review
  df['Sentiment Score'] = df['Review Text'].apply(lambda x: get_sentiment_score(x))

  # Print the top 5 reviews with the highest and lowest scores
  print(df[df['Sentiment Score'] == max(df['Sentiment Score']), :].head())
  print(df[df['Sentiment Score'] == min(df['Sentiment Score']), :].tail())
  ```

 ![](../images/sentiment-analysis/sentiment140.png)

### 3. Sentiment Analysis using TensorFlow

In this section, we will use a more advanced approach to perform sentiment analysis on our dataset. We will use [TensorFlow](https://www.tensorflow.org/) to train a neural network model that can classify whether a given sentence is positive or negative.

First, let's import the necessary libraries.

```python
import tensorflow as tf
from ten