## MODEL SETUP

Installing [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) using CMake.
This might take some time.

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.75.tar.gz (48.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.7/48.7 MB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.75-cp310-cp310-linux_x86_64.whl size=76309122 sha256=9a1b9ea2ff24cb49d36f369eac77898332c8d3d435df9a7b834bb5f4c7d6bc74
  Stored in direct

Importing what is needed for training the model.

In [None]:
# Allows us to go grab models from hugging face
from huggingface_hub import hf_hub_download
# The Llama class is a wrapper for llama cpp models
from llama_cpp import Llama

This bit of code allows us to go grabe the model from Hugging Face.
This might take some time.

In [None]:
model_name = "myclassunil/Emollama-chat-13b-v0.1.gguf"
model_file = "Emollama-chat-13b-v0.1.gguf"
model_path = hf_hub_download(model_name,
                             filename=model_file,
                             local_dir='/content')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Emollama-chat-13b-v0.1.gguf:   0%|          | 0.00/13.8G [00:00<?, ?B/s]

Loading the model and offloading all the layers to the GPU.

In [None]:
llm = Llama(model_path=model_path,
            n_gpu_layers=-1)

llama_model_loader: loaded meta data with 21 key-value pairs and 363 tensors from /content/Emollama-chat-13b-v0.1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 32000
llama_model_loader: - kv   3:                       llama.context_length u32              = 2048
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   5:                          llama.block_count u32              = 40
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32   

The model is now usable!

## DATA SETUP

Grabbing the reviews from our json file.

In [1]:
import json
filename = 'json_reviews_examples.json'
path_to_data = str(filename)

with open(path_to_data) as f:
    sites = json.load(f)

Making sure that everything works properly by printing the reviews.

In [2]:
for site in sites:
  for review in sites[site]:
    print(review + "\n")

This place is well worth the visit and make sure to book your all inclusive ticket well in advance, which will let you skip with waiting lines, and will also give you access to the Roman Forum.
I would definitely allow yourself at least a couple of hours to walk around the Forum as it is such a vast space with lots to see and do. Your all access tickets will allow you to see the exhibitions inside too.
As for the Colosseum, what an incredible piece of history. Words cannot describe how stunning this structure is. Definitely a highlight of our trip in Rome.

As a lover of history, I can't recommend visiting enough. It's absolutely breathtaking. Pictures can't do justice to the scale and beauty of this monument. Highly recommend booking a tour in advance to skip the queues and to learn so much more about the place. It is completely worth doing the underground also.

The restoration and landscaping of the Colosseum have been done very well. It manages to shine from the moment you see it u

## USING EMOLLM

Importing reegular expressions for later use.

In [3]:
import re

We'll start by defining a function for each one of the five tasks accessible through EmoLLM.
For transparency reasons, the names of the variables are taken from ["EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis"](https://doi.org/10.48550/arXiv.2401.08508) (Zhiwei Liu and others, 2024).

In [4]:
# We need to define the values we'll deem acceptable from the LLM.

# Possible answers for the v_oc task.
acceptable_v_oc = ['3: very positive mental state can be inferred',
                   '2: moderately positive mental state can be inferred',
                   '1: slightly positive mental state can be inferred',
                   '0: neutral or mixed mental state can be inferred',
                   '-1: slightly negative mental state can be inferred',
                   '-2: moderately negative mental state can be inferred',
                   '-3: very negative mental state can be inferred']

# Possible answers for the e_c task.
acceptable_e_c = ['anger',
                  'anticipation',
                  'disgust',
                  'fear',
                  'joy',
                  'love',
                  'optimism',
                  'pessimism',
                  'sadness',
                  'surprise',
                  'trust']

# This is the list of sentiments that are supported for ei_reg and ei_oc.
main_sentiments = ['joy', 'anger', 'fear', 'sadness']

# Possible answers for the ei_oc task.
base_ei_oc = ['0: no E can be inferred',
              '1: low amount of E can be inferred',
              '2: moderate amount of E can be inferred',
              '3: high amount of E can be inferred']

# Adapting the possible answers to include particular emotions.
acceptable_ei_oc = []
for sentiment in main_sentiments:
  for message in base_ei_oc:
    acceptable_ei_oc.append(message.replace("E", sentiment))

# Estimating the valence of the review and representing it as a float.
def get_v_reg(text, max_tokens):
  prompt= f'''
  Human:
  Task: Evaluate the valence intensity of the writer's mental state based on the text, assigning it a real-valued score from 0 (most negative) to 1 (most positive).
  Text: {text}
  Intensity Score:
  '''
  v_reg = "Aberrant answer from EmoLLM : "
  raw_v_reg = llm(prompt, max_tokens=max_tokens)['choices'][0]['text']
  try:
    if(0 <= float(raw_v_reg) <= 1):
      v_reg = raw_v_reg
  except:
    print("Something went wrong with get_v_reg")
    v_reg += str(raw_v_reg)
  return v_reg

# For each task, the default value is "Aberrant answer from EmoLLM"
# if the model does well, we replace this message with its answer
# otherwise, we add the aberrant answer in case it holds useful data anyway.

# Estimates the valence of the review and represents it as an ordinal class.
def get_v_oc(text, max_tokens):
  prompt = f'''
  Human:
  Task: Categorize the text into an ordinal class that best characterizes the writer's mental state, considering various degrees of positive and negative sentiment intensity. 3: very positive mental state can be inferred. 2: moderately positive mental state can be inferred. 1: slightly positive mental state can be inferred. 0: neutral or mixed mental state can be inferred. -1: slightly negative mental state can be inferred. -2: moderately negative mental state can be inferred. -3: very negative mental state can be inferred
  Text: {text}
  Intensity Class:
  '''
  v_oc = "Aberrant answer from EmoLLM : "
  raw_v_oc = llm(prompt, max_tokens=max_tokens)['choices'][0]['text']
  if(raw_v_oc in acceptable_v_oc):
    v_oc = raw_v_oc
  else:
    v_oc += str(raw_v_oc)
  return raw_v_oc

# Identifies which sentiments are present in the review.
def get_e_c(text, max_tokens):
  prompt = f'''
  Task: Categorize the text's emotional tone as either 'neutral or no emotion' or identify the presence of one or more of the given emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust).
  Text: {text}
  This tweet contains emotions:
  '''
  e_c = "Aberrant answer from EmoLLM : "
  raw_e_c = llm(prompt, max_tokens=max_tokens)['choices'][0]['text']
  temp_e_c = re.split(r'[^a-zA-Z]', raw_e_c)
  while('' in temp_e_c):
    temp_e_c.remove('')
  is_aberrant = False
  for s in temp_e_c:
    if(s not in acceptable_e_c):
      is_aberrant = True
      break
  if(is_aberrant == False):
    e_c = temp_e_c
  else:
    e_c += str(raw_e_c)
  return e_c

# Estimates the intensity of a sentiment and represents it as a float.
def get_ei_reg(text, max_tokens, sentiment):
  prompt = f'''
    Human:
    Task: Assign a numerical value between 0 (least E) and 1 (most E) to represent the intensity of emotion E expressed in the text.
    Text: {text}
    Emotion: {sentiment}
    Intensity Score:
  '''
  ei_reg = "Aberrant answer from EmoLLM : "
  raw_ei_reg = llm(prompt, max_tokens=max_tokens)['choices'][0]['text']
  try:
    if(0 <= float(raw_ei_reg) <= 1):
      ei_reg = raw_ei_reg
  except:
    print("Something went wrong with get_ei_reg")
    ei_reg += str(raw_ei_reg)
  return ei_reg

# Estimates the intensity of a sentiment and represents it as an ordinal class.
def get_ei_oc(text, max_tokens, sentiment):
  prompt = f'''
  Task: Categorize the tweet into an intensity level of the specified emotion E, representing the mental state of the tweeter. 0: no E can be inferred. 1: low amount of E can be inferred. 2: moderate amount of E can be inferred. 3: high amount of E can be inferred.
  Tweet: {text}
  Emotion: {sentiment}
  Intensity Score:
  '''
  ei_oc = "Aberrant answer from EmoLLM : "
  raw_ei_oc = llm(prompt, max_tokens=max_tokens)['choices'][0]['text']
  if(raw_ei_oc in acceptable_ei_oc):
    ai_oc = raw_ei_oc
  else:
    ai_oc += str(raw_ei_oc)
  return [raw_ei_oc, ei_oc]

We can now create a function that applies the five tools given by EmoLLM to our reviews.

In [5]:
# Processing the reviews using EmoLLM
def sentiment_analysis(sites,max_tokens):
  # We create a dict to hold the sentiment analysis results.
  sa_results = {}
  # We create a dict for each site results.
  for i, site in enumerate(sites):
    sa_site = {}
    # Processing the different reviews with the five EmoLLM tasks.
    for j, review in enumerate(sites[site]):
      v_reg = get_v_reg(review, max_tokens)
      v_oc = get_v_oc(review, max_tokens)
      e_c = get_e_c(review, max_tokens)
      ei_reg = []
      ei_oc = []
      for sentiment in e_c:
        if(sentiment in main_sentiments):
          current_reg = {sentiment : get_ei_reg(review, max_tokens, sentiment)}
          current_oc = {sentiment : get_ei_oc(review, max_tokens, sentiment)}
          ei_reg.append(current_reg)
          ei_oc.append(current_oc)
      sa_review = {
        'v_reg' : v_reg,
        'v_oc' : v_oc,
        'e_c' : e_c,
        'ei_reg' : ei_reg,
        'ei_oc' : ei_oc
      }
      sa_site[j] = sa_review
    sa_results[i] = sa_site
  return sa_results

Because we have limited ressources for this project, the possibility to use GPU was limited. We needed to generate placeholder results. This function is useless for the final state of the project. It generates a mockup version of the output file.
This function is not perfect. For example, regressions and ordinal classifications are not corellated. Furthermore, it does not generate aberrant answers.

In [8]:
# The random library is needed to generate random data
import random
# As GPU ressource are scarce, this function creates a placeholder for data.
def mockup_sentiment_analysis(sites):
  # We create a dict to hold the sentiment analysis results.
  sa_results = {}
  # We create a dict for each site results.
  for i, site in enumerate(sites):
    sa_site = {}
    # Processing the different reviews with the five EmoLLM tasks.
    for j, review in enumerate(sites[site]):
      v_reg = str(round(random.random(), 3))
      v_oc = random.choice(acceptable_v_oc)
      e_c = []
      for r in range(random.randrange(1, len(acceptable_e_c))):
        random_e_c = random.choice(acceptable_e_c)
        if(random_e_c not in e_c):
          e_c.append(random_e_c)
      ei_reg = []
      ei_oc = []
      for sentiment in e_c:
        if(sentiment in main_sentiments):
          current_reg = {sentiment : str(round(random.random(), 3))}
          random_oc = random.choice(base_ei_oc).replace("E", sentiment)
          current_oc = {sentiment : random_oc}
          ei_reg.append(current_reg)
          ei_oc.append(current_oc)
      sa_review = {
        'v_reg' : v_reg,
        'v_oc' : v_oc,
        'e_c' : e_c,
        'ei_reg' : ei_reg,
        'ei_oc' : ei_oc
      }
      sa_site[f'review_{j}'] = sa_review
    sa_results[site] = sa_site
  return sa_results

Finally, we can dump the output of this process in a json file.

In [9]:
placeholder = mockup_sentiment_analysis(sites)
print(placeholder)
with open('placeholder_sentiment_analysis_data.json', 'w') as fp:
    json.dump(placeholder, fp)

{'Tour Eiffel': {'review_0': {'v_reg': '0.229', 'v_oc': '3: very positive mental state can be inferred', 'e_c': ['surprise', 'joy', 'pessimism', 'disgust', 'fear', 'sadness', 'anger'], 'ei_reg': [{'joy': '0.147'}, {'fear': '0.049'}, {'sadness': '0.855'}, {'anger': '0.478'}], 'ei_oc': [{'joy': '0: no joy can be inferred'}, {'fear': '3: high amount of fear can be inferred'}, {'sadness': '0: no sadness can be inferred'}, {'anger': '2: moderate amount of anger can be inferred'}]}, 'review_1': {'v_reg': '0.833', 'v_oc': '0: neutral or mixed mental state can be inferred', 'e_c': ['love', 'sadness', 'joy'], 'ei_reg': [{'sadness': '0.961'}, {'joy': '0.354'}], 'ei_oc': [{'sadness': '3: high amount of sadness can be inferred'}, {'joy': '3: high amount of joy can be inferred'}]}, 'review_2': {'v_reg': '0.444', 'v_oc': '2: moderately positive mental state can be inferred', 'e_c': ['disgust'], 'ei_reg': [], 'ei_oc': []}}, 'Colisee': {'review_0': {'v_reg': '0.975', 'v_oc': '-1: slightly negative men