<a href="https://colab.research.google.com/github/caiocmello/Introduction-to-SA-Training-CAIS/blob/main/session_3/Autolabel_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center><img src='https://drive.google.com/uc?id=1FzW9bN0SDepyn5pD-BZIP_9VewVmJWya'></center>

#Workshop: **Introduction to Sentiment Analysis: Potentials and limitations**

---

# **Session 3 Practical**: Sentiment analysis using ChatGPT via the Autolabel library

In this notebook, we will explore using *ChatGPT* via the [*Autolabel* Python library](https://github.com/refuel-ai/autolabel) for a sentiment analysis task.

## Prerequisites
In order to run the code in this notebook, you need a key for the [*OpenAI* API](https://openai.com/blog/openai-api).

**NB**: Using the *OpenAI* API is not free. You will need to set up a payment method to use it. The [costs for using the *OpenAI* API](https://openai.com/pricing) depend on the [model(s)](https://platform.openai.com/docs/models) you use and the amount of [tokens](https://platform.openai.com/tokenizer) in the prompts (input) and responses (output).

Notably, [*Autolabel* also supports other Large Language Models (LLMs)](https://docs.refuel.ai/guide/llms/llms/?h=key) besides the ones offered by *OpenAI*. Some of those are available for free, but may be more complicated to set up and possibly also not as performant as the *OpenAI* models (depending on the task as well as the available computing resources).

Of course, we also need to install the *Autolabel* library and its dependencies. *Note*: How you need to install and set up *Autolabel* depends on the LLM you want to use (see the [*Autolabel* documentation](https://docs.refuel.ai/) for details).

In [None]:
!pip3 install 'refuel-autolabel[openai]'

*Note*: If this cell throws errors or if you cannot execute the following cells, you may have to run this cell again to complete/fix the installation process.

## Setup

After installing the *Autofuel* library, you need to set your API key.

**NB**: Treat your API key like a password and never share it (especially not on *GitHub* or other public locations)! If somebody else has your API key, this can incur costs for you. To increase safety, it is advisable to set a usage limit in your *OpenAI* account (under "Billing"). You can also always revoke API keys and generate new ones.

In [None]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxxxx'

Next, we need to set the configuration for the task.

In [None]:
config = {
    "task_name": "YouTubeCommentSentiment",
    "task_type": "classification", # classification task
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo" # the model we want to use
    },
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "prompt": {
        "task_guidelines": "You are an expert at analyzing the sentiment of YouTube comments. Most of the comments are in English, but they may also be in another language. Many of the comments also include emojis. \nYour job is to classify the provided YouTube comments into one of the following labels: \n{labels}",
        "labels": [
            "positive",
            "negative",
            "neutral",
        ],
            "few_shot_examples": [
            {
                "example": "This looks awful why Sony why",
                "label": "negative"
            },
            {
                "example": "It will be an AMAZING MOVIE!!",
                "label": "positive"
            },
            {
                "example": "Gonna watch the movie just for the meme of it",
                "label": "neutral"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 3,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}

Note that we are asking the LLM to only use three labels (positive, neutral, negative). We could, e.g., also ask it to provide a sentiment score instead (as has been done in the study by [Rathje et al., 2023](https://osf.io/sekf5)).

## Running the classification task

In order to run the sentiment analysis classification task, we need to initialize the labeling agent and pass it the configuration we have previously created.

In [None]:
# create an agent for labeling
from autolabel import LabelingAgent

agent = LabelingAgent(config, cache=False)

In this notebook, we will use parts of the data that the examples in the lecture were based on. The file `comments.csv` included in the [*GitHub* repository for the workshop](https://github.com/caiocmello/Introduction-to-SA-Training-CAIS) contains a sample of comments. All of these comments include at least one emoji. To compare the impact of emojis on sentiment classification, the comments in the sample that contain both text and emojis have been duplicated, so that they are included once with and once without emojis. The final test dataset contains 129 comments. You need to download the file from the *GitHub* repo and upload it to your *Google Drive*. Once you have done that, you have to mount your Drive via the "Files" menu in *Colab* (i.e., the little folder symbol in the upper-left) and possibly also adjust the file path in the next code cell (*note*: once you have mounted your Drive, you can get the file path by clicking on the three dots next to the file name and selecting "Copy path").

*Note*: If you want to reduce the costs produced by running the example in this notebook you could, of course, delete rows from the `.csv` file.

Before we send our request to the *OpenAI* API, we might want to see an example prompt. This will also provide us with an estimate of how much the request for the full `.csv` file will cost.

In [None]:
agent.plan('/content/drive/MyDrive/data/comments_test.csv')

Now we can run the sentiment analysis classification task for the whole dataset.

*Note*: Depending on your internet connection and the current *OpenAI* API workload, this may take a bit.

In [None]:
labels, output_df, metrics = agent.run('/content/drive/MyDrive/data/comments_test.csv')

We can now check the results (by printing the first 20 rows from the resulting dataset).

*Note*: It may be possible that some comments will not receive a sentiment label.

In [None]:
output_df.head(20)

If everything worked, the resulting `.csv` file should have been stored in the same (*Google Drive*) folder as the input data. The file name should end in `_labeled.csv`.

One important final note: The results of text classifications using LLMs should be validated (see [Pangakis et al., 2023](https://arxiv.org/abs/2306.00176)). This is possible in various ways: Through comparisons with results from a) repeated runs using the same LLM, b) other LLMs, c) other automated classification (in our case sentiment analysis) methods, or d) human coders. Validation through comparison with human coders is mostly still considered the gold standard. Notably, however, these approaches can also be combined to, e.g., compute [inter-rater reliability](https://en.wikipedia.org/wiki/Inter-rater_reliability) within and across methods/classification approaches.