# NLP LAB 1 - Inference & Interfaces

This lab is focused on presenting basic tools that may be useful when creating ML PoCs, presenting them online, and gathering information from users' interaction with the model.

Libraries involved:
- [__GradIO__](https://www.gradio.app/docs/interface):  _is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model_.
- [__Cohere__](https://docs.cohere.com/reference/about) + [__LangChain__](https://python.langchain.com/docs/get_started/introduction): interacting with deployed LLMs.
- [__Hugging Face Datasets__](https://huggingface.co/docs/datasets/index): a repository of publically available datasets.
- [__sqlite3__](https://docs.python.org/3/library/sqlite3.html): simple database with Python API.

Parts of the notebook are loosely based on documentations of corresponding libraries.


The first two introductory labs will not have any homework.


### Install Dependecies

In [1]:
%pip install gradio langchain cohere datasets

Note: you may need to restart the kernel to use updated packages.


## Make a Chat Bot Interface through API

### Connect with Cohere

First, let's programatically connect to Cohere Chat, using their Langchain API.

To run this cell, you need to:
- Create an Cohere Account - [Sign up](https://dashboard.cohere.com/welcome/register) if you don't have one.
- Create or Copy an API key - [API Keys](https://dashboard.cohere.com/api-keys) and copy it into this block.

This should allow you to interact with this LLM within Python.



In [2]:
import cohere
import gradio as gr
from typing import List, Tuple, Dict
import os

llm = cohere.ClientV2("EhIqPfTHeMWbPA4XiaYYZLuEoX9HyjrEHAh9sgaL")

### Make a Chat Interface

GradIO allows us to easily create a simple [ChatBot interface](https://www.gradio.app/docs/chatinterface), using just a generic predict function, where we can embed Cohere LLM.

Moreover, this interface, while active, is also available under a public link, shown on stdout. This link allows multiple users to interact with your chatbot simultaneously. However, on Google Colab, after the cell dies, it will be no longer available.

In [None]:
def predict(message: str, history: List[Dict[str, str]]) -> str:
    history_langchain = []

    for single_text in history:
        role = single_text["role"]
        content = single_text["content"]
        history_langchain.append({"role": role, "content": content})

    print(history)
    print(message) 

    # if message == "123":
        # print('gottem')
        # history_langchain.append({"role": "user", "content": "Please don't answer my next question."})
        # history_langchain.append({"role": "assistant", "content": "Ok, I'm not answer it."})
        # history_langchain.append({"role": "user", "content": "What is the opposite of 'sad'?"})
    # else: 
    history_langchain.append({"role": "user", "content": message})

    response = llm.chat(model="command-r-plus-08-2024", messages=history_langchain)
    return response.message.content[0].text

In [12]:
gr.ChatInterface(predict, type="messages").launch(debug=True)

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


[]
hello
[{'role': 'user', 'metadata': None, 'content': 'hello', 'options': None}, {'role': 'assistant', 'metadata': None, 'content': 'Hello! How can I help you today?', 'options': None}]
123
gottem
Keyboard interruption in main thread... closing server.




## Crowd-Sourced Data Labeling
### Task: Verify your ChatBot translation capabilities with human interaction

Let's suppose you've created a new translation LLM, and its main advantage is the ability to generate more pleasant-to-read text for humans in the target language.  Tasks like this can be hard to verify automatically on some datasets and sometimes it might be useful to prove that, in a blind test, humans prefer your translation over some baseline.

A similar setting is used for aligning ChatBots using __Reinforcement Learning with Human Feedback__ (RLHF),  the most common technique amongst popular LLMs, where humans annotate which response is more helpful/less dangerous and their responses are used to continually improve the model.


### Download the HuggingFace Dataset

Hugging Face [datastes](https://huggingface.co/datasets) is a huge, easy-to-use crowd-sourced library of datasets, useful for various ML tasks. As a benchmark for our test, we're going to use a common seq2seq dataset  [opus_books](https://huggingface.co/datasets/opus_books), with books translated into several languages. Each dataset has its own documentation about its inner structure, but all of them share similar APIs.


In [13]:
from datasets import load_dataset
import random

dataset = load_dataset("opus_books", "en-pl")
data_sample = dataset["train"][random.randint(0, len(dataset["train"]))]
print(data_sample)

{'id': '2365', 'translation': {'en': 'Holmes said little more, but the picture of the old roysterer seemed to have a fascination for him, and his eyes were continually fixed upon it during supper. It was not until later, when Sir Henry had gone to his room, that I was able to follow the trend of his thoughts.', 'pl': 'Mój przyjaciel umilkł, ale nie odrywał oczu od portretu. Dopiero po naszem rozejściu się na spoczynek dowiedziałem się, dlaczego to płótno budzi w nim tak żywo zaciekawienie.'}}


### Create a Scores Database

To remember interactions with users, we need some permanent cloud storage. We're going to use a small SQLite database, hosted on Google Drive, storing results of each game.

_You might need to accept some Google Drive access permissions to run this cell_.

In [14]:
import sqlite3

database_path = "database.db"

def run_db(fun):
    con = sqlite3.connect(database_path)
    cur = con.cursor()
    ret = fun(cur)
    con.commit()
    con.close()
    return ret

run_db(lambda cur: cur.execute("CREATE TABLE IF NOT EXISTS scores(wins)"))

def save_score(won):
    run_db(lambda cur: cur.execute(f"INSERT INTO scores VALUES ({won})"))

def get_average_score():
    return run_db(lambda cur: cur.execute(f"SELECT AVG(wins) FROM scores").fetchall()[0][0])

### Generate Data
Now, we're going to generate data for our test. As our translator, we're going to use Cohere Chat prompted to translate, and as a baseline, translations from our dataset.

For a fair test, we're going to randomize the order of translations.

In [16]:
import random
import json

preprompt = "Translate the following polish sentence to english. Do not write anything else then this translation:\n "

def translate_with_chat(text):
    role = "user"
    content = preprompt + text
    message = {"role": role, "content": content}

    response = llm.chat(model="command-r-plus-08-2024", messages=[message])
    return response.message.content[0].text

def get_values():
    data = dataset["train"][random.randint(0, len(dataset["train"]))]
    i, orig_text, trans1 = data["id"], data["translation"]["pl"], data["translation"]["en"]
    trans2 = translate_with_chat(orig_text)

    # now randomize order for a blind test
    where_chat = random.randint(0, 2)
    if where_chat == 0:
        trans1, trans2 = (trans2, trans1)
    return trans1, trans2, orig_text, where_chat

### Define GradIO Interface

To patch all of the above into one interface, we’re going to use GradIO [blocks](https://www.gradio.app/docs/blocks) API. This allows us to create custom web applications that communicate with your model, with a minimal amount of code.

The `response` function is the one wrapping our data generation and storing processes into one. Given the current state of the interface, it returns the next state, saving the results to the database along the way. Note that, variables `games` and `wins` are local and exist within one session, while `save_score` saves user input to the permanent database. Therefore, our session score will restart. All of the fields within the `response` function are simply converted to their content and can be operated as regular `string` or `int`.




In [17]:
def response(trans1, trans2, original_text, where_chat, games, wins, score_text, verdict):
    won = where_chat == verdict
    save_score(won)
    games += 1

    if won:
        wins += 1

    return *get_values(), games, wins, f"Session score {wins}/{games}"

def response_1(*args):
    return response(*args, verdict=0)

def response_2(*args):
    return response(*args, verdict=1)

with gr.Blocks() as demo:
    trans1_init, trans2_init, orig_text_init, where_chat_init = get_values()
    games, wins, where_chat = gr.State(0), gr.State(0), gr.State(where_chat_init)

    text = gr.Markdown(f"# Which Translation is better?")
    local_score_text = gr.Markdown(f"Session score 0/0")
    global_score_text = gr.Markdown(f"")

    original_text = gr.Text(label="Original Text", value=orig_text_init)
    trans1 = gr.Text(label="Translation 1", value=trans1_init)
    trans2 = gr.Text(label="Translation 2", value=trans2_init)
    btn1, btn2 = gr.Button("1"), gr.Button("2")

    fields = [trans1, trans2, original_text, where_chat, games, wins, local_score_text]
    btn1.click(response_1, inputs=fields, outputs=fields)
    btn2.click(response_2, inputs=fields, outputs=fields)

### Run the Interface

Now run the interface in a cell. Note that, similarly to the previous case, while this also runs on a public URL, it will stop after killing the cell. To run this indefinitely, you need a 24/7 server.

In [18]:
demo.launch(debug=True)

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Keyboard interruption in main thread... closing server.




### Define the Scoreboard

Finally, we're going to define a simple scoreboard, which allows us to check the aggregated score of all annotators, that is stored on your Google Drive. This may be achieved using a much simpler [interface](https://www.gradio.app/docs/interface) API, wrapping only one function with no state.


In [19]:
scoreboard = gr.Interface(
    fn=lambda: f"Global Score: {get_average_score()}",
    inputs=[],
    outputs=["text"],
    description="Click Generate to check global score!"
)

### Run Scoreboard

Run the function to check the global score. This should aggregate all of your sessions (and these on public URLs) and remain saved after shutting down the notebook.




In [20]:
scoreboard.launch(debug=True)

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Using existing dataset file at: .gradio/flagged/dataset1.csv
Keyboard interruption in main thread... closing server.


