# Data Augmention using Generative LLMs


<center><img src="https://media.licdn.com/dms/image/v2/D4E12AQGLk5R0lcfr8A/article-cover_image-shrink_720_1280/article-cover_image-shrink_720_1280/0/1713883053110?e=1737590400&v=beta&t=0b0YHpIGPl8KxMzLfEhhwFbkcrT3ZLz2BDTaax-4HNM" alt="llama" style="width:60%"></a></center>

<br><br><br><br><br><br><br>
### Setting things up:

First, we need to set some things up. This usually starts by **creating a virtual environment**.

Open your preferred command line. Make sure **Python** is available. This means you should be able to enter the Python promp if you type ```Python```.

**Navigate/create a folder** where you want to run your system. 

Inside that folder you can run the following commands:

```
conda deactivate           # deactivate conda in case you have it active
python -m venv venv       # create a folder 'venv' that will contain your virtual environment
source venv/bin/activate   # activate your virtual environment; you should now see '(venv)' on the left of your terminal prompt
```


You are now inside your clean virtual enviroment: nothing except python should be available. 
**Let's start installing the necessary things**!

**Run the following commands** inside your terminal:

```
pip install llama-cpp-python
pip install openai
pip install sse_starlette
pip install starlette_context
pip install pydantic_settings
pip install "fastapi[all]"
pip install jupyter
```

We should everything we need to run LLMs locally...  We're almost there! 

<br><br>

#### What are we missing?


<br><br><br><br><br><br><br><br><br><br><br><br>
# An LLM!  
### (Where can we find them? Aren't they huge?)

<br><br><br><br><br><br><br><br><br><br><br><br>


# Quantization
#### What is it? 
[Read more here](https://medium.com/@techresearchspace/what-is-quantization-in-llm-01ba61968a51)

<br><br><br><br><br><br><br><br><br><br>


**Advantages:**
 - Lesser memory consumption: Lower bit width results in less memory for storage
 - Fast Inference: This is due to efficient computation due to its lower memory bandwidth requirements
 - Less energy consumption: Larger model need more data movement and storage resulting in more energy consumption. Hence a smaller model results in compartively lesser energy usage.
 - Smaller models: Can quantize to suit the need and deploy to device with samller hadware specifications.

**Disadvantages:**
 - Potential loss in accuracy due to squeezing of high precision weight to lower precision.

### We'll be using Quantized LLMs, in GGUF format
[Click here to know more about and how to find GGUF models](https://huggingface.co/docs/hub/en/gguf)

[QuantFactory](https://huggingface.co/QuantFactory) is a good place to start!

<br><br><br><br><br><br><br><br><br><br><br>

### Once you download your GGUF model, you should be able to run it (from within the virtual environment):

`
python -m llama_cpp.server --host 0.0.0.0 --model ./models/Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf --n_ctx 4096
`

This launches the **LLM as a server** that **can be used while it is running**.
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

# Using the LLM

In [11]:
from openai import OpenAI
from collections import defaultdict as dd

# Point to the server
client = OpenAI(base_url="http://localhost:9000/v1", api_key="cltl")

#client = OpenAI(base_url="http://130.37.53.128:9002/v1", api_key="cltl")


## 1) In chat mode
- Try to understand how the contents of `history` is used
- What are the different `"roles"`?
- How to stop the chat?

In [25]:
history = [
    {"role": "system", "content": "You an chat-based assistant. Be cooperative, and polite. Try to be concise."},
]

while True:
    completion = client.chat.completions.create(
        model="local-model",
        messages=history,
        temperature=0.8,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}

    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)
    print("\n")
    userinput = input("> ")
    if userinput.lower() in ["bye", "quit", "exit"]:
        print("\n BYE BYE!")
        break
    history.append({"role": "user", "content": userinput})
    print("\n")

How can I assist you today?



>  With nothing. Thank you.




Feel free to reach out if you need anything in the future. Have a great day!



>  exit



 BYE BYE!


<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

## 2) In single prompt mode
- Is `history` still used?
- What does temperature do? What should theoretically happen if we set it to 0?


In [84]:
def query_LLM(model_client, prompt, temp=0.6):
    history = [
        {"role": "system", "content": prompt},
    ]
    completion = model_client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=temp,
        stream=False,
    )

    return completion.choices[0].message.content

In [24]:
prompt = "What is prompt engineering?"
print(query_LLM(client, prompt, temp=0.6))

Prompt engineering, also known as prompt design or prompt optimization, is the process of crafting and refining input prompts to elicit specific, accurate, and informative responses from language models. It involves understanding how language models generate text based on inputs and designing optimal prompts that guide them towards desired outputs.

The goal of prompt engineering is to improve the quality and relevance of a model's responses by making its internal workings more interpretable and controllable. This requires a combination of natural language processing (NLP) expertise, domain knowledge, and understanding of how language models process text.

Prompt engineering has several applications across various industries:

1. **Chatbots and Virtual Assistants**: Crafting prompts to elicit specific information or actions from users.
2. **Language Translation**: Designing prompts for language translation systems to capture nuances and context.
3. **Text Summarization**: Creating prom

In [125]:
prompt = "What is the best drink in the world? Your answer should contain the name of a single drink and nothing more."
print(query_LLM(client, prompt, temp=1))

Coffee.


<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

# Data Augmentation -- Machine Translation as a case study
1) We must know what we want
2) We must know if we can/how to ask it
3) We must learn how to process its output

## 1) Who can help me understand what you need for your assignment?

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

## 2) We must know if we can/how to ask it
- Am I using the right LLM for the task?
- Should we be using the chat or the single prompt method?

In [26]:
prompt = "Can you translate text?"
# prompt = "What languages were you trained with?"
print(query_LLM(client, prompt, temp=0.6))

I can translate text from one language to another. I support translations in many languages, including but not limited to:

* European languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (Simplified and Traditional), Japanese, Korean
* Indian languages: Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam, Kannada
* Middle Eastern languages: Arabic, Hebrew, Persian (Farsi)
* Asian languages: Thai, Vietnamese, Indonesian, Malaysian

To translate text, you can:

1. Type or paste the text into this chat window.
2. Specify the source language and target language.

For example, you can say "Translate 'Hello' from English to Spanish" or simply type "translate hello en es".

Keep in mind that my translation abilities are based on machine learning algorithms, so while I strive for accuracy, there may be occasional errors or nuances that get lost in translation. If you need a professional or high-stakes translation, it's always best to consu

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

## 3) We must learn how to process its output
- How do we stop all the yapping?
- What is the difference between zero-/one-/few-shot prompting?
- How do we get the ouput in the format we want?

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

In [47]:
# Working Example (Make sure you understand why we use a WHILE loop, and why we must use the TRY statement;) 
prompt = """
Please provide 5 english sentences and their respective portuguese translations. Each English sentence should be between 10 and 15 words long and must contain an idiom.
Your answer should be a list of list in python. The first element of each list should contain the English sentence and the second element should contain the Portuguese translation.
Provide only the list and nothing else. For example:
[["english sentence"], ["portuguese translation], ...]
"""

mt_list = []

import json

while len(mt_list) < 15:
    answer = query_LLM(client, prompt, temp=0.2)

    try:
        a = json.loads(answer)

        for item in a:
            mt_list.append(item)
    except:
        print("Failed to parse: ", answer)
        print("Trying again!\n\n\n")

print("\n\n\nDONE:")
for pair in mt_list:
    print(pair)
    print("\n")




DONE:
['The company is between a rock and a hard place with these financial decisions.', 'A empresa está entre uma pedra e um morro com essas decisões financeiras.']


["He's burning the midnight oil to meet this project deadline, but it's worth it.", 'Ele está queimando o óleo da meia-noite para cumprir a data-limite desse projeto, mas vale a pena.']


['The new policy is a double-edged sword for our business, both positive and negative impacts.', 'A nova política é uma espada de dois gumes para nossa empresa, tanto impactos positivos quanto negativos.']


["After the scandal, the company's reputation was left in tatters, a complete loss of trust.", 'Depois do escândalo, a reputação da empresa ficou em frangalhos, uma perda completa de confiança.']


['The team is feeling under the weather after that grueling project, they need some rest.', 'O time está se sentindo mal de saúde após aquele projeto agoniante, eles precisam de um pouco de descanso.']


['The company is between a rock

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

# Quality Estimation / "Evaluation" with BLEU

<center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdPDCCJH2WVAuEWUvp-RWdQITk9L8dB2p62GVI9CzLHd_hC2cED4wovkTY07sSZmYHtiWcHbSUhPRzbg_2DYyHiq_9gElMN85ZwZAI2gPcuwQNleQATdqUlrd8klzjOLhvh-weaAWdqkA2/s1600/BLEU4.png" alt="llama" style="width:90%"></a><br><a href="https://kv-emptypages.blogspot.com/2019/04/understanding-mt-quality-bleu-scores.html">Taken from/Read more here</a></center>



<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

In [None]:
!pip install nltk

In [6]:
import nltk
from nltk.util import ngrams
from nltk.translate.bleu_score import sentence_bleu
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /Users/lmc/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [13]:
source_sent = "le professeur est arrivé en retard à cause de la circulation"
reference_transl = "the teacher arrived late because of the traffic"
reference_transl_tok = nltk.word_tokenize(reference_transl)
ngrams_reference = []
for n in [1,2,3,4]:
    ngrams_reference = ngrams_reference + list(ngrams(reference_transl_tok,n))

print(ngrams_reference)

[('the',), ('teacher',), ('arrived',), ('late',), ('because',), ('of',), ('the',), ('traffic',), ('the', 'teacher'), ('teacher', 'arrived'), ('arrived', 'late'), ('late', 'because'), ('because', 'of'), ('of', 'the'), ('the', 'traffic'), ('the', 'teacher', 'arrived'), ('teacher', 'arrived', 'late'), ('arrived', 'late', 'because'), ('late', 'because', 'of'), ('because', 'of', 'the'), ('of', 'the', 'traffic'), ('the', 'teacher', 'arrived', 'late'), ('teacher', 'arrived', 'late', 'because'), ('arrived', 'late', 'because', 'of'), ('late', 'because', 'of', 'the'), ('because', 'of', 'the', 'traffic')]


In [20]:
transl_list = [
    "The professor was delayed due to the congestion",
    "Congestion was responsible for the teacher being late",
    "The teacher was late due to the traffic",
    "The professor arrived late because circulation",
    "The teacher arrived late because of the traffic"
]

transl_list_tokenized = []
for sent in transl_list:
    transl_list_tokenized.append(nltk.word_tokenize(sent))

print(transl_list_tokenized)

[['The', 'professor', 'was', 'delayed', 'due', 'to', 'the', 'congestion'], ['Congestion', 'was', 'responsible', 'for', 'the', 'teacher', 'being', 'late'], ['The', 'teacher', 'was', 'late', 'due', 'to', 'the', 'traffic'], ['The', 'professor', 'arrived', 'late', 'because', 'circulation'], ['The', 'teacher', 'arrived', 'late', 'because', 'of', 'the', 'traffic']]


In [21]:
from nltk.translate.bleu_score import sentence_bleu

ngram_weights = (0.10, 0.30, 0.30, 0.30) # weights for 1-gram, 2-gram, 3-gram, 4-gram

for translation in transl_list_tokenized:

    # Fine the translation n-grams
    # Not needed for the score, just to see the overlap
    sent_ngrams = []
    for n in [1,2,3,4]:
        sent_ngrams = sent_ngrams + list(ngrams(translation,n))
    
    
    bleu_score1 = sentence_bleu([reference_transl_tok], translation)  # This can be a list of references 
    bleu_score2 = sentence_bleu([reference_transl_tok], translation, weights=ngram_weights) # This can be a list of references 

    print(bleu_score1, bleu_score2)

    #1-gram overlap
    print(set(ngrams_reference) & set(sent_ngrams))
    print("\n\n")

1.0832677820940877e-231 1.052691193011681e-277
{('the',)}



7.176381577237209e-155 1.2950316234712509e-185
{('the', 'teacher'), ('teacher',), ('late',), ('the',)}



7.711523862191631e-155 1.3328284280434942e-185
{('the',), ('teacher',), ('late',), ('traffic',), ('the', 'traffic')}



4.1382219658909647e-78 1.695647221393335e-93
{('arrived', 'late'), ('arrived', 'late', 'because'), ('because',), ('late',), ('late', 'because'), ('arrived',)}



0.8408964152537145 0.834236890454548
{('teacher', 'arrived'), ('of', 'the', 'traffic'), ('late', 'because', 'of', 'the'), ('the',), ('late', 'because'), ('because', 'of'), ('arrived', 'late', 'because', 'of'), ('arrived', 'late'), ('teacher', 'arrived', 'late', 'because'), ('teacher', 'arrived', 'late'), ('because', 'of', 'the', 'traffic'), ('arrived',), ('late', 'because', 'of'), ('because', 'of', 'the'), ('late',), ('of', 'the'), ('because',), ('arrived', 'late', 'because'), ('teacher',), ('traffic',), ('the', 'traffic'), ('of',)}





## Why is the last one not 1.0?