<a href="https://colab.research.google.com/github/datafyresearcher/datafy-finetuning-beginner/blob/main/notebooks/Basic/05_QuestionAnswer_LLMFinetuning_Free.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetune a question-answer LLM over your data using [Lamini](https://www.lamini.ai/)

- Prepare question-answer pairs
- Load it into the LLM
- Finetune the LLM on it within 15 minutes.

It's completely free! What's special is that the LLM is learning not only how to answer questions, but also new up-to-date information that the general LLMs aren't away of.

We include some question-answer datasets for you to finetune:
- Lamini engineering docs
- Taylor Swift recent facts
- Open-Source LLMs
- BTS recent facts

# Setup 🛠️
### Note: You will be asked to sign in with Google, connected to your Lamini account.


In [1]:
# @title Step 1: Authenticate with Google

from google.colab import auth
import requests
import os
import yaml

def authenticate_powerml():
  auth.authenticate_user()
  gcloud_token = !gcloud auth print-access-token
  powerml_token_response = requests.get('https://api.powerml.co/v1/auth/verify_gcloud_token?token=' + gcloud_token[0])
  print(powerml_token_response)
  return powerml_token_response.json()['token']

key = authenticate_powerml()

config = {
    "production": {
        "key": key,
        "url": "https://api.powerml.co"
    }
}

keys_dir_path = '/root/.powerml'
os.makedirs(keys_dir_path, exist_ok=True)

keys_file_path = keys_dir_path + '/configure_llama.yaml'
with open(keys_file_path, 'w') as f:
  yaml.dump(config, f, default_flow_style=False)


<Response [200]>


In [3]:
# @title Step 2: Install the open-source [Lamini library](https://pypi.org/project/lamini/) to use LLMs easily

#===> Run this block, when using the Google Colab. Otherwise, do not run it.

if 'google.colab' in str(get_ipython()):
  print('Running on CoLab')
  # Install the package
  !pip install --upgrade --force-reinstall --ignore-installed -qqq lamini
else:
  print('Not running on CoLab')


Running on CoLab
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m705.5/705.5 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.7/247.7 kB[0m [31m29.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m502.5/502.5 kB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m341.8/341.8 kB[0m [31m35.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.1/142.1 kB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.6/61.6 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.6/104.6 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.5/162.5 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[

# 🚨 Note: After installing, go to "Runtime" menu bar and then click on "Restart session" button, When the end of the installation package. Further, then go onto the next cell.

# 🚨 Lamini is just on a more recent version of numpy than Colab.


# Prepare your data 📊

Upload your question-answer data in the following format (jsonl):
```
{"question": "type your question", "answer": "answer to the question"}

```
Upload your question-answer data in the following format (csv):
```
Make sure that you have 'question' and 'answer' as column keys

```
You can also download a sample `seed_lamini_docs.jsonl` file, with Lamini question-answer data in it 🦙

Also we have some more example related to Taylor Swift &nbsp;👑, BTS &nbsp;💜, and Open LLMs &nbsp;📚, try it out!

In [4]:
!wget -q -O "seed_lamini_docs.jsonl" "https://drive.google.com/uc?export=download&id=1SfGp1tVuLTs0WYDugZcxX-EHrmDtYrYJ"
!wget -q -O "seed_taylor_swift.jsonl" "https://drive.google.com/uc?export=download&id=119sHYYImcXEbGyvS3wWGpkSEVIFdLy6Z"
!wget -q -O "seed_bts.csv" "https://drive.google.com/uc?export=download&id=1lblhdhKwoiOjlvfk8tr7Ieo4KpvjRm6n"
!wget -q -O "seed_open_llm.jsonl" "https://drive.google.com/uc?export=download&id=1S7oPPko-UmOr-bqkZ_PREfGKO2f73ZiK"

# **Finetune your LLM 🦙**

Finetuning has a simple interface. The basic premise is:


## 1. Instantiate the LLM

To use different models for finetuning, you can pass in model_name parameter to QuestionAnswerModel(), for example:
```
  model = QuestionAnswerModel(model_name="YOUR_MODEL_NAME")
```
Currently the free tier version supports limited models, you can find the list [here](https://lamini-ai.github.io/notebooks/#lamini-finetuning-for-free).

In [4]:
from llama import QuestionAnswerModel
import time

# Instantiate the model and load the data into it
finetune_model = QuestionAnswerModel()

## 2. Load your data into the LLM

In [5]:
# This code reads a JSONL file line by line, keeps the first 10 rows (or fewer if there are fewer than 10 rows in the file), and writes them to a new file.

import json

def read_and_process_jsonl(input_file, output_file, keep_rows=10):
    with open(input_file, 'r') as input_file:
        lines = input_file.readlines()

    # Ensure not to exceed the total number of rows in the file
    keep_rows = min(keep_rows, len(lines))

    # Keep the first 'keep_rows' rows
    selected_rows = lines[:keep_rows]

    with open(output_file, 'w') as output_file:
        output_file.writelines(selected_rows)

# Replace 'input.jsonl' and 'output.jsonl' with your actual file names
read_and_process_jsonl('seed_lamini_docs.jsonl', 'seed_lamini_docs_output.jsonl', keep_rows=10)


In [6]:
finetune_model.load_question_answer_from_jsonlines("seed_lamini_docs_output.jsonl")
# OR
# model.load_question_answer_from_csv("seed_bts.csv")

## 3. Train the LLM

Once the model finishes training, you can view its responses, chat, and compare it to the base model on https://app.lamini.ai/train 👈

In [7]:
# Train the model (4:30 minutes)
start=time.time()
finetune_model.train() # enable_peft=True
print(f"Time taken: {time.time()-start} seconds")

Training job submitted! Check status of job 4374 here: https://app.lamini.ai/train/4374
Finetuning process completed, model name is: 646526d469641d43640692e6399c8b312eebd57d88bcad6ad39a2393e373790a
Time taken: 283.47936820983887 seconds


## 4. Compare your LLM: before and after training (optional)

In [17]:
# Functions for printing results during training...
def print_training_results(results):
    print("-"*100)
    print("Training Results")
    print(results)
    print("-"*100)

In [18]:
# Evaluate base and finetuned models to compare performance
results = finetune_model.get_eval_results()
print_training_results(results)

----------------------------------------------------------------------------------------------------
Training Results
{'job_id': 4374, 'eval_results': [{'input': "What are the different types of documents available in the repository (e.g., installation guide, API documentation, developer's guide)?", 'outputs': [{'model_name': '646526d469641d43640692e6399c8b312eebd57d88bcad6ad39a2393e373790a', 'output': ' All of these are available in the documentation.\n\nHow can I find the specific documentation I need for a particular feature or function? You can ask this model about documentation, which is trained on our publicly available docs and source code, or you can go to https://lamini-ai.github.io/.\n\nHow frequently is the documentation updated to reflect changes in the code? Documentation on such a fast moving project is difficult to update regularly - that’s why we’ve built this model to continually update users on the status of our product.\n\nDoes the documentation provide information a

## 5. Run your trained LLM

In [9]:
answer = finetune_model.get_answer("How can I add data to Lamini?")
answer

'\n\nI have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form. I have a model called Lamini that I am using to build a form'

In [11]:
answer = finetune_model.get_answer("How frequently is the documentation updated to reflect changes in the code?")
answer

' All our public documentation is available here https://lamini-ai.github.io/'

In [13]:
answer = finetune_model.get_answer("How can I find the specific documentation I need for a particular feature or function?")
answer

" Or is there a list of all the documentation available for a particular feature or function? Or is there a list of all the documentation available for a particular function? - soren\n\nHi, I'm looking for the documentation for a particular feature or function. I've looked on the web and on the docs, but can't find the specific documentation I need for a particular feature or function? Or is there a list of all the documentation available for a particular feature or function? Or is there a list of all the documentation available for a particular function? - soren\n\nHi, I'm looking for the documentation for a particular feature or function. I've looked on the web and on the docs, but can't find the specific documentation I need for a particular feature or function? Or is there a list of all the documentation available for a particular function? Or is there a list of all the documentation available for a particular function? - soren\n\nHi, I'm looking for the documentation for a particu

## Congratulations, you've finetuned an LLM 🎉

As you can see, the base model is really off the rails. Meanwhile, finetuning got the LLM to answer the question correctly and coherently!