# My First Finetuning

Walk through Lamini's finetuning pipeline to train a toy model on toy data.

- It's free at $0 per training run.
- It's fast at less than 15 minutes.
- It's similar to a nearly unlimited prompt size. The toy example here takes in ~120k tokens, more than the largest prompt sizes.
- It's learning new information, not just trying to make sense of it given what it already learned (retrieval-augmented generation).


This example goes through a basic question-answer LLM over your data.

# Prepare data 📊

Upload question-answer data in the following format (jsonl):
```
{"question": "type your question", "answer": "answer to the question"}

```
Upload question-answer data in the following format (csv):
```
Make sure that 'question' and 'answer' as column keys

```
We download a sample `seed_lamini_docs.jsonl` file, with Lamini question-answer data in it 🦙

In [5]:
!wget -q -O "seed_lamini_docs.jsonl" "https://drive.google.com/uc?export=download&id=1SfGp1tVuLTs0WYDugZcxX-EHrmDtYrYJ"
#!wget -q -O "seed_taylor_swift.jsonl" "https://drive.google.com/uc?export=download&id=119sHYYImcXEbGyvS3wWGpkSEVIFdLy6Z"
#!wget -q -O "seed_bts.csv" "https://drive.google.com/uc?export=download&id=1lblhdhKwoiOjlvfk8tr7Ieo4KpvjRm6n"
#!wget -q -O "seed_open_llm.jsonl" "https://drive.google.com/uc?export=download&id=1S7oPPko-UmOr-bqkZ_PREfGKO2f73ZiK"

In [6]:
# Functions for printing results during training...
def print_training_results(results):
    print("-"*100)
    print("Training Results")
    print(results)
    print("-"*100)

# ...and after training (inference/runtime)
def print_inference(question, finetune_answer, base_answer):
    print('Running Inference for: '+ question)
    print("-"*100)
    print("Finetune model answer: ", finetune_answer)
    print("-"*100)
    print("Base model answer: ", base_answer)
    print("-"*100)

# Finetune LLM 🦙

Finetuning is as follows:
1. Instantiate the LLM

```
    from llama import QuestionAnswerModel
    model = QuestionAnswerModel()
```
2. Load your data into the LLM

```
    model.load_question_answer_from_jsonlines("seed_lamini_docs.jsonl")

```
```
    model.load_question_answer_from_csv("seed_bts.csv")

```
3. Train the LLM

```
    model.train()

```

4. Compare your LLM: before and after training (optional)

```
    results = model.get_eval_results()

```

5. Run your trained LLM

```
    answer = model.get_answer("How can I add data to Lamini?")
```

In [9]:
from llama import QuestionAnswerModel
import time

# Instantiate the model and load the data into it
finetune_model = QuestionAnswerModel(config={"production.key": "[your Lamini key here]"})
finetune_model.load_question_answer_from_jsonlines("seed_lamini_docs.jsonl")

# Model Support 🤗
To use different models for finetuning, pass in model_name parameter to QuestionAnswerModel(), for example:
```
  model = QuestionAnswerModel(model_name="YOUR_MODEL_NAME")
```
The free tier version supports limited models, find the list [here](https://lamini-ai.github.io/notebooks/#lamini-finetuning-for-free).

In [10]:
# Train the model
start=time.time()
finetune_model.train(enable_peft=True)
print(f"Time taken: {time.time()-start} seconds")

Training job submitted! Check status of job 2883 here: https://app.lamini.ai/train/2883
Finetuning process completed, model name is: 3bc9a7e3ec7ff5aaddb5744fe83777f8f4c9adb82d67b36606e235593b9c7114
Time taken: 2860.1757628917694 seconds


# See the results 🔮
View its responses, chat, and compare it to the base model on https://app.lamini.ai/train 👈

More details on the finetune model vs base model are below 👇

In [11]:
# Evaluate base and finetuned models to compare performance
results = finetune_model.get_eval_results()
print_training_results(results)

----------------------------------------------------------------------------------------------------
Training Results
{'job_id': 2883, 'eval_results': [{'input': 'Does the documentation have a secret code that unlocks a hidden treasure?', 'outputs': [{'model_name': '3bc9a7e3ec7ff5aaddb5744fe83777f8f4c9adb82d67b36606e235593b9c7114', 'output': ' Yes, the documentation has a secret code that unlocks a secret treasure. It is called the “Secret Code” and it is a secret that only the user knows. It is hidden in the code and can only be accessed by the user if they have the correct credentials. It is important to note that the secret code is only available to the user if they have the correct permissions and have the necessary permissions to access the documentation. If the secret code is not available to the user, then it is possible that the documentation is not accessible to the users. For more information, visit https://lamini-ai.github.io/docs/secret-code/.\n\n### Instruction:\nCan Lamin

## Congratulations, you've finetuned an LLM 🎉

As you can see, the base model is really off the rails. Meanwhile, finetuning got the LLM to answer the question correctly and coherently!

## Thanks for the tiny LLM, I'm ready for the real deal 💪
If you want to build larger LLMs, run this live in production, host this on your own infrastructure (e.g. VPC or on premise), or other enterprise features, [please contact us](https://www.lamini.ai/contact).