<a href="https://colab.research.google.com/github/datafyresearcher/datafy-finetuning-beginner/blob/main/notebooks/Basic/06_LLM_Custom_InputOutput_FinetuningFree.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Lamini](https://www.lamini.ai/): The LLM engine for rapidly customizing models 🦙
Walk through Lamini's finetuning pipeline, so you can train custom models on your data.

- It's free at $0 per training run.
- It's fast at less than 15 minutes.
- It's similar to a nearly unlimited prompt size. The toy example here takes in ~120k tokens, more than the largest prompt sizes.
- It's learning new information, not just trying to make sense of it given what it already learned (retrieval-augmented generation, or RAG). The best models use both RAG and finetuning together, which can be done easily with Lamini.


Here, you'll also explore ways to customize an LLM to your data, across different use cases, including but not limited to:
* Chatting / answering questions
* Scoring customer support conversations
* Extracting values from HTML forms
* Querying over code/engineering docs
* Reasoning and routing agents
* Writing articles
* Summarizing content and suggesting actions, e.g. from meeting transcripts
* Searching content, e.g. from Google docs
* Recommending content, e.g. health recs from patient EMR data

# Setup 🛠️
#### Note: You will be asked to sign in with Google, connected to your Lamini account.


In [1]:
# @title Step 1: Authenticate with Google

from google.colab import auth
import requests
import os
import yaml

def authenticate_powerml():
  auth.authenticate_user()
  gcloud_token = !gcloud auth print-access-token
  powerml_token_response = requests.get('https://api.powerml.co/v1/auth/verify_gcloud_token?token=' + gcloud_token[0])
  print(powerml_token_response)
  return powerml_token_response.json()['token']

key = authenticate_powerml()

config = {
    "production": {
        "key": key,
        "url": "https://api.powerml.co"
    }
}

keys_dir_path = '/root/.powerml'
os.makedirs(keys_dir_path, exist_ok=True)

keys_file_path = keys_dir_path + '/configure_llama.yaml'
with open(keys_file_path, 'w') as f:
  yaml.dump(config, f, default_flow_style=False)


<Response [200]>


In [2]:
# @title Step 2: Install the open-source [Lamini library](https://pypi.org/project/lamini/) to use LLMs easily

#===> Run this block, when using the Google Colab. Otherwise, do not run it.

if 'google.colab' in str(get_ipython()):
  print('Running on CoLab')
  # Install the package
  !pip install --upgrade --force-reinstall --ignore-installed -qqq lamini
else:
  print('Not running on CoLab')

Running on CoLab
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m49.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m92.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m74.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m73.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m394.2/394.2 kB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━

# 🚨 Note: After installing, go to "Runtime" menu bar and then click on "Restart session" button, When the end of the installation package. Further, then go onto the next cell.

# 🚨 Lamini is just on a more recent version of numpy than Colab.

# Build & Run 🏃‍♀️

Set up your LLM interface, with expected input and output types.
* These are based on standard Pydantic types.
* In this simple example, your input type is a question and the output type is an answer.
* Note: There's a Context field. This can be lightly prompt engineered.
* We include some other common types at the end of this notebook.

In [1]:
from llama import Type, Context

class Input(Type):
    question: str = Context("question")

class Output(Type):
    answer: str = Context("answer to question")


Next, it's model time:
* Instantiate a model with the `InputOutputRunner` class.
* Specify which `input_type` and `output_type` you want to use.
* `model_name` is the base model you'll use. This one works for the free-tier. Please [contact us](https://www.lamini.ai/contact) for larger models.

In [2]:
from llama import InputOutputRunner

llm = InputOutputRunner(input_type=Input, output_type=Output, model_name="meta-llama/Llama-2-7b-chat-hf")

Then, it's data time:
* The `load_data_from_jsonlines` method takes a jsonline filepath and loads it into the model. More ways to load data from different files and basic python types in the [Lamini library docs](https://lamini-ai.github.io/runners/input_output_runner/).
* Set `verbose=True` to see if your data was loaded correctly into the input and output types, as well as how much data was added — always a good sanity check :)
* We recommend ~100-1000 examples to see some improvement in the base model here.

In [3]:
!wget -q -O "lamini_docs.jsonl" "https://drive.google.com/uc?export=download&id=1rJDDI2wvEL4npvtOUaJ_-1zuCjBgSxHw"

In [4]:
# This code reads a JSONL file line by line, keeps the first 10 rows (or fewer if there are fewer than 10 rows in the file), and writes them to a new file.

import json

def read_and_process_jsonl(input_file, output_file, keep_rows=10):
    with open(input_file, 'r') as input_file:
        lines = input_file.readlines()

    # Ensure not to exceed the total number of rows in the file
    keep_rows = min(keep_rows, len(lines))

    # Keep the first 'keep_rows' rows
    selected_rows = lines[:keep_rows]

    with open(output_file, 'w') as output_file:
        output_file.writelines(selected_rows)

# Replace 'input.jsonl' and 'output.jsonl' with your actual file names
read_and_process_jsonl('lamini_docs.jsonl', 'lamini_docs_output.jsonl', keep_rows=10)


In [5]:
llm.load_data_from_jsonlines("lamini_docs_output.jsonl", verbose=True)

{'question': "What are the different types of documents available in the repository (e.g., installation guide, API documentation, developer's guide)?"} {'answer': 'Lamini has documentation on Getting Started, Authentication, Question Answer Model, Python Library, Batching, Error Handling, Advanced topics, and class documentation on LLM Engine available at https://lamini-ai.github.io/.'}
{'question': 'What is the recommended way to set up and configure the code repository?'} {'answer': 'Lamini can be downloaded as a python package and used in any codebase that uses python. Additionally, we provide a language agnostic REST API. We’ve seen users develop and train models in a notebook environment, and then switch over to a REST API to integrate with their production environment.'}
{'question': 'How can I find the specific documentation I need for a particular feature or function?'} {'answer': 'You can ask this model about documentation, which is trained on our publicly available docs and s

The moment you've been waiting for: training!
* This will take ~10-15min (after it's thru the free-tier queue).
* Keep this cell running.
* You can see your model status on your [dashboard](https://app.lamini.ai/train).
* The run is private by default. Setting `is_public` to True makes the run shareable and public.

In [6]:
llm.train(is_public=True)

Training job submitted! Check status of job 4376 here: https://app.lamini.ai/train/4376
Finetuning process completed, model name is: 5a19a1c990c656331eb4d364532cc477b2693dc5c706090bed5176503e1e95f2


Finally, check out your model's results to compare the finetuned model to the base model's results.

In code here, or on your [dashboard](https://app.lamini.ai/train).

Please note that the dashboard results are not parsed, so you can see (and debug) the prompts visibly and easily.

In [7]:
llm.evaluate()

{'job_id': 4376,
 'eval_results': [{'input': "Task Definition:\nGiven: question\nGenerate: answer\n\nTask:\nGiven: question: What are the different types of documents available in the repository (e.g., installation guide, API documentation, developer's guide)?\n\nGenerate: answer\nanswer: ",
   'outputs': [{'model_name': '5a19a1c990c656331eb4d364532cc477b2693dc5c706090bed5176503e1e95f2',
     'output': ' Task Definition:\nGiven: question\nGenerate: answer'},
    {'model_name': 'Base model (meta-llama/Llama-2-7b-chat-hf)',
     'output': "\nThere are several types of documents available in the repository, including:\n\n* Installation guide: This document provides step-by-step instructions for installing and configuring the repository.\n* API documentation: This document provides detailed information about the repository's API, including its endpoints, parameters, and response formats.\n* Developer's guide: This document provides detailed information about developing applications that in

## Congratulations, you've finetuned an LLM 🎉

* As you can see, the base model is really off the rails.
* Meanwhile, finetuning got the LLM to answer questions about new Lamini information correctly and coherently!

Thanks for the tiny LLM, I'm ready for the real deal 💪

* If you want to build larger LLMs, run this live in production, host this on your own infrastructure (e.g. VPC or on premise), or other enterprise features, [please contact us](https://www.lamini.ai/contact).

## Inference: Running the LLM after it's trained

In [8]:
new_input = Input(question="How do I add data? Please help")
llm(new_input)

Output(answer=' To add data, you can follow these steps: Step 1: Determine the type of data you want to add and')

Note that the output is in the Output type you specified.

Use the model later, by instantiating it like this.

In [9]:
model_name = "5a19a1c990c656331eb4d364532cc477b2693dc5c706090bed5176503e1e95f2" # Or your model ID here

In [10]:
llm_later = InputOutputRunner(input_type=Input, output_type=Output, model_name=model_name)

## Customize inputs and outputs to your heart's desire!

Here are some other common examples that we've seen:

In [None]:
#######################################
# Score customer support conversations
#######################################
class InputConversation(Type):
    conversation: str = Context("conversation turns for customer support")

class OutputScore(Type):
    reason_for_score: str = Context("think step by step on how to score the customer conversation")
    score: int = Context("score of 0 (bad, customer is unhappy), 1 (good, customer is happy with results), or -1 (unsure, unclear resolution)")


#######################################
# Extract HTML elements
#######################################
class InputHTMLDOM(Type):
    html_dom: str = Context("html dom of a form page with first name, last name, and credit card number fields filled out")

class OutputForm(Type):
    first_name: str = Context("first name of the person")
    last_name: str = Context("last name of the person")
    credit_card_number: str = Context("credit card number of the person")


#######################################
# Code/engineering docs
#######################################
class FunctionQuestion(Type):
    function: str = Context("function to ask about")

class FunctionAnswer(Type):
    inputs: list = Context("list of inputs to the function")
    outputs: list = Context("list of outputs from the function")
    description: str = Context("function description in 2 to 5 lines")


#######################################
# Reasoning and routing agent
#######################################
class UserQuery(Type):
    query: str = Context("user's query")

class AgentAction(Type):
    thinking_steps: list = Context("think step by step about what action the agent needs to take")
    action: str = Context("action to take")
    backup_action: str = Context("fallback action to take")


#######################################
# Article writing
#######################################
class Topic(Type):
    topic: str = Context("what topic the article should cover")

class Article(Type):
    title: str = Context("article title starting, written in markdown, e.g. with #")
    article: str = Context("article body text, written in markdown")


#######################################
# Content summarization
#######################################
class Transcript(Type):
    meeting_transcript: str = Context("transcript of the meeting (may contain some transcription errors)")

class McKinseySummary(Type):
    summary: str = Context("content summary in 3 bullet points")
    action_items: list = Context("next action items to take")


#######################################
# Search
#######################################
# Supporting documents
class Documents(Type):
    documents: list = Context("google docs dump")

# Input
class UserQuery(Type):
    search_query: str = Context("user's query")

# Output
class Result(Type):
    keywords: list = Context("keywords from the search query and results")
    results: list = Context("top 3 search results")
    reference_docs: list = Context("reference ID of the documents retrieved to support this search query") # likely this is done another way for precision, but this is a start


#######################################
# Recommendation
#######################################
class EMR(Type):
    latest_hospital_visit: str = Context("patient's most recent hospital visit")

class Recommendation(Type):
    health_recommendations: list = Context("health recommendations")
