<img  src="https://raw.githubusercontent.com/doguilmak/InferenceVision/main/assets/Inference%20Vision%20Cover.png" alt="github.com/doguilmak/InferenceVision"/>

# Introduction

In this notebook, we demonstrate how to fine-tune and utilize the `EleutherAI/pythia-1b` model for question-answering (Q&A) tasks using the InferenceVision library. This guide will walk you through the process of setting up the necessary environment, loading the pre-trained model, and interacting with it for Q&A purposes.

<br>


    

## Model Information

The `EleutherAI/pythia-1b` model is a powerful architecture from the Pythia series, designed to handle a variety of natural language processing tasks. Its substantial number of parameters allows it to capture complex patterns and nuances in text, making it suitable for fine-tuning on custom Q&A tasks.

<br>


### Key Features:
- **Large Capacity**: With a significant number of parameters, this model is capable of understanding and generating complex responses.
- **Versatility**: It can be adapted to various NLP tasks, including question answering and text generation.
- **Efficiency**: The model's architecture ensures effective handling of large-scale data and intricate language patterns.

Script demonstrates how to clone the fine-tuned InferenceVision LLM
from Hugging Face, load the model and tokenizer, and interactively query it.

## Setting Up the Environment

To get started with fine-tuning and using the model, follow these steps:

1. **Install Dependencies**:
   - Ensure that you have the necessary libraries installed. This script uses `transformers` to handle the model and tokenizer, and standard libraries such as `torch`, `os`, and `subprocess`.

2. **Clone Model Repository from Hugging Face**:
   - The model repository is hosted on Hugging Face Hub. We use `git clone` to retrieve the model weights and tokenizer files.

3. **Load Pre-trained Model**:
   - Use `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library to load the fine-tuned model directly from the cloned directory.

4. **Set Device**:
   - Automatically detect and use a GPU if available; otherwise fall back to CPU. This ensures optimal inference performance on different systems.


## Interacting with the Model

Once the environment is set up and the model is loaded, you can start interacting with it. The notebook includes an interactive chat component that allows you to input questions and receive answers generated by the model. You can exit the conversation with typing `exit`.

```
if __name__ == "__main__":
    print("InferenceVision QA Chat (type 'exit' to quit)")
    while True:
        question = input("\nEnter your question: ")
        if question.lower().strip() in ("exit", "quit"):
            print("Exiting. Goodbye!")
            break
        answer = answer_question(question)
        print(f"Answer: {answer}\n")
```

Make sure your runtime is **GPU** (_not_ CPU or TPU). And if it is an option, make sure you are using _Python 3_. You can select these settings by going to `Runtime -> Change runtime type -> Select the above mentioned settings and then press SAVE`.


In [None]:
!nvidia-smi

<hr>

In [None]:
# @markdown **Creating the Environment**

# @markdown To to download the LLM weights and tokens, please run the code block.

import os
import subprocess
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

REPO_URL = "https://huggingface.co/doguilmak/inferencevision-llm"
CLONE_DIR = "inferencevision-llm"

# If directory doesn't exist, clone the repo
auth_env = os.environ.copy()

if not os.path.isdir(CLONE_DIR):
    print(f"Cloning repository from {REPO_URL} into {CLONE_DIR}...")
    subprocess.run(["git", "clone", REPO_URL, CLONE_DIR], env=auth_env, check=True)
else:
    print(f"Repository directory '{CLONE_DIR}' already exists. Skipping clone.")

In [None]:
# @markdown To set up the environment for using the model, follow these steps:
# @markdown 1. **Install Dependencies**: Ensure that the necessary libraries are installed.
# @markdown 2. **Load Pre-trained Model**: Use the provided script to load the model and tokenizer.
# @markdown 3. **Set Device**: Determine whether to use GPU or CPU based on availability.

model_dir = os.path.join(CLONE_DIR + "/llm")
print(f"Loading model and tokenizer from {model_dir}...")
model = AutoModelForCausalLM.from_pretrained(model_dir)
tokenizer = AutoTokenizer.from_pretrained(model_dir)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model.to(device)

def answer_question(question: str, max_length: int = 128) -> str:
    """
    Generate an answer to the input question using the fine-tuned LLM.
    """
    inputs = tokenizer(
        question,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    input_ids = inputs.input_ids.to(device)
    attention_mask = inputs.attention_mask.to(device)

    outputs = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_length=input_ids.shape[-1] + max_length,
        do_sample=False
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response[len(tokenizer.decode(input_ids[0], skip_special_tokens=True)):].strip()

The `EleutherAI/pythia-1b` is part of the Pythia series, which is designed to handle a wide range of natural language processing tasks with substantial capacity and efficiency. The model's large number of parameters enables it to capture intricate patterns and nuances in the data, making it suitable for fine-tuning on custom Q&A tasks.

<img  src="https://raw.githubusercontent.com/doguilmak/InferenceVision/refs/heads/main/assets/qa_top.png" height=200 width=1000 alt="github.com/doguilmak/InferenceVision"/>

In [None]:
# @markdown Run the code and interact with our advanced language model for conversations!
if __name__ == "__main__":
    print("InferenceVision QA Chat (type 'exit' to quit)")
    while True:
        question = input("\nEnter your question: ")
        if question.lower().strip() in ("exit", "quit"):
            print("Exiting. Goodbye!")
            break
        answer = answer_question(question)
        print(f"Answer: {answer}\n")

<img  src="https://raw.githubusercontent.com/doguilmak/InferenceVision/refs/heads/main/assets/qa_bottom.png" height=100 width=1500 alt="github.com/doguilmak/InferenceVision"/>