## Installing dependencies

In [None]:
%%capture
!pip install 'instructlab[cuda]' \
   -C cmake.args="-DLLAMA_CUDA=on" \
   -C cmake.args="-DLLAMA_NATIVE=off"
!pip install vllm

## Defining working directory

In [None]:
WORKSPACE = "/workspace"

## Taxonomy

Now I need to clone the taxonomy repository:

docs: https://docs.instructlab.ai/taxonomy/

In [None]:
# Remove existing taxonomy folder if exists
!rm -rf {WORKSPACE}/taxonomy

# Cloning the taxonomy repository
!git clone https://github.com/instructlab/taxonomy.git {WORKSPACE}/taxonomy

## Initialize the instructlab

Now I need to initialize the **instructlab**. Remember to get the taxonomy cloned repository absolut path - in my case, it is `/content/taxonomy`.

Note: If yout are in Jupyter notebook you will not be able to interact with the output. If so, run the next command in the terminal.

In [None]:
!ilab config init \
    --taxonomy-path {WORKSPACE}/taxonomy \
    --non-interactive

## Download the models

In this case we will be using unsloth Llama 3.2.

Note: It will need the `HF_TOKEN` in the environment.

In [None]:
# Download with safetensors
!ilab model download \
--repository instructlab/merlinite-7b-lab

## Synthetic Data

Lets clone the minimodel repository.

I've created inside the minimodel repo the `knowledge/taxonomy` folder with `minima_qna.yaml` following the knowledge template provided by the `taxonomy` repository.

The `minima_qna.yaml` file contains the questions and answers extracted from the minima whitepaper and brand
 guidelines.

In [None]:
import os

# Add GitHub access token
github_token = os.getenv("GITHUB_TOKEN")
!git config --global url."https://{github_token}@github.com/".insteadOf https://github.com/

# Remove any existing minim
!rm -rf {WORKSPACE}/minimodel

# Clone the minimodel repository
!git clone -b feat/translating-knowledge --single-branch https://github.com/akio-code/minimodel.git {WORKSPACE}/minimodel

### Copying the minima taxonomy

Now we need to copy the `minima_qna.yaml` and `attribution.txt` to the **taxonomy** repository. It will be use to generate the data with the instructlab.

In [None]:
# Create the destination folder
!mkdir -p {WORKSPACE}/taxonomy/knowledge/business

# Copy from minimodel
!cp {WORKSPACE}/minimodel/knowledge/taxonomy/minima_qna.yaml {WORKSPACE}/taxonomy/knowledge/business/qna.yaml
!cp {WORKSPACE}/minimodel/knowledge/taxonomy/attribution.txt {WORKSPACE}/taxonomy/knowledge/business/

Now I check if the taxonomy is valid:

In [None]:
!ilab taxonomy diff

## Generating the data

docs: https://docs.instructlab.ai/adding-data-to-model/creating_new_knowledge_or_skills/

In [None]:
# Remove existing datasets folder if exists
!rm -rf {WORKSPACE}/datasets

# Generating the synthetic data
!ilab data generate \
--taxonomy-path {WORKSPACE}/taxonomy \
--output-dir {WORKSPACE}/datasets \
--model /root/.cache/instructlab/models/instructlab/merlinite-7b-lab \
--pipeline simple \
--gpus 1