<a href="https://colab.research.google.com/github/Vermeulen321/oumi_public/blob/main/notebooks/Oumi%20-%20A%20Tour.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
</div>

👋 Welcome to Open Universal Machine Intelligence (Oumi)!

🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](hhttps://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

# A Tour of Oumi

This tutorial will give you a brief overview of Oumi's core functionality. We'll cover:

1. Training a model
1. Performing model inference
1. Evaluating a model against common benchmarks
1. Launching jobs
1. Customizing datasets and clouds

# 📋 Prerequisites
## Oumi Installation

First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html).

If you have a GPU, you can run the following commands to install Oumi:


In [None]:
%pip install uv -q
!uv pip install oumi --no-progress --system

Note: you may need to restart the kernel to use updated packages.
[2mUsing Python 3.11.8 environment at: /Users/oussamaelachqar/miniconda3/envs/oumi[0m
[2mAudited [1m1 package[0m [2min 15ms[0m[0m


❗**WARNING:** After the first `pip install`, you may have to restart the notebook for the package updates to take effect (Colab Menu: `Runtime` -> `Restart Session`).

In [1]:
import os
from pathlib import Path

tutorial_dir = "tour_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)
os.environ["TOKENIZERS_PARALLELISM"] = "false"  # Disable warnings from HF.

# ⚒️ Training a Model

Oumi supports training both custom and out-of-the-box models. Want to try out a model on HuggingFace? You can do that. Want to train your own custom Pytorch model? No problem.

## A Quick Demo

Let's try training a pre-existing model on HuggingFace. We'll use SmolLM2 135M as it's small and trains quickly.

Oumi uses [training configuration files](https://oumi.ai/docs/en/latest/api/oumi.core.configs.html#oumi.core.configs.TrainingConfig) to specify training parameters. We've already created a training config for SmolLM2 — let's give it a try!

In [2]:
yaml_content = f"""
model:
  model_name: "HuggingFaceTB/SmolLM2-135M-Instruct"
  torch_dtype_str: "bfloat16"
  trust_remote_code: True

data:
  train:
    datasets:
      - dataset_name: "yahma/alpaca-cleaned"
    target_col: "prompt"

training:
  trainer_type: "TRL_SFT"
  per_device_train_batch_size: 2
  max_steps: 10 # Quick "mini" training, for demo purposes only.
  run_name: "smollm2_135m_sft"
  output_dir: "{tutorial_dir}/output"
"""

with open(f"{tutorial_dir}/train.yaml", "w") as f:
    f.write(yaml_content)

In [3]:
from oumi.core.configs import TrainingConfig
from oumi.train import train

config = TrainingConfig.from_yaml(str(Path(tutorial_dir) / "train.yaml"))

train(config)

[2025-02-04 05:45:07,571][oumi][rank0][pid:5475][MainThread][INFO]][torch_utils.py:66] Torch version: 2.4.1+cu121. NumPy version: 1.26.4
[2025-02-04 05:45:07,574][oumi][rank0][pid:5475][MainThread][INFO]][torch_utils.py:68] CUDA is not available!
[2025-02-04 05:45:07,583][oumi][rank0][pid:5475][MainThread][INFO]][train.py:133] Oumi version: 0.1.3
[2025-02-04 05:45:07,591][oumi][rank0][pid:5475][MainThread][INFO]][train.py:174] TrainingConfig:
TrainingConfig(data=DataParams(train=DatasetSplitParams(datasets=[DatasetParams(dataset_name='yahma/alpaca-cleaned',
                                                                                dataset_path=None,
                                                                                subset=None,
                                                                                split='train',
                                                                                dataset_kwargs={},
                                                  

max_steps is given, it will override any value given in num_train_epochs


[2025-02-04 05:45:13,852][oumi][rank0][pid:5475][MainThread][INFO]][device_utils.py:283] GPU Metrics Before Training: GPU runtime info: None.
[2025-02-04 05:45:13,855][oumi][rank0][pid:5475][MainThread][INFO]][train.py:312] Training init time: 6.284s
[2025-02-04 05:45:13,860][oumi][rank0][pid:5475][MainThread][INFO]][train.py:313] Starting training... (TrainerType.TRL_SFT, transformers: 4.45.2)


Step,Training Loss


Step,Training Loss


[2025-02-04 06:56:38,965][oumi][rank0][pid:5475][MainThread][INFO]][train.py:320] Training is Complete.
[2025-02-04 06:56:38,972][oumi][rank0][pid:5475][MainThread][INFO]][device_utils.py:283] GPU Metrics After Training: GPU runtime info: None.
[2025-02-04 06:56:38,976][oumi][rank0][pid:5475][MainThread][INFO]][train.py:327] Saving final state...
[2025-02-04 06:56:38,981][oumi][rank0][pid:5475][MainThread][INFO]][train.py:332] Saving final model...
[2025-02-04 06:56:40,816][oumi][rank0][pid:5475][MainThread][INFO]][hf_trainer.py:102] Model has been saved at tour_tutorial/output
[2025-02-04 06:56:40,822][oumi][rank0][pid:5475][MainThread][INFO]][train.py:339] 

» We're always looking for feedback. What's one thing we can improve? https://oumi.ai/feedback


Congratulations, you've trained your first model using Oumi!

You can also train your own custom Pytorch model. We cover that in depth in our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb).

# 🧠 Model Inference

Now that you've trained a model, let's run inference.

In [4]:
yaml_content = f"""
model:
  model_name: "{tutorial_dir}/output"
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 128
  batch_size: 1
"""

with open(f"{tutorial_dir}/infer.yaml", "w") as f:
    f.write(yaml_content)

In [5]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))

input_text = (
    "Remember that we didn't train for long, so the results might not be great."
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-02-04 07:02:21,714][oumi][rank0][pid:5475][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-02-04 07:02:21,719][oumi][rank0][pid:5475][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-02-04 07:02:22,745][oumi][rank0][pid:5475][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`


Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


conversation_id=None messages=[USER: Remember that we didn't train for long, so the results might not be great., ASSISTANT: I'm sorry for the inconvenience, but as a chatbot, I don't have the ability to access or process data from external sources. I'm designed to provide information and guidance based on the information I receive from users. I'm designed to be helpful and informative, but I don't have the capability to access or process data from external sources. If you have any specific questions or need help with a particular topic, feel free to ask.] metadata={}


We can also run inference using the pretrained model by slightly tweaking our config:

In [6]:
base_model_config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))
base_model_config.model.model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"

input_text = "Input for the pretrained model: What is your name? "

results = infer(config=base_model_config, inputs=[input_text])

print(results[0])

[2025-02-04 07:21:13,116][oumi][rank0][pid:5475][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-02-04 07:21:13,245][oumi][rank0][pid:5475][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-02-04 07:21:14,272][oumi][rank0][pid:5475][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`
conversation_id=None messages=[USER: Input for the pretrained model: What is your name? , ASSISTANT: My name is Alex Chen. I'm a data scientist and AI assistant, trained on a vast dataset of text data, which I use to train my models for various tasks.] metadata={}


# 📊 Evaluating a Model against Common Benchmarks

You can use Oumi to evaluate pretrained and tuned models against standard benchmarks. For example, let's evaluate our tuned model against `Hellaswag`:

In [7]:
yaml_content = f"""
model:
  model_name: "{tutorial_dir}/output"
  torch_dtype_str: "bfloat16"

tasks:
  - evaluation_platform: lm_harness
    task_name: mmlu_college_computer_science

generation:
  batch_size: null # This will let LM HARNESS find the maximum possible batch size.
output_dir: "{tutorial_dir}/output/evaluation"
"""

with open(f"{tutorial_dir}/eval.yaml", "w") as f:
    f.write(yaml_content)

In [9]:
from oumi.core.configs import EvaluationConfig
from oumi.evaluate import evaluate

eval_config = EvaluationConfig.from_yaml(str(Path(tutorial_dir) / "eval.yaml"))

# Uncomment the following line to run evals against the V1 HuggingFace Leaderboard.
# This may take a while.
# eval_config.data.datasets[0].dataset_name = "huggingface_leaderboard_v1"

evaluate(eval_config)

AttributeError: 'EvaluationConfig' object has no attribute 'data'

# ☁️ Launching Jobs

Oftentimes you'll need to run various tasks (training, evaluation, etc.) on remote hardware that's better suited for the task. Oumi can handle this for you by launching jobs on various compute clusters. For more information about running jobs, see our [Running Jobs Remotely tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Running%20Jobs%20Remotely.ipynb). For running jobs on custom clusters, see our [Launching Jobs on Custom Clusters tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).


Today, Oumi supports running jobs on several cloud provider platforms.

For the latest list, we can run the `which_clouds` method:

In [10]:
import oumi.launcher as launcher

print("Supported Clouds in Oumi:")
for cloud in launcher.which_clouds():
    print(cloud)

Supported Clouds in Oumi:
local
polaris
runpod
gcp
lambda
aws
azure


Let's run a simple "Hello World" job locally to demonstrate how to use the Oumi job launcher. This job will echo `Hello World`, then run the same training job executed above. Running this job on a cloud provider like GCP simply involves changing the `cloud` field.

In [11]:
yaml_content = f"""
name: hello-world
resources:
  cloud: local

working_dir: .

envs:
  TEST_ENV_VARIABLE: '"Hello, World!"'
  OUMI_LOGGING_DIR: "{tutorial_dir}/logs"

run: |
  echo "$TEST_ENV_VARIABLE"
  oumi train -c {tutorial_dir}/train.yaml
"""

with open(f"{tutorial_dir}/job.yaml", "w") as f:
    f.write(yaml_content)

In [12]:
import time

job_config = launcher.JobConfig.from_yaml(str(Path(tutorial_dir) / "job.yaml"))
cluster, job_status = launcher.up(job_config, cluster_name=None)

while job_status and not job_status.done:
    print("Job is running...")
    time.sleep(15)
    job_status = cluster.get_job(job_status.id)
print("Job is done!")

Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is run

The job created logs under our tutorial directory. Let's take a look at the directory:

In [13]:
logs_dir = f"{tutorial_dir}/logs"
os.listdir(logs_dir)

['2025_02_04_11_03_17_652_0.stderr', '2025_02_04_11_03_17_652_0.stdout']

Now let's parse the logfiles.

In [14]:
for log_file in Path(logs_dir).iterdir():
    print(f"Log file: {log_file}")
    with open(log_file) as f:
        print(f.read())

Log file: tour_tutorial/logs/2025_02_04_11_03_17_652_0.stderr
2025-02-04 11:03:37.336208: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1738667017.647543   82345 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738667017.733658   82345 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
max_steps is given, it will override any value given in num_train_epochs

  0%|          | 0/10 [00:00<?, ?it/s]
 10%|█         | 1/10 [06:37<59:36, 397.39s/it]
 20%|██        | 2/10 [13:25<53:49, 403.66s/it]
 30%|███       | 3/10 [19:03<43:36, 373.81s/it]
 40%|████      | 4/10 [27:56<43:39, 436.54s/it]
 50%|█████     | 5/10 [33:40<33:35, 403.17s/it]
 60%|██████    | 6/10 

# ⚙️ Customizing Datasets and Clusters

Oumi offers rich customization that allows users to build custom solutions on top of our existing building blocks. Several of Oumi's primary resources (Datasets, Clouds, etc.) leverage the Oumi Registry when invoked.

This registry allows users to build custom classes that function as drop-in replacements for core functionality.

For more details on registering custom datasets, see the [tutorial here](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Datasets%20Tutorial.ipynb).

For a tutorial on writing a custom cloud/cluster for running jobs, see the [tutorial here](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).

You can find further information about the required registry decorators [here](https://oumi.ai/docs/en/latest/api/oumi.core.registry.html#oumi.core.registry.register_cloud_builder).

# 🧭 What's Next?

Now that you've completed the basic tour, you're ready to tackle the other [notebook guides & tutorials](https://oumi.ai/docs/en/latest/get_started/tutorials.html).

If you have not already, make sure to take a look at the [Quickstart](https://oumi.ai/docs/en/latest/get_started/quickstart.html) for an overview of our CLI.