# What is MLRun and Why It Matters

MLRun is an open-source MLOps orchestration framework that integrates feature stores, model training, deployment, and monitoring into a single, composable environment. It’s Kubernetes-native and designed for real-time and batch ML pipelines with traceability and governance baked in.

In [1]:
import mlrun

In [2]:
# Show the API server URL
mlrun.get_run_db()

HTTPRunDB('http://dragon.local:30070')

In [3]:
# Set the base project name
project_name = "mlrun-demo"

# Initialize the MLRun project object
project = mlrun.get_or_create_project(
    name=project_name, 
    context="./",
    user_project=True)

# Display the current project name
project_name = project.metadata.name
print(f'Full project name: {project_name}')

> 2025-07-31 10:33:02,340 [info] Project loaded successfully: {"project_name":"mlrun-demo-johannes"}
Full project name: mlrun-demo-johannes


## 1. FeatureSet Ingest

- https://docs.mlrun.org/en/latest/feature-store/feature-sets.html
- https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-2/

In [4]:
import pandas as pd
import mlrun.feature_store as fstore
from mlrun.feature_store import FeatureSet
from mlrun.datastore import ParquetTarget

In [5]:
# read the source data from the CSV file
df_source = pd.read_csv("data/iris.csv")

# create a str primary key for the feature set
df_source.reset_index(drop=False, inplace=True)
df_source.rename(columns={"index": "id"}, inplace=True)
df_source["id"] = df_source["id"].astype(str)


df_source.head()

Unnamed: 0,id,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,target,label
0,0,5.1,3.5,1.4,0.2,0,setosa
1,1,4.9,3.0,1.4,0.2,0,setosa
2,2,4.7,3.2,1.3,0.2,0,setosa
3,3,4.6,3.1,1.5,0.2,0,setosa
4,4,5.0,3.6,1.4,0.2,0,setosa


In [6]:
# create the feature set
fs_iris = FeatureSet(name="iris_features",
                     entities=["id"])

# # Add a local Parquet target
# fs_iris.set_targets([ParquetTarget(path=project.artifact_path)], with_defaults=False)

# ingest the source data
fs_iris.ingest(df_source)
# df_iris = fstore.ingest(featureset=fs_iris,
#                         source=df_source)

# create the dataset
fv_iris = fstore.FeatureVector(name="iris_vector",
                                   features=["iris_features.*"], 
                                   label_feature="iris_features.label",
                                   with_indexes=True)
fv_iris.save()

In [7]:
# # Delete a feature set by name and project
# fstore.delete_feature_set(name="iris_features",
#                           project=project_name,
#                           force=True)


In [8]:
## Retrieve the feature set
print(f"Retrieving the feature set from:\n{fv_iris.uri}")

offline_features = fstore.get_feature_vector(fv_iris.uri).get_offline_features()
offline_features.to_dataframe().head()

Retrieving the feature set from:
store://feature-vectors/mlrun-demo-johannes/iris_vector


Unnamed: 0_level_0,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,target,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


## 2. Register and Run Training

- https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-3/

In [9]:
# create the function for training the model
fn_train = project.set_function(
    func="01_train.py",
    name="train",
    kind="job",
    image="mlrun/mlrun")

In [10]:
# run the training function
run = fn_train.run(
    inputs={"dataset": fv_iris.uri},
    handler="train_model",
    artifact_path=project.artifact_path,
    local=False)

> 2025-07-31 10:33:04,313 [info] Storing function: {"db":"http://dragon.local:30070","name":"train-train-model","uid":"df7dbe5101cc47e7af5f0bf4f03784ef"}
> 2025-07-31 10:33:04,604 [info] Job is running in the background, pod: train-train-model-xmj2z
Training model...
> 2025-07-31 08:35:45,231 [error] Execution error, Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/mlrun/runtimes/local.py", line 506, in exec_from_params
    val = mlrun.handler(
  File "/opt/conda/lib/python3.9/site-packages/mlrun/package/__init__.py", line 137, in wrapper
    func_outputs = func(*args, **kwargs)
  File "01_train.py", line 11, in train_model
    data_uri = str(context.get_input("dataset"))
  File "/opt/conda/lib/python3.9/site-packages/mlrun/execution.py", line 549, in get_input
    return self._data_stores.object(
  File "/opt/conda/lib/python3.9/site-packages/mlrun/datastore/datastore.py", line 197, in object
    meta, url = self.get_store_artifact(
  File "/opt/conda/

project,uid,iter,start,end,state,kind,name,labels,inputs,parameters,results
mlrun-demo-johannes,...3784ef,0,Jul 31 08:35:44,2025-07-31 08:35:45.238275+00:00,error,run,train-train-model,v3io_user=johanneskind=jobowner=johannesmlrun/client_version=1.9.1mlrun/client_python_version=3.9.23host=train-train-model-xmj2z,dataset,,





> 2025-07-31 10:35:49,177 [info] Run execution finished: {"name":"train-train-model","status":"error"}
> 2025-07-31 10:35:49,180 [error] Run did not finish successfully: {"state":"error","status":{"end_time":"2025-07-31T08:35:45.238275+00:00","error":"Resource store://feature-vectors/mlrun-demo-johannes/iris_vector does not have a valid/persistent offline target","last_update":"2025-07-31T08:35:45.511000+00:00","start_time":"2025-07-31T08:35:44.896000+00:00","state":"error"}}


RunError: Resource store://feature-vectors/mlrun-demo-johannes/iris_vector does not have a valid/persistent offline target