# What is MLRun and Why It Matters

MLRun is an open-source MLOps orchestration framework that integrates feature stores, model training, deployment, and monitoring into a single, composable environment. It’s Kubernetes-native and designed for real-time and batch ML pipelines with traceability and governance baked in.

In [7]:
import mlrun

In [8]:
# Show the API server URL
mlrun.get_run_db()

HTTPRunDB('http://dragon:30070')

In [9]:
# Set the base project name
project_name = "mlrun-demo"

# Initialize the MLRun project object
project = mlrun.get_or_create_project(
    name=project_name, 
    context="./",
    user_project=True)

# Display the current project name
project_name = project.metadata.name
print(f'Full project name: {project_name}')

> 2025-07-15 12:51:32,220 [info] Project loaded successfully: {"project_name":"mlrun-demo-johannes"}
Full project name: mlrun-demo-johannes


## 1. FeatureSet Ingest

- https://docs.mlrun.org/en/latest/feature-store/feature-sets.html
- https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-2/

In [10]:
import pandas as pd
import mlrun.feature_store as fstore
from mlrun.feature_store import FeatureSet
from mlrun.datastore import ParquetTarget

In [11]:
# read the source data from the CSV file
df_source = pd.read_csv("data/iris.csv")

# create a str primary key for the feature set
df_source.reset_index(drop=False, inplace=True)
df_source.rename(columns={"index": "id"}, inplace=True)
df_source["id"] = df_source["id"].astype(str)


df_source.head()

Unnamed: 0,id,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,target,label
0,0,5.1,3.5,1.4,0.2,0,setosa
1,1,4.9,3.0,1.4,0.2,0,setosa
2,2,4.7,3.2,1.3,0.2,0,setosa
3,3,4.6,3.1,1.5,0.2,0,setosa
4,4,5.0,3.6,1.4,0.2,0,setosa


In [12]:
# create the feature set
fs_iris = FeatureSet(name="iris_features",
                     entities=["id"])

# # Add a local Parquet target
# fs_iris.set_targets([ParquetTarget(path=project.artifact_path)], with_defaults=False)

# ingest the source data
df_iris = fstore.ingest(featureset=fs_iris,
                        source=df_source)

# create the dataset
fv_iris = fstore.FeatureVector(name="iris_vector",
                                   features=["iris_features.*"], 
                                   label_feature="iris_features.label",
                                   with_indexes=True)
fv_iris.save()

In [13]:
# # Delete a feature set by name and project
# fstore.delete_feature_set(name="iris_features",
#                           project=project_name,
#                           force=True)


In [14]:
## Retrieve the feature set
print(f"Retrieving the feature set from:\n{fv_iris.uri}")

offline_features = fstore.FeatureVector.get_offline_features(fv_iris.uri, 
                                                             target=ParquetTarget())

offline_features.to_dataframe().head()

Retrieving the feature set from:
store://feature-vectors/mlrun-demo-johannes/iris_vector
> 2025-07-15 12:51:34,515 [info] wrote target: {'partitioned': True, 'size': 7445, 'name': 'parquet', 'kind': 'parquet', 'updated': '2025-07-15T10:51:34.515607+00:00', 'status': 'ready', 'path': 's3://mlrun/projects/mlrun-demo-johannes/FeatureStore/iris_vector/parquet/vectors/iris_vector-latest.parquet'}


Unnamed: 0_level_0,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,target,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


## 2. Register and Run Training

- https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-3/

In [15]:
# create the function for training the model
fn_train = project.set_function(
    func="01_train.py",
    name="train",
    kind="job",
    image="mlrun/mlrun")

In [16]:
# run the training function
run = fn_train.run(
    inputs={"dataset": fv_iris.uri},
    handler="train_model",
    artifact_path=project.artifact_path,
    local=False,)

> 2025-07-15 12:51:34,673 [info] Storing function: {"db":"http://dragon:30070","name":"train-train-model","uid":"854cd17016ac43be8b0ddd3aeb5e6420"}
> 2025-07-15 12:51:34,745 [info] Job is running in the background, pod: train-train-model-6wtw7
Training model...
s3://mlrun/projects/mlrun-demo-johannes/FeatureStore/iris_vector/parquet/vectors/iris_vector-latest.parquet
  id  sepal_length_cm  sepal_width_cm  ...  petal_width_cm  target   label
0  0              5.1             3.5  ...             0.2       0  setosa
1  1              4.9             3.0  ...             0.2       0  setosa
2  2              4.7             3.2  ...             0.2       0  setosa
3  3              4.6             3.1  ...             0.2       0  setosa
4  4              5.0             3.6  ...             0.2       0  setosa

[5 rows x 7 columns]
> 2025-07-15 10:53:54,256 [info] To track results use the CLI: {"info_cmd":"mlrun get run 854cd17016ac43be8b0ddd3aeb5e6420 -p mlrun-demo-johannes","logs_cmd":

project,uid,iter,start,state,kind,name,labels,inputs,parameters,results
mlrun-demo-johannes,...5e6420,0,Jul 15 10:53:53,completed,run,train-train-model,v3io_user=johanneskind=jobowner=johannesmlrun/client_version=1.7.2mlrun/client_python_version=3.9.23host=train-train-model-6wtw7,dataset,,





> 2025-07-15 12:54:02,023 [info] Run execution finished: {"name":"train-train-model","status":"completed"}
