# Part 4: Score the trained model



Microsoft Fabric allows you to operationalize machine learning models with a scalable function called PREDICT, which supports batch scoring in any compute engine. You can generate batch predictions directly from a Microsoft Fabric notebook or from a given model's item page. Learn about [PREDICT](https://aka.ms/fabric-predict).  

To generate batch predictions on our test dataset, you'll use version 1 of the trained churn model. You'll load the test dataset into a spark DataFrame and create an MLFlowTransformer object to generate batch predictions. You can then invoke the PREDICT function using one of following three ways: 

- Using the Transformer API from SynapseML
- Using the Spark SQL API
- Using PySpark user-defined function (UDF)

## Prerequisites

- Complete [Part 3: Train and register machine learning models](https://learn.microsoft.com/fabric/data-science/tutorial-data-science-train-models).
- Attach the same lakehouse you used in Part 3 to this notebook.

In [1]:
!pip install scikit-learn==1.6.1

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 3, Finished, Available, Finished)

Collecting scikit-learn==1.6.1
  Downloading scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn==1.6.1)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m125.6 MB/s[0m eta [36m0:00:00[0m00:01[0m:01[0m
[?25hDownloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, scikit-learn
  Attempting uninstall: threadpoolctl
    Found existing installation: threadpoolctl 2.2.0
    Uninstalling threadpoolctl-2.2.0:
      Successfully uninstalled threadpoolctl-2.2.0
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.2.2
    Uninstalling scikit-learn-1.2.2:
      Successfully uninstalled scikit-learn-1.2.2
Successfully instal

## Load the test data

Load the test data that you saved in Part 3.

In [2]:
df_test = spark.read.format("delta").load("Tables/df_test")
display(df_test)

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 4, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 8fa9da18-0b5c-49ea-af0e-7b8ededdb3fc)

### PREDICT with the Transformer API

To use the Transformer API from SynapseML, you'll need to first create an MLFlowTransformer object.

### Instantiate MLFlowTransformer object

The MLFlowTransformer object is a wrapper around the MLFlow model that you registered in Part 3. It allows you to generate batch predictions on a given DataFrame. To instantiate the MLFlowTransformer object, you'll need to provide the following parameters:

- The columns from the test DataFrame that you need as input to the model (in this case, you would need all of them).
- A name for the new output column (in this case, predictions).
- The correct model name and model version to generate the predictions (in this case, `lgbm_sm` and version 1).

In [3]:
from synapse.ml.predict import MLFlowTransformer

model = MLFlowTransformer(
    inputCols=list(df_test.columns),
    outputCol='predictions',
    modelName='lgbm_sm',
    modelVersion=1
)

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 5, Finished, Available, Finished)

Downloading artifacts:   0%|          | 0/9 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/9 [00:00<?, ?it/s]

 - scikit-learn (current: 1.6.1, required: scikit-learn==1.7.2)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 10, Finished, Available, Finished)

Now that you have the MLFlowTransformer object, you can use it to generate batch predictions.

In [4]:
import pandas

predictions = model.transform(df_test)
display(predictions)

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 6, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 471e3d2e-d3b1-4349-a1c0-ae3a1b33dd23)

### PREDICT with the Spark SQL API

In [5]:
from pyspark.ml.feature import SQLTransformer 

# Substitute "model_name", "model_version", and "features" below with values for your own model name, model version, and feature columns
model_name = 'lgbm_sm'
model_version = 1
features = df_test.columns

sqlt = SQLTransformer().setStatement( 
    f"SELECT PREDICT('{model_name}/{model_version}', {','.join(features)}) as predictions FROM __THIS__")

# Substitute "X_test" below with your own test dataset
display(sqlt.transform(df_test))

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 7, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 780fd0d5-1b5b-4044-816b-95ceee4c0f11)

### PREDICT with a user-defined function (UDF)

In [6]:
from pyspark.sql.functions import col, pandas_udf, udf, lit

# Substitute "model" and "features" below with values for your own model name and feature columns
my_udf = model.to_udf()
features = df_test.columns

display(df_test.withColumn("predictions", my_udf(*[col(f) for f in features])))

StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 8, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 8a51e4f8-18f1-4b26-a4e5-9d52ac6a2385)

## Write model prediction results to the lakehouse

Once you have generated batch predictions, write the model prediction results back to the lakehouse.  

In [7]:
# Save predictions to lakehouse to be used for generating a Power BI report
table_name = "df_test_with_predictions_v1"
predictions.write.format('delta').mode("overwrite").save(f"Tables/{table_name}")
print(f"Spark DataFrame saved to delta table: {table_name}")


StatementMeta(, 52fb64a6-31ba-4ca5-b510-fde8562028c0, 9, Finished, Available, Finished)

Spark DataFrame saved to delta table: df_test_with_predictions_v1


## Next step

Use these predictions you just saved to [create a report in Power BI](https://learn.microsoft.com/fabric/data-science/tutorial-data-science-create-report).