# Part 2: Execute a simple ML model - Further pipeline configuration

## Import libraries

In [None]:
from craft_ai_sdk import CraftAiSdk
import dotenv
import os
import pandas as pd
from sklearn import datasets

dotenv.load_dotenv()

## Load environnement variables

In [None]:
CRAFT_AI_SDK_TOKEN = os.environ.get("CRAFT_AI_SDK_TOKEN")
CRAFT_AI_ENVIRONMENT_URL = os.environ.get("CRAFT_AI_ENVIRONMENT_URL")

## SDK instantiation

In [None]:
sdk = CraftAiSdk(sdk_token=CRAFT_AI_SDK_TOKEN, environment_url=CRAFT_AI_ENVIRONMENT_URL)

## Clean Previous part

We can start by cleaning the objects we created in the hello world use case.

To do so we can simply use the `delete_pipeline` function of the sdk. 

In [None]:
sdk.delete_pipeline(pipeline_name="part-1-hello-world")

## Upload dataset to Data Store

This use case uses the famous Iris dataset.

With the Craft.AI platform, your environment comes with computational resources and file storage. That's what we call the **data store**.

You can upload and download files and organize them using the SDK.

We will start by uploading this dataset to the data store using the `upload_data_store_object` function of the sdk. 

You have to pass two arguments:
- `filepath_or_buffer` : path of the file to be uploaded or a file-like object
- `object_path_in_datastore`: path to save the file to

You can find further information in the SDK documentation.

In the following cell we first load the iris dataset from sklearn, then write it to the disk before uploading it to the data store. We then remove the file from the disk. You could avoid writing the dataset to the disk using file-like object.

In [None]:
iris = datasets.load_iris(as_frame=True)
iris_df = pd.concat([iris.data, iris.target], axis=1)
iris_df.to_parquet("iris.parquet")

sdk.upload_data_store_object(
    filepath_or_buffer="iris.parquet",
    object_path_in_datastore="get_started/dataset/iris.parquet",
)

os.remove("iris.parquet")

We can also check all the objects contained in the datastore using the `list_data_store_objects` function.

In [None]:
sdk.list_data_store_objects()

And get information about a specific item with the `get_data_store_object_information` function.

In [None]:
sdk.get_data_store_object_information("get_started/dataset/iris.parquet")

## Pipeline creation with the SDK

### Create a pipeline

Now, it's time to create the **pipeline** embedding our code. 

We will do exactly what we have done previously in the *Hello_world* section, but this time we will use more advanced options.

The argument `container_config` can contain other many things to parametrize our pipeline. 
Here we will focus on one new specific parameter:
- `requirements_path`: in order for our `requirements.txt` file to be taken into account and therefore to onboard all necessary librairies in the pipeline, we add the relative path of this file in the `container_config`.

You can find further information and configuration settings in the SDK documentation.

The `requirements_path` can also be **specified by default** directly in your project, on the platform (in the Settings of your project). If you specify this argument in your pipeline creation, it will take it first into account before checking the values set at the project level.


In [None]:
sdk.create_pipeline(
    pipeline_name="part-2-iristrain",
    function_path="src/part-2-iris-train.py",
    function_name="trainIris",
    description="This function creates a classifier model for iris and makes prediction on test data set",
    container_config={
        "local_folder": "../../get_started",
        "requirements_path": "requirements.txt",
    },
)

## Run your Pipeline

In [None]:
sdk.run_pipeline(pipeline_name="part-2-iristrain")

## Check model creation

We can check the creation of the model by investigating the data store.

Using the previously introduced `get_data_store_object_information`, we can easily verify that the model has been created and well uploaded.

In [None]:
sdk.get_data_store_object_information("get_started/models/iris_knn_model.joblib")

Finally, to clean the datastore, we use the `delete_data_store_object` function.

In [None]:
# sdk.delete_data_store_object("get_started/models/iris_knn_model.joblib")
# sdk.delete_data_store_object("get_started/dataset/iris.parquet")