### e-SparX Starter Notebook

Hey there! This notebook will show you how to build a simple e-SparX pipeline.

In [None]:
# import packages (make sure you've installed pandas)
import esparx
import pandas as pd

Let us assume, we have downloaded a raw data file from https://www.epe.ed.tum.de/emt/startseite/. Let us register the dataset in e-SparX. Please choose all artifact names yourself, as in the current e-SparX implementation, each artifact name can only exist once. Also choose a pipeline name (which also must not exist yet).

In [None]:
raw_data_file_name = "x" # pick your name here (as string)
my_pipeline_name = "p" # pick your pipeline name here (as string)

Let's register the raw files in e-SparX! We have to methods to register data artifacts: `register_dataset_pandas` (for `pd.DataFrame` datasets) and `register_dataset_free` (for all remaining datasets). So we will use `register_dataset_free` for our raw files.

We want to create a pipeline where the artifact will belong to. This will happen automatically, whenever we mention the pipeline for the first time during registering an artifact.

In [None]:
esparx.register_dataset_free(
    name=raw_data_file_name,
    description="This is my first dataset artifact. It's actually a mock artifact, so don't get too excited.",
    file_type="ZIP",
    source_url="https://www.epe.ed.tum.de/emt/startseite/",
    pipeline_name=my_pipeline_name,
)

You should now already be able to find your pipeline [here](http://10.152.14.197:3030/)!

Next, assume that you're doing some data processing within this Jupyter Notebook. Your outcome will be two different processed datasets.

In e-SparX, we want to register this processing notebook and the two processed datasets. Let's go.

In [None]:
# please give all artifacts your personal name of choice
processing_script_name = "a" # pick your notebook name here (as string)
processed_dataset_1_name = "b" # pick your name here (as string)
processed_dataset_2_name = "c" # pick your name here (as string)

In [None]:
# assume some processing happens here

In [None]:
# register script in e-SparX using 'register_code'
esparx.register_code(
    name=processing_script_name,
    description="Some fancy processing happened here (or not really).",
    file_type="IPYNB",
    pipeline_name=my_pipeline_name,
    source_name=raw_data_file_name,
)

Note that each register methods supports to specify a `source_name`. This enables you to specify the connectivity of your artifact. In this case, the raw data is the source artifact for your processing notebook. Go ahead and see how that looks in your pipeline online!

Also, did you see that when passing the pipeline name, e-SparX did not create the pipeline again, but rather found it and just connected the artifact to it?

In general, you're save running a comman again, e-SparX will recognize what's new and what's not new.

Finally, let's address the finished datasets. This time, we will assume these are `pd.DataFrames`.

In [None]:
mock_df_1 = pd.DataFrame({"A": [1.2, 2.0, 3.5], "B": [4.1, 5.0, 6.1]})
mock_df_2 = pd.DataFrame({"C": [7.3, 8.3, 9.3], "D": [10.2, 11.2, 12.2]})

In [None]:
esparx.register_dataset_pandas(
    name=processed_dataset_1_name,
    description="This is my first processed dataset artifact.",
    file_type="none",
    df=mock_df_1, # pass the df here
    pipeline_name=my_pipeline_name,
    source_name=processing_script_name,
)
esparx.register_dataset_pandas(
    name=processed_dataset_2_name,
    description="This is my second processed dataset artifact.",
    file_type="none",
    df=mock_df_2, # pass the df here
    pipeline_name=my_pipeline_name,
    source_name=processing_script_name,
)

Note that for `register_dataset_pandas`, we actually pass the `pd.DataFrame`. Go to the web and click on the artifact! You will see that e-SparX stored and displays some key information automatically for `pd.DataFrames`.

Finally, let's clean up our mock example. You can only delete artifacts which you created yourself. You can only delete pipelines, when they are entirely empty and you created them in the first place.

In [None]:
esparx.delete_artifact(raw_data_file_name)
esparx.delete_artifact(processing_script_name)
esparx.delete_artifact(processed_dataset_1_name)
esparx.delete_artifact(processed_dataset_2_name)

In [None]:
esparx.delete_pipeline(my_pipeline_name)

Well done! Now you're good to go to grap some last information from the `README` and explore some full-blown usecases in the `usecases` folder. We're excited to see your first ML pipelines soon!