# Getting Started with Taipy Core on Notebooks

!!! important "Supported Python versions"

    Taipy requires **Python 3.8** or newer.

Welcome to the **Getting Started** guide for Taipy Core. This tour shows you how to use Taipy Core to orchestrate pipelines. Taipy Core implements a modern backend for any data-driven application based on your business case.

<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_00/imd_end_interface.png width=700>
</div>

# Taipy Core

Taipy Core is one of the components of Taipy to facilitate pipeline orchestration. There are a lot of reasons for using Taipy Core:

- Taipy Core efficiently manages the execution of your functions/pipelines.

- Taipy Core manages data sources and monitors KPIs.

- Taipy Core provides easy management of multiple pipelines and end-user scenarios, which comes in handy in the context of Machine Learning or Mathematical optimization.

To apprehend the Scenario Management aspect of Taipy, you need to understand four essential concepts.

Each step of the **"Getting Started"** will focus on basic concepts of *Taipy*. Note that every step is dependent on 
the code of the previous one. After completing the last step, you will have the skills to develop your own Taipy 
application. 

## Before we begin

Only Taipy has to be installed. **Taipy** package requires Python 3.8 or newer;



In [0]:
# !pip install taipy


## Using Notebooks


# Configuration and execution
## Four fundamental concepts in Taipy Core:
- Data Nodes: are the translation of variables in Taipy. Data Nodes don't contain the data but know how to retrieve it. They can refer to any data: any Python object (string, int, list, dict, model, dataframe, etc), a Pickle file, a CSV file, an SQL database, etc. They know how to read and write data. You can even write your own custom Data Node if needed to access a particular data format.

- Tasks: are the translation of functions in Taipy.

- Pipelines: are a list of tasks executed with intelligent scheduling created automatically by Taipy. They usually represent a sequence of Tasks/functions corresponding to different algorithms like a simple baseline Algorithm or a more sophisticated Machine-Learning pipeline.

- Scenarios: End-Users very often require modifying various parameters to reflect different business situations. Taipy Scenarios will provide the framework to "play"/"execute" pipelines under different conditions/variations (i.e., data/parameters modified by the end user)


## What is a configuration?

Configuration is the structure or model of what is our scenario. It represents our Direct Acyclic Graph but also how we want our data to be stored or how our code is run. Taipy is able to create multiple instances of this structure with different data thus, we need a way to define it through this configuration step.


Let's create our first configuration and then create our entities to submit through Taipy Studio or direct Python Code.



In [1]:
from taipy import Config
import taipy as tp

# Normal function used by Taipy
def double(nb):
    return nb * 2



<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_01/config_01.svg width=700>
</div>

- Two Data Nodes are being configured ('input' and 'output'). The 'input' Data Node has a _default_data_ put at 21. They will be stored as Pickle files by default, and are unique to their scenario.

- The task links the two Data Nodes through the Python function _double_.

- The pipeline will contain this one task, and the scenario will contain this one pipeline.

<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_01/config_01.gif width=700>
</div>


=== "Python configuration"

Here is the code to configure a simple scenario.



In [2]:
# Configuration of Data Nodes
input_data_node_cfg = Config.configure_data_node("input", default_data=21)
output_data_node_cfg = Config.configure_data_node("output")

# Configuration of tasks
task_cfg = Config.configure_task("double",
                                 double,
                                 input_data_node_cfg,
                                 output_data_node_cfg)

# Configuration of the pipeline and scenario
pipeline_cfg = Config.configure_pipeline("my_pipeline", [task_cfg])
scenario_cfg = Config.configure_scenario("my_scenario", [pipeline_cfg])


In [3]:
# Run of the Core
tp.Core().run()

# Creation of the scenario and execution
scenario = tp.create_scenario(scenario_cfg)
tp.submit(scenario)

print("Value at the end of task", scenario.output.read())


Results:
```
[2022-12-22 16:20:02,740][Taipy][INFO] job JOB_double_699613f8-7ff4-471b-b36c-d59fb6688905 is completed.
Value at the end of task 42
```


# Basic functions

Let's discuss the basic functions that come along with Taipy.

- `<Data Node>.write(<new value>)`: this is how data can be changed through Taipy. _write_ will change the _last_edit_date_ of the data node, which will influence if a task can be skipped or not.

-`tp.get_scenarios()`: this function returns a list of all the scenarios

-`tp.get(<Taipy object ID>)`: this function returns an entity based on the id of the entity

-`tp.delete(<Taipy object ID>)`: this function deletes the entity and nested elements based on the id of the entity

## Utility of having scenarios

Taipy lets the user create multiple instances of the same configuration. Data can differ between instances and can be used to compare different scenarios.

Data can naturally differ depending on the input Data Nodes or the randomness of functions. Moreover, the user can change them with the _write_ function.

<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_02/config_02.svg width=700>
</div>



In [4]:
scenario = tp.create_scenario(scenario_cfg, name="Scenario")
tp.submit(scenario)
print("First submit", scenario.output.read())


Results:
```
[2022-12-22 16:20:02,874][Taipy][INFO] job JOB_double_a5ecfa4d-1963-4776-8f68-0859d22970b9 is completed.
First submit 42
```

## _write_ function

Using _write_, data of a Data Node can be changed. The syntax is `<Scenario>.<Pipeline>.<Data Node>.write(value)`. If there is just one pipeline, we can just write `<Scenario>.<Data Node>.write(value)`.




In [5]:
print("Before write", scenario.input.read())
scenario.input.write(54)
print("After write",scenario.input.read())



Results:
```
Before write 21
After write 54
```

The submission of the scenario will update the output values.




In [6]:
tp.submit(scenario)
print("Second submit",scenario.output.read())


Results:
```
[2022-12-22 16:20:03,011][Taipy][INFO] job JOB_double_7eee213f-062c-4d67-b0f8-4b54c04e45e7 is completed.
Second submit 108
```

## Other useful functions

- how to access all the scenarios



In [7]:
print([s.input.read() for s in tp.get_scenarios()])



Results:
```
[21, 54]
```

- get an entity from its id



In [8]:
scenario = tp.get(scenario.id)



- delete an entity though its id. Example: how to delete a scenario.



In [9]:
tp.delete(scenario.id)



# Different types of Data Nodes:

- *[Pickle](https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#pickle)* (default): Taipy can store and read anykind of data that can be serializable.

- *[CSV](https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#csv)*: Taipy can read and store any dataframe as a CSV.

- *[JSON](https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#json)*: Taipy can read and store any JSONable data as a JSON file.

- *[SQL](https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#sql)*: Taipy can read and store a table or data base.

- *[Generic](https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#generic)*: Taipy provides a generic Data Node that can read and store any data based on the reding and writing function created by the user.

The execution graph used to explain the different concepts is quite simple.

1) Three Data Nodes:
- _historical data_: initial CSV DataFrame
- _month_data_: DataFrame after the filtering on the month (a _Pandas.DataFrame_ as a Pickle file)
- _nb_of_values_: number of values in this month (int as a Pickle file)

2) Two tasks linking these Data Nodes:
- _filter_: filters on the months of the dataframe
- _count_values_: calculates the number of elements in this month

3) One pipeline in a scenario gathering these two tasks.

<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_03/config_03.svg width=700>
</div>



In [10]:
import datetime as dt
import pandas as pd


In [11]:
def filter_current(df):
    current_month = dt.datetime.now().month
    df['Date'] = pd.to_datetime(df['Date']) 
    df = df[df['Date'].dt.month == current_month]
    return df

def count_values(df):
    return len(df)




<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_03/config_03.gif width=700>
</div>


=== "Python configuration"



In [12]:
# here is a CSV Data Node
historical_data_cfg = Config.configure_csv_data_node(id="historical_data",
                                                     default_path="time_series.csv")
month_values_cfg =  Config.configure_data_node(id="month_data")
nb_of_values_cfg = Config.configure_data_node(id="nb_of_values")


In [13]:
task_filter_current_cfg = Config.configure_task(id="filter_current",
                                                 function=filter_current,
                                                 input=historical_data_cfg,
                                                 output=month_values_cfg)

task_count_values_cfg = Config.configure_task(id="count_values",
                                                 function=count_values,
                                                 input=month_values_cfg,
                                                 output=nb_of_values_cfg)


In [14]:
pipeline_cfg = Config.configure_pipeline(id="my_pipeline",
                                         task_configs=[task_filter_current_cfg,
                                                       task_count_values_cfg])

scenario_cfg = Config.configure_scenario(id="my_scenario",
                                         pipeline_configs=[pipeline_cfg])

#scenario_cfg = Config.configure_scenario_from_tasks(id="my_scenario",
#                                                    task_configs=[task_filter_current_cfg,
#                                                                  task_count_values_cfg])


In [15]:
tp.Core().run()

scenario_1 = tp.create_scenario(scenario_cfg, creation_date=dt.datetime(2022,10,7), name="Scenario 2022/10/7")
scenario_1.submit()

scenario_2 = tp.create_scenario(scenario_cfg, creation_date=dt.datetime(2022,10,7), name="Scenario 2022/10/7")
scenario_2.submit()


Results:
```
[2022-12-22 16:20:03,424][Taipy][INFO] job JOB_filter_current_257edf8d-3ca3-46f5-aec6-c8a413c86c43 is completed.
[2022-12-22 16:20:03,510][Taipy][INFO] job JOB_count_values_90c9b3c7-91e7-49ef-9064-69963d60f52a is completed.
[2022-12-22 16:20:03,755][Taipy][INFO] job JOB_filter_current_4adc91ee-cd64-4ebf-819b-8643da0282fd is completed.
[2022-12-22 16:20:03,901][Taipy][INFO] job JOB_count_values_968c8c34-2ed4-4f89-995c-a4137af82beb is completed.
```



# Cycles :

So far, we have talked about how having different scenarios helps us to oversee our assumptions about the future. For example, in business, it is critical to weigh different options to come up with an optimal solution. However, this decision-making process isnâ€™t just a one-time task but rather a recurrent operation that happens over a time period. This is why we want to introduce Cycles.

A cycle can be thought of as a place to store different and recurrent scenarios within a time frame. In Taipy Core, each Cycle will have a unique primary scenario representing the reference scenario for a time period.


In the step's example, scenarios are attached to a MONTHLY cycle. Using Cycles is useful because some specific Taipy's functions exist to navigate through these Cycles. Taipy can get all the scenarios created in a month by providing the Cycle. You can also get every primary scenario ever made to see their progress over time quickly.




In [16]:
from taipy.core.config import Frequency

def filter_by_month(df, month):
    df['Date'] = pd.to_datetime(df['Date']) 
    df = df[df['Date'].dt.month == month]
    return df

def count_values(df):
    return len(df)



<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_04/config_04.svg width=700>
</div>


=== "Python configuration"



In [17]:
historical_data_cfg = Config.configure_csv_data_node(id="historical_data",
                                                     default_path="time_series.csv")
month_cfg =  Config.configure_data_node(id="month")
month_values_cfg =  Config.configure_data_node(id="month_data")
nb_of_values_cfg = Config.configure_data_node(id="nb_of_values")


task_filter_by_month_cfg = Config.configure_task(id="filter_by_month",
                                                 function=filter_by_month,
                                                 input=[historical_data_cfg, month_cfg],
                                                 output=month_values_cfg)

task_count_values_cfg = Config.configure_task(id="count_values",
                                              function=count_values,
                                              input=month_values_cfg,
                                              output=nb_of_values_cfg)

pipeline_cfg = Config.configure_pipeline(id="my_pipeline",
                                         task_configs=[task_filter_by_month_cfg,
                                                       task_count_values_cfg])

scenario_cfg = Config.configure_scenario(id="my_scenario",
                                         pipeline_configs=[pipeline_cfg],
                                         frequency=Frequency.MONTHLY)


#scenario_cfg = Config.configure_scenario_from_tasks(id="my_scenario",
#                                                    task_configs=[task_filter_by_month_cfg,
#                                                                  task_count_values_cfg])





As you can see, a Cycle can be made very easily once you have the desired frequency. In this snippet of code, since we have specified frequency=Frequency.MONTHLY, the corresponding scenario will be automatically attached to the correct period (month) once it is created.





In [18]:
tp.Core().run()

scenario_1 = tp.create_scenario(scenario_cfg,
                                creation_date=dt.datetime(2022,10,7),
                                name="Scenario 2022/10/7")
scenario_2 = tp.create_scenario(scenario_cfg,
                                creation_date=dt.datetime(2022,10,5),
                                name="Scenario 2022/10/5")



Scenario 1 and 2 belongs to the same cycle but they don't share the same data node. Each one have a Data Node by itself.




In [19]:
scenario_1.month.write(10)
scenario_2.month.write(10)


print("Month Data Node of Scenario 1", scenario_1.month.read())
print("Month Data Node of Scenario 2", scenario_2.month.read())

scenario_1.submit()
scenario_2.submit()




Results:
```
Month Data Node of Scenario 1 10
Month Data Node of Scenario 2 10
[2022-12-22 16:20:04,746][Taipy][INFO] job JOB_filter_by_month_a4d3c4a7-5ec9-4cca-8a1b-578c910e255a is completed.
[2022-12-22 16:20:04,833][Taipy][INFO] job JOB_count_values_a81b2f60-e9f9-4848-aa58-272810a0b755 is completed.
[2022-12-22 16:20:05,026][Taipy][INFO] job JOB_filter_by_month_22a3298b-ac8d-4b55-b51f-5fab0971cc9e is completed.
[2022-12-22 16:20:05,084][Taipy][INFO] job JOB_count_values_a52b910a-4024-443e-8ea2-f3cdda6c1c9d is completed.
[2022-12-22 16:20:05,317][Taipy][INFO] job JOB_filter_by_month_8643e5cf-e863-434f-a1ba-18222d6faab8 is completed.
[2022-12-22 16:20:05,376][Taipy][INFO] job JOB_count_values_72ab71be-f923-4898-a8a8-95ec351c24d9 is completed.
```

## Primary scenarios

In each cycle, there is a primary scenario. Having a primary scenario is interesting because it will be the important one of the cycle, the one that is the reference. By default, the first scenario created for a cycle will be primary. `tp.set_primary()` allows to change which scenario is the primary scenario in a cycle. `<Scenario>.is_primary` will return a boolean whether the scenario is primary or not.



In [20]:
print("Scenario 1 before", scenario_1.is_primary)
print("Scenario 2 before", scenario_1.is_primary)

tp.set_primary(scenario_2)


print("Scenario 1 after", scenario_1.is_primary)
print("Scenario 2 after", scenario_1.is_primary)


Results:

```
Scenario 1 before True
Scenario 2 before False
Scenario 2 after False
Scenario 2 after True
```

Scenario 3 will be alone in another Cycle due to its creation date and will be the default primary scenarrio.



In [21]:
scenario_3 = tp.create_scenario(scenario_cfg,
                                creation_date=dt.datetime(2021,9,1),
                                name="Scenario 2022/9/1")
scenario_3.month.write(9)
scenario_3.submit()

print("Is scenario 3 primary?", scenario_3.is_primary)



Results:

```
[2022-12-22 16:20:05,317][Taipy][INFO] job JOB_filter_by_month_8643e5cf-e863-434f-a1ba-18222d6faab8 is completed.
[2022-12-22 16:20:05,376][Taipy][INFO] job JOB_count_values_72ab71be-f923-4898-a8a8-95ec351c24d9 is completed.

Is scenario 3 primary? True
```

Also, as you can see every scenario has been submitted and executed entirely. However, the result for these tasks are all the same. Caching will help to skip certain redundant task.

## Useful functions concerning cycles

- `get_primary_scenarios()`: will return a list of all primary scenarios

- `get_scenarios(cycle=<Cycle>)`: will return all the scenarios in the cycle

-  `get_cycles()`: will return the list of cycles

-  `get_primary(<Cycle>)`: will return the primary scenario of the cycle



# Scoping 

Scoping determines how Data Nodes are shared between cycles, scenarios, and pipelines. Indeed, multiple scenarios can have their own Data Nodes or share the same one. For example, the initial/historical dataset is usually shared by all the scenarios/pipelines/cycles. It has a Global Scope and will be unique in the entire application.




In [22]:
from taipy.core.config import Scope


In [23]:
def filter_by_month(df, month):
    df['Date'] = pd.to_datetime(df['Date']) 
    df = df[df['Date'].dt.month == month]
    return df

def count_values(df):
    return len(df)



- **Pipeline** scope: two pipelines can reference different Data Nodes even if their names are the same. For example, we can have a _prediction_ Data Node of an ARIMA model (ARIMA pipeline) and a _prediction_ Data Node of a RandomForest model (RandomForest pipeline). A scenario can contain multiple pipelines.

- **Scenario** scope: pipelines share the same Data Node within a scenario. 

- **Cycle** scope: scenarios from the same cycle share the same Data Node.

- **Global** scope: unique Data Node for all the scenarios/pipelines/cycles.

=== "Python configuration"



In [24]:
historical_data_cfg = Config.configure_csv_data_node(id="historical_data",
                                                 default_path="time_series.csv",
                                                 scope=Scope.GLOBAL)
month_cfg =  Config.configure_data_node(id="month", scope=Scope.CYCLE)

month_values_cfg = Config.configure_data_node(id="month_data",
                                               scope=Scope.CYCLE)
nb_of_values_cfg = Config.configure_data_node(id="nb_of_values")


task_filter_by_month_cfg = Config.configure_task(id="filter_by_month",
                                                 function=filter_by_month,
                                                 input=[historical_data_cfg,month_cfg],
                                                 output=month_values_cfg)

task_count_values_cfg = Config.configure_task(id="count_values",
                                                 function=count_values,
                                                 input=month_values_cfg,
                                                 output=nb_of_values_cfg)

pipeline_cfg = Config.configure_pipeline(id="my_pipeline",
                                         task_configs=[task_filter_by_month_cfg,
                                                       task_count_values_cfg])

scenario_cfg = Config.configure_scenario(id="my_scenario",
                                         pipeline_configs=[pipeline_cfg],
                                         frequency=Frequency.MONTHLY)


#scenario_cfg = Config.configure_scenario_from_tasks(id="my_scenario",
#                                                    task_configs=[task_filter_by_month_cfg,
#                                                                  task_count_values_cfg])


In [25]:
tp.Core().run()

scenario_1 = tp.create_scenario(scenario_cfg,
                                creation_date=dt.datetime(2022,10,7),
                                name="Scenario 2022/10/7")
scenario_2 = tp.create_scenario(scenario_cfg,
                               creation_date=dt.datetime(2022,10,5),
                               name="Scenario 2022/10/5")
scenario_3 = tp.create_scenario(scenario_cfg,
                                creation_date=dt.datetime(2021,9,1),
                                name="Scenario 2021/9/1")



Scenario 1 and 2 belongs to the same cycle so we can define the month just once for scenario 1 and 2 because month has a Cycle scope.

<div align="center">
 <img src=https://github.com/Avaiga/taipy-getting-started-core/blob/develop/step_05/sommething.svg width=700>
</div>




In [26]:
scenario_1.month.write(10)
print("Scenario 1: month", scenario_1.month.read())
print("Scenario 2: month", scenario_2.month.read())


Results:
```
Scenario 1: month 10
Scenario 2: month 10
```




In [27]:
print("\nScenario 1: submit")
scenario_1.submit()
print("Value", scenario_1.nb_of_values.read())



Results:
```
Scenario 1: submit
[2022-12-22 16:20:05,810][Taipy][INFO] job JOB_filter_by_month_d71cfd10-f674-40c8-b7a5-c66bea8773ef is completed.
[2022-12-22 16:20:05,902][Taipy][INFO] job JOB_count_values_cbe0b3b3-2531-440a-9413-48845c9cfdf1 is completed.
Value 849
```  




In [28]:
print("Scenario 2: submit")
scenario_2.submit()
print("Value", scenario_2.nb_of_values.read())



Results:
```
Scenario 2: submit
[2022-12-22 16:20:06,356][Taipy][INFO] job JOB_filter_by_month_705fcb69-64fc-4f66-a5f3-90169e09f8bf is completed.
[2022-12-22 16:20:06,426][Taipy][INFO] job JOB_count_values_5a8eea88-4477-48ab-a401-8036bada2267 is completed.
Value 849
```




In [29]:
print("\nScenario 3: submit")
scenario_3.month.write(9)
scenario_3.submit()
print("Value", scenario_3.nb_of_values.read())
