# Data Module
The data module helps you build complex data pipeline simply. Simply means that the code is readable and easily maintainable. This is due to the modularity of the data pipeline built with this module.

## Overview
The data pipeline builtd with this module can be built like a pipeline in the shell console.  They are composed of multiple sub-scripts that are agglomerated (piped) together with the pipe ```|``` operator.  This way, the sub-scripts can be reusable for multiple data pipes, and helps you build new data pipes quicker by composition of existing code.

## Anatomy of a data pipeline
The root class of the data pipelines is ```DataPipe```.  This class is a recursive and composable class.  This means that a single small script that does an elementary task is a ```DataPipe```, and an agglomeration (composition) of multiple elementary scripts is also a ```DataPipe```.  As it might be clear by now, a data pipeline is built by composition of multiple elementary pipes.  These elementary pipes are split into four categories: ```Fetch```, ```Process```, ```Collate```, ```Cache```. ```Fetch``` pipes fetch data from an external source.  It can be from the internet, from a database, from a file, etc.  ```Process``` pipes will transform the data from the upstream pipe.  It can be to change the datastructure, to impute nans, to filter the data, etc. ```Collate``` pipes are use to merge two branches of a pipeline.  For example, let's say you have a pipeline that fetch chart data from one data source, and another pipeline that fetch fundamental data formated as reports.  Each raw data needs to be processed differently, so they have their own subpipes.  However, they need to be agglomerated at the end to have a single dataset.  This is where a Collate pipe would comes in handy: it could align the two series together to merge the output of the two subpipes into one pipe.  Finally, the ```Cache``` pipe can cache the output of a data pipeline and prevent the wrapped pipeline section to run only when cache has been revalidated or expired.  Otherwise, it will return the cached data.  The default cache pipe supports multiple way of revalidating the cache.

## Basic Example
The following example will show how to build a simple pipe that can fetch chart data based on a ticker and the yfinance api.

In [8]:
# Import the pipe
from backtest.data import FetchCharts
# Build the pipe
pipe = FetchCharts(["NVDA"])
# Print the representation of the pipe to have a clear view of what it will do.
print(pipe)

┌ DataPipe(DataPipeType.FETCH, FetchCharts) ┐
│                                           │
│ FetchCharts                               │
│                                           │
└───────────────────────────────────────────┘


As we can see in the previous example, our pipe is constituted of a single elementary pipe which fetches the charts of the tickers given as parameters.  For now, the pipe is just built, it hasn't run.  To run it, we simply need to call the ```get``` method and pass the ```frm``` and ```to``` parameters, which are datetimes.

In [7]:
from datetime import datetime
my_charts = pipe.get(frm=datetime(2022, 1, 1), to=datetime(2024,1,1))
print(f"The type of the output is:  {type(my_charts)}")
print(f"The keys are:               {my_charts.keys()}")
print(f"The type of the values are: {type(my_charts['NVDA'])}")

The type of the output is:  <class 'dict'>
The keys are:               dict_keys(['NVDA'])
The type of the values are: <class 'pandas.core.frame.DataFrame'>


---
Let's complexify the example by adding elementary pipes to our pipe.  We will fetch the charts, ignore the charts that are None (Which means the asset didn't exist at the time requested), and impute the nan values with the previous value.

In [14]:
from backtest.data import FilterNoneCharts, CausalImpute

pipe = FetchCharts(["NVDA", "AAPL", "TXYZ"]) | FilterNoneCharts() | CausalImpute()
# Let's see what the pipe is doing
print(pipe)

# Let's fetch the data
data = pipe.get(frm=datetime(2022, 1, 1), to=datetime(2024,1,1))
print(f"The type of the output is:  {type(data)}")
print(f"The keys are:               {data.keys()}")
print(f"The type of the values are: {type(data['NVDA'])}")

TXYZ: No timezone found, symbol may be delisted


┌ DataPipe(DataPipeType.PROCESS, CausalImpute) ───┐
│                                                 │
│ FetchCharts -> FilterNoneCharts -> CausalImpute │
│                                                 │
└─────────────────────────────────────────────────┘
The type of the output is:  <class 'dict'>
The keys are:               dict_keys(['NVDA', 'AAPL'])
The type of the values are: <class 'pandas.core.frame.DataFrame'>


---
As you can see, the pipe ```FilterNoneCharts``` ignored the 'TXYZ' asset because it doesn't exists, so it returned only the NVDA and AAPL charts.  If this elementary pipe wasn't there, the pipe would have return a key for 'TXYZ' that would map to a None.

**Questions**  
1.1 - What is the type of the FetchCharts pipe in the four types presented earlier?  
1.2 - What is the type of the FilterNoneCharts pipe?

Now, let's build a real-world pipe that you might find useful.

In [21]:
from backtest.data import PadNan, ToTSData, Cache

# Build the pipe
pipe = FetchCharts(["NVDA", "AAPL", "MSFT"]) | FilterNoneCharts() | CausalImpute() | PadNan() | ToTSData() | Cache()
print(pipe)

# Fetch the data
data = pipe.get(frm=datetime(2022, 1, 1), to=datetime(2024,1,1))
print(f"The type of the output is:                     {type(data)}")
print(f"The length of the list is:                     {len(data)}")
print(f"The type of the element of the list is         {type(data[0])}")
print(f"The keys of the inner dict is                  {data[0].keys()}")
print(f"The type of the elements in the inner dict is: {type(data[0]['NVDA'])}")

┌ DataPipe(DataPipeType.CACHE, Cache) ───────────────────────────────────────────┐
│                                                                                │
│ FetchCharts -> FilterNoneCharts -> CausalImpute -> PadNan -> ToTSData -> Cache │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘
The type of the output is:                     <class 'list'>
The length of the list is:                     1
The type of the element of the list is         <class 'dict'>
The keys of the inner dict is                  dict_keys(['NVDA', 'AAPL', 'MSFT'])
The type of the elements in the inner dict is: <class 'backtest.engine.tsData.TSData'>


---
As you can see, we built a complex data structure with a single line of code.  As if it was not engough, it automatically cached the result, so we do not have to query the api anymore.  Now you might wonder what is this output data structure and why this pipe may be useful to you in the future.  Let's break it down!

#### The output
The final output is the data structure that the backtest object needs.  So, the output of this pipe is the input of the backtest.  More specifically, the backtest object needs the data to be formated as follow: A list of dictionnaries, where each index of the list correspond to a specific time resolution (our has a length of 1 because we only had a single time resolution which was 1 day, the default).  The dictionaries have string keys and TSData object values.  The keys are the tickers of the assets, and the values are the chart warpped by a TSData object which contains metadata information useful for the backtest engine.  Without the pipe, it would have been a tedious task to create a function that fetch the data, preprocess it, and format it in the good format.  However, building this pipeline with elementary pipes that are reusable is a charm!

#### The pipes
Let's break down what each pipe do.  The first three pipes have already been cover, so we will start with the 4th one.

**PadNan**: This pipe will ensure that every timeseries (charts in our case) have the same length.  It will pad the start of the shorter charts with nan to ensure this.  It is required by the backtest object that each series have the same length.  
**ToTSData**: This pipe converts a dictionary of DataFrames to the data structure presented above.  It is recommended to always use this pipe if the pipeline is meant to prepare the data for the backtest object.  
**Cache**: Finally, the cache pipe caches the output of the pipe on the first run.  On the following runs, it is this pipe that is called first, and it will check if there is a cache file associated with this data.  If yes, it will skip all of the other pipes (from FetchCharts to Cache) and return the cached data.  If some other pipes would have beed added after the cache pipe, their data wouldn't have been cached.  Because we didn't passed any parameters to the cache pipe, it will be cached indefinitely.  However, if you change the pipe structure, or you change some values in the pipe (for example add a ticker to the list of tickers), it will automatically detect those changes and perform a full revalidation of the whole pipe.

## Complex pipes
It comes handy to have some prebuilt pipes for simple pipelines.  However, how can I build a complex pipeline fetching data from multiple sources, transforming each of them independently, and finally agglomerating everything together?  We'll see how to do this here.

### Building custom elementary pipes
As said earlier, there are four types of pipes, and each of them has some particularities to keep in mind when building a custom elementary pipe.  In this section, you will see how to create those custom pipes with examples.

#### Fetch
The fetch pipes are designed to fetch data from an external sources.  However, in this example, we will fetch the data from a global variable to simplify the examples.  There are two ways to build a custom fetch pipe.  The first one, and the simplest si from a decorator.  The second one is by deriving a class.  Here are the pros and cons of both methods:

**Decorator**  
Pros:
- Functional approach
- Simpler
- Quicker to code, and less boiler plates

Cons:
- Cannot receives parameters during initialization
- Cannot have a state.

**Deriving a class**  
Pros:
- More flexibility
- Can receives parameters during initialization
- Can have a state

Cons:
- More boiler plates
- Might be longer to code

To make a custom pipe using the **decorator** technique, you only need to add the ```@Fetch``` at the top of your function.  Teh function must have a particular signature.  It must have as positional parameters the ```frm``` and ```to``` paramter, which are datetimes.  This means that your pipe should always fetch data between those two datetime to avoid unexepected results.  It pust also have the po parameters, which is a keyword parameter that receives a ```PipeOutput```.  The PipeOuput objects corresponds to the output of the previous pipe.  If the current pipe is the first of the pipeline, the po parameter will be ```None```.  You function must also take as parameters positional arguments that can be passed to the pipeline and keyword arguments.  That's why the ```*args``` and ```**kwargs``` are added.  It will be explained in more details later why this is required.  Finallym if we print the type of the function, we can find out it isn't a function, but a Fetch object.  We built a custom DataPipe!

In [33]:
from backtest.data import Fetch

MY_VALUES = [1.618, 2.71828, 3.1416, 42.]

@Fetch
def FetchDec(frm, to, *args, po, **kwargs):
    return MY_VALUES

print(f"The type of FetchDec is: {type(FetchDec)}\n")
print(FetchDec())

The type of FetchDec is: <class 'backtest.data.pipes.Fetch'>

┌ DataPipe(DataPipeType.FETCH, FetchDec) ┐
│                                        │
│ FetchDec                               │
│                                        │
└────────────────────────────────────────┘


---
Now, we will build a custom pipe using the **class derivation** technique.  To do so, you need to derive a class from the DataPipe base class (The fetch is only made as a decorator, and it isn't recommended to derive from it).  To do so, we need to override the constructor and the fetch method.  The constructor can take as many parameters as we like.  We also need to initialize the super class by passing the pipe type and the name of the pipe. It is recommended to use the same name as the name of the class.  Then, we can override the fetch method and implement the logic there, like we did with the decorator technique.  However, it must return a PipeOutput, not any object like in the decorator technique.  In addition, the pipe output must be refrence to the current object 'self'.

In [34]:
from backtest.data import DataPipe, DataPipeType, PipeOutput

class FetchClf(DataPipe):
    def __init__(self, my_param="I can receive params!"):
        super().__init__(DataPipeType.FETCH, "FetchClf")    # PipeType, Pipe Name
        self.my_param = my_param

    def fetch(self, frm, to, *args, po, **kwargs):
        print(self.my_param)
        return PipeOutput(MY_VALUES, self)

print(f"The type of FetchClf is: {type(FetchClf())}\n")
print(FetchClf())

The type of FetchClf is: <class '__main__.FetchClf'>

┌ DataPipe(DataPipeType.FETCH, FetchClf) ┐
│                                        │
│ FetchClf                               │
│                                        │
└────────────────────────────────────────┘


#### Process
To make custom process pipes, it is the same as Fetch pipes, but using the Process decorator, or the Process data pipe type and overriding the process method instead of the fetch for the class deriving method.

#### Collate
There is still two ways to make a custom collate pipe.  The first one is the **decorator**.  It is similar to the Fetch and the Process sunthax, but it takes two pipe output as input: ```po1```, ```po2```.  In the following example, we make a Collate pipe that assumes that the values of the pipe outputs are lists and concatenate them.

In [35]:
from backtest.data import Collate

@Collate
def CollateDec(frm, to, *args, po1, po2, **kwargs):
    return po1.value + po2.value

print(f"The type of CollateDec is: {type(CollateDec)}\n")
print(CollateDec(FetchDec(), FetchClf()))

The type of CollateDec is: <class 'backtest.data.pipes.Collate'>

┌ DataPipe(DataPipeType.COLLATE, CollateDec) ┐
│                                            │
│ FetchDec -> ┐                              │
│             │ -> CollateDec                │
│ FetchClf -> ┘                              │
│                                            │
└────────────────────────────────────────────┘


---
The **class derivation** method is trickier because we need to manually handle the pipe ids, which we haven't seen yet.  So, we will make an example, and there is a part of the code that will be explained later on. Like for a Fetch pipe, we need to derive from the DataPipe class and initialize the super class.  We need to use the COLLATE data pipe type.  We also need to register the two branches in a list of two pipe called ```_pipes```.  It must be called like this.  With this line, the DataPipe will be able to handle both branches under the hood and pass their output to the collate method.  Finally, we can override the collate method and implement it like with the decorator technique.  However, note that it must return a PipeOutput referenced to the current object ('self').

In [36]:
class CollateClf(DataPipe):
    def __init__(self, pipe1, pipe2):
        super().__init__(DataPipeType.COLLATE, "CollateClf")
        # We register the two branches in order for the DataPipe to handle them.
        self._pipes = [pipe1, pipe2]
        
        # Technical line explained later
        self._pipe_id = pipe2._increment_id(pipe1._pipe_id + 1)

    def collate(self, frm, to, *args, po1, po2, **kwargs):
        return PipeOutput(po1.value + po2.value, self)

print(f"The type of CollateClf is: {type(CollateClf)}\n")
print(CollateClf(FetchDec(), FetchClf()))

The type of CollateClf is: <class 'abc.ABCMeta'>

┌ DataPipe(DataPipeType.COLLATE, CollateClf) ┐
│                                            │
│ FetchDec -> ┐                              │
│             │ -> CollateClf                │
│ FetchClf -> ┘                              │
│                                            │
└────────────────────────────────────────────┘


#### Putting everythng together
In the following example, we will see how our implementation using the decorator technique and the class derivation techniques gives the same result.

In [41]:
pipe_dec = CollateDec(FetchDec(), FetchDec())
dec_out = pipe_dec.get(None, None)
print(pipe_dec)
print("\n" + "="*100 + "\n")
pipe_clf = CollateClf(FetchClf(), FetchClf())
clf_out = pipe_clf.get(None, None)
print(pipe_clf)
print("\n" + "="*100 + "\n")
print(f"Decorator: {dec_out}")
print(f"Class derivation: {clf_out}")

┌ DataPipe(DataPipeType.COLLATE, CollateDec) ┐
│                                            │
│ FetchDec -> ┐                              │
│             │ -> CollateDec                │
│ FetchDec -> ┘                              │
│                                            │
└────────────────────────────────────────────┘


I can receive params!
I can receive params!
┌ DataPipe(DataPipeType.COLLATE, CollateClf) ┐
│                                            │
│ FetchClf -> ┐                              │
│             │ -> CollateClf                │
│ FetchClf -> ┘                              │
│                                            │
└────────────────────────────────────────────┘


Decorator: [1.618, 2.71828, 3.1416, 42.0, 1.618, 2.71828, 3.1416, 42.0]
Class derivation: [1.618, 2.71828, 3.1416, 42.0, 1.618, 2.71828, 3.1416, 42.0]


---
In the previous example, we can see that both methods gives the same results.  We can see the string 'I can receive params!' printed two times because the FetchClf object is called twice, and we built the pipe in a way to print this each time it is called.

## Deep dive into Caching
Implementing robust caching techniques from scratch can be tedious.  This is why the datapipe api comes with a prebuilt caching support.  It can support by default simple caching and revalidating mechanism, and can be extended to handle any arbitrarly complex caching and revalidating mecanism.  There is again two ways to extend the caching mechanism.  The first one is the simplest, but the less elegant: using a combination of decorator and callbacks.  If it is what you want to do, I recommend you to take a look at the example in the docs (Cache object).  The other way is by deriving the class.  We won't enter into the technical details on how to make a custom caching mecanism in this tutorial since most people won't need to do so.  If you need to understand those mecanism, I suggest you to read the docs on the Cache object and check the implementation of the JSONCache object in the source code.  I believe it is well documented and can be a great example.

Now, coming back to the tutorial, we take a look at the two prebuilt caching pipes.  We will look at how to use them, and a brief overview of how they work under the hood.

### How to interprete a pipe using caching
When a cache pipe is added to a datapipe, it automatically wraps everything to the left of of it.  This means that this pipe
```fetch() | process() | cache()``` is equivalent to do this in a functional notation: ```cache(process(fetch()))```.  In this example, the output of the process is cached and the process and fetch pipes will not be run again will the cache is still valid.

### Cache and JSONCache
There is two types of prebuilt cache pipes: ```Cache``` and ```JSONCache```.  The first one caches the data in a pickle format.  It is more flexible than  JSON and can result in a smaller file size for big objects.  However, it is harder to inspect the file and understand what data is stored compared to a text file such as JSON.  This is why there is another cache pipe that stores the cache in JSON: ```JSONCache```.  By default, JSON is pretty limited to basic datatypes such as float, strings, bool, etc.  This is why we extended the JSON serializing mechanism to handle more complex datatypes and be easily extendable to other datatypes.  It supports py default multiple complex datatype that you could use such as pandas DataFrames and numpy arrays.  Most objects are also serializable by default.  However, if you want a specific way to serialize your custum object, check out this example:

In [4]:
from backtest.data import json_extension as je
import json

class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    
    def __tojson__(self):
        return {"a": self.a, "b": self.b}
    
    @classmethod
    def __fromjson__(cls, d):
        return cls(d["a"], d["b"])
    def __repr__(self):
        return f"MyClass({self.a}, {self.b})"
    
je.add_types(MyClass)
obj = MyClass(1, 2)
print(json.dumps(obj, cls=je.JSONEncoder))
# To deserialize the object
d = '{"__TYPE__": "MyClass", "data": {"a": 1, "b": 2}}'
print(json.loads(d, cls=je.JSONDecoder))
# To unregister a type, use the remove_types function
je.remove_types(MyClass)

print() # For spacing in th output
# However, in this case, it wasn't necessary to implement a custom serializer because the serializer can handle it by default:
class MyClass2:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    
    def __repr__(self):
        return f"MyClass2({self.a}, {self.b})"

obj = MyClass2(1, 2)
print(json.dumps(obj, cls=je.JSONEncoder))
# To deserialize the object
d = '{"__TYPE__": "MyClass2", "data": {"a": 1, "b": 2}}'
print(json.loads(d, cls=je.JSONDecoder))

{"__TYPE__": "MyClass", "data": {"a": 1, "b": 2}}
MyClass(1, 2)

{"__TYPE__": "MyClass2", "data": {"a": 1, "b": 2}}
MyClass2(1, 2)


---
The following shows an example of registering a custom class with the JSONCache pipe.  Its basically the same synthax.

In [7]:
from backtest.data import Fetch, JSONCache

# Register the class.  (Usually not necessary, but we show it for the pupose of the tutorial)
je.add_types(MyClass)

OBJ = MyClass(1, 2)

@Fetch
def FetchNum(frm, to, *args, po, **kwargs):
    return OBJ

pipe = FetchNum() | JSONCache()

# Fisrt run, it isn't cache
print(pipe.get(None, None))

OBJ = MyClass(42, 42)

# Second run: It is cached and loaded fro cache
print(pipe.get(None, None))

MyClass(1, 2)
MyClass(1, 2)


### Overview of the mecanism
The great question: 'How does it work under the hood?'  
- On the first run, it runs after the previous pipes and store the output in memory.  If the ```store``` parameter is set to True, it is stored on disk in a pickle file or a json file depending on the caching pipe you chose or with the ```caching_cb``` passed as parameter.  
- On the following runs, the cache pipe is called first (before the pipes it is wrapping). If stored the cache is stored, it loads the cache from the disk using the default loading mecanism or the ```loading_cb``` if provided.  Then, it verifies if the cache is still valid.  By default, it will verify if the cache is not too old (Stored datetime + ```timeout``` parameter) or if it hasn't been hit more times than ```max_requests```.  If a ```revalidate_cb``` is provided, it is called to determine of the cache is still valid.
- If the cache is not valid anymore, it is revalidated and the part of the pipe wrapped inside the cache object is run again.
- If the cache is still valid, it returns the cached data.
- If the pipe has changed: its structure or its pipe's parameters, the cache object will detect it and it will trigger a full revalidation.  This means  that every cache object will revalidate their cache.  In other words, all the cache will be revalidated.

## Technicality of DataPipe and Pipe Ids

This section explains the technicalities of how the datapipes are built under the hood.  During the build process, each elementary pipe is given a unique id that is unique inside the pipe.  However, because the pipe can be built in multiple steps, the given ids aren't fixed utils the pipe is run.  This means that while a pipe isn't run, any ids can be changed internally.  However, once the pipe is run, the ids becomes fixed and cannot be changed anymore.  We call this process a pipe forging.  This is because a pipe is flexible before it is run, but becomes fixed and cannot be changed after being run.  Also, during the forgin process, the pipe ids may change to become unique accross all pipes, not only the current pipe.  Let's see an example:
```python
pipe1 = fetch() | process()
pipe1.get(datetime(2020, 1, 1), datetime(2021, 1, 1))    # Forge the pipe to reserve the ids
pipe2 = fetch() | process()
pipe2.get(datetime(2020, 1, 1), datetime(2021, 1, 1))    # Forge the pipe to get the real ids
```
In the previous code, the first pipe *could have* the following ids: (fetch: 1, process: 2) and the second (fetch: 3, process: 4).  It is precised 'could have' because it depends on what pipes were built previously.  Like in that example, even though both pipes are identical, they have different ids.  However, if the get method wouldn't have been called, their ids would have been the same *i.e.* (fetch: 1, process: 2).  This is because the pipe wouldn't have been forged and the ids would be unique for their pipe, but not for every pipes.

### Why it matters?
Usually, you won't even need to bother with the pipe ids.  However, there might have some situation were it could be useful to understand the concept.  For example: when using caching in jupyter notebooks.  This is because the caching pipes uses their pipe_id to identify the cache file and revalidate its cache.  If the pipe_id of a cache pipe changes, it might use the cache of another pipe, which could cause bugs.  Usually, this would not happen because each pipe is built deterministically in the same order in a script thus inheriting deterministically the same pipe_id.  However, it is not the case in jupyter notebooks where cells can be run in different orders depending on the user's intentions.  This means that the same pipeline could inherit different ids depending on how the user run the notebook.  This can cause problem if the pipe uses caching.  To tackle this prolem, there is a method called ```set_id``` that will set the ids of the pipe and forge it in order to fix the ids.  Because they are not automatically assigned, there is no verification done to check if the ids are unique.  (They are unique inside the pipe, but could overlap with the ids of another pipe.)  This being said, you must make sure the assigned ids are unique.  A good thumb rule is to first run your notebook with automatically assigned ids, then specify the same ids in the ```set_id``` method.  

**Example**:  
During the first run, we do not set any pipe ids.

In [1]:
from backtest.data import PadNan, ToTSData, CausalImpute, FilterNoneCharts, FetchCharts
from datetime import datetime

# Build the pipe
pipe1 = FetchCharts(["NVDA", "AAPL", "MSFT"]) | FilterNoneCharts() | CausalImpute() | PadNan() | ToTSData()
pipe1.get(datetime(2020, 1, 1), datetime(2021, 1, 1))   # Run to forge the pipe
pipe2 = FetchCharts(["NVDA", "AAPL", "MSFT"]) | FilterNoneCharts() | CausalImpute() | PadNan() | ToTSData()
pipe2.get(datetime(2020, 1, 1), datetime(2021, 1, 1))   # Run to forge the pipe

print("\n\n")

for p in pipe1:
    print(f'{p.name}: {p.pipe_id}')
print("="*100)
for p in pipe2:
    print(f'{p.name}: {p.pipe_id}')




FetchCharts: 0
FilterNoneCharts: 1
CausalImpute: 2
PadNan: 3
ToTSData: 4
FetchCharts: 5
FilterNoneCharts: 6
CausalImpute: 7
PadNan: 8
ToTSData: 9


Then, we change the cells to add the ```set_id()``` method call

In [2]:
from backtest.data import PadNan, ToTSData, CausalImpute, FilterNoneCharts, FetchCharts
from datetime import datetime

# Build the pipe
pipe1 = FetchCharts(["NVDA", "AAPL", "MSFT"]) | FilterNoneCharts() | CausalImpute() | PadNan() | ToTSData()
pipe1.set_id(0)    # Always pick the smallest id
pipe1.get(datetime(2020, 1, 1), datetime(2021, 1, 1))   # Run to forge the pipe
pipe2 = FetchCharts(["NVDA", "AAPL", "MSFT"]) | FilterNoneCharts() | CausalImpute() | PadNan() | ToTSData()
pipe2.set_id(5)
pipe2.get(datetime(2020, 1, 1), datetime(2021, 1, 1))   # Run to forge the pipe

print("\n\n")

for p in pipe1:
    print(f'{p.name}: {p.pipe_id}')
print("="*100)
for p in pipe2:
    print(f'{p.name}: {p.pipe_id}')




FetchCharts: 0
FilterNoneCharts: 1
CausalImpute: 2
PadNan: 3
ToTSData: 4
FetchCharts: 5
FilterNoneCharts: 6
CausalImpute: 7
PadNan: 8
ToTSData: 9


---
Now, you can try to run mulitple time the first cell, you will see that the ids keep incrementing defining new pipes at every run.  However, in the second cell, no matters how many times you run the cell, the pipe ids will stay the same.  It won't create new pipes, it will just re-initialize the pipelines.

## Question Answers
1.1: Fetch  
1.2: Process