# Step Saving And Lifecycle 

<p style="font-size:18px">This page introduces the concept of lifecycle in a Neuraxle BaseStep. You can find a deailed component API reference here.</p>

## Lifecycle
<img src="images/lifecycle.png" style="max-width:600px" />

1. **BaseStep.__init__()**: This is where you initialize all of your props, and fitted state. 
2. **set_hyperparams()**:
3. **setup()**: Initialize the step before it runs. Only from here and not before that heavy things should be created (e.g.: things inside GPU), and NOT in the constructor.
4. **fit(data_inputs, expected_outputs)**: Fit step with the given data inputs, and expected outputs.

5. **transform(data_inputs)**: Transform given data inputs.
6. **save(context, full_dump)**: Save step using the execution context to create the directory to save the step into.
7. **teardown()**: Teardown step after program execution. Inverse of setup, and it should clear memory. Override this method if you need to clear memory.

# Step Saving

In Neuraxle, each step has a list of savers that can load, and save steps. Steps are saved using the execution context to create the directory to save the step into. The saving happens by looping through all of the step savers in the reversed order. 

The cool thing about this is that you don't even need the source code to load your steps. This enables a lot of thing like parallel processing, and distributed computing. 

### Saver

Some savers just save parts of objects, some save it all or what remains.
The JoblibStepSaver has to be called last because it needs a
stripped version of the step.

You might need to create your own saver if you are using a step that is not serializable. For instance, this will most likely happen if the step is a deep learning model. 

Fortunately, we have already built a set of savers for tensorflow 1, and 2 in [Neuraxle-Tensorflow](https://github.com/Neuraxio/Neuraxle-TensorFlow). We plan to do the same thing for Pytorch soon.

Here is an example of a custom saver that strips the multiprocessing Queue from a step called SequentialQueuedPipeline:

In [None]:
class ObservableQueueStepSaver(BaseSaver):
    def save_step(self, step: 'BaseStep', context: 'ExecutionContext') -> 'BaseStep':
        step.queue = None
        step.observers = []
        return step

    def can_load(self, step: 'BaseStep', context: 'ExecutionContext'):
        return True

    def load_step(self, step: 'BaseStep', context: 'ExecutionContext') -> 'BaseStep':
        step.queue = Queue()
        return step


### Execution Context

Execution context object contains all of the pipeline hierarchy steps.
First item in parents is root, second is nested, and so on. This is like a stack. Note: You should not have to worry about pushing steps to the context because it is already done for you in the handler methods. 

### Full Dump Saving

To save the full pipeline even if steps are not invalidated or initialized, you can use full dump saving: 

In [None]:
PIPELINE_NAME = 'saved_pipeline_name'

pipeline = Pipeline([
    TrainOnlyWrapper(DataShuffler()),
    OutputTransformerWrapper(NumpyRavel()),
    SKLearnWrapper(LogisticRegression(), HyperparameterSpace({
        'C': LogUniform(0.01, 10.0), 
        'fit_intercept': Boolean(), 
        'dual': Boolean(),
        'penalty': Choice(['l1', 'l2']), 
        'max_iter': RandInt(20, 200)
    }))
], cache_folder='cache_folder').set_name(PIPELINE_NAME)

pipeline, outputs = pipeline.fit_transform(DATA_INPUTS, EXPECTED_OUTPUTS)
pipeline.save(ExecutionContext(tmpdir), full_dump=True)

### Full Dump Loading

To Load a full pipeline without any source code, you can load the full dump using the execution context load method: 

In [5]:
loaded_pipeline = ExecutionContext(tmpdir).load(PIPELINE_NAME)
outputs = pipeline.transform(DATA_INPUTS)

NameError: name 'ExecutionContext' is not defined