# Define a way for a pipeline to be executed

- Need a way to define a set of instructions that would apply different functions. In this Example, all functions would come from a class 'MathOperators'. 
- The instructions are set inside a .yaml file, called pipeline.yaml. This file can be created in different ways, and edited using a text editor, here it is created using Python.

## Why use a .yaml file?

- Human-readable format: YAML is designed to be easily readable by humans, with a simple syntax and minimal punctuation. This makes it easy for developers, data engineers, and other stakeholders to understand and review the pipeline structure and flow at a glance.

- Separation of concerns: By describing the pipeline in a YAML file, you can separate the pipeline's structure and flow from the implementation details of the pipeline's steps. This allows you to modify the pipeline structure without changing the underlying code, promoting modularity and maintainability.

- Easier collaboration: Since the pipeline is described in a separate YAML file, multiple team members can work on different parts of the project simultaneously. For example, one team member can work on the pipeline's steps while another team member modifies the pipeline's structure in the YAML file.

- Flexibility and reusability: A YAML-based pipeline description can be easily reused or adapted for different projects or use cases. By changing the YAML file, you can create new pipelines or modify existing ones without rewriting the entire codebase.

- Easy integration with external tools: Many data processing tools and frameworks support YAML configuration files out of the box. By using a YAML file to describe your pipeline, you can easily integrate your pipeline with other tools and systems that work with YAML, such as workflow management systems (e.g., Apache Airflow) or infrastructure-as-code tools (e.g., Ansible).

- Cross-language compatibility: YAML is a language-independent data serialization format, so it can be used with a variety of programming languages. This makes it easier to use the same pipeline description across different languages or platforms, reducing the need for language-specific pipeline configurations.

In [1]:
import yaml

# Define a class:
class MathOperators:
    def add(self, x, y):
        return x + y

    def subtract(self, x, y):
        return x - y

    def multiply(self, x, y):
        return x * y
    
    
    
# Define the pipeline structure using a Python dictionary
pipeline_structure = {
    'pipeline': [
        'add',
        'subtract',
        'add',
        'subtract',
        'multiply'
    ]
}



# Write structure to a YAML file for easier manipulation... and all the above
with open('ex01_pipeline.yaml', 'w') as file:
    yaml.dump(pipeline_structure, file, default_flow_style=False)



# Define a function that would read the YAML file, to read the pipeline eventually    
def read_yaml_file(file_path):
    with open(file_path, 'r') as file:
        return yaml.safe_load(file)
    

###
# Example 1: calling functions independently
###   
# Define a main function, main() that would read the yaml pipeline and execute it:
def main():
    # Read the pipeline from the YAML file
    pipeline_yaml = read_yaml_file('ex01_pipeline.yaml')
    pipeline = pipeline_yaml['pipeline']

    # Initialize the MathOperators class
    operators = MathOperators()

    # Sample input values
    x, y = 5, 3

    # Follow the pipeline and execute the corresponding operations
    for operation in pipeline:
        if operation == 'add':
            result = operators.add(x, y)
            print(f"{x} + {y} = {result}")
        elif operation == 'subtract':
            result = operators.subtract(x, y)
            print(f"{x} - {y} = {result}")
        elif operation == 'multiply':
            result = operators.multiply(x, y)
            print(f"{x} * {y} = {result}")

#Call main function... if it is executed directly, as a main module (only if __name__ == '__main__').
#If it is imported as a module in another script (__name__ == (name of script)), then don't execute 
if __name__ == '__main__':
    main()

5 + 3 = 8
5 - 3 = 2
5 + 3 = 8
5 - 3 = 2
5 * 3 = 15


In [2]:
#########################

In [3]:
###
# Example 2: setting output of function as an ipnut in next function in pipeline
###   

# def main() that would read the yaml pipeline and execute it:
def main(x,y):
    # Read the pipeline from the YAML file
    pipeline_yaml = read_yaml_file('ex01_pipeline.yaml')
    pipeline = pipeline_yaml['pipeline']

    # Initialize the MathOperators class
    operators = MathOperators()

    # Sample input values
    #x, y = 5, 3

    # Follow the pipeline and execute the corresponding operations
    for operation in pipeline:
        if operation == 'add':
            result = operators.add(x, y)
            print(f"{x} + {y} = {result}")
        elif operation == 'subtract':
            result = operators.subtract(x, y)
            print(f"{x} - {y} = {result}")
        elif operation == 'multiply':
            result = operators.multiply(x, y)
            print(f"{x} * {y} = {result}")
            
        #update value with result 
        x = result

if __name__ == '__main__':
    main(7,3)

7 + 3 = 10
10 - 3 = 7
7 + 3 = 10
10 - 3 = 7
7 * 3 = 21
