# Python programming

## Functional Programming

Functional programming is a programming paradigm that emphasizes the use of functions to solve problems. 

In functional programming, functions are treated as **first-class citizens**, which means they can be passed as arguments to other functions, returned as values, and stored in variables.

Functional programming is an important paradigm for the Data Engineering world because it offers several benefits that are particularly valuable in this domain:

- **Scalability**: Functional programming emphasizes immutability and pure functions, which can make it easier to scale code for large datasets and distributed systems.

- **Modularity**: Functional programming is well-suited for creating modular code that can be easily combined and reused. 

- **Maintainability**: Functional programming can make code more maintainable by reducing the number of side effects and mutable state.

- **Testability**: Functional programming can make code more testable by emphasizing pure functions and minimizing side effects.

- **Interoperability**: Functional programming is often used in conjunction with other technologies such as Spark, Hadoop, and other distributed computing frameworks.

### Pure functions

Functions that don't have side effects and always return the same output for a given input. You can write pure functions avoiding mutable data types and using only the input arguments to calculate the output.

In [2]:
def add_numbers(x, y):
  return x + y

print(add_numbers(4, 5))

9


### Higher-order functions

Functions that take other functions as arguments or return functions as values.

This is known as **function composition**.

When you pass a function to another function, the passed-in function sometimes is referred to as a **callback** because a call back to the inner function can modify the outer function’s behavior.

In [3]:
def apply_func(f, x):
  return f(x)

def square(x):
  return x * x

result = apply_func(square, 3)
print(result)

9


### Lambda functions

Anonymous functions that don't have a name and are defined inline. You can create lambda functions using the `lambda` keyword.

The syntax of a lambda expression is as follows:
`lambda <parameter_list>: <expression>`

In [4]:
square = lambda x: x * x
result = square(3)
print(result)

9


### Map

The `map()` function applies a given function to each element of an iterable (e.g., a list or a tuple) and returns a new iterable with the results. 

The syntax is as follows: `map(function, iterable)`

In [5]:
numbers = [1, 2, 3, 4]
squared = map(lambda x: x**2, numbers)
print(list(squared))

[1, 4, 9, 16]


### Filter

The `filter()` function applies a given function to each element of an iterable and returns a new iterable with the elements that satisfy a given condition.

The syntax is as follows: `filter(function, iterable)`


In [6]:
numbers = [1, 2, 3, 4]
even_numbers = filter(lambda x: x % 2 == 0, numbers)
print(list(even_numbers))

[2, 4]


### Reduce

The `reduce()` function applies a given function to each element of an iterable in a cumulative way, producing a single result.
The syntax is as follows: `reduce(function, iterable)`

In [7]:
from functools import reduce
numbers = [1, 2, 3, 4]
sum = reduce(lambda x, y: x + y, numbers)
print(sum)

10


### List comprehensions

List comprehensions are a concise and readable way to create a new list from an existing iterable, such a list or a tuple. They provide an alternative way to achive similar results as those obtained by usin `map`, and `filter` functions.

The syntax is as follows: `new_list = [expression for item in iterable if condition]`

In [8]:
numbers = [1, 2, 3, 4, 5]
squares = [num**2 for num in numbers]
print(squares)

[1, 4, 9, 16, 25]


#### List comprehensions vs Map function

In [9]:
numbers = [1, 2, 3, 4, 5]
squares_list_comp = [num**2 for num in numbers]
squares_map = list(map(lambda num: num**2, numbers))
assert(squares_list_comp == squares_map)

#### List comprehensions vs Filter function


In [10]:
numbers = [1, 2, 3, 4, 5]
even_list_comp = [num for num in numbers if num % 2 == 0]
even_filter = list(filter(lambda num: num % 2 == 0, numbers))
assert(even_list_comp == even_filter)

### Functional Programming in Pandas

The following example shows how to apply functional programming to perform some cleaning operations in a dataset.
The dataset is provided by Open Africa and contains Historical and Projected Rainfall and Runoff for 2 Lake Victoria Sub-Regions.

The following code performs the following transformations:
1. Reads an excel file from a `URL` using `pandas`. This operation returns a `pandas DataFrame`. The DataFrame reading operation `skips` the first `two rows` and selects only `columns` from `B` to `D`.
2. Displays the `DataFrame`
3. Splits the column `Month, period` by `,` and assign these values to two new columns `Month` and `Period`
4. Removes the suffix `mm` in some rows in the columns `Lake Victoria` and `Simiyu`.
5. Casts columns `Lake Victoria` and `Simiyu` to `float64`
6. Drops column `Month, period`

In [11]:
import pandas as pd
url = 'https://open.africa/dataset/c5cb0206-a1ed-44eb-a0e4-86bd0b94fbe9/resource/8e4db8de-dd9e-44e3-b32f-8680974e7158/download/messy-data.xlsx'
df = pd.read_excel(url, skiprows=2, usecols='B:D')
print(df.dtypes)
display(df)

Month, period    object
Lake Victoria    object
Simiyu           object
dtype: object


Unnamed: 0,"Month, period",Lake Victoria,Simiyu
0,"Jan,2001-2019",3.176mm,2.908474
1,"Feb,2001-2019",3.477mm,1.8mm
2,"Mar,2001-2019",4.687053,2.981053
3,"Apr,2001-2019",7.004526,4.753579
4,"May,2001-2019",9.362789,4.077474
5,"Jun,2001-2019",3.430211,1.046947
6,"Jul,2001-2019",1.764421,0.195211
7,"Aug,2001-2019",2.812632,0.333632
8,"Sep,2001-2019",3.978895,1.205842
9,"Oct,2001-2019",5.318421,2.454737


In [12]:
def split_string(index):
    return lambda x: str(x).split(",")[index]

remove_mm = lambda x: str(x).replace("mm", "")

operations_dict = {
    'Month': df['Month, period'].apply(split_string(0)),
    'Period': df['Month, period'].apply(split_string(1)),
    'Simiyu': df['Simiyu'].apply(remove_mm).astype('float64'),
    'Lake Victoria': df['Lake Victoria'].apply(remove_mm).astype('float64')
}

df_cleansed = (
    df
    .assign(**operations_dict)
    .drop('Month, period', axis=1)
)
print(df_cleansed.dtypes)
df_cleansed

Lake Victoria    float64
Simiyu           float64
Month             object
Period            object
dtype: object


Unnamed: 0,Lake Victoria,Simiyu,Month,Period
0,3.176,2.908474,Jan,2001-2019
1,3.477,1.8,Feb,2001-2019
2,4.687053,2.981053,Mar,2001-2019
3,7.004526,4.753579,Apr,2001-2019
4,9.362789,4.077474,May,2001-2019
5,3.430211,1.046947,Jun,2001-2019
6,1.764421,0.195211,Jul,2001-2019
7,2.812632,0.333632,Aug,2001-2019
8,3.978895,1.205842,Sep,2001-2019
9,5.318421,2.454737,Oct,2001-2019


## Object Oriented Programming

Object Oriented Programming(OOP) is a programming paradigm that focuses on creating objects that contain both data and functions.

Python is an object-oriented language, which means that everything in Python is an object, and all code in Python is written using classes and objects.

- **Class** is a blueprint or a template for creating objects.
- **Objects** are *instances* of classes, which have their own unique *properties* and *methods*.

Object Oriented programming is an approach for modeling concrete, real-world things, like cars. 
OOP models real-world entities as software objects that have some data associated with them and can perform certain functions. [Object-Oriented Programming (OOP) in Python 3 – Real Python](https://realpython.com/python3-object-oriented-programming/)


### Create a class

In [13]:
class Car:
  def __init__(self, make, model, year):
    self.make = make
    self.model = model
    self.year = year
  
  def get_make(self):
    return self.make
  
  def get_model(self):
    return self.model
  
  def get_year(self):
    return self.year

  def __str__(self) -> str:
    return f'''
        Make: {self.make},
        Model: {self.model},
        Year: {self.year}'''

In this example, we have created a `Car` class with three properties `make`, `model`, and `year`.
We have also defined three methods: `get_make()`, `get_model`, and `get_year()`, which allow us to retrieve the values of these properties.

### The `__init__()` method

The `__init__()` method is a special method in Python that is used to initialize the attributes of a class. It is called the **constructor** method because it is automatically called when an instance of the class is created.

The **self** parameter in the `__init__()` method refers to the instance of the class that is being created. It is a convention in Python to use self as the name of the first parameter in all instance methods, including the `__init__()` method.

To create an **instance** of this class, we simply call the class constructor and provide values for the properties:

In [14]:
my_car = Car("Toyota", "Corolla", 2022)

### Calling object methods

We can use the methods of the `my_car` object to retrieve the values of its attributes:

In [15]:
print(my_car.get_make()) # Output: Toyota
print(my_car.get_model()) # Output: Corolla
print(my_car.get_year()) # Output: 2022

Toyota
Corolla
2022


### Accessing attributes by dot notation

You can access the attributes of the class using **dot notation**:

In [16]:
my_car.model # Output: Corolla

'Corolla'

You can modify any instance attribute of the object because custom objects are **mutable** by default. An object is mutable if can be altered dynamically.

In [17]:
my_car.year = 2023
print(my_car.year) # Output: 2023

2023


### The `__string__()` method

The `__str__()` method is a special method in Python that is used to define how an object should be represented as a string. It is automatically called when the **str()** function is called on an object or when the object is printed using the `print()` function, and it can be used to format the string representation of an object in any way that is appropriate for the class.

In [18]:
print(my_car) # Output: Make: Toyota,
              # Model: Corolla,
              # Year: 2023


        Make: Toyota,
        Model: Corolla,
        Year: 2023


### Abstraction

Abstraction is a fundamental concept in object-oriented programming that allows developers to create complex systems by hiding the implementation details of the code and providing a simple interface for the user. In Python, abstraction can be achieved through the use of classes and methods that provide a level of indirection between the user and the underlying code. 

### Inheritance

Inheritance is a key feature of object-oriented programming in Python that allows a class to inherit the properties and behavior of another class.

The class that inherits the properties and behavior is called the **subclass**, child or derived class, and the class that provides the properties and behavior is called the **superclass**, parent, or base class.

To create a class that inherits the functionality from another class, send the parent class as a parameter when creating the child class.

The subclass can access the properties and behavior of the superclass using the `super()` function, and it can also define its own properties and behavior that are specific to the subclass. 


In [19]:
class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year

    def drive(self):
        print("Driving the car.")

class SportsCar(Car):
    def accelerate(self):
        print("Accelerating the sports car.")

When an object of the SportsCar class is created, it has access to the `__init__()` and `drive()` methods from the Car class, as well as the `accelerate()` method from the SportsCar class:

In [20]:
my_sports_car = SportsCar("Ferrari", "488 GTB", 2019)
my_sports_car.drive()        # Output: Driving the car.
my_sports_car.accelerate()   # Output: Accelerating the sports car.

Driving the car.
Accelerating the sports car.


The use of inheritance in Python provides several benefits to programmers, including:

* **Code reuse**: Inheritance allows subclasses to inherit properties and methods from a superclass, which promotes code reuse. This reduces the amount of code that needs to be written, tested, and maintained, which saves time and effort.

* **Modularity**: Inheritance helps to organize code into a hierarchy of related classes, which makes it easier to understand and modify. By breaking down complex systems into smaller, more manageable classes, inheritance promotes modularity and enhances the overall maintainability of the code.

### Encapsulation

Encapsulation is a key concept in object-oriented programming (OOP) in Python that refers to the bundling of data and methods within a class to **protect** the data from **unauthorized** access and modification. Encapsulation is achieved through the use of **access modifiers**, which **restrict the visibility** of certain data and methods to within the class or its subclasses.

In Python, there are three levels of access modifiers that can be used to achieve encapsulation:

* **Public**: Public data and methods can be accessed from anywhere in the program, both within and outside of the class. In Python, all data and methods are public by default, unless they are explicitly marked as private or protected.

* **Private**: Private data and methods are marked with a double underscore prefix, such as `__data` or `__method()`. Private data and methods can only be accessed from within the class in which they are defined, and not from outside the class or its subclasses.

* **Protected**: Protected data and methods are marked with a single underscore prefix, such as `_data` or `_method()`. Protected data and methods can be accessed from within the class in which they are defined, as well as from within its subclasses.

In [21]:
class Car:
    def __init__(self, make, model, year):
        self.__make = make  # private attribute
        self.__model = model  # private attribute
        self.__year = year  # private attribute
        self.__mileage = 0  # private attribute

    def get_make(self):
        return self.__make  # public method to access private attribute

    def get_model(self):
        return self.__model  # public method to access private attribute

    def get_year(self):
        return self.__year  # public method to access private attribute

    def get_mileage(self):
        return self.__mileage  # public method to access private attribute

    def drive(self, miles):
        self.__mileage += miles  # private method to modify private attribute

In this example, the Car class has several private attributes (`__make`, `__model`, `__year`, and `__mileage`) that can only be accessed and modified through the public methods (`get_make()`, `get_model()`, `get_year()`, `get_mileage()`, and `drive()`). This encapsulation ensures that the internal state of the Car object is protected from outside interference, and that any modifications to the object's state are made through the appropriate methods.

In [22]:
my_car = Car('Toyota', 'Camry', 2022)

To drive the car and update its mileage, you would call the `drive()` method:

In [25]:
my_car.drive(100)
print(my_car.get_mileage())  # Output: 100

300


### Polymorphism

Polymorphism is a concept in object-oriented programming that allows objects of different classes to be used **interchangeably**. It means that a single method can behave differently depending on the object it is operating on. 

The use of polymorphism promotes code reusability, simplicity, flexibility, extensibility, and testability.

In Python, polymorphism can be achieved through the use of method **overriding** and method **overloading**.

* **Method Overriding**: Method overriding allows a subclass to provide a different implementation of a method that is already defined in its superclass. This means that the same method can behave differently depending on the type of object that is calling it.

* **Method Overloading**: Method overloading allows a class to have multiple methods with the same name but different parameters. This means that the same method name can be used to perform different tasks depending on the arguments that are passed to it.

In [26]:
class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year

    def start(self):
        print("Starting the car...")

class ElectricCar(Car):
    def __init__(self, make, model, year, battery_size):
        super().__init__(make, model, year)
        self.battery_size = battery_size

    def start(self):
        print("Starting the electric car...")

class GasolineCar(Car):
    def __init__(self, make, model, year, fuel_type):
        super().__init__(make, model, year)
        self.fuel_type = fuel_type

    def start(self):
        print("Starting the gasoline car...")

def start_car(car):
    car.start()

my_electric_car = ElectricCar("Tesla", "Model S", 2022, "100 kWh")
my_gasoline_car = GasolineCar("Ford", "Mustang", 2022, "Regular unleaded")

start_car(my_electric_car)
start_car(my_gasoline_car)

Starting the electric car...
Starting the gasoline car...


In this example, we have a Car class that has a `start()` method. We also have two subclasses, *ElectricCar* and *GasolineCar*, that inherit from *Car* and also have a `start()` method. The ElectricCar and GasolineCar classes have their own implementations of the `start()` method, which behave differently depending on the type of car.

We then define a `start_car()` function that takes a Car object as its argument and calls its `start()` method. When we pass an ElectricCar object to the `start_car()` function, the `start()` method of the ElectricCar class is called, and when we pass a GasolineCar object, the `start()` method of the GasolineCar class is called.

### OOP for data processing

Suppose we want to create a functionality to process data from multiple data formats such as CSV and JSON, and after processing we want to save the data.

In this example, we define a DataProcessor class that has methods for processing and saving data. We then define two subclasses, `CSVProcessor` and `JSONProcessor`, which inherit from the DataProcessor class and override the `process_data` and `save_data` methods to handle CSV and JSON data, respectively.

In the main code, we create instances of the `CSVProcessor` and `JSONProcessor` classes and use them to process and save data. This allows us to easily reuse and extend our data processing code, making it more modular and maintainable.

By using OOP in this way, we can create flexible and reusable data processing tools that can handle a wide range of data formats and be easily extended to support new ones.

In [None]:
class DataProcessor:
    def __init__(self, data):
        self.data = data

    def process_data(self):
        # Process the data
        # ...
        pass

    def save_data(self, filename):
        # Save the processed data to a file
        # ...
        pass

class CSVProcessor(DataProcessor):
    def __init__(self, data, delimiter=','):
        super().__init__(data)
        self.delimiter = delimiter

    def process_data(self):
        # Process the CSV data
        # ...
        pass

    def save_data(self, filename):
        # Save the processed CSV data to a file
        # ...
        pass


class JSONProcessor(DataProcessor):
    def __init__(self, data):
        super().__init__(data)

    def process_data(self):
        # Process the JSON data
        # ...
        pass

    def save_data(self, filename):
        # Save the processed JSON data to a file
        # ...
        pass


# Main code
csv_data = "1,John,Doe\n2,Jane,Smith\n3,Bob,Johnson"
json_data = '{"id": 1, "name": "John Doe"}, {"id": 2, "name": "Jane Smith"}, {"id": 3, "name": "Bob Johnson"}'

csv_processor = CSVProcessor(csv_data)
csv_processor.process_data()
csv_processor.save_data('processed.csv')

json_processor = JSONProcessor(json_data)
json_processor.process_data()
json_processor.save_data('processed.json')

## Advanced concepts

#### Decorators

A decorator is a special type of function that can be used to modify the behavior of another function. Decorators are written using the **@** symbol followed by the name of the decorator function, and they are placed before the function they are decorating. 

Decorators are useful in many situations, such as:

* **Logging**: Decorators can be used to add logging functionality to a function by printing messages before and after it is called.

* **Caching**: Decorators can be used to add caching functionality to a function by storing the results of expensive computations and returning them from cache instead of recomputing them.

* **Authorization**: Decorators can be used to add authorization functionality to a function by checking if the user has the necessary permissions before allowing the function to be called.

* **Timing**: Decorators can be used to add timing functionality to a function by measuring the time it takes to execute and printing the result.

In [31]:
def add_nitro_boost(func):
    def wrapper(self):
        print("Adding nitro boost!")
        func(self)
    return wrapper

class Car:
    def __init__(self, make, model):
        self.make = make
        self.model = model
    
    @add_nitro_boost
    def accelerate(self):
        print(f"{self.make} {self.model} is accelerating!")

In this example, we define a decorator function called `add_nitro_boost()`. This decorator takes a function as its argument and returns a new function called `wrapper()`. The `wrapper()` function adds some additional behavior to the original function by printing a message before calling it.

We then define a Car class with an `accelerate()` method. We apply the add_nitro_boost decorator to the `accelerate()` method using the @ symbol. When we call the `accelerate()` method on a Car object, the decorator adds the additional behavior of printing a message before calling the method.

In [30]:
my_car = Car("Tesla", "Model S")
my_car.accelerate()


Tesla Model S is accelerating!


#### Functions

A function is a block of reusable code that performs a specific task. Functions can take inputs, process them, and produce outputs, and they can be called from other parts of a program as needed.

Functions are important to use in Python and other programming languages for several reasons:

* **Reusability**: Functions can be used in multiple parts of a program, which makes it easier to write code that is reusable and maintainable.

* **Modularity**: Functions allow us to break down a large program into smaller, more manageable pieces. This makes it easier to write, test, and debug code.

* **Abstraction**: Functions allow us to abstract away complex operations and hide the details from the user. This makes it easier to use the function without needing to understand the underlying implementation.

* **Code organization**: Functions help to organize code and make it easier to read and understand. By breaking a program into smaller functions, we can focus on one task at a time and avoid getting overwhelmed by the complexity of the entire program.


Let's say we have the following code, which calculates the area of a rectangle and prints it out:

In [32]:
length = 10
width = 5
area = length * width
print(f"The area of the rectangle is {area}")

The area of the rectangle is 50


This code is simple enough, but what if we wanted to calculate the area of a different rectangle with different dimensions? We could copy and paste this code and make some changes, but that would be repetitive and error-prone.

Instead, we can define a function to calculate the area of a rectangle:

In [33]:
calculate_area = lambda length, width: length * width

We can then call this function with different values for length and width:

In [34]:
rectangles = [[10, 5], [8, 3]] # [Rectangle1[length, width], Rectangle2[length, width]]
for rectangle in rectangles:
  print(f"The area of the rectangle is {calculate_area(*rectangle)}")

The area of the rectangle is 50
The area of the rectangle is 24


This code does the same thing as the previous code, but it's more organized and easier to read. By defining a separate function for calculating the area, we can reuse this code in multiple places without having to copy and paste it.

Additionally, splitting code into functions can make it easier to test and debug. If we have a problem with the area calculation, we can focus on the `calculate_area()` function without having to worry about the rest of the program.

#### Modules and libraries

In Python, a **module** is a file containing Python definitions and statements. It typically contains a set of related functions, classes, and variables that can be imported and used in other Python scripts. 

Modules are used to break up large programs into smaller, more manageable pieces. This can make the code easier to read, debug, and maintain. It also allows code to be reused in different contexts.

A **library**, on the other hand, is a collection of related modules that are designed to be used together.

Python has a large *standard library* that includes many useful modules for tasks like working with files, connecting to networks, and parsing data.

In addition to the standard library, there are many *third-party libraries* available for Python. These libraries are typically open-source and can be installed using a package manager like **pip**. Some popular Python libraries include NumPy for scientific computing, Pandas for data analysis, and Django for web development.

Here's an example of how to import and use the built-in math module:

In [35]:
import math

# Calculate the square root of 9
x = math.sqrt(9)

# Calculate the value of pi
y = math.pi

# Print the results
print(x)  # 3.0
print(y)  # 3.141592653589793

3.0
3.141592653589793


You can also import specific functions or variables from a module using the `from` keyword.

In [36]:
from math import sqrt

# Calculate the square root of 16
x = sqrt(16)

# Print the result
print(x)  # 4.0

4.0


A thirhd-party library can be installed using pip with the command:
`pip install <name_of_library>`

In [37]:
!pip install numpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


After the library is installed can be imported and used in the script

In [38]:
import numpy as np

# Define two arrays
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

# Calculate the dot product
dot_product = np.dot(x, y)

# Print the result
print(dot_product)  # 32

32


In [42]:
help(np.dot)

Help on function dot in module numpy:

dot(...)
    dot(a, b, out=None)
    
    Dot product of two arrays. Specifically,
    
    - If both `a` and `b` are 1-D arrays, it is inner product of vectors
      (without complex conjugation).
    
    - If both `a` and `b` are 2-D arrays, it is matrix multiplication,
      but using :func:`matmul` or ``a @ b`` is preferred.
    
    - If either `a` or `b` is 0-D (scalar), it is equivalent to :func:`multiply`
      and using ``numpy.multiply(a, b)`` or ``a * b`` is preferred.
    
    - If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
      the last axis of `a` and `b`.
    
    - If `a` is an N-D array and `b` is an M-D array (where ``M>=2``), it is a
      sum product over the last axis of `a` and the second-to-last axis of `b`::
    
        dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
    
    Parameters
    ----------
    a : array_like
        First argument.
    b : array_like
        Second argument.
    out : 