# Generators

Generators are function that return an object that can be iterated over. The special thing is that they generate the items inside the object lazily, which means they generate the items only one at a time and only when you ask for it. And because of this, they are much more memory efficient than other sequence objects when you have to deal with large data sets. They are a powerful advanced Python technique

In [10]:
def mygenerator():
    yield 1
    yield 2
    yield 3

for i in mygenerator():
    print(i)

1
2
3


Let's have a closer look at the execution. This can be visualized by another generator. The generator is memory efficient because, instead of predefining the values of an iterable collection, it gives the possibility to calculate a value of an index when it is needed.

In [25]:
import sys

# Lists object generator
def firstn(n):
    nums = []
    num = 0
    while num < n:
        nums.append(num)
        num+=1
    return nums

# Generator object of collection
def firstn_generator(n):
    num = 0
    while num < n:
        yield num
        num += 1

print(sys.getsizeof(firstn(1000000)))
print(sys.getsizeof(firstn_generator(1000000)))

8448728
200


We can see that a generator is orders of magnitude more efficient. Furthermore, we don't have to wait for the operating system to read all the data into the memory before we can use the collection, so it is faster as well. Consequently, generators are very handy, when the collection data can be determined from known patterns. 

Generator expressions are handier still. They are written like list comprehensions, but with parentheses instead of square brackets. It is a very simple syntax and shortcut to implement the generator expression. 

In [113]:
# Using expression as generator definition
mygenerator = (i for i in range(100000) if i % 2 == 0)
print("Size of generator is: ", sys.getsizeof(mygenerator))

# Using expression as list definition
mylist = [i for i in range(100000) if i % 2 == 0]
print("Size of list is: ", sys.getsizeof(mylist))

sum = 0
for i in mygenerator:
    sum += 1
print(sum)

Size of generator is:  208
Size of list is:  444376
50000


# Threading vs Multiprocessing

## Thread

A thread is an entity within a process that can be scheduled for execution. All threads within a process share the same memory space, which allows them to communicate quickly but also requires careful management to avoid conflicts.

| Pros                                         | Explanation                                                                                     | Cons                                    | Explanation                                                                                     |
|----------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------|--------------------------------------------------------------------------------------------------|
| All threads within a process share the same memory | This allows threads to communicate quickly and share data easily.                                | Threading is limited by GIL: Only one thread at a time | In Python, the Global Interpreter Lock (GIL) means that only one thread can execute Python bytecode at a time, limiting concurrency. |
| Lightweight                                  | Threads use fewer resources and less overhead than processes.                                    | No effect for CPU-bound tasks           | For CPU-bound tasks, threads do not provide performance improvements due to the GIL.             |
| Starting a thread is faster than starting a process | Creating a thread incurs less overhead compared to creating a process.                           | Not interruptable/killable              | Threads cannot be forcibly terminated from the outside; they must complete their task.           |
| Great for I/O-bound tasks                    | Threads can handle multiple I/O operations concurrently, improving performance.                  | Careful with race conditions            | Shared memory access can lead to race conditions if not managed properly.                        |



**When to Use Threads:**

- *I/O-bound tasks:* Threads are excellent for tasks that spend a lot of time waiting for I/O operations, such as reading from a disk, network communication, or user inputs.
- *Lightweight operations:* When you need to perform many small tasks simultaneously, threads are more efficient because they require less overhead to create and manage.
- *Quick startup time:* If your application needs to start many parallel tasks quickly, threads are advantageous due to their fast startup time compared to processes.

**Avoid Threads:**

- *CPU-bound tasks:* Due to the Global Interpreter Lock (GIL) in Python, threads are not suitable for CPU-bound tasks because only one thread can execute Python code at a time.
- *High control needs:* If you need to be able to forcibly stop a task, threads are not ideal since they cannot be interrupted or killed externally.
- *Complex synchronization:* If your tasks require complex sharing and synchronization of data, the potential for race conditions and the complexity of managing thread safety might outweigh the benefits.

In [115]:
# Import the basic threading library
from threading import Thread
import os
import time

# A good number of processes is the number of threads
threads = []
num_threads = 10

# We have to define a function to use a process
def square_numbers():
    for i in range(100):
        i * i

# Assigning process to the function
for i in range(num_threads):
    t = Thread(target=square_numbers)
    threads.append(t)

# Start each thread
for t in threads:
    t.start()

# Block the main thread until the processes are finished
for t in threads:
    t.join()

print('end main')

end main


Since threads live in the same memory space, sharing data is easy. We can do it in Python using a global variable. This variable simulates a database. We have to be careful through, as database integrity can be easily affected by interferennce. Therefore we have to use syncronization methods. For this special locks are used before and after processing the data. 

In addition a queue can be used for data interference. A queue is a linear data structure similar to a FIFO. This avoids discriminating threads from getting the lock. Note that any input or output stream has to be used as a database as well.

In [146]:
# Import special type for lock
from threading import Lock

database_value = 0

def increase(lock):
    global database_value

    # Another thread cannot have it the same time
    with lock:
        local_copy = database_value
        local_copy += 1
        time.sleep(0.03)
        database_value = local_copy

# Creating threads
lock = Lock()
thread1 = Thread(target=increase, args=(lock,))
thread2 = Thread(target=increase, args=(lock,))

# Starting threads
thread1.start()
thread2.start()

# Stopping threads
thread1.join()
thread2.join()

print("End value of database: ", database_value)

End value of database:  2


While the main thread is the initial thread of execution in any Python program, as when a Python script starts, a single main thread is created by default, the daemon thread is a type of thread that runs in the background and is typically used for tasks that should not block the program from exiting.

| Aspect           | Main Thread                                              | Daemon Thread                                            |
|------------------|----------------------------------------------------------|----------------------------------------------------------|
| **Role**         | Handles the primary logic and essential tasks.           | Handles background tasks like logging and monitoring.    |
| **Lifecycle**    | Program runs until the main thread finishes.             | Program exits when only daemon threads are left running. |
| **Termination**  | Prevents program from exiting until it completes.        | Does not prevent program from exiting.                   |
| **Creation**     | Automatically created when the program starts.           | Created by setting `daemon` attribute to `True` before starting the thread. |
| **Usage**        | Used for core application logic and critical operations. | Used for periodic or auxiliary tasks that can terminate abruptly. |


## Process

A process is an instance of a program running in its own memory space. Each process is independent and isolated from others, which provides stability and security but requires more resources.

| Pros                                               | Explanation                                                                                     | Cons                                               | Explanation                                                                                     |
|----------------------------------------------------|--------------------------------------------------------------------------------------------------|----------------------------------------------------|--------------------------------------------------------------------------------------------------|
| Takes advantage of multiple CPUs and cores         | Processes can run on different CPUs or cores, providing true parallelism.                       | Heavyweight                                        | Processes use more system resources and are heavier than threads.                               |
| Separate memory space -> Memory is not shared between processes | Each process has its own memory space, which enhances stability and security.                   | Starting a process is slower than starting a thread | Creating a new process takes more time and resources compared to a thread.                      |
| Great for CPU-bound processing                     | Processes can fully utilize multiple cores for CPU-intensive tasks.                             | More memory                                        | Each process requires its own memory allocation, leading to higher memory usage.                |
| New process is started independently from other processes | Processes are isolated, so the failure of one does not affect others.                            | IPC (inter-process communication) is more complicated | Communicating between processes is more complex and requires additional mechanisms.              |
| Processes are interruptable/killable               | Processes can be terminated externally, allowing for better control.                            |                                                    |                                                                                                  |
| One GIL for each process -> avoids GIL limitation  | Each process has its own GIL, so they can run Python code concurrently.                         |                                                    |                                                                                                  |


**When to Use Processes:**

- *CPU-bound tasks:* Processes can fully utilize multiple CPU cores because each process runs independently and has its own Python interpreter and GIL.
- *Memory isolation:* If you need tasks to run in isolated memory spaces to avoid conflicts and improve stability, processes are the better choice.
- *Parallel execution:* For true parallel execution on multiple cores, processes are necessary.

**Avoid Processes:**

- *I/O-bound tasks:* The overhead of creating and managing processes can be too high for tasks that spend a lot of time waiting for I/O operations.
- *Resource efficiency:* If you need to run many tasks simultaneously with minimal overhead, the resource demands of processes may be too high.
- *Complex IPC:* Communication between processes is more complicated than between threads, which can add complexity to your application if frequent inter-process communication is needed.

In [117]:
# basic multiprocessing library
from multiprocessing import Process

# A good number of processes is the number of cores
processes = []
num_processes = os.cpu_count()

# Assigning process to the function
for i in range(num_processes):
    p = Process(target=square_numbers)
    processes.append(p)
    p.start()

# Block the main thread until the processes are finished
for p in processes:
    p.join()

print('end main')

end main


Sharing data is a little more complicated, but Python Value or Array module simplifies it a lot. 
- The Value class allows you to create a shared object that can be accessed by multiple processes. This is useful for sharing simple data types such as integers or floats.
- The Array class allows you to create a shared array of fixed size. This is useful for sharing a list-like structure between processes.

In [184]:
# Value and Array is imported from multiprocessing
from multiprocessing import Value, Array, Lock, Queue

def square(numbers, queue):
    for i in numbers: 
        queue.put(i*i)

def make_negative(numbers, queue):
    for i in numbers:
        queue.put(-1*i)

numbers = range(1, 6)
shared_queue = Queue()

p1 = Process(target=square, args=(numbers,shared_queue))
p2 = Process(target=make_negative, args=(numbers,shared_queue))

p1.start()
p2.start()

p1.join()
p2.join()

while not shared_queue.empty():
    print(shared_queue.get())

Process pools, provided by the multiprocessing module in Python, are a way to manage a pool of worker processes that can execute tasks concurrently. This abstraction simplifies the process of parallel execution by managing the creation, distribution, and synchronization of tasks among multiple processes. The main class used for this purpose is Pool:
- *Concurrent Execution*: Allows multiple tasks to be executed in parallel using multiple processes.
- *Task Distribution*: Automatically distributes tasks among the available worker processes.
- *Result Collection:* Collects results from the worker processes and returns them to the main process.
- Task Management: Handles process creation, task assignment, and result retrieval, reducing boilerplate code.

In [None]:
from multiprocessing import Pool

def cube(number):
    return number * number * number

numbers = range(10)
pool = Pool()

# Automatically allocate the maximum number of available processes
# Then it will split iteratble to equal sized chunks and executes it in parallel
result = pool.map(cube, numbers)

# Executes a process with one argument
pool.apply(cube, number[0])

pool.close()
pool.join()
print(result)

# Function Arguments

## The difference between arguments and parameters

In programming, especially in the context of functions, the terms "arguments" and "parameters" are often used interchangeably, but they have distinct meanings. Here’s a breakdown of the differences:
- **Parameters** are the variables listed in a function's definition. They act as placeholders for the values that will be passed to the function.
- **Arguments** are the actual values or data you pass into the function when you call it.
They are the concrete values that get substituted for the function's parameters.

In Python, function arguments can be passed in two main ways: positional arguments and keyword arguments. **Positional arguments** are arguments that are passed to a function in a specific order, the position of the argument in the function call determines which parameter it corresponds to in the function definition.  **Keyword arguments** are arguments that are passed to a function by explicitly naming each parameter and assigning a value. The order of arguments doesn’t matter as each argument is specified by its parameter name.

In [None]:
def greet(name, message):
    print(f"{message}, {name}!")

# Positional arguments
greet("Alice", "Hello")

# Keyword arguments
greet(name="Alice", message="Hello")
greet(message="Goodbye", name="Bob")

## Default arguments

Default arguments in Python are a way to specify default values for function parameters. If a function call does not provide a value for a parameter with a default value, the default value is used. This allows for more flexible and concise function definitions and calls.

In [None]:
def greet(name, message="Hello"):
    print(f"{message}, {name}!")

# Function call with both arguments
greet("Alice", "Good morning")  # Output: Good morning, Alice!

# Function call with only the required argument
greet("Bob")  # Output: Hello, Bob!


Using mutable objects as default arguments can lead to unexpected behavior because the default value is shared across all calls to the function.

In [None]:
def append_to_list(value, my_list=[]):
    my_list.append(value)
    return my_list

# The list is shared between calls
print(append_to_list(1))  # Output: [1]
print(append_to_list(2))  # Output: [1, 2]
print(append_to_list(3))  # Output: [1, 2, 3]

## Args and Kwargs

The concepts of *args and **kwargs in Python are used for passing a variable number of arguments to a function. They provide flexibility in function calls, allowing you to handle an arbitrary number of positional and keyword arguments.
- *args: Used to pass a variable number of positional arguments to a function. These arguments are accessible as a tuple.
- **kwargs: Used to pass a variable number of keyword arguments to a function. These arguments are accessible as a dictionary.
- Combination: Both can be used together to create highly flexible functions that can handle any number of positional and keyword arguments.
- Order: When used together, *args should come before **kwargs.

In [None]:
def example(pos1, pos2, *args, kw1=None, kw2=None, **kwargs):
    print(f"pos1: {pos1}, pos2: {pos2}")
    print(f"kw1: {kw1}, kw2: {kw2}")
    print(f"args: {args}")
    print(f"kwargs: {kwargs}")

example(1, 2, 3, 4, kw1='a', extra='b')

When defining a function with *args and **kwargs, the order of parameters should be:

- Positional arguments (required)
- *args: It collects additional positional arguments passed to the function into a tuple.
- Keyword arguments (optional)
- **kwargs: These arguments are accessible as a dictionary within the function.

## Unpacking arguments

We already discussed unpacking using tuples, when we return values, but there is the possibility to pass a list of variables as a function parameter using the asterisk operator. Keep in mind. that the length of the container must match the number of parameters, as well as the type of variables. Unpacking will always be done in a list. 

In [None]:
def foo(a, b, c):
    print(a, b, c)

my_list=[0, 1, 2]
foo(*my_list)

## Local vs Global variables

*Local variables* = Local variables are variables that are declared within a function and are only accessible within that function. Their scope is limited to the function in which they are defined.

*Global variables* = Global variables are variables that are declared outside of all functions and are accessible throughout the entire script. Their scope extends to the entire script, and they can be accessed from within any function.

| Aspect           | Local Variables                                | Global Variables                               |
|------------------|------------------------------------------------|-----------------------------------------------|
| **Definition**   | Declared within a function or namespace        | Declared outside all functions.               |
| **Scope**        | Accessible only within the function.           | Accessible throughout the entire script.      |
| **Lifetime**     | Exists only during the function execution.     | Exists for the duration of the program's execution. |


In [None]:
def bar():
    number = 3
    print('number inside function: ', x)

number = 0
bar()
print(number)

## Passing parameters

In call by value, a copy of the actual parameter's value is passed to the function. Changes made to the parameter inside the function do not affect the original variable. In call by reference, a reference to the actual parameter is passed to the function. Changes made to the parameter inside the function affect the original variable.

There are two rules that have to be considered. Parameters parsed in is a reference, but the reference is passes by value. Also, there is a difference between mutable and inmutable data types: mutable object types can be changed withing a function, but if you rebind the reference within a method, the outside reference will not be affected. Furthermore, although inmutable object cannot be modified within a method, a reference of the inmutable object can be reassigned within a mutable object.

In [None]:
def ool(a_list):
    a_list = [200, 300, 400]
    a_list.append(4)
    a_list[0] = -100

a_list = [1, 2, 3, 4]
ool(a_list)
print(a_list)

# Shallow vs Deep copying

The built-in copy module in Python provides functions to create shallow and deep copies of objects. This is particularly useful when you want to duplicate complex objects like lists, dictionaries, or user-defined classes, but the way the copies are made can have significant implications for how the original and copied objects interact.

A **shallow copy** creates a new object, but it inserts references into it to the objects found in the original. Therefore, if the original object contains references to mutable objects (like lists or dictionaries), both the original and the shallow copy will refer to the same mutable objects.

In [None]:
import copy
org = [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

# On the second level both will be affected
cpy = copy.copy(org)
cpy[0][1] = -100

print(org)
print(cpy)

A **deep copy** creates a new object and recursively copies all objects found in the original, meaning that it creates a completely independent copy of the entire structure. This means that modifications to the original object do not affect the deep copy, and vice versa.

In [None]:
# There references are duplicated as well
cpy = copy.deepcopy(org)
cpy[0][1] = 0

print(org)
print(cpy)

| Aspect                    | Shallow Copy                                                      | Deep Copy                                                      |
|---------------------------|-------------------------------------------------------------------|----------------------------------------------------------------|
| **Definition**            | Creates a new object but copies references to nested objects.     | Creates a new object and recursively copies all nested objects. |
| **Usage of Memory**       | Less memory is used since nested objects are shared.              | More memory is used since all objects are copied.               |
| **Impact of Modifications**| Changes in mutable objects are reflected in both original and copied objects. | Changes in the original do not affect the deep copy and vice versa. |
| **Function**              | `copy.copy()`                                                     | `copy.deepcopy()`                                              |


# Contex Manager

Context managers in Python are constructs that allow for the allocation and release of resources precisely when you need them. They are a great tool for resource management, as they allow to allocate resources efficiently, and offer a cleaner code.

In [None]:
# Code for the whole process
file = open('notes.txt', 'w')
try:
    file.write('some todoo...')
finally:
    file.close()

# Code with context managers
with open('notes.txt', 'w') as file:
    file.write('This is a file context manager')

**Benefits of Using Context Managers:**
- *Resource Management:* Ensures that resources like files, network connections, and locks are properly managed, avoiding resource leaks.
- *Error Handling:* Automatically handles exceptions and ensures cleanup code is executed.
- *Readability:* Provides a clean and readable way to manage resources, reducing boilerplate code.

## Wrinting custom context manager

You can write custom context managers in Python by implementing two special methods: __enter__ and __exit__.
- __enter__ Method: Executed when the execution flow enters the context of the with statement. It can return an object to be used within the with block.
- __exit__ Method: Executed when the execution flow leaves the context of the with statement. It can handle exceptions, taking three arguments (exc_type, exc_val, exc_tb) that describe the exception, if any. Returning True suppresses the exception, while returning False propagates it.

In [None]:
class CustomContextManager:
    def __enter__(self):
        # Code to acquire resource
        print("Entering the context")
        return self  # This is optional, depends on what you need

    def __exit__(self, exc_type, exc_val, exc_tb):
        # Code to release resource
        print("Exiting the context")
        # Handle exceptions if necessary
        # Return True to suppress the exception, or False to propagate it
        return False

# Usage
with CustomContextManager() as manager:
    print("Inside the context")

Alternatively, you can use the contextlib module, which provides a decorator for simpler context manager creation.

- contextlib.contextmanager Decorator: A simpler way to create context managers using a generator function, where the setup code goes before the yield and the cleanup code goes after.

In [None]:
from contextlib import contextmanager

@contextmanager
def custom_context():
    try:
        # Code to acquire resource
        print("Entering the context")
        yield
    finally:
        # Code to release resource
        print("Exiting the context")

# Usage
with custom_context():
    print("Inside the context")