# Python Test: Module 2

## Python exercises

### Exercise 1

Write a Python function called `list_chunking` that takes in an `elements_list` and a `chunk_size` as arguments and returns a new list containing sublists of the specified chunk size. If the original list cannot be divided evenly into sublists of equal size, the remaining elements should be placed in the last sublist.

Example:
```python
list_chunking([1, 2, 3, 4, 5, 6, 7, 8], 3)
>>> [[1, 2, 3], [4, 5, 6], [7, 8]]

list_chunking(['a', 'b', 'c', 'd', 'e'], 2)
>>> [['a', 'b'], ['c', 'd'], ['e']]

```

hint: Remember that you can select sections of a list using the following syntax: `my_list[start:end]`

hint: Remember that you can create sequences using `range(start, end, step)`

hint: Try to manually select the different chunks of the list and see if you can find a pattern

### Exercise 2: Python Dictionaries 


Write a Python function called `reverse_dictionary` that takes in a dictionary as an argument and returns a new dictionary where the keys and values are swapped. 

Expect the values and keys in the original dictionary to be unique if that's not the case the function should return None and print a warning message.

Example:

```python
reverse_dictionary({'a': 1, 'b': 2, 'c': 3})
>>> {1: 'a', 2: 'b', 3: 'c'}

reverse_dictionary({'x': 'apple', 'y': 'banana', 'z': 'banana'})
>>> Error: multiple keys for one value
```

&nbsp;

hint: Remember that the dictionary items can be accessed with the `items()` method.

hint : There is no need to continue the loop if you find a duplicate value.

hint: We can check is a value is in a dictionary with the `in` keyword.

### Exercise 3: Python Dictionaries 

This exercise is an extension of the previous one. Write a Python function called `reverse_dictionary` that takes in a dictionary as an argument and returns a new dictionary where the keys and values are swapped.

Now the values in the original dictionary should be unique, but the swapped keys may have duplicates. In such cases, the values in the resulting dictionary should be stored as a list. Be aware that a single value may not be a list.

Example:

```python
reverse_dictionary({'a': 1, 'b': 2, 'c': 3})
>>> {1: 'a', 2: 'b', 3: 'c'}

reverse_dictionary({'x': 'apple', 'y': 'banana', 'z': 'banana'})
>>> {'apple': 'x', 'banana': ['y', 'z']}
```

&nbsp;

hint: Consider that the first time we encounter a value we should create a list with the previous key and the current key. The next time we encounter the same value we should append the current key to the list.

hint: We can use isinstance() to check if a variable is a list or not

hint: Remember that a dictionary can't have a list as a key and therefore we will never have a list as a single value in the reverse dictionary.

## Object Oriented Programming: 



**Using an Enum class**


Before creating the credentials class we will learn how to use an Enum class. Enum classes are used to create a set of constants. Enums are useful when you want to restrict the values a variable can take (e.g. days of the week, months of the year, etc.).

Let's create an Enum class for the days of the week. We will call it `WeekDays` and it will have the following values: `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, `SATURDAY`, `SUNDAY`.

> **Note**: Enum classes are usually defined in a separate file. For the sake of simplicity we will define it in the same file as the rest of the code.

In [9]:
from enum import Enum

class WeekDays(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

In your `WeekDays` Enum, you've defined seven members, one for each day of the week, with values 1 through 7.

Here's how you can use this Enum:


In [10]:
# usage
print(WeekDays.MONDAY) # Output: WeekDays.MONDAY
print(WeekDays.MONDAY.name) # Output: MONDAY
print(WeekDays.MONDAY.value) # Output: 1

WeekDays.MONDAY
MONDAY
1


In this example, `WeekDays.MONDAY`, `WeekDays.TUESDAY`, and so on, are enumeration members. You don't need to create these instances yourself. They're created automatically when the enum is defined.


You can use enumeration members in comparisons, print them, and so on. For example:

In [12]:
today = WeekDays.MONDAY

if today == WeekDays.MONDAY:
    print("Today is Monday.")

Today is Monday.


You can also iterate over the enum:

In [11]:
for day in WeekDays:
    print(day)

WeekDays.MONDAY
WeekDays.TUESDAY
WeekDays.WEDNESDAY
WeekDays.THURSDAY
WeekDays.FRIDAY
WeekDays.SATURDAY
WeekDays.SUNDAY


Enums are used when a variable can take one out of a small set of possible values. Enums add readability to your code by assigning names to these values, which makes your code easier to read and maintain.

### Exercise 4:

Implement a simple program to simulate a traffic light. The traffic light can be in one of three states: Red, Yellow, or Green. Use an Enum to represent these states.

Steps:

1. Define an Enum called `TrafficLight` with three members: RED, YELLOW, and GREEN.

2. Write a function called `change_light` that takes a `TrafficLight` member as an argument. This function should print a message indicating what drivers should do when they see this color light. For example, if the function is called with:
   1.  If it's called with `TrafficLight.RED`, it should print "STOP". 
   2.  If it's called with `TrafficLight.YELLOW`, it should print "CAUTION". 
   3.  If it's called with `TrafficLight.GREEN`, it should print "GO".

3. In your main program, create a loop that cycles through the different traffic light states, calling your `change_light` function for each state.

This exercise should give you some practice working with enums, including defining an enum, using enum members, and passing enum members as function arguments. It should also reinforce the idea that enums are a good choice when you need a variable that can only take one of a small set of specific values.

Try to complete the `change_light` function and the main program loop.

In [13]:
from enum import Enum

class TrafficLight(Enum):
    RED = 1
    # your implementation here

def change_light(state):
    # your implementation here
    pass

# main program
for light in TrafficLight:
    change_light(light)

### **Exercise 5:**

Consider a data preprocessing pipeline that involves multiple steps. You have to define an Enum class that represents these steps, and create a function that takes an Enum member and returns a string describing what the step does. The steps in the pipeline are:

1. Loading: Load the dataset from a CSV file.
2. Cleaning: Handle missing data and outliers.
3. Transformation: Apply transformations like normalization or standardization.
4. Feature_Selection: Choose which features to use in the model.
5. Model_Training: Train the machine learning model.
6. Evaluation: Evaluate the performance of the model.

Use these steps as Enum members, and create a function named `describe_step` that takes one of these steps and returns a string description of it. For instance, if the function is called with the `Loading` step, it should return "Load the dataset from a CSV file."

Use this as a starting point for your solution:

```python
class PreprocessingSteps(Enum):
    Loading = 1

def describe_step(step):
    if step == PreprocessingSteps.Loading:
        return "Load the dataset from a CSV file."
    ...
    else:
        return "Invalid step"
```
To demonstrate your solution, loop through all the members of the Enum and call `describe_step` on each one.

```python
# main program
for step in PreprocessingSteps:
    print(f"{step.name}: {describe_step(step)}")
```


### Exercise 6: CSVLoader

In this exercise, we will create a simple and functional CSVLoader class that loads data from a CSV file and returns it as a list of dictionaries.

1. Create a `CSVLoader` class that doesn't take any input when initialized.
2. Add a method `load_data()` that takes a filepath as input and then reads the CSV file from the filepath and returns the data as a **list of dictionaries**.
3. Although the `CSVLoader` class doesn't take any input when initialized, create an internal variable to store the different filepaths that the `load_data()` method has been called with. This variable should be a **list of strings**.
4. Create an attribute called `extensions` that stores the string `"csv"` in a list.
5. Also provide a method `get_loaded_filepaths()` that returns the list of filepaths that the `load_data()` method has been called with and prints the number of files that have been loaded.
6. Create a method `get_supported_extensions()` that returns the extensions attribute.
7. Use the Python `csv` module to parse the CSV file.
8. If the CSV file is empty, the `load_data()` method should return an empty list.
9. If the CSV file does not exist, the `load_data()` method should print an error message and return an empty list.

**Optional 1:**
1. Add a `use_headers` optional argument to the `load_data()` method that takes a boolean value. If `use_headers` is `True`, the first row of the CSV file should be treated as column headers. If `use_headers` is `False`, the first row of the CSV file should be treated as data values.
   - If `use_headers` is `True`, the first row will be used as keys for the dictionaries in the list. Each subsequent row of the CSV file should contain data values, which will be used as values for the dictionaries in the list.
   - If `use_headers` is `False`, the keys for the dictionaries will be the integers 0, 1, 2, and so on.

**Optional 2**: 
1. Implement a similar loader for JSON files. You can use the `json` module to parse the JSON file. The `json.load()` function takes a file object as input and returns a Python dictionary. You can use the `open()` function to open the JSON file and pass the file object to `json.load()`. Name this class `JSONLoader`.

Example usage:

```python
loader = CSVLoader()
data = loader.load_data('data.csv', use_headers=True)
print(data)
```

Example output:

```
[   
    {'name': 'Alice', 'age': '30','weight': '65', 'height': '170'},    
    {'name': 'Bob', 'age': '25', 'weight': '70', 'height': '175'},    
    {'name': 'Charlie', 'age': '35', 'weight': '80', 'height': '180'},    
    {'name': 'David', 'age': '40', 'weight': '75', 'height': '178'},
]

```
hint: First try to open the file with `open` and then use the `csv` module to parse the file. 

hint: You can try to open the csv file on `files/data.csv` to test your code. Remember that the path may vary depending on your operating system.

hint: In order to check if a file exists in Python, you can use the `os.path.exists()` function from the `os` module. This function takes a filepath as input and returns `True` if the file exists, and `False` otherwise.


In [None]:
class CSVLoader:
    def __init__(self):
        # your implementation here
        pass
    
    def get_supported_extensions(self):
        # your implementation here
        pass

    def get_loaded_filepaths(self):
        # your implementation here
        pass

    def load_data(self,filepath, use_headers=True):
        # your implementation here
        pass

### Exercise 7: Custom Exceptions

Before diving into Exercise 7, let's briefly discuss exceptions and their importance in error handling.

Exceptions are a way to handle errors and exceptional situations that may occur during the execution of a program. When an error or exception occurs, it disrupts the normal flow of the program and can be caught and handled using try-except blocks.

Exceptions consist of a type and a message. The type represents the specific type of exception, and the message provides additional information about the error. Python provides many built-in exception types, such as `ValueError`, `TypeError`, and `FileNotFoundError`. However, it's also possible to create custom exception types to handle specific errors in your code.

Custom exceptions are useful when you want to raise an exception that is specific to your code and provides meaningful information about the error. By creating custom exceptions, you can make your code more maintainable and readable, as well as improve error handling and debugging.

In the example bellow, we define a custom exception called `InvalidInputError` that is raised when the second argument of the `divide_numbers()` function is zero. We catch the exception using a try-except block and handle it by printing a custom error message.



In [2]:
# This is a custom exception that inherits from the built-in Exception class
# There is no need to define any methods in this class
class InvalidInputError(Exception):
    pass

# This function raises the custom exception when the second argument is zero
def divide_numbers(a, b):
    if b == 0:
        raise InvalidInputError("Cannot divide by zero")
    return a / b

Invalid input: Cannot divide by zero


In [3]:
divide_numbers(10, 0)

InvalidInputError: Cannot divide by zero

In [None]:
# Finally, we can catch the custom exception using a try-except block
try:
    result = divide_numbers(10, 0)
    print(result)
except InvalidInputError as e:
    print("Invalid input:", str(e))


In this exercise, we will use built-in and custom exception classes for error handling in the upcoming exercises.

1. In the previous exercise we had to print the error if the file did not exist, now instead of printing a message `raise` the python's built-in exeption of type `FileNotFoundError` and, of course, don't return any list.

2. Also prepare a `UnsupportedFileTypeError` class that inherits from the built-in `Exception` exception. This custom exception will be used in the upcoming exercises when creating a generic data loader when the file extension is not supported.

3. Prepare a `ExtensionAlreadyRegisteredError` class that inherits from the built-in `Exception` exception. This time the exception will be used when registering a new file extension that is already registered.

By creating custom exception classes, we can handle specific errors related to file handling and data loading in a more structured and informative way. 



In [None]:
# Your code here

### Exercise 8: File Loader Map

A map, also known as a dictionary or associative array, is a data structure that stores a collection of `key-value` pairs. It allows efficient lookup, insertion, and deletion of elements based on their keys. In Python, the built-in `dict` class represents a map.

Maps are useful when we have a set of unique keys and want to associate each key with a corresponding value. They provide a way to organize and access data in a structured manner, allowing us to quickly retrieve values based on their keys.

In the context of the exercise, we will use a map, specifically a dictionary, to create a mapping between `file extensions` and `file loaders`. This mapping allows us to dynamically select the appropriate file loader based on the file extension.

By using a map, we can decouple the file loading logic from the code that selects the appropriate loader. This provides flexibility and extensibility, as we can easily add or modify file loaders without modifying the code that uses them.

In the `FileLoaderMap` class, the `loader_map` dictionary serves as the map, where the keys are the file extensions and the values are the corresponding file loader classes. The `register_loader` method allows us to add entries to the map, associating file extensions with loader classes. The `get_loader` method retrieves the loader class based on a given file extension.

By utilizing a map, we can create a flexible and configurable system for handling different file types and loaders.

In this exercise we will create a `FileLoaderMap` class that will serve as the map for file loaders. The map will store the mapping between file extensions and file loaders. The map will be used in the next exercise to dynamically select the appropriate file loader based on the file extension.

1. Create a `FileLoaderMap` class that will serve as the map for file loaders.

2. Inside the `FileLoaderMap` class, define a dictionary called `loader_map` to store the mapping between file extensions and file loaders.
   ```python
    class FileLoaderMap:
         def __init__(self):
              self.loader_map = {}
    ```

3. Add a method called `register_loader` that takes three parameters: `extension` (a string representing the file extension), `loader` (a class representing the file loader), and `force` (a boolean value indicating whether to overwrite an existing entry in the map). The method should add an entry to the `loader_map` dictionary, where the key is the file extension and the value is the file loader class. If an entry with the same key already exists in the map, the method should raise an `ExtensionAlreadyRegisteredError` exception if `force` is `False`, or print a warning message if `force` is `True`.
   ```python
    def register_loader(self, ...):
        # Raise error or a warning if extension is already registered
        # depending on the value of the force argument
        if ...:
            if ...:
                raise 
            else:
                print(...)
        # proceed with adding the entry to the map
    ```

4. Add a method called `get_loader` that takes a `file_extension` parameter and returns the file loader class associated with that extension from the `loader_map` dictionary. If the file extension is not found in the map, the method should raise an `UnsupportedFileTypeError` exception.
   ```python
    def get_loader(self, file_extension):
        if ...:
            raise ...
        return ...
    ```

5. Initialize an instance of the `FileLoaderMap` class and register two file loaders: `CSVLoader` for the '.csv' extension and `TXTLoader` for the '.txt' extension (use mock class names for now).

Example usage:

```python

# Mock classes for demonstration purposes
# You can use your CSVLoader or any other class from Exercise 3 or just use the mock classes bellow

class CSVLoader:
    pass

class TXTLoader:
    pass
    

map = FileLoaderMap()
map.register_loader('.csv', CSVLoader)
map.register_loader('.txt', TXTLoader)

loader_class = map.get_loader('.csv')
print(loader_class)  # Output: <class '__main__.CSVLoader'>

loader_class = map.get_loader('.xlsx')
print(loader_class)  # Output: None
```
 


In [None]:
class FileLoaderMap():
    def __init__(self):
        self.loader_map = {}

    def register_loader(self):
        # your implementation here
        pass

    def get_loader(self):
        # your implementation here
        pass

### Exercise 9: File Loader Manager

As a data scientist, you often encounter data stored in different file formats such as CSV, JSON, or Excel. Each format requires specific handling and loading mechanisms, which can be cumbersome and time-consuming. To address this challenge, we introduce a new class called `FileLoaderManager`, which provides a simplified and flexible approach to loading data from multiple file formats.

The objective of this exercise is to introduce you to the concept of a flexible data loader that simplifies the process of loading data from various file formats. The exercise aims to highlight the benefits of encapsulation, code reusability, and flexibility in data loading tasks.

In this exercise, you will build the `FileLoaderManager` class that encapsulates the functionality of loading data from different file formats. Follow the step-by-step guide below to complete the exercise:

![img](schema.png)

1. Create the `FileLoaderManager` class.
2. Inside the class, define an `__init__` method that initializes the `FileLoaderManager` instance. This method should initialize an internal variable called `data_file_loader` to store the mapping between file extensions and file loaders that you created before. You can choose to use the `FileLoaderMap` class from the previous exercise or just use a dictionary (but you will have to handle more casuistics inside the class).
   ```python
   class FileLoaderManager:
       def __init__(self):
            self.data_file_loader = ...
            ...
    ```
3. Implement a method called `register_loader` that takes two mandatory parameters: `extension` and `loader` and one optional `force`. `force` should be set as `False` by default. This method should register a loader for a specific file extension. 
   ```python
   class FileLoaderManager:
       def register_loader(self, ... ):
            # Register the loader using the data_file_loader variable
            ...
    ```
4. Implement a method called `register_all_loaders`. This method should register all available loaders for the supported file extensions available in the `AVAILABLE_LOADERS` list. (Hint: You can use the `register_loader` method for each loader and extension). **This method should be called inside the `__init__` method so that all loaders are registered when the `FileLoaderManager` instance is initialized.**
   ```python
   class FileLoaderManager:
       def register_all_loaders(self, ... ):
            # Register the loader using the register_loader method
            for loader in ... :
                ...
                # Use the register_loader method with the force parameter set to True
                for extension in ... :
                    self.register_loader(..., force = True)
            
    ```
5. Implement a method called `load_data` that takes a `file_path` parameter. This method should handle the loading of data based on the file extension. The `load_data` method should return the loaded data by calling the `.load_data()` method from the proper FileLoader.
    ```python
    class FileLoaderManager:
        def load_data(self, ... ):
            # Get the file extension from the file_path
            file_extension = ...
            # Get the loader from the data_file_loader variable
            loader = ...    
            # Load the data using the loader
            data = loader.load_data(...)
            return data
    ```
6. Also implement a method called `get_supported_extensions` that returns a list of the supported file extensions.
7. Test your implementation by creating an instance of `FlexibleDataLoader`, registering loaders for different file extensions, and loading sample data from different file formats.

Your task is to complete the class skeleton by implementing the methods mentioned in the steps above. Test your implementation with different file formats to ensure the `FlexibleDataLoader` class works as expected.

Example usage:

```python
...

AVAILABLE_LOADERS = [CSVLoader, JSONLoader]

...

# Using the Mock classes given
file_loader = FileLoaderManager()
file_loader.register_loader(".txt", TXTLoader())
file_loader.get_supported_extensions()
>>> ['.txt', '.csv', '.json']

file_loader.load_data("./files/data.csv")
>>> Loading data from ./files/data.csv using CSVLoader


In [None]:
# You can use your CSVLoader implementation (and any other class)
# or just use the mock classes bellow
class FileLoader:
    def __init__(self):
        self.file_paths = []
        self.extensions = []

    def load_data(self, filepath):
        print(f"Loading data from {filepath} using {self.__class__.__name__}") 
        self.file_paths.append(filepath)

    def get_supported_extensions(self):
        return self.extensions
    
    def get_loaded_filepaths(self):
        return self.file_paths
        
class CSVLoader(FileLoader):
    def __init__(self):
        super().__init__()
        self.extensions = ['.csv']

class TXTLoader(FileLoader):
    def __init__(self):
        super().__init__()
        self.extensions = ['.txt']
    
class JSONLoader(FileLoader):
    def __init__(self):
        super().__init__()
        self.extensions = ['.json']

AVAILABLE_LOADERS = [CSVLoader, TXTLoader, JSONLoader]

class FileLoaderManager:
    def __init__(self):
        # Initialize the FlexibleDataLoader instance
        pass

    def register_loader(self):
        # Register a loader for a specific file extension
        pass

    def register_all_loaders(self):
        # Register all available loaders for the supported file extensions
        pass

    def load_data(self):
        # Handle the loading of data based on the file extension
        pass

    def get_supported_extensions(self):
        # Return a list of the supported file extensions
        pass