### Explaination


#### Methods
Your class should support two methods, namely fit and transform with the following behaviours:

#### fit : *fit(X)*
- fit the imputer on X.

Parameters:	
- X : {array-like, sparse matrix}, shape (n_samples, n_features)
- Input data, where n_samples is the number of samples and n_features is the number of features.

Returns:	
- self : object
- Returns self.

In other words: Fit receives as input the "matrix" of incomplete data, with the "boundaries" of the area for which we want to impute (calculate) missing values. (i.e. a single column, or the entire matrix) and returns an object containing only the part we want to do the imputation on. 


#### transform : *transform(X)*
- Impute all missing values in X and returns X with new values.

Parameters:	
- X : {array-like, sparse matrix}, shape = [n_samples, n_features]
- The input data to complete.
 

## Imputer Classes

In [2]:

# Neccessary libraries to solve the task. Panadas is used to display the data structures. Numpy is used for numerical operations to calculate missing value
import pandas as pd
import numpy as np

# Class for performing data imputation
class HomebrewImputer:
# The default strategy is empty, but it is possible to put one of the 3 strategies for default. The Imputation can only perform columns of the data structures.
    def __init__(self, strategy="", axis=0):
        self.strategy = strategy
        self.axis = axis
        self.imputer = None
# The imputation is based on the users chosen strategy. Mean, median, and mode is the current implemented strategies.
    def fit(self, X, columns=None):
        if self.strategy == "mean":
            self.imputer = MeanImputer(axis=self.axis)
        elif self.strategy == "median":
            self.imputer = MedianImputer()
        elif self.strategy == "mode":
            self.imputer = ModeImputer(axis=self.axis)
        else:
# When one of the 3 implemented strategies is not detected, that means it is either an typo or not having the specific strategy implemented.
            raise ValueError("The imputation strategy is not valid or not implemented")
# This allows users to select which columns to imputate. Iloc is integer location based indexing. It allows user to select columns.
        if columns is not None:
            X = X.iloc[:, columns]
# Fits the selected imputer to the data.
        self.imputer.fit(X)
# Transform the imputer on the data based on the selection.
    def transform(self, X):
        if self.imputer is not None:
            return self.imputer.transform(X)

# Class for mean imputation.
class MeanImputer:
    def __init__(self, axis=0):
        self.axis = axis
# Calculate the mean value for the imputation
    def fit(self, X):
        self.imputation_values = X.mean(axis=self.axis)
# fill the missing values for the imputation
    def transform(self, X):
        return X.fillna(self.imputation_values)
    
# Class for median imputation. Median class does not require an __init__ method because it is not axis depended.
class MedianImputer:
    def fit(self, X):
        self.imputation_values = X.median()
# fill the missing values for the imputation
    def transform(self, X):
        return X.fillna(self.imputation_values)
    
# Class for mode imputation.
class ModeImputer:
    def __init__(self, axis=0):
        self.axis = axis
# Calculate the mode value for the imputation
    def fit(self, X):
        self.imputation_values = X.mode(axis=self.axis).iloc[0]
# fill the missing values for the imputation
    def transform(self, X):
        return X.fillna(self.imputation_values)

# Data to the imputation.
X = pd.DataFrame({
    'Country': ['France', 'Spain', 'Germany', 'Spain', 'Germany', 'France', 'Spain', 'France', 'Germany', 'France'],
    'Num1': [44.0, 27.0, 30.0, 38.0, 40.0, 35.0, np.nan, 48.0, 50.0, 37.0],
    'Num2': [72000.0, 48000.0, 54000.0, 61000.0, np.nan, 58000.0, 52000.0, 79000.0, 83000.0, 67000.0]
})
# User can choose which strategy to imputation with.
strategy = "mean"  
# User can choose which row to imputate.
columns_to_impute = [1,2]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)


Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country       Num1          Num2
0   France  44.000000  72000.000000
1    Spain  27.000000  48000.000000
2  Germany  30.000000  54000.000000
3    Spain  38.000000  61000.000000
4  Germany  40.000000  63777.777778
5   France  35.000000  58000.000000
6    Spain  38.777778  52000.000000
7   France  48.000000  79000.000000
8  Germany  50.000000  83000.000000
9   France  37.000000  67000.000000


## Demonstration of how to use the strategy imputer

# Testing mean strategy

In [3]:

# User can choose which strategy to imputation with.
strategy = "mean"  
# User can choose which row to imputate.
columns_to_impute = [1,2]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)


Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country       Num1          Num2
0   France  44.000000  72000.000000
1    Spain  27.000000  48000.000000
2  Germany  30.000000  54000.000000
3    Spain  38.000000  61000.000000
4  Germany  40.000000  63777.777778
5   France  35.000000  58000.000000
6    Spain  38.777778  52000.000000
7   France  48.000000  79000.000000
8  Germany  50.000000  83000.000000
9   France  37.000000  67000.000000


# Testing median strategy

In [4]:

# User can choose which strategy to imputation with.
strategy = "median"  
# User can choose which row to imputate.
columns_to_impute = [1,2]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)

Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0  61000.0
5   France  35.0  58000.0
6    Spain  38.0  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0


# Testing mode strategy

In [5]:

# User can choose which strategy to imputation with.
strategy = "mode"  
# User can choose which row to imputate.
columns_to_impute = [1,2]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)

Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0  48000.0
5   France  35.0  58000.0
6    Spain  27.0  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0


# User can choose which colmun to imputate. The first number corresponds to column "Num1" and the second corresponds to column "Num2".

In [6]:

# User can choose which strategy to imputation with.
strategy = "mean"  
# User can choose which row to imputate.
columns_to_impute = [1]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)


Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country       Num1     Num2
0   France  44.000000  72000.0
1    Spain  27.000000  48000.0
2  Germany  30.000000  54000.0
3    Spain  38.000000  61000.0
4  Germany  40.000000      NaN
5   France  35.000000  58000.0
6    Spain  38.777778  52000.0
7   France  48.000000  79000.0
8  Germany  50.000000  83000.0
9   France  37.000000  67000.0


In [8]:

# User can choose which strategy to imputation with.
strategy = "mean"  
# User can choose which row to imputate.
columns_to_impute = [2]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)


Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country  Num1          Num2
0   France  44.0  72000.000000
1    Spain  27.0  48000.000000
2  Germany  30.0  54000.000000
3    Spain  38.0  61000.000000
4  Germany  40.0  63777.777778
5   France  35.0  58000.000000
6    Spain   NaN  52000.000000
7   France  48.0  79000.000000
8  Germany  50.0  83000.000000
9   France  37.0  67000.000000


However, 0  will coresspond to nothing due to the "Country" column does not contain integers or NaN.

In [9]:

# User can choose which strategy to imputation with.
strategy = "mean"  
# User can choose which row to imputate.
columns_to_impute = [0]  
# Creates an instance of the homebrew imputator class
imputer = HomebrewImputer(strategy=strategy, axis=0)
# Fit the imputer on the dataset and choose which column to imputate.
imputer.fit(X, columns=columns_to_impute)
# Perform imputation and get the imputed data
imputed_data = imputer.transform(X)

# Print the original data and imputed data for comparison.
print("Original Data: \n", X)
print("")
print("Imputed Data : \n", imputed_data)

Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0

Imputed Data : 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0


  self.imputation_values = X.mean(axis=self.axis)


## Brief Reflection

The code of the strategy pattern is developed to have a robust and extensible design in mind for future use. The design is developed in a way to allow effortless implementing a new strategy pattern without impacting the design of the code structure. These are the current features that makes the design resistant to changes;


Extensibility. The Strategy Pattern allows for the addition of new imputation strategies by a new class that follows the same interface. However, the design will likely get more complicated when it is overloaded with imputation classes and features, and could lead to changing the user interface for just to be compatible with the new feature.

Maintainability. The code imputation strategies are separated into classes, each of the classes are responsible for a specific strategy. This approach creates a more effortless approach for maintaining and debugging due to the isolation of other strategy patterns, resulting the code does not break if one of the strategy patterns does to function. By adding more classes could make the code structure more challenging to navigate.

Flexibility. The design is flexible as it allows users to choose which columns to impute. However, implementing axis=1 will require the whole structure being modified. The design can only perform axis=0. Adding Axis=1 will require a bit of code structure modification. 



## Pros and Cons of the Design


Benefits:

Extensibility: The factory pattern allows effortless addition of new strategies without interrupting already implemented imputation strategies

Easier management: Separating the responsibility of strategy creation to the factory class promotes a cleaner code structure.

Reusability: The factory can be reused across different projects that require similar strategy instantiation.

Maintenance: Debugging and maintaining code is more straightforward with a separated factory, allowing strategies getting implmented in isolation by introducting a new class and implementing it as an option on "StrategyFactory".

Negatives:

Complexity: Introducing a factory pattern has the possiblity of increasing complexity to the design.

Implementation: Depending on what kind of imputation strategy will be added, it may take some time.

Missing features: The design does not have the implementation of reading data from a CSV file, and can only imputate columns. (Axis=0)


## Improved version of the design

In [2]:
import pandas as pd
import numpy as np

class Strategyfactory:
    def create_imputer(self, strategy, axis=0):
        if strategy == "mean":
            return MeanImputer(axis=axis)
        elif strategy == "median":
            return MedianImputer(axis=axis)
        elif strategy == "mode":
            return ModeImputer(axis=axis)
        else:
            raise ValueError("The imputation strategy is not valid or not implemented")

class HomebrewImputer:
    def __init__(self, Strategyfactory, axis=0):
        self.Strategyfactory = Strategyfactory
        self.axis = axis
        self.imputer = None

    def fit(self, X, rows=None, strategy=""):
        self.imputer = self.Strategyfactory.create_imputer(strategy, axis=self.axis)
        if rows is not None:
            X = X.loc[rows]
        self.imputer.fit(X)

    def transform(self, X):
        if self.imputer is not None:
            return self.imputer.transform(X)

    def set_axis(self, axis):
        self.axis = axis

class MeanImputer:
    def __init__(self, axis=0):
        self.axis = axis

    def fit(self, X):
        self.imputation_values = X.mean(axis=self.axis)

    def transform(self, X):
        return X.fillna(self.imputation_values)

class MedianImputer:
    def __init__(self, axis=0):
        self.axis = axis

    def fit(self, X):
        self.imputation_values = X.median(axis=self.axis)

    def transform(self, X):
        return X.fillna(self.imputation_values)

class ModeImputer:
    def __init__(self, axis=0):
        self.axis = axis

    def fit(self, X):
        self.imputation_values = X.mode(axis=self.axis).iloc[0]

    def transform(self, X):
        return X.fillna(self.imputation_values)

X = pd.DataFrame({
    'Country': ['France', 'Spain', 'Germany', 'Spain', 'Germany', 'France', 'Spain', 'France', 'Germany', 'France'],
    'Num1': [44.0, 27.0, 30.0, 38.0, 40.0, 35.0, np.nan, 48.0, 50.0, 37.0],
    'Num2': [72000.0, 48000.0, 54000.0, 61000.0, np.nan, 58000.0, 52000.0, 79000.0, 83000.0, 67000.0]
})

Strategyfactory = Strategyfactory()
strategy = "median"
imp = [1, 2]
imputer = HomebrewImputer(Strategyfactory, axis=0)
imputer.fit(X, rows=imp, strategy=strategy)
imputed_data = imputer.transform(X)
print("Original Data: \n", X)
print("Imputed Data : \n", imputed_data)

Original Data: 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0      NaN
5   France  35.0  58000.0
6    Spain   NaN  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0
Imputed Data : 
    Country  Num1     Num2
0   France  44.0  72000.0
1    Spain  27.0  48000.0
2  Germany  30.0  54000.0
3    Spain  38.0  61000.0
4  Germany  40.0  51000.0
5   France  35.0  58000.0
6    Spain  28.5  52000.0
7   France  48.0  79000.0
8  Germany  50.0  83000.0
9   France  37.0  67000.0


  self.imputation_values = X.median(axis=self.axis)


## The current version can not change axis to 1.

Imputation strategy will fill missing values in the Num1 and Num2 columns based on the rows specified in imp[]. The HomebrewImputer class manages the imputation process, and the Imputer classes handles the imputation itself. The output will display the original data and the imputed data after applying the imputation strategy to prove it works.

## How does the information travel when a user input information?

The process starts when an user interact the program by specifying which column and which imputatation strategy to execute. The choice is then passed to the Strategyfactory class to create an instance based on the information given by a user. The information will then get passed to the Homebrewimputer class for strategy initilizaition. The homebrewclass will then coordinate the flow of the information by using "Fit" method to calcluate values by using one of the implementation strategies and which column to imputate. Then the "Transform" will apply the information and output values to "NaN" cells.