## Homework #1 - Object-Oriented Programming Principles.

In this homework I had to implement all my knowledge regarding the OOP principles into practice.

## Objective: Create a simplified machine learning pipeline that demonstrates Object-Oriented Programming (OOP) principles

### Data Specification:
* X: A list (or list of lists) representing your features. For simplicity, you can assume
one-dimensional features (e.g., [1, 2, 3]), where each number could represent the
house size for each different example in your data
* y: A list of numeric values representing labels or target values (e.g., [10, 14, 12] for
regression - house price, or [0, 1, 0] for classification - pricing category: 0 - not
expensive, 1 - expensive)

### Classes to implement:

#### Base Class: MLModel
* Define two methods, fit(X, y) and predict(X). Both should raise NotImplementedError by default. This ensures that child classes are responsible for implementing the details.

#### Child Classes:
* DummyClassifier
  * The fit(X, y) method should compute and store the most common target class based on y (store it in self.__mode)
  * The predict(X) method should return a list of the same length as X, where each element would be the computed value self.__mode
* MeanRegressor:
  * In the fit(X, y) method, compute the mean of y and store it (e.g., self.__mean).
  * In the predict(X) method, return this stored mean for every input (e.g., \[self.__mean, self.__mean, ...\]).

#### Dataset Class:
* During object initialization, store the features (X) and targets (y) in private attributes
(e.g., __features and __targets).
* Provide a method get_data() to return (X, y).
* Implement a split(train_fraction=0.8) method that returns (X_train, y_train), (X_test, y_test) by slicing the data.

### Final Demonstration:
* Create variables X and y with any values, load them into your Dataset, and split the data into X_train, y_train, X_test, y_test by calling split method
* Instantiate each model class, train them on (X_train, y_train) by calling fit method, then predict on X_test.
* Print out the predictions for each model.
* Highlight Polymorphism, Encapsulation, and Inheritance through your implementation and demonstration

### Base class - MLModel
In the cell below, I specify a Base class (sort of abstract class) that will be "implemented" by its subclasses. It has 2 "abstract" methods:
* fit(X, y) - that will take the list or list of lists parameter X (feature / features) and y - list of numeric values (target).
* predict(X) - that will take the list or list of lists parameter X (feature / features).

Both of the methods are not actually abstract in the direct sense, but we simulate that behavior by making them raise NotImplementedError, that will mean that the method are abstract.

In [1]:
class MLModel:
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        raise NotImplementedError

    def predict(self, X: list[int | float] | list[list[int | float]]):
        raise NotImplementedError

    # This will be used to demonstrate the use of Inheritance in the Final Demonstration
    def inherited_method(self):
        print("INHERITED METHOD EXAMPLE!")

### Subclass #1 - DummyClassifier
In the cell below, I present the DummyClassifier class, that will "implement" the MLModel "abstract" class (actually the DummyClassifier(MLModel) is about the Inheritance), making the
DummyClassifier a subclass of the MLModel base class. In this case, it should override the methods from its parent class (implement them).
* fit(X, y) - for this method, it will find the most common Target Class in the y parameter (list of Target Variables) via the use of Counter.most_common() method that will return a list
of tuples of type (value, occurrence), and I will take the value from the tuple that I get from the list.
* predict(X) - for this method, I will just return a list that has repeating the self.__mode variable using list comprehension.

In [2]:
from overrides import override
from collections import Counter

from Homework_1_OOP.Models.MLModel import MLModel


class DummyClassifier(MLModel):
    def __init__(self):
        self.__mode: int = 0

    @override
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        if set(y) != {0, 1}:
            raise ValueError(f"y must contain only 0 and 1 values - set(y) = {set(y)}")
        if len(y) == 0:
            raise ValueError("y contains no values inside.")
        if len(X) != len(y):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(X)} and len(y) = {len(y)}.")

        occurrence_count = Counter(y)
        self.__mode = occurrence_count.most_common(1)[0][0] # most_common(1) => only one most common value (still list of tuples),
                                                            # most_common(1)[0] => return the tuple (value, occurrences),
                                                            # most_common(1)[0][0] => return the most common value.
    @override
    def predict(self, X: list[int | float] | list[list[int | float]]) -> list:
        return [self.__mode for _ in range(len(X))]

### Subclass #2 - MeanRegressor
In the cell below, I present the MeanRegressor class, that will "implement" the MLModel "abstract" class, making the
MeanRegressor a subclass of the MLModel base class. In this case, it should override the methods from its parent class (implement them).
* fit(X, y) - for this method, it will find the mean value of the y parameter (list of Target Variables) via the use of ...
* predict(X) - for this method, I will just return a list that has repeating the self.__mean variable for each input, using, again, list comprehension.

In [3]:
from overrides import override

from Homework_1_OOP.Models.MLModel import MLModel


class MeanRegressor(MLModel):
    def __init__(self):
        self.__mean: float = 0.0

    @override
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        if len(X) != len(y):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(X)} and len(y) = {len(y)}")

        self.__mean = sum(y) / len(y)

    @override
    def predict(self, X: list[int | float] | list[list[int | float]]) -> list:
        return [self.__mean for _ in range(len(X))]

### Class - Dataset
In the cell below, I present the Dataset class, that will hold the Features and Target variables inside of it (using Constructor) and will offer the following methods:
* get_data() - it will return both features and targets that were assigned to the object of type Dataset,
* split(train_fraction) - it will split the dataset based on a train fraction float that will show the percentage of the data in the Training set and the remaining - in the Test set.

In [4]:
import math


class Dataset:
    def __init__(self, features: list | list[list], targets: list):
        if len(features) != len(targets):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(features)} and len(y) = {len(targets)}")
        self.__features = features
        self.__targets = targets

    def get_data(self) -> tuple[list, list]:
        return self.__features, self.__targets

    def split(self, train_fraction: float = 0.8) -> tuple[tuple[list, list], tuple[list, list]]:
        if not (0.0 <= train_fraction <= 1.0):
            raise ValueError(f"Train Fraction should be in range from 0.0 to 1.0 - train_fraction = {train_fraction}")

        slice_index: int = math.ceil(len(self.__features) * train_fraction)

        X_train: list = self.__features[:slice_index]
        y_train: list = self.__targets[:slice_index]
        X_test: list = self.__features[slice_index:]
        y_test: list = self.__targets[slice_index:]

        return (X_train, y_train), (X_test, y_test)

### Execution
For the Final Demonstration, I will present the step-by-step execution of the code that embeds all of the above functionalities.

#### Step 1
For this Step, I initialize a sample dataset, with arbitrary values:

In [5]:
X_classification: list = [5, 2, 1, 4, 6, 7, 4, 1, 2, 3]
X_classification_multiple: list[list] = [
    [0, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [1, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [2, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [3, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [4, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [5, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [6, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [7, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [8, 2, 1, 4, 6, 7, 4, 1, 2, 3],
    [9, 2, 1, 4, 6, 7, 4, 1, 2, 3]
]
y_classification: list = [1, 1, 0, 0, 1, 0, 1, 1, 1, 1]

X_regression: list = [1, 7, 25, 12, 10, 23, 10, 11, 9, 34]
X_regression_multiple: list[list] = [
    [0, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [1, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [2, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [3, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [4, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [5, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [6, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [7, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [8, 7, 25, 12, 10, 23, 10, 11, 9, 34],
    [9, 7, 25, 12, 10, 23, 10, 11, 9, 34]
]
y_regression: list = [50, 10, 100, 15, 13, 4, 7, 10, 24, 76] # sum = 209 for training set with 0.8 split. mean = 209 / 8 = 26.125

#### Step 2
Next, I will instantiate a Dataset object and I will use its split method on the dataset described above.

In [6]:
dataset_classification: Dataset = Dataset(features=X_classification, targets=y_classification)
dataset_classification_multiple: Dataset = Dataset(features=X_classification_multiple, targets=y_classification)
dataset_regression: Dataset = Dataset(features=X_regression, targets=y_regression)
dataset_regression_multiple: Dataset = Dataset(features=X_regression_multiple, targets=y_regression)

datasets_list: list[Dataset] = [dataset_classification, dataset_classification_multiple, dataset_regression, dataset_regression_multiple]
titles: list[str] = ["CLASSIFICATION", "CLASSIFICATION MULTIPLE FEATURES", "REGRESSION", "REGRESSION MULTIPLE FEATURES"]
datasets_dict: dict = dict(zip(titles, datasets_list))

for index, (title, dataset) in enumerate(datasets_dict.items()):
    print(f"Case {index + 1}: {title}")
    (X_train, y_train), (X_test, y_test) = dataset.split()
    print(f"X_train = {X_train}")
    print(f"y_train = {y_train}")
    print(f"X_test = {X_test}")
    print(f"y_test = {y_test}")
    datasets_dict[title] = (X_train, y_train), (X_test, y_test)

Case 1: CLASSIFICATION
X_train = [5, 2, 1, 4, 6, 7, 4, 1]
y_train = [1, 1, 0, 0, 1, 0, 1, 1]
X_test = [2, 3]
y_test = [1, 1]
Case 2: CLASSIFICATION MULTIPLE FEATURES
X_train = [[0, 2, 1, 4, 6, 7, 4, 1, 2, 3], [1, 2, 1, 4, 6, 7, 4, 1, 2, 3], [2, 2, 1, 4, 6, 7, 4, 1, 2, 3], [3, 2, 1, 4, 6, 7, 4, 1, 2, 3], [4, 2, 1, 4, 6, 7, 4, 1, 2, 3], [5, 2, 1, 4, 6, 7, 4, 1, 2, 3], [6, 2, 1, 4, 6, 7, 4, 1, 2, 3], [7, 2, 1, 4, 6, 7, 4, 1, 2, 3]]
y_train = [1, 1, 0, 0, 1, 0, 1, 1]
X_test = [[8, 2, 1, 4, 6, 7, 4, 1, 2, 3], [9, 2, 1, 4, 6, 7, 4, 1, 2, 3]]
y_test = [1, 1]
Case 3: REGRESSION
X_train = [1, 7, 25, 12, 10, 23, 10, 11]
y_train = [50, 10, 100, 15, 13, 4, 7, 10]
X_test = [9, 34]
y_test = [24, 76]
Case 4: REGRESSION MULTIPLE FEATURES
X_train = [[0, 7, 25, 12, 10, 23, 10, 11, 9, 34], [1, 7, 25, 12, 10, 23, 10, 11, 9, 34], [2, 7, 25, 12, 10, 23, 10, 11, 9, 34], [3, 7, 25, 12, 10, 23, 10, 11, 9, 34], [4, 7, 25, 12, 10, 23, 10, 11, 9, 34], [5, 7, 25, 12, 10, 23, 10, 11, 9, 34], [6, 7, 25, 12, 10, 23, 

Exactly, here, when I create an object of class Dataset, I inject the X and y (features and targets) that will be used by the constructor in order to set them as its private fields
inside itself. By making them private, I ensure that only the Dataset object can access them, via its own methods. In order to provide an access point to them, to see them, I have a
method get_data() that will return them as a tuple. This will ensure the Encapsulation of the data in the Dataset.

#### Step 4
In the next step, I will actually use both MLModels implementations in order to illustrate the algorithms in work and the use of Polymorphism and Inheritance.

As I mentioned at the MLModel class description, I have a method that will be inherited by the Concrete ML Model Classes (MeanRegressor and DummyClassifier). This method will just
print a string in the console. At the same time, I can specify that the type of the class that the variables for ML Models will hold a MLModel class, that is valid, because the
Concrete ML Models are subclasses of the MLModel base class and they will just substitute it at the instantiation.


In [7]:
classifier_model: MLModel = DummyClassifier()
regression_model: MLModel = MeanRegressor()

classifier_model.inherited_method()
regression_model.inherited_method()

INHERITED METHOD EXAMPLE!
INHERITED METHOD EXAMPLE!


Next, I will demonstrate how the Polymorphism, in the context of Overriding of methods, is working in the current hierarchy. Since both DummyClassifier and MeanRegressor implements the
abstract class MLModel, they will override the fit(X, y) and predict(X) methods in order to implement them in their own ways.

In [8]:
for index, (title, datasets) in enumerate(datasets_dict.items()):
    print(f"Case {index + 1}: {title}")
    print(f"Dataset:")
    X_train = datasets[0][0]
    y_train = datasets[0][1]
    X_test = datasets[1][0]
    y_test = datasets[1][1]
    print(f"X_train = {X_train}")
    print(f"y_train = {y_train}")
    print(f"X_test = {X_test}")
    print(f"y_test = {y_test}")
    if index <= 1:
        classifier_model.fit(X=X_train, y=y_train)
        print(f"Predicted classes - {classifier_model.predict(X=X_test)}")
    else:
        regression_model.fit(X=X_train, y=y_train)
        print(f"Predicted values - {regression_model.predict(X=y_test)}")

Case 1: CLASSIFICATION
Dataset:
X_train = [5, 2, 1, 4, 6, 7, 4, 1]
y_train = [1, 1, 0, 0, 1, 0, 1, 1]
X_test = [2, 3]
y_test = [1, 1]
Predicted classes - [1, 1]
Case 2: CLASSIFICATION MULTIPLE FEATURES
Dataset:
X_train = [[0, 2, 1, 4, 6, 7, 4, 1, 2, 3], [1, 2, 1, 4, 6, 7, 4, 1, 2, 3], [2, 2, 1, 4, 6, 7, 4, 1, 2, 3], [3, 2, 1, 4, 6, 7, 4, 1, 2, 3], [4, 2, 1, 4, 6, 7, 4, 1, 2, 3], [5, 2, 1, 4, 6, 7, 4, 1, 2, 3], [6, 2, 1, 4, 6, 7, 4, 1, 2, 3], [7, 2, 1, 4, 6, 7, 4, 1, 2, 3]]
y_train = [1, 1, 0, 0, 1, 0, 1, 1]
X_test = [[8, 2, 1, 4, 6, 7, 4, 1, 2, 3], [9, 2, 1, 4, 6, 7, 4, 1, 2, 3]]
y_test = [1, 1]
Predicted classes - [1, 1]
Case 3: REGRESSION
Dataset:
X_train = [1, 7, 25, 12, 10, 23, 10, 11]
y_train = [50, 10, 100, 15, 13, 4, 7, 10]
X_test = [9, 34]
y_test = [24, 76]
Predicted values - [26.125, 26.125]
Case 4: REGRESSION MULTIPLE FEATURES
Dataset:
X_train = [[0, 7, 25, 12, 10, 23, 10, 11, 9, 34], [1, 7, 25, 12, 10, 23, 10, 11, 9, 34], [2, 7, 25, 12, 10, 23, 10, 11, 9, 34], [3, 7, 25, 12,

As it may be seen, the results are consistent. In the next cell, I will take one specific example for regression and classification and I will try to explain it in particular

In [9]:
print("REGRESSION MODEL")
datasets: dict = datasets_dict.get("REGRESSION")
X_train = datasets[0][0]
y_train = datasets[0][1]
X_test = datasets[1][0]
y_test = datasets[1][1]
print(f"X_train = {X_train}")
print(f"y_train = {y_train}")
print(f"X_test = {X_test}")
print(f"y_test = {y_test}")

REGRESSION MODEL
X_train = [1, 7, 25, 12, 10, 23, 10, 11]
y_train = [50, 10, 100, 15, 13, 4, 7, 10]
X_test = [9, 34]
y_test = [24, 76]


In [10]:
regression_model: MLModel = MeanRegressor()
regression_model.fit(X_train, y_train)
print(f"X_test: {X_test}")
print(f"Predicted Values: {regression_model.predict(X_test)}")

X_test: [9, 34]
Predicted Values: [26.125, 26.125]


In the above cell, I have the predicted Value 26.125. In order to check it, I go through the instructions by-hand and, as it may be seen below, the result is actually correct.
$$
\begin{split}
{\vec{y}}_{train} = \begin{pmatrix}
50 \cr
10 \cr
100 \cr
15 \cr
13 \cr
4 \cr
7 \cr
10 \cr
\end{pmatrix}
\\[10pt]
{sum}_{y_{train}} = \displaystyle\sum_{i=0}^{len({\vec{y}}_{train})} {y_{train}^{(i)}} = 209
\\[10pt]
{mean}_{y_{train}} = \frac{{sum}_{y_{train}}}{len(y_{train})} = \frac{209}{8} = 26.125
\\[10pt]
\hat{y} = \begin{pmatrix}
26.125 \cr
26.125 \cr
\end{pmatrix}
\end{split}
$$

In [11]:
print("CLASSIFICATION MODEL")
datasets: dict = datasets_dict.get("CLASSIFICATION")
X_train = datasets[0][0]
y_train = datasets[0][1]
X_test = datasets[1][0]
y_test = datasets[1][1]
print(f"X_train = {X_train}")
print(f"y_train = {y_train}")
print(f"X_test = {X_test}")
print(f"y_test = {y_test}")

CLASSIFICATION MODEL
X_train = [5, 2, 1, 4, 6, 7, 4, 1]
y_train = [1, 1, 0, 0, 1, 0, 1, 1]
X_test = [2, 3]
y_test = [1, 1]


In [12]:
classifier_model: MLModel = DummyClassifier()
classifier_model.fit(X_train, y_train)
print(f"X_test: {X_test}")
print(f"Predicted Classes: {classifier_model.predict(X_test)}")

X_test: [2, 3]
Predicted Classes: [1, 1]


In the above cell, I have the predicted Class 1. In order to check it, I go through the instructions by-hand, as in previous example, and, as it may be seen below, the result is
actually correct.
$$
\begin{split}
{\vec{y}}_{train} = \begin{pmatrix}
1 \cr
1 \cr
0 \cr
0 \cr
1 \cr
0 \cr
1 \cr
1 \cr
\end{pmatrix}
\\[10pt]
max\_y = \arg\max_{y \in {\vec{y}}_{train}} \left( \text{count}(y) \right) = 1 \text{ (nr. of occurrences = 5)}
\\[10pt]
\hat{y} = \begin{pmatrix}
1 \cr
1 \cr
\end{pmatrix}
\end{split}
$$

|In the above code, I have both models of Classification and Regression, and they implement the specific methods from each instance of the class DummyClassifier and MeanRegressor.