## Homework #1 - Object-Oriented Programming Principles.

In this homework I had to implement all my knowledge regarding the OOP principles into practice.

## Objective: Create a simplified machine learning pipeline that demonstrates Object-Oriented Programming (OOP) principles

### Data Specification:
* X: A list (or list of lists) representing your features. For simplicity, you can assume
one-dimensional features (e.g., [1, 2, 3]), where each number could represent the
house size for each different example in your data
* y: A list of numeric values representing labels or target values (e.g., [10, 14, 12] for
regression - house price, or [0, 1, 0] for classification - pricing category: 0 - not
expensive, 1 - expensive)

### Classes to implement:

#### Base Class: MLModel
* Define two methods, fit(X, y) and predict(X). Both should raise NotImplementedError by default. This ensures that child classes are responsible for implementing the details.

#### Child Classes:
* DummyClassifier
  * The fit(X, y) method should compute and store the most common target class based on y (store it in self.__mode)
  * The predict(X) method should return a list of the same length as X, where each element would be the computed value self.__mode
* MeanRegressor:
  * In the fit(X, y) method, compute the mean of y and store it (e.g., self.__mean).
  * In the predict(X) method, return this stored mean for every input (e.g., \[self.__mean, self.__mean, ...\]).

#### Dataset Class:
* During object initialization, store the features (X) and targets (y) in private attributes
(e.g., __features and __targets).
* Provide a method get_data() to return (X, y).
* Implement a split(train_fraction=0.8) method that returns (X_train, y_train), (X_test, y_test) by slicing the data.

### Final Demonstration:
* Create variables X and y with any values, load them into your Dataset, and split the data into X_train, y_train, X_test, y_test by calling split method
* Instantiate each model class, train them on (X_train, y_train) by calling fit method, then predict on X_test.
* Print out the predictions for each model.
* Highlight Polymorphism, Encapsulation, and Inheritance through your implementation and demonstration

### Base class - MLModel
In the cell below, I specify a Base class (sort of abstract class) that will be "implemented" by its subclasses. It has 2 "abstract" methods:
* fit(X, y) - that will take the list or list of lists parameter X (feature / features) and y - list of numeric values (target).
* predict(X) - that will take the list or list of lists parameter X (feature / features).

Both of the methods are not actually abstract in the direct sense, but we simulate that behavior by making them raise NotImplementedError, that will mean that the method are abstract.

In [17]:
class MLModel:
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        raise NotImplementedError

    def predict(self, X: list[int | float] | list[list[int | float]]):
        raise NotImplementedError

### Subclass #1 - DummyClassifier
In the cell below, I present the DummyClassifier class, that will "implement" the MLModel "abstract" class (actually the DummyClassifier(MLModel) is about the Inheritance), making the
DummyClassifier a subclass of the MLModel base class. In this case, it should override the methods from its parent class (implement them).
* fit(X, y) - for this method, it will find the most common Target Class in the y parameter (list of Target Variables) via the use of Counter.most_common() method that will return a list
of tuples of type (value, occurrence), and I will take the value from the tuple that I get from the list.
* predict(X) - for this method, I will just return a list that has repeating the self.__mode variable using list comprehension.

In [18]:
from overrides import override
from collections import Counter

from Homework_1_OOP.Models.MLModel import MLModel


class DummyClassifier(MLModel):
    def __init__(self):
        self.__mode: int = 0

    @override
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        if set(y) != {0, 1}:
            raise ValueError(f"y must contain only 0 and 1 values - set(y) = {set(y)}")
        if len(y) == 0:
            raise ValueError("y contains no values inside.")
        if len(X) != len(y):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(X)} and len(y) = {len(y)}.")

        occurrence_count = Counter(y)
        self.__mode = occurrence_count.most_common(1)[0][0] # most_common(1) => only one most common value (still list of tuples),
                                                            # most_common(1)[0] => return the tuple (value, occurrences),
                                                            # most_common(1)[0][0] => return the most common value.
    @override
    def predict(self, X: list[int | float] | list[list[int | float]]) -> list:
        return [self.__mode for _ in range(len(X))]

### Subclass #2 - MeanRegressor
In the cell below, I present the MeanRegressor class, that will "implement" the MLModel "abstract" class, making the
MeanRegressor a subclass of the MLModel base class. In this case, it should override the methods from its parent class (implement them).
* fit(X, y) - for this method, it will find the mean value of the y parameter (list of Target Variables) via the use of ...
* predict(X) - for this method, I will just return a list that has repeating the self.__mean variable for each input, using, again, list comprehension.

In [None]:
from overrides import override

from Homework_1_OOP.Models.MLModel import MLModel


class MeanRegressor(MLModel):
    def __init__(self):
        self.__mean: float = 0.0

    @override
    def fit(self, X: list[int | float] | list[list[int | float]], y: list) -> None:
        if len(X) != len(y):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(X)} and len(y) = {len(y)}")

        self.__mean = sum(X) / len(y)

    @override
    def predict(self, X: list[int | float] | list[list[int | float]]) -> list:
        return [self.__mean for _ in range(len(X))]

### Class - Dataset
In the cell below, I present the Dataset class, that will hold the Features and Target variables inside of it (using Constructor) and will offer the following methods:
* get_data() - it will return both features and targets that were assigned to the object of type Dataset,
* split(train_fraction) - it will split the dataset based on a train fraction float that will show the percentage of the data in the Training set and the remaining - in the Test set.

In [None]:
import math


class Dataset:
    def __init__(self, features: list | list[list], targets: list):
        if len(features) != len(targets):
            raise ValueError(f"X and y must be of the same length - len(X) = {len(features)} and len(y) = {len(targets)}")
        self.__features = features
        self.__targets = targets

    def get_data(self) -> tuple[list, list]:
        return self.__features, self.__targets

    def split(self, train_fraction: float = 0.8) -> tuple[tuple[list, list], tuple[list, list]]:
        if not (0.0 <= train_fraction <= 1.0):
            raise ValueError(f"Train Fraction should be in range from 0.0 to 1.0 - train_fraction = {train_fraction}")

        slice_index: int = math.ceil(len(self.__features) * train_fraction)
        print(f"Slice index: {slice_index}")

        X_train: list = self.__features[:slice_index]
        y_train: list = self.__targets[:slice_index]
        X_test: list = self.__features[slice_index:]
        y_test: list = self.__targets[slice_index:]

        return (X_train, y_train), (X_test, y_test)