<a href="https://colab.research.google.com/github/Kiana-M/CapsuleVAE/blob/master/OOP_Mockinterview_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let’s dive into a mock interview question that ties in data quality monitoring and OOP fundamentals.

---

**Interview Question:**

"Let's design a basic **Data Quality Monitoring System** in Python. This system should allow us to track data quality metrics for different data sources, such as completeness and uniqueness. We'll structure it to follow OOP principles, making it modular and extensible for future metrics."

### Requirements

1. **Abstract Base Class**: Create an abstract base class `DataQualityMetric`, which defines the blueprint for all metrics.
   - This class should have an `evaluate` method that will be implemented in each concrete metric.

2. **Concrete Metrics**: Define at least two concrete classes that inherit from `DataQualityMetric`:
   - **CompletenessMetric**: Checks for missing values in a data source.
   - **UniquenessMetric**: Checks for duplicate entries in a data source.
   
3. **DataSource Class**: A `DataSource` class represents the data to be monitored. It has a method to load data and another method to run metrics on it.

4. **MetricEvaluator Class**: Finally, a class `MetricEvaluator` that takes in a `DataSource` and a list of `DataQualityMetric` objects, and evaluates each metric on the data source.

**Assume we’re working with a pandas DataFrame as the data source.**

---

### Example Usage

Here's a quick example of what this would look like when put together:

```python
import pandas as pd

# Define some data with missing values and duplicates for testing
data = pd.DataFrame({
    "id": [1, 2, 3, 4, None, 5, 2],
    "name": ["Alice", "Bob", None, "David", "Eve", "Frank", "Alice"]
})
```

Then, the code should work like this:

```python
source = DataSource(data)
evaluator = MetricEvaluator(source, [CompletenessMetric(), UniquenessMetric()])
results = evaluator.evaluate()
print(results)
```

### Let’s Begin the Design

1. **Step 1**: Start by defining the `DataQualityMetric` base class with the `evaluate` method.
  
2. **Step 2**: Implement `CompletenessMetric` and `UniquenessMetric` classes that inherit from `DataQualityMetric`.

---

Let’s start with **Step 1**: Go ahead and define the `DataQualityMetric` class as an abstract class, including the `evaluate` method.

Let me know if you'd like me to help clarify any part or proceed to the next steps!

In [1]:

from abc import ABC, abstractmethod

class DataQualityMetric(ABC):

    @abstractmethod
    def evaluate(self):
        raise NotImplementedError("Subclasses must implement this method.")


class CompletenessMetric(DataQualityMetric):
    #Checks for missing values in a data source.

    def evaluate(self, df):
        if df.isnull().sum().sum() == 0:
            return False
        else:
            return True


class UniquenessMetric(DataQualityMetric):
    #Checks for duplicate entries in a data source.

    def evaluate(self, df):
        if len(df) == len(df.drop_duplicates()):
            return False
        else:
            return True

class DataSource:
    def __init__(self, data):
        self.__data = data

    @property
    def load_data(self):
        return self.__data


class MetricEvaluator:
    def __init__(self, source: DataSource, metric_list: list[DataQualityMetric]):
        self.data = source
        self.metric_list = metric_list

    def run_metrics(self):
        results = {}
        if self.data.load_data.empty:
            raise ValueError('No data')
        for metric in self.metric_list:
            metric_name = metric.__class__.__name__
            results[metric_name] = metric.evaluate(self.data.load_data)
        return results


In [2]:
import pandas as pd

# Define some data with missing values and duplicates for testing
data = pd.DataFrame({
    "id": [1, 2, 3, 4, None, 5, 2],
    "name": ["Alice", "Bob", None, "David", "Eve", "Frank", "Alice"]
})


In [3]:
source = DataSource(data)
evaluator = MetricEvaluator(source, [CompletenessMetric(), UniquenessMetric()])
results = evaluator.run_metrics()
print(results)

{'CompletenessMetric': True, 'UniquenessMetric': False}
