## **Encapsulation**

Encapsulation is the OOP concept of **bundling data and methods** that operate on that data within a single unit (i.e., a class), while **restricting direct access** to some of the object’s components.

>  **Goal**: Protect internal data and provide controlled access via class-defined methods.


## Why Encapsulation Matters
- Protects an object’s integrity by preventing unintended or harmful modifications.
- Encourages clean interfaces for interaction.
- Promotes modularity, code reusability, and debugging ease.

## Key Concepts in Encapsulation

| Concept              | Description                                                                 |
|----------------------|-----------------------------------------------------------------------------|
| **Private Attributes** | Internal data hidden from direct access (e.g., `__df`)                        |
| **Getter/Setter Methods** | Functions to retrieve or modify private data safely                          |
| **Access Control**     | Deciding who/what can see or edit which data                                 |
| **Interface Exposure** | Only expose necessary behavior via public methods (e.g., `.filter_by_role`) |


In [1]:
import pandas as pd
import numpy as np

# Create a dataset for data science team members
data = {
    "Name": [
        "Alice", "Bob", "Clara", "Dan", "Ella", "Frank", "Grace", "Hassan", "Ivy", "Jack",
        "Kate", "Leo", "Maya", "Nina", "Omar", "Paul", "Quinn", "Rita", "Steve", None
    ],
    "Role": [
        "Data Scientist", "Data Engineer", "ML Engineer", "Data Analyst", "Data Scientist",
        "ML Engineer", "Data Engineer", "Data Scientist", "Data Analyst", "ML Engineer",
        "Data Engineer", "Data Scientist", "Data Analyst", "Data Engineer", "ML Engineer",
        "Data Scientist", "Data Analyst", "ML Engineer", "Data Engineer", "Data Scientist"
    ],
    "Experience_Years": [
        3, 5, 2, 4, 6, 1, 7, 5, 3, 2,
        4, 6, 2, 5, 3, 7, 4, 1, 6, None
    ],
    "Specialization": [
        "NLP", "ETL", "Deep Learning", "Visualization", "Computer Vision", "Reinforcement Learning", 
        "Data Warehousing", "Machine Learning", "BI Tools", "MLOps",
        "Big Data", "Forecasting", "Dashboards", "Cloud Pipelines", "AutoML", "Recommendation", 
        "SQL", "TensorFlow", "Spark", "NLP"
    ],
    "Certifications": [
        2, 1, 3, 1, 4, 0, 2, 3, 1, 2,
        2, 4, 1, 2, 3, 3, 1, 0, 2, 2
    ],
    "Remote": [
        True, False, True, False, True, True, False, True, False, True,
        False, True, False, True, True, False, False, True, False, True
    ]
}

In [2]:
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Data Scientist,3.0,NLP,2,True
1,Bob,Data Engineer,5.0,ETL,1,False
2,Clara,ML Engineer,2.0,Deep Learning,3,True
3,Dan,Data Analyst,4.0,Visualization,1,False
4,Ella,Data Scientist,6.0,Computer Vision,4,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
6,Grace,Data Engineer,7.0,Data Warehousing,2,False
7,Hassan,Data Scientist,5.0,Machine Learning,3,True
8,Ivy,Data Analyst,3.0,BI Tools,1,False
9,Jack,ML Engineer,2.0,MLOps,2,True


### To understand and appreciate encapsolation let's look at a class without `Encapsolation` and a class with `Encapsolation`

### **1. Class Without Encapsulation (No Access Control)**

In [3]:
class CandidateDataNoEncapsulation:
    def __init__(self, df):
        self.df = df  # Public attribute — accessible and modifiable from outside

    def clean_data(self):
        self.df.dropna(inplace=True)

    def filter_by_role(self, role):
        return self.df[self.df["Role"] == role]

In [5]:
candidates = CandidateDataNoEncapsulation(df)
candidates.df

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Data Scientist,3.0,NLP,2,True
1,Bob,Data Engineer,5.0,ETL,1,False
2,Clara,ML Engineer,2.0,Deep Learning,3,True
3,Dan,Data Analyst,4.0,Visualization,1,False
4,Ella,Data Scientist,6.0,Computer Vision,4,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
6,Grace,Data Engineer,7.0,Data Warehousing,2,False
7,Hassan,Data Scientist,5.0,Machine Learning,3,True
8,Ivy,Data Analyst,3.0,BI Tools,1,False
9,Jack,ML Engineer,2.0,MLOps,2,True


#### **Let's tamper with the dataset and see what happens. *Let us try updating the dataframe with something else.***

In [6]:
candidates.df = "Oops! The dataset is now a string"

In [7]:
candidates.df

'Oops! The dataset is now a string'

#### We can see the dataframe has been updated into something else. Now this is where `Encapsolation` comes in.

### **2. Class With Basic Encapsulation**

In [10]:
class CandidateDataEncapsulation:
    def __init__(self, df):
        self.__df = df  # Private attribute

    def clean_data(self):
        self.__df.dropna(inplace=True)

    def filter_by_role(self, role):
        return self.__df[self.__df["Role"] == role]

    def get_all(self):
        return self.__df


### load the dataframe again and try

In [14]:
candidates = CandidateDataEncapsulation(df)
candidates.__df 

AttributeError: 'CandidateDataEncapsulation' object has no attribute '__df'

This fails because `__df` is a **private attribute**, made inaccessible from outside the class due to **name mangling** in Python. When you use double underscores (`__df`), Python internally renames it to something like `_CandidateDataEncapsulation__df` to prevent external access, enforcing encapsulation and protecting the data.


##### **To visualize the actual DataFrame inside the class, you should use the method provided within the class:**

In [12]:
candidates.get_all()

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Data Scientist,3.0,NLP,2,True
1,Bob,Data Engineer,5.0,ETL,1,False
2,Clara,ML Engineer,2.0,Deep Learning,3,True
3,Dan,Data Analyst,4.0,Visualization,1,False
4,Ella,Data Scientist,6.0,Computer Vision,4,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
6,Grace,Data Engineer,7.0,Data Warehousing,2,False
7,Hassan,Data Scientist,5.0,Machine Learning,3,True
8,Ivy,Data Analyst,3.0,BI Tools,1,False
9,Jack,ML Engineer,2.0,MLOps,2,True


- You **cannot** directly access `__df` outside the class.
- You **must** use methods like `get_all()` to work with the data.

### **3. Read-Only Encapsulation**

In [4]:
class CandidateDataReadOnlyEncapsulation:
    def __init__(self, df):
        self.__df = df  # Still private

    def clean_data(self):
        self.__df.dropna(inplace=True)

    def filter_by_role(self, role):
        return self.__df[self.__df["Role"] == role].copy() # Read-only version

    def get_data(self):
        return self.__df.copy()  # Read-only version

**Load the data frame from the dataframe cell above**

In [18]:
candidates = CandidateDataReadOnlyEncapsulation(df)

In [19]:
readonly_df = candidates.get_data()
readonly_df

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Data Scientist,3.0,NLP,2,True
1,Bob,Data Engineer,5.0,ETL,1,False
2,Clara,ML Engineer,2.0,Deep Learning,3,True
3,Dan,Data Analyst,4.0,Visualization,1,False
4,Ella,Data Scientist,6.0,Computer Vision,4,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
6,Grace,Data Engineer,7.0,Data Warehousing,2,False
7,Hassan,Data Scientist,5.0,Machine Learning,3,True
8,Ivy,Data Analyst,3.0,BI Tools,1,False
9,Jack,ML Engineer,2.0,MLOps,2,True


## **Basic Implementation vs. Read-Only Implementation in Encapsulation**

In object-oriented programming, **encapsulation** refers to bundling the data (attributes) and methods (functions) that operate on the data into a single unit or class. It also restricts direct access to some of the object's components, which is often done using **private** attributes and methods.

The difference between a basic encapsulation implementation and a read-only implementation lies in the **level of access** and **protection** provided to the internal data (i.e., the dataset `__df` in this case).

### **1. Basic Encapsulation (`CandidateDataEncapsulation`)**


- **Private Data**: The internal dataset (`__df`) is marked as **private**, which means it cannot be accessed directly from outside the class.
  
- **Methods**: The class has methods like `clean_data()`, `filter_by_role()`, and `get_all()` to interact with the internal data. These methods manipulate and return the dataset as needed, but they **do not prevent modification** by external code after the data is accessed. In other words, external code can receive a reference to the dataset and modify it, which might lead to unintentional changes.

#### Example:

In [21]:
# Creating an instance of CandidateDataEncapsulation
data_encapsulation = CandidateDataEncapsulation(df)

# Cleaning the data
data_encapsulation.clean_data()

# Accessing the data directly through the get_all() method
data = data_encapsulation.get_all()
data

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Data Scientist,3.0,NLP,2,True
1,Bob,Data Engineer,5.0,ETL,1,False
2,Clara,ML Engineer,2.0,Deep Learning,3,True
3,Dan,Data Analyst,4.0,Visualization,1,False
4,Ella,Data Scientist,6.0,Computer Vision,4,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
6,Grace,Data Engineer,7.0,Data Warehousing,2,False
7,Hassan,Data Scientist,5.0,Machine Learning,3,True
8,Ivy,Data Analyst,3.0,BI Tools,1,False
9,Jack,ML Engineer,2.0,MLOps,2,True


In [None]:
# Try modifying the data
#data.loc[0, 'Role'] = 'Lead Data Scientist'

data.iloc[0, data.columns.get_loc('Role')] = 'Lead Data Scientist'

# Check the internal data after modification
data_encapsulation.get_all().head(1)

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
0,Alice,Lead Data Scientist,3.0,NLP,2,True


### You can see that the we have successfully change the data true an authorised way which was gotten from `data` through `get_all` method.

- **Risk of Data Modification**: The key issue with this approach is that external users have access to the dataset (directly through the `get_all()` method) and can **modify** it outside the class. This means that there's no control over unintended modifications to the dataset after it's accessed.

---

### **2. Read-Only Encapsulation (`CandidateDataReadOnlyEncapsulation`)**

- **Private Data**: The dataset (`__df`) is still **private**, and cannot be accessed directly from outside the class.

- **Read-Only Access**: Unlike the basic implementation, the class's methods return **copies** of the data (`.copy()`) instead of references to the original dataset.
  
    - The `filter_by_role()` method returns a **copy** of the filtered dataset.
    - The `get_data()` method returns a **copy** of the internal dataset, preventing external code from modifying the internal data.


- **Safety of Data**: By returning a **copy** of the data, the internal dataset (`__df`) cannot be modified directly. This ensures that the original data is protected from external modifications, preventing accidental or malicious changes. If external code modifies the returned copy, it does not affect the internal data.

#### Example:

In [None]:
# Creating an instance of CandidateDataReadOnlyEncapsulation
read_only_encapsulation = CandidateDataReadOnlyEncapsulation(df)

# Cleaning the data
read_only_encapsulation.clean_data()

# Accessing the filtered data for "ML Engineer" role
ml_engineers = read_only_encapsulation.filter_by_role("ML Engineer")
ml_engineers

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
2,Clara,ML Engineer,2.0,Deep Learning,3,True
5,Frank,ML Engineer,1.0,Reinforcement Learning,0,True
9,Jack,ML Engineer,2.0,MLOps,2,True
14,Omar,ML Engineer,3.0,AutoML,3,True
17,Rita,ML Engineer,1.0,TensorFlow,0,True


In [6]:
# Modifying the filtered data (but not the internal data)

ml_engineers.iloc[0, ml_engineers.columns.get_loc('Role')] = 'Lead ML Engineer'

In [None]:
ml_engineers.iloc[0, ml_engineers.columns.get_loc('Role')]

'Lead ML Engineer'

In [None]:
# Re-fetch filtered data from internal state
ml_engineers_check = read_only_encapsulation.filter_by_role("ML Engineer")

# Print the first row to confirm it's unchanged
ml_engineers_check.head(1)

Unnamed: 0,Name,Role,Experience_Years,Specialization,Certifications,Remote
2,Clara,ML Engineer,2.0,Deep Learning,3,True


# let's explain why we have 
- `ml_engineers.iloc[0, ml_engineers.columns.get_loc('Role')]` with output of `Lead ML Engineer`.
and then we have
- `ml_engineers_check.head(1)` with output of `ML Engineer`.

**When we modify `ml_engineers`, we're working on a copy of the internal data returned by `filter_by_role()`, so the change affects only that copy. Later, when we fetch a fresh filtered result using the same method, we get an unmodified version of the internal data. This happens because the class returns `.copy()` instead of a direct reference. This behavior demonstrates one of the key benefits of encapsulation — `protecting internal state` from unintended external changes, ensuring data integrity and safer program behavior.**

---

### **Key Differences Between Basic and Read-Only Encapsulation**

| **Feature**              | **Basic Encapsulation**                                     | **Read-Only Encapsulation**                             |
|--------------------------|-------------------------------------------------------------|---------------------------------------------------------|
| **Access to Data**        | External code can access and modify the dataset (`get_all()` returns a reference to `__df`) | External code can only access a **copy** of the dataset (`get_data()` and `filter_by_role()` return copies) |
| **Data Safety**           | No protection; data can be modified directly by external code | Data is protected; modifications to the returned copy do not affect the original data |
| **Use Case**              | Suitable when internal data needs to be accessed and modified freely | Ideal for when you want to prevent accidental or unauthorized modification of internal data |

### **When to Use Each Approach**

- **Basic Encapsulation**: Use this approach when you want to **allow external code to modify the internal data**. This might be acceptable in scenarios where data modifications are expected, and there is no need for strict data protection.
  
  - Example use case: An application where external systems are responsible for updating records in the dataset (e.g., a customer management system).

- **Read-Only Encapsulation**: Use this approach when you want to **protect the internal data** and only allow external code to view or copy it without modifying it. This is important when data integrity is crucial and should not be accidentally modified.

  - Example use case: A data processing system where you need to ensure the original dataset remains unchanged while still allowing for filtering and reporting operations.

### Conclusion

The main difference between the **basic implementation** and the **read-only implementation** of encapsulation is **data protection**. While the basic encapsulation allows external code to modify the dataset once it's accessed, the read-only encapsulation ensures that any external interaction with the data is done through copies, preventing accidental or unauthorized modifications. This makes the read-only implementation safer for scenarios where data integrity is crucial.