
# **Python Packages and Modules**

### **Encapsulation Mechanism**
- A **package** is a way to group related **modules** into a single unit.
- A **package** is simply a **folder or directory** containing Python modules.
- Any folder that contains an `__init__.py` file is considered a **Python package**.
- A package can contain **sub-packages** as well.

#### **Advantages of Using Packages**
1. **Resolves naming conflicts** by organizing modules into different namespaces.
2. **Uniquely identifies components** within an application.
3. **Improves modularity** by structuring the code into reusable modules.



### **Package Structure Example**
```plaintext
Loan/
│── __init__.py
│── HomeLoan/
│   ├── __init__.py
│   ├── x.py
│   ├── y.py
│── VehicleLoan/
│   ├── __init__.py
│   ├── m.py
│   ├── n.py
```
- `Loan` is the main package.
- It contains two sub-packages: `HomeLoan` and `VehicleLoan`.
- Each sub-package has its own modules (`x.py`, `y.py`, etc.).
- The presence of `__init__.py` in each folder makes it a **Python package**.



### **Example 1: Creating and Importing a Package**
#### **Directory Structure**
```plaintext
D:\Python_classes
│── test.py
│── pack1/
│   ├── __init__.py  # Empty file
│   ├── module1.py
```

#### **Contents of `module1.py`**
```python
def f1():
    print("Hello, this is from module1 present in pack1")
```

#### **Contents of `test.py` (Version 1)**
```python
import pack1.module1
pack1.module1.f1()
```

#### **Alternative Import Syntax (Version 2)**
```python
from pack1.module1 import f1
f1()
```

### **Example 2: Using Nested Packages**
#### **Directory Structure**
```plaintext
D:\Python_classes
│── test.py
│── com/
│   ├── __init__.py  # Empty file
│   ├── module1.py
│   ├── durgasoft/
│       ├── __init__.py  # Empty file
│       ├── module2.py
```

#### **Contents of `module1.py`**
```python
def f1():
    print("Hello, this is from module1 present in com")
```

#### **Contents of `module2.py`**
```python
def f2():
    print("Hello, this is from module2 present in com.durgasoft")
```

#### **Contents of `test.py`**
```python
from com.module1 import f1
from com.durgasoft.module2 import f2

f1()
f2()
```

#### **Expected Output**
```plaintext
D:\Python_classes> py test.py
Hello, this is from module1 present in com
Hello, this is from module2 present in com.durgasoft
```



### **Python Code Structure**
- A **library** contains multiple **packages**.
- A **package** contains multiple **modules**.
- A **module** contains **functions, classes, and variables**.

#### **Diagram Representation**
```plaintext
Library
│── pack1/
│   ├── module1.py
│   ├── module2.py
│── pack2/
│   ├── module1.py
│   ├── module2.py
│── packN/
│   ├── moduleN.py
```
- Each **module** contains **functions, variables, and classes**.



### **Case Study: Data Engineering Pipeline using Python Packages and Modules**  



### **Problem Statement**  
A **financial institution** wants to build an **automated credit risk assessment system**. The system should:  
1. **Ingest data** from multiple sources (CSV, databases, APIs).  
2. **Preprocess and clean the data** (handle missing values, normalize features).  
3. **Perform feature engineering** (generate new features).  
4. **Train an ML model** to classify loan applicants as **low risk or high risk**.  
5. **Deploy the model** for real-time predictions.  



### **Project Structure**  
We organize our pipeline using **Python packages and modules** to ensure a modular and maintainable codebase.  

```plaintext
credit_risk_pipeline/
│── main.py                      # Entry point of the pipeline
│── config.py                     # Configuration settings
│── data/
│   ├── raw/                      # Raw data files
│   ├── processed/                 # Processed data files
│── src/
│   ├── __init__.py               # Makes src a package
│   ├── data_ingestion.py         # Module for data collection
│   ├── data_preprocessing.py     # Module for cleaning data
│   ├── feature_engineering.py    # Module for feature transformation
│   ├── model_training.py         # Module for training ML model
│   ├── model_inference.py        # Module for making predictions
│── models/
│   ├── trained_model.pkl         # Trained ML model
│── requirements.txt              # Dependencies
```



### **1. Data Ingestion Module (`data_ingestion.py`)**  
This module fetches data from **CSV, databases, and APIs** and saves it to the `data/raw/` directory.  

```python
import pandas as pd
import os

class DataIngestion:
    def __init__(self, data_path="data/raw/"):
        self.data_path = data_path

    def load_csv(self, file_name):
        """Load data from a CSV file."""
        file_path = os.path.join(self.data_path, file_name)
        return pd.read_csv(file_path)

    def save_data(self, df, file_name):
        """Save data to CSV."""
        file_path = os.path.join(self.data_path, file_name)
        df.to_csv(file_path, index=False)
        print(f"Data saved to {file_path}")
```

**Usage in `main.py`**
```python
from src.data_ingestion import DataIngestion

data_ingestor = DataIngestion()
df = data_ingestor.load_csv("credit_data.csv")
print(df.head())
```



### **2. Data Preprocessing Module (`data_preprocessing.py`)**  
Handles **missing values, normalization, and encoding**.  

```python
from sklearn.preprocessing import StandardScaler
import pandas as pd

class DataPreprocessing:
    def handle_missing_values(self, df):
        """Fill missing values with median."""
        return df.fillna(df.median())

    def normalize_features(self, df, columns):
        """Normalize numerical features using StandardScaler."""
        scaler = StandardScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df
```

**Usage in `main.py`**
```python
from src.data_preprocessing import DataPreprocessing

preprocessor = DataPreprocessing()
df_cleaned = preprocessor.handle_missing_values(df)
df_normalized = preprocessor.normalize_features(df_cleaned, ["income", "loan_amount"])
```



### **3. Feature Engineering Module (`feature_engineering.py`)**  
Creates new features to improve model performance.  

```python
class FeatureEngineering:
    def create_credit_ratio(self, df):
        """Create a new feature: Credit Utilization Ratio."""
        df["credit_ratio"] = df["loan_amount"] / df["income"]
        return df
```

**Usage in `main.py`**
```python
from src.feature_engineering import FeatureEngineering

feature_engineer = FeatureEngineering()
df_features = feature_engineer.create_credit_ratio(df_normalized)
```



### **4. Model Training Module (`model_training.py`)**  
Trains a **classification model** (e.g., Logistic Regression).  

```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pickle

class ModelTraining:
    def train_model(self, df, target_column):
        """Train a logistic regression model."""
        X = df.drop(columns=[target_column])
        y = df[target_column]

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        model = LogisticRegression()
        model.fit(X_train, y_train)

        # Save model
        with open("models/trained_model.pkl", "wb") as f:
            pickle.dump(model, f)

        print("Model trained and saved!")
```

**Usage in `main.py`**
```python
from src.model_training import ModelTraining

trainer = ModelTraining()
trainer.train_model(df_features, "risk_category")
```



### **5. Model Inference Module (`model_inference.py`)**  
Loads the trained model and makes predictions.  

```python
import pickle
import pandas as pd

class ModelInference:
    def __init__(self, model_path="models/trained_model.pkl"):
        with open(model_path, "rb") as f:
            self.model = pickle.load(f)

    def predict(self, df):
        """Make predictions on new data."""
        return self.model.predict(df)
```

**Usage in `main.py`**
```python
from src.model_inference import ModelInference

inference = ModelInference()
predictions = inference.predict(df_features)
print(predictions)
```



1. **Modular Design**:  
   - Each functionality is implemented in a separate module, making the code **organized and reusable**.  

2. **Scalability**:  
   - New data sources, transformations, or models can be added without changing the entire pipeline.  

3. **Reusability**:  
   - The `data_preprocessing.py` and `feature_engineering.py` modules can be reused in different projects.  

4. **Separation of Concerns**:  
   - **Data ingestion, processing, feature engineering, training, and inference** are clearly separated.  
