## ✅ 4. Ethical Considerations & Limitations

### 1. Ethical Considerations
#### 1.1. Fairness & Bias

A model may unfairly treat certain customer groups if:
- the data contains historical biases (e.g., income, age, contract type)
- uneven representation of categories impacts learning
- one socioeconomic group churns disproportionately, and the model begins to "penalize" it automatically

__Risks__:
- Discrimination of customers by age, income, and education
- Deterioration of service conditions due to false predictions

__What can be done__:
- evaluate fairness metrics (Disparate Impact, Equalized Odds)
- remove/hide socially sensitive features
- apply fairness regularization

#### 1.2. Privacy & Confidentiality

The dataset contains customer data potentially considered PII(Personally Identifiable Information), such as:
- demographics
- payment behavior
- credit limits
- geolocation features

When using such data, it is important to comply with:
- GDPR
- data retention policy
- feature minimization

We must ensure:
- there is no information that could identify a specific person
- data is not transferred to open services
- secure storage is used

#### 1.3. Responsible Use

It is important to understand that the model:
- does not make final decisions
- cannot be the sole factor in account closure
- must be used as a tool, not an automatic action classifier

The organization must:
- implement a "human-in-the-loop" approach
- use the model only to improve service
- not use it to impose sanctions against customers

### 2. Limitations
#### 2.1. Dataset Imbalance

Churn rate ≈ 16%, meaning:
- the model tends to predict the majority class (Non-Churn)
- the accuracy metric is useless
- requires using:
 - stratified split
 - oversampling/undersampling
 - ROC-AUC, PR-AUC

#### 2.2. Weak Correlations of Features with Churn

According to the correlation matrix:
- no numeric field correlates strongly with Churn
- most behavioral factors are weakly related to churn

Consequences:
- simple linear models → low accuracy
- nonlinear models should be used:
 - RandomForest
 - XGBoost
 - CatBoost

#### 2.3. Limited Feature Scope

The dataset only contains:
- demographics
- credit limits
- transaction history

Lacks:
- customer behavior over a long period
- external factors
- purchase history outside of card transactions
- sentiment analysis of communications

This limits what the model can actually "learn".

#### 2.4. Potential Data Quality Issues

Although the dataset appears clean:
- hidden outliers are possible
- temporal component is missing
- categorical features are simplified and truncated

#### 2.5. No Real Causality

We can estimate correlations, but not causes.

The model cannot say:

```"a customer will leave because they reduced card usage."```

It can only detect a statistical pattern.

#### 2.6. Generalization Issues

The model was trained on data from one company →
may perform poorly:
- in other countries
- on other tariffs
- with other types of clients

Further experimentation and adaptation are required.