# Problem Formulation

## Business Problem
Customer churn occurs when existing customers stop using a company's services or products. High churn rates lead to revenue loss, increased customer acquisition costs, and competitive disadvantages. Financial institutions, in particular, face significant churn as customers migrate to fintech alternatives. Reducing churn is critical for sustaining profitability and customer retention. \
we are focusing on a bank that has been experiencing customer attrition due to increased competition from digital banking solutions. Customers may leave due to factors such as unsatisfactory service, high fees, or better offerings from competitors. Understanding the key reasons behind churn will help the bank take proactive measures to retain customers and enhance its services.

## Key Business Objectives
1. **Predict customer churn:** Build a machine learning model to identify customers likely to leave.
2. **Understand churn factors:** Analyze key predictors influencing churn to aid strategic decision-making.
3. **Automate the pipeline:** Develop a scalable, automated system to ingest, process, and analyze data.
4. **Deploy a predictive model:** Provide real-time churn predictions for proactive intervention.

## Key Data Sources
We will use two data sources for this project:
1. **Bank Customer Churn Data (Kaggle CSV)**
   - Customer demographics (age, gender, tenure, etc.)
   - Account details (balance, number of products, etc.)
   - Activity indicators (credit card usage, estimated salary, etc.)
   - Churn label (indicating whether the customer left or stayed)
2. **API-based Customer Data**
   - Real-time updates on customer transactions and interactions.
   - Additional attributes to enhance prediction accuracy.

## Expected Outputs
1. **Clean datasets for exploratory data analysis (EDA):**
   - Remove duplicates, handle missing values, and correct inconsistencies.
2. **Transformed features for machine learning:**
   - Apply encoding, scaling, feature engineering, and aggregation.
3. **Deployable churn prediction model:**
   - Train, evaluate, and save the best-performing model.
   - Provide a REST API endpoint for real-time churn predictions.

## Measurable Evaluation Metrics
- **Accuracy, Precision, Recall, and F1 Score:** Evaluate classification performance.
- **ROC-AUC Score:** Assess model’s ability to distinguish churn vs. non-churn.
- **Feature Importance Analysis:** Identify key drivers of customer churn.
- **Model Deployment Latency:** Ensure real-time predictions meet business needs.

---


