# Work Plan

The goal is to build a predictive model that forecasts when a customer is likely to cancel their service. Before diving into model creation, we will first classify users based on whether they canceled or continued their service. We'll use that data to predict future cancellations.

## Data Preprocessing

- Understand the four data frames:

    - Check for duplicates, missing values, and improper data types.

- Depending on the results, preprocessing could involve:

    - Removing duplicates.
    - Handling missing values (filling or removing them).
    - Converting data to correct types.
    - Encoding data for easier analysis.

- Consider merging all four datasets into one.

## Exploratory Data Analysis

- Visualize the data (histograms, box plots, correlation matrices).
- Identify outliers and examine features that may relate to the target.
- If encoding is used, check for class imbalance and adjust if needed.

## Model Training

- Use a supervised learning model.
  - Split data into features and target (EndDate column).
  - Use a 60/20/20 split for training, validation, and testing.
- Aim for an AUC-ROC of at least 0.75, but strive for 0.88.
- Test multiple models: Dummy, Decision Tree, Linear Regression, Random Forest, CatBoost, Light GBM, and XGBoost (only if necessary).

## Model Testing

- Evaluate performance on the test set for AUC-ROC and accuracy.
- Fine-tune the model or adjust training set size if necessary (consider 70/15/15 split if 60% is insufficient).

## Conclusions

- Final conclusions will be drawn after project completion.
- The project must meet these targets:
     - Code is error-free and organized.
     - Documentation explains each section of code.
     - Data is processed and prepared.
     - EDA with visuals is completed.
     - Models are trained.
     - The final model has an AUC-ROC of at least 0.75.