<div style="background-color: pink; font-size: 40px; text-align: center;"> Detailed Explanation of Binary Classification Bank Churn Project with Prediction: 90.49% </div>

<div style="background-color: orange; font-size: 30px; text-align: center;"> Datasets
    </div>
    
1. https://www.kaggle.com/competitions/playground-series-s4e1
2. https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction

# **Part 1: Libraries and Settings**

#### **Libraries:**
- **NumPy (`np`):** NumPy is a powerful library for numerical operations in Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these elements.

- **Pandas (`pd`):** Pandas is a data manipulation library that provides data structures like DataFrames for easy handling and analysis of structured data.

- **Matplotlib (`plt`):** Matplotlib is a 2D plotting library for creating static, animated, and interactive visualizations in Python.

- **Seaborn (`sns`):** Seaborn is a statistical data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

- **TensorFlow (`tf`):** TensorFlow is an open-source machine learning framework developed by Google. It is widely used for building and training machine learning models, particularly deep learning models.

- **Optuna:** Optuna is an optimization framework for hyperparameter tuning. It simplifies the optimization of machine learning model hyperparameters.

- **Category Encoders:** A library for encoding categorical variables. Different encoding methods like One-Hot, M-Estimate, CatBoost, and Ordinal encoding are available.

- **Scikit-learn:** A comprehensive machine learning library that includes various tools for data preprocessing, model selection, evaluation, and more.

- **XGBoost, LightGBM, CatBoost:** These are gradient boosting libraries widely used for classification and regression tasks. They provide efficient implementations of boosting algorithms.

#### **Settings:**
- **Seaborn Theme and Palette:** Sets the theme and color palette for Seaborn plots to ensure consistent and visually appealing visualizations.

- **Pandas Settings:** Configures various display options for Pandas DataFrames, such as maximum rows, columns, and float formatting.

- **File Reading:** Reads the training and test datasets from CSV files using Pandas. The `index_col` parameter specifies the column to use as the index.

Let's move on to the next part. If you have any questions or if you'd like more details on any specific aspect, feel free to ask!

# **Part 2: Data Loading and Exploration**

#### **Data Exploration - Training Dataset (`train`):**
```python
train.head(10)
```
- Displays the first 10 rows of the training dataset for initial inspection.

```python
desc = pd.DataFrame(index=list(train))
desc['type'] = train.dtypes
desc['count'] = train.count()
desc['nunique'] = train.nunique()
desc['%unique'] = desc['nunique'] / len(train) * 100
desc['null'] = train.isnull().sum()
desc['%null'] = desc['null'] / len(train) * 100
desc['min'] = train.min()
desc['max'] = train.max()
desc
```
- Creates a descriptive DataFrame (`desc`) containing information about each column in the training dataset.
  - `type`: Data type of each column.
  - `count`: Non-null count of values.
  - `nunique`: Number of unique values.
  - `%unique`: Percentage of unique values.
  - `null`: Number of null values.
  - `%null`: Percentage of null values.
  - `min`: Minimum value.
  - `max`: Maximum value.

- Provides insights into the structure and characteristics of the training dataset, including the presence of null values, data types, and descriptive statistics.

#### **Data Exploration - Test Dataset (`test`):**
```python
test.head(10)
```
- Displays the first 10 rows of the test dataset for initial inspection.

```python
desc = pd.DataFrame(index=list(test))
desc['type'] = test.dtypes
desc['count'] = test.count()
desc['nunique'] = test.nunique()
desc['%unique'] = desc['nunique'] / len(test) * 100
desc['null'] = test.isnull().sum()
desc['%null'] = desc['null'] / len(test) * 100
desc['min'] = test.min()
desc['max'] = test.max()
desc
```
- Creates a descriptive DataFrame (`desc`) for the test dataset, similar to what was done for the training dataset.

- Provides insights into the structure and characteristics of the test dataset.

#### **Data Exploration - Original Dataset (`orig_train`):**
```python
orig_train.head(10)
```
- Displays the first 10 rows of the original training dataset (`Churn_Modelling.csv`) for initial inspection.

```python
desc = pd.DataFrame(index=list(orig_train))
desc['type'] = orig_train.dtypes
desc['count'] = orig_train.count()
desc['nunique'] = orig_train.nunique()
desc['%unique'] = desc['nunique'] / len(orig_train) * 100
desc['null'] = orig_train.isnull().sum()
desc['%null'] = desc['null'] / len(orig_train) * 100
desc['min'] = orig_train.min()
desc['max'] = orig_train.max()
desc
```
- Creates a descriptive DataFrame (`desc`) for the original training dataset, similar to what was done for the other datasets.

- Provides insights into the structure and characteristics of the original training dataset.

#### **Feature Categorization:**
```python
numerical_features = list(test._get_numeric_data())
categorical_features = list(test.drop(numerical_features, axis=1))
```
- Separates features into numerical and categorical types based on the test dataset.

These steps aim to understand the data, identify potential issues (such as missing values), and categorize features for further analysis and preprocessing. If you have any specific questions or if you'd like more details on any part, feel free to ask!

# **Part 3: Data Preparation**

#### **Combining Datasets:**
```python
X = pd.concat([orig_train, train]).reset_index(drop=True)
```
- Concatenates the original training dataset (`orig_train`) and the training dataset (`train`) along rows (`axis=0`).
- Resets the index to create a new combined dataset (`X`).

```python
y = X.pop('Exited')
```
- Extracts the target variable (`Exited`) from the combined dataset (`X`) and assigns it to `y`.
- The target variable is removed from `X` to create the feature matrix.

```python
orig_comp_combo = train.merge(orig_train, on=list(test), how='left')
orig_comp_combo.index = train.index

orig_test_combo = test.merge(orig_train, on=list(test), how='left')
orig_test_combo.index = test.index
```
- Merges the training dataset (`train`) with the original training dataset (`orig_train`) based on the common columns present in the test dataset.
- The result is stored in `orig_comp_combo`, and the index is set to match the index of the training dataset.

- Similar steps are performed for the test dataset, resulting in `orig_test_combo`.

#### **Cross-Validation Setup:**
```python
seed = 42
splits = 30
skf = StratifiedKFold(n_splits=splits, random_state=seed, shuffle=True)
```
- Sets the random seed for reproducibility (`seed`).
- Configures the Stratified K-Folds cross-validator (`skf`) with 30 splits and shuffling, which is commonly used for classification tasks.
- The stratified nature ensures that each fold maintains the same distribution of target classes as the original dataset.

#### **TensorFlow and Random Seed Configuration:**
```python
tf.keras.utils.set_random_seed(seed)
tf.config.experimental.enable_op_determinism()
```
- Sets the random seed for TensorFlow (`set_random_seed`) to ensure reproducibility.
- Enables deterministic operations in TensorFlow (`enable_op_determinism`), further contributing to reproducibility.

This part focuses on preparing the data for machine learning modeling, combining datasets, extracting target variables, and configuring settings for cross-validation and random seeds to ensure consistent and reproducible results during model training. If you have any specific questions or if you'd like more details on any part, feel free to ask!

# **Part 4: Feature Engineering**

#### **Custom Transformer Functions:**

1. **Nullify:**
   ```python
   def nullify(x):
       x_copy = x.copy()
       x_copy['Balance'] = x_copy['Balance'].replace({0: np.nan})
       return x_copy

   Nullify = FunctionTransformer(nullify)
   ```
   - Replaces zero balances in the 'Balance' column with NaN, utilizing the `replace` method.
   - Implemented as a custom transformer using `FunctionTransformer`.

2. **SalaryRounder:**
   ```python
   def salary_rounder(x):
       x_copy = x.copy()
       x_copy['EstimatedSalary'] = (x_copy['EstimatedSalary'] * 100).astype(np.uint64)
       return x_copy

   SalaryRounder = FunctionTransformer(salary_rounder)
   ```
   - Multiplies the 'EstimatedSalary' column by 100 and converts it to uint64.
   - Another custom transformer using `FunctionTransformer`.

3. **AgeRounder:**
   ```python
   def age_rounder(x):
       x_copy = x.copy()
       x_copy['Age'] = (x_copy['Age'] * 10).astype(np.uint16)
       return x_copy

   AgeRounder = FunctionTransformer(age_rounder)
   ```
   - Multiplies the 'Age' column by 10 and converts it to uint16.
   - A custom transformer similar to the previous ones.

4. **BalanceRounder:**
   ```python
   def balance_rounder(x):
       x_copy = x.copy()
       x_copy['Balance'] = (x_copy['Balance'] * 100).astype(np.uint64)
       return x_copy

   BalanceRounder = FunctionTransformer(balance_rounder)
   ```
   - Multiplies the 'Balance' column by 100 and converts it to uint64.
   - Implemented as a custom transformer using `FunctionTransformer`.

5. **FeatureGenerator:**
   ```python
   def feature_generator(x):
       x_copy = x.copy()
       # x_copy['IsSenior'] = (x_copy['Age'] >= 600).astype(np.uint8)
       x_copy['IsActive_by_CreditCard'] = x_copy['HasCrCard'] * x_copy['IsActiveMember']
       x_copy['Products_Per_Tenure'] = x_copy['Tenure'] / x_copy['NumOfProducts']
       x_copy['ZeroBalance'] = (x_copy['Balance'] == 0).astype(np.uint8)
       x_copy['AgeCat'] = np.round(x_copy.Age / 20).astype(np.uint16)
       x_copy['AllCat'] = x_copy['Surname'] + x_copy['Geography'] + x_copy['Gender'] + x_copy.EstimatedSalary.astype(
           'str') + x_copy.CreditScore.astype('str') + x_copy.Age.astype('str') + x_copy.NumOfProducts.astype(
           'str') + x_copy.Tenure.astype('str') + x_copy.CustomerId.astype('str')

       return x_copy

   FeatureGenerator = FunctionTransformer(feature_generator)
   ```
   - Generates new features based on existing columns.
   - Creates features such as 'IsActive_by_CreditCard', 'Products_Per_Tenure', 'ZeroBalance', 'AgeCat', and 'AllCat'.
   - Implemented as a custom transformer using `FunctionTransformer`.

6. **SVDRounder:**
   ```python
   def svd_rounder(x):
       x_copy = x.copy()
       for col in [column for column in list(x) if 'SVD' in column]:
           x_copy[col] = (x_copy[col] * 1e18).astype(np.int64)

       return x_copy

   SVDRounder = FunctionTransformer(svd_rounder)
   ```
   - Rounds the values in columns containing 'SVD' in their names by multiplying them with 1e18 and converting to int64.
   - Custom transformer created with `FunctionTransformer`.

These custom transformers are designed to preprocess and engineer features in the dataset. They encapsulate specific operations, making it easier to apply them consistently and modularly within the data preparation pipeline. If you have any questions or need further clarification, feel free to ask!

> 1. **IsActive_by_CreditCard:**
>    - **Logic:** Multiplying 'HasCrCard' (whether the customer has a credit card) by 'IsActiveMember' (whether the customer is an active member).
>    - **Purpose:** This feature captures the interaction between having a credit card and being an active member, potentially revealing patterns specific to this combination.
> 
> 2. **Products_Per_Tenure:**
>    - **Logic:** Calculating the ratio of 'Tenure' (the number of years the customer has been with the bank) to 'NumOfProducts' (the number of bank products the customer uses).
>    - **Purpose:** This feature normalizes the tenure by the number of products, providing a perspective on the average duration a customer holds a product.
> 
> 3. **ZeroBalance:**
>    - **Logic:** Creating a binary indicator for whether the 'Balance' is zero.
>    - **Purpose:** This binary feature identifies customers with zero balances, which might have distinct behaviors or characteristics.
> 
> 4. **AgeCat:**
>    - **Logic:** Dividing 'Age' by 20 and rounding to assign the customer to an age category.
>    - **Purpose:** This discretizes the age into categories, potentially capturing non-linear relationships with the target variable.
> 
> 5. **AllCat:**
>    - **Logic:** Concatenating several categorical and numerical columns after converting them to strings.
>    - **Purpose:** This feature combines information from multiple columns into a single categorical feature, potentially capturing complex patterns and interactions.
> 
> These feature engineering techniques aim to uncover hidden patterns, relationships, or variations in the data that might be valuable for predictive modeling. It's an iterative process where the creation of features is guided by domain knowledge, intuition, and experimentation. If you have further questions or need clarification on any specific aspect, feel free to ask!

# **Part 5:Feature Engineering:**

#### 5.1 FeatureDropper:
```python
class FeatureDropper(BaseEstimator, TransformerMixin):
    def __init__(self, cols):
        self.cols = cols

    def fit(self, x, y):
        return self

    def transform(self, x):
        return x.drop(self.cols, axis=1)
```
- **Purpose:**
  - **Logic:** Drops specified columns from the input DataFrame `x`.
  - **Purpose:** This class provides a transformer to drop unnecessary or unwanted columns during the data preprocessing pipeline.

#### 5.2 Categorizer:
```python
class Categorizer(BaseEstimator, TransformerMixin):
    def __init__(self, cols: list):
        self.cols = cols

    def fit(self, x, y):
        return self

    def transform(self, x):
        return x.astype({cat: 'category' for cat in self.cols})
```
- **Purpose:**
  - **Logic:** Converts specified columns to the 'category' data type.
  - **Purpose:** This transformer is useful when dealing with categorical variables in machine learning models that benefit from the 'category' dtype, such as LightGBM or CatBoost.

#### 5.3 Vectorizer:
```python
class Vectorizer(BaseEstimator, TransformerMixin):
    def __init__(self, max_features=1000, cols=['Surname'], n_components=3):
        self.max_features = max_features
        self.cols = cols
        self.n_components = n_components

    def fit(self, x, y):
        self.vectorizer_dict = {}
        self.decomposer_dict = {}

        for col in self.cols:
            self.vectorizer_dict[col] = TfidfVectorizer(max_features=self.max_features).fit(x[col].astype(str), y)
            self.decomposer_dict[col] = TruncatedSVD(random_state=seed, n_components=self.n_components).fit(
                self.vectorizer_dict[col].transform(x[col].astype(str)), y
            )

        return self

    def transform(self, x):
        vectorized = {}

        for col in self.cols:
            vectorized[col] = self.vectorizer_dict[col].transform(x[col].astype(str))
            vectorized[col] = self.decomposer_dict[col].transform(vectorized[col])

        vectorized_df = pd.concat([pd.DataFrame(vectorized[col]).rename({
            f'truncatedsvd{i}': f'{col}SVD{i}' for i in range(self.n_components)
        }, axis=1) for col in self.cols], axis=1)

        return pd.concat([x.reset_index(drop=True), vectorized_df], axis=1)
```
- **Purpose:**
  - **Logic:**
    - Fits a TfidfVectorizer and TruncatedSVD for each specified column.
    - Transforms the original columns into a reduced-dimensional space using TruncatedSVD.
  - **Purpose:** This class is designed for columns containing textual data (like 'Surname'). It converts the text into numerical features using TF-IDF and reduces dimensionality with TruncatedSVD, which can be useful for machine learning models. This is especially common in Natural Language Processing (NLP) tasks.

# **Part 6: Model Cross-Validation:**
Certainly! Let's break down the `cross_val_score` function step by step:

#### 6.1 `cross_val_score` Function:

```python
def cross_val_score(estimator, cv=skf, label='', include_original=True, show_importance=False, add_reverse=False):
    X = train.copy()
    y = X.pop('Exited')
```
- **Explanation:**
  - `cross_val_score` is a function that performs cross-validation for a given machine learning model (`estimator`).
  - `cv=skf` sets the default cross-validation strategy to Stratified K-Fold (`skf` is defined earlier).
  - `label=''` is a parameter for labeling purposes.
  - `include_original`, `show_importance`, and `add_reverse` are optional parameters for including the original dataset, showing feature importance, and augmenting the training set by reversing, respectively.
  - `X` is a copy of the training data, and `y` is the target variable (`Exited` column).

```python
    # initiate prediction arrays and score lists
    val_predictions = np.zeros((len(X)))
    train_scores, val_scores = [], []
    feature_importances_table = pd.DataFrame({'value': 0}, index=list(X.columns))
    test_predictions = np.zeros((len(test)))
```
- **Explanation:**
  - Initializes arrays and lists to store predictions and scores during cross-validation.
  - `val_predictions`: An array to store validation set predictions.
  - `train_scores` and `val_scores`: Lists to store training and validation ROC AUC scores.
  - `feature_importances_table`: A DataFrame to store feature importance values (initialized with zeros).
  - `test_predictions`: An array to store final test set predictions.

```python
    # training model, predicting prognosis probability, and evaluating metrics
    for fold, (train_idx, val_idx) in enumerate(cv.split(X, y)):
```
- **Explanation:**
  - Initiates a loop for each fold in the cross-validation.
  - `fold`: Current fold index.
  - `train_idx` and `val_idx`: Indices for the training and validation sets for the current fold.

```python
        model = clone(estimator)
```
- **Explanation:**
  - Creates a clone of the original machine learning model (`estimator`) for the current fold.

```python
        # define train set
        X_train = X.iloc[train_idx].reset_index(drop=True)
        y_train = y.iloc[train_idx].reset_index(drop=True)
```
- **Explanation:**
  - Defines the training set using the current fold indices and resets the index for consistency.

```python
        # define validation set
        X_val = X.iloc[val_idx].reset_index(drop=True)
        y_val = y.iloc[val_idx].reset_index(drop=True)
```
- **Explanation:**
  - Defines the validation set using the current fold indices and resets the index for consistency.

```python
        if include_original:
            X_train = pd.concat([orig_train.drop('Exited', axis=1), X_train]).reset_index(drop=True)
            y_train = pd.concat([orig_train.Exited, y_train]).reset_index(drop=True)
```
- **Explanation:**
  - Optionally includes the original dataset in the training set if `include_original` is True.

```python
        if add_reverse:
            X_train = pd.concat([X_train, X_train.iloc[::-1]]).reset_index(drop=True)
            y_train = pd.concat([y_train, y_train.iloc[::-1]]).reset_index(drop=True)
```
- **Explanation:**
  - Optionally augments the training set by adding its reversed version if `add_reverse` is True.

```python
        # train model
        model.fit(X_train, y_train)
```
- **Explanation:**
  - Trains the machine learning model on the defined training set.

```python
        # make predictions
        train_preds = model.predict_proba(X_train)[:, 1]
        val_preds = model.predict_proba(X_val)[:, 1]
```
- **Explanation:**
  - Generates predictions (probability of positive class) for the training and validation sets.

```python
        val_predictions[val_idx] += val_preds
        test_predictions += model.predict_proba(test)[:, 1] / cv.get_n_splits()
```
- **Explanation:**
  - Accumulates the validation set predictions and averages the test set predictions over folds.

```python
        if show_importance:
            feature_importances_table['value'] += permutation_importance(model, X_val, y_val, random_state=seed,
                                                                         scoring=make_scorer(roc_auc_score,
                                                                                             needs_proba=True),
                                                                         n_repeats=5).importances_mean / cv.get_n_splits()
```
- **Explanation:**
  - Optionally calculates feature importances using permutation importance if `show_importance` is True.

```python
        # evaluate model for a fold
        train_score = roc_auc_score(y_train, train_preds)
        val_score = roc_auc_score(y_val, val_preds)
```
- **Explanation:**
  - Evaluates the ROC AUC scores for the training and validation sets.

```python
        # append model score for a fold to list
        train_scores.append(train_score)
        val_scores.append(val_score)
```
- **Explanation:**
  - Appends the ROC AUC scores to the corresponding lists.

```python
    if show_importance:
        plt.figure(figsize=(20, 30))
        plt.title(f'Features with Biggest Importance of {np.mean(val_scores):.5f} ± {np.std(val_scores):.5f} Model',
                  size=25, weight='bold')
        sns.barplot(feature_importances_table.sort_values('value', ascending=False).T, orient='h', palette='viridis')
        plt.show()
    else:
        print(
            f'Val Score: {np.mean(val_scores):.5f} ± {np.std(val_scores):.5f} | Train Score: {np.mean(train_scores):.5f} ± {np.std(train_scores):.5f} | {label}')
```
- **Explanation:**
  - Optionally displays a bar plot of feature importances if `show_importance` is True.
  - Prints the mean and standard deviation of ROC AUC scores for training and validation sets.

```python
    val_predictions = np.where(orig_comp_combo.Exited_y == 1, 0,
                               np.where(orig_comp_combo.Exited_y == 0, 1, val_predictions))
    test_predictions = np.where(orig_test_combo.Exited == 1, 0,
                                np.where(orig_test_combo.Exited == 0, 1, test_predictions))
```
- **Explanation:**
  - Adjusts the predictions for cases where the original dataset has target values (`Exited_y` and `Exited`) to avoid leakage.

```python
    return val_scores, val_predictions, test_predictions
```
- **Explanation:**
  - Returns the validation scores, predictions, and test predictions for analysis.

### Overall Purpose:
This function performs a robust cross-validation for a given machine learning model, providing evaluation scores, predictions, and optional feature importance analysis. It incorporates various parameters for flexibility in the training process. The augmented training sets, inclusion of the original dataset, and feature importance analysis contribute to a thorough evaluation of the model.

# **Part 7: Logistic Regression Model**

##### Model Pipeline:
```python
Log = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat', 'EstimatedSalary', 'CreditScore'], max_features=500, n_components=4),
    CatBoostEncoder(cols=cat_features + [f'SurnameSVD{i}' for i in range(4)]),
    StandardScaler(),
    LogisticRegression(random_state=seed, max_iter=1000000000)
)
```

##### Explanation:
1. **SalaryRounder, AgeRounder, FeatureGenerator:**
   - `SalaryRounder`: Rounds the 'EstimatedSalary' feature.
   - `AgeRounder`: Rounds the 'Age' feature.
   - `FeatureGenerator`: Generates additional features based on the existing ones, such as interaction features.

2. **Vectorizer:**
   - Utilizes TF-IDF Vectorization and Truncated Singular Value Decomposition (SVD) to create embeddings for categorical columns.
   - Columns like 'Surname', 'AllCat', 'EstimatedSalary', and 'CreditScore' are vectorized with a limit of 500 features and 4 components.

3. **CatBoostEncoder:**
   - Applies CatBoost encoding to categorical features and the newly created SVD features.
   - CatBoost encoding is a method to encode categorical features based on the target variable.

4. **StandardScaler:**
   - Standardizes the numerical features to have a mean of 0 and a standard deviation of 1.
   - Ensures all features contribute equally to the logistic regression model.

5. **LogisticRegression:**
   - Implements logistic regression as the final classifier.
   - Uses the standard logistic regression algorithm with a large number of iterations (`max_iter=1000000000`) to ensure convergence.

##### Cross-validation and Training:
```python
_, oof_list['Log'], predict_list['Log'] = cross_val_score(Log)
```
- Calls the `cross_val_score` function to perform cross-validation for the Logistic Regression model.
- Obtains the validation scores (`oof_list['Log']`), validation predictions, and test predictions for analysis.

##### Overall Purpose:
This logistic regression model is part of a pipeline that includes feature engineering, vectorization, encoding, scaling, and logistic regression training. The pipeline is designed for robust performance on a binary classification task. The feature engineering and encoding steps aim to capture meaningful patterns in the data, while logistic regression serves as the final classifier. The model is evaluated using cross-validation to assess its generalization performance.

**Explanation:**

SalaryRounder, AgeRounder, FeatureGenerator: These are preprocessing steps to round specific columns.
Vectorizer: It converts selected categorical columns into numerical vectors using TF-IDF and dimensionality reduction with TruncatedSVD.
CatBoostEncoder: Encodes categorical features using CatBoost encoding.
StandardScaler: Standardizes the feature values.
LogisticRegression: Implements logistic regression for binary classification.

# **Part 8: TensorFlow Model**

##### Model Definition:
```python
class TensorFlower(BaseEstimator, ClassifierMixin):

    def fit(self, x, y):
        # Model Architecture
        inputs = tf.keras.Input((x.shape[1]))
        inputs_norm = tf.keras.layers.BatchNormalization()(inputs)

        z = tf.keras.layers.Dense(32)(inputs_norm)
        z = tf.keras.layers.BatchNormalization()(z)
        z = tf.keras.layers.LeakyReLU()(z)

        z = tf.keras.layers.Dense(64)(z)
        z = tf.keras.layers.BatchNormalization()(z)
        z = tf.keras.layers.LeakyReLU()(z)

        z = tf.keras.layers.Dense(16)(z)
        z = tf.keras.layers.BatchNormalization()(z)
        z = tf.keras.layers.LeakyReLU()(z)

        z = tf.keras.layers.Dense(4)(z)
        z = tf.keras.layers.BatchNormalization()(z)
        z = tf.keras.layers.LeakyReLU()(z)

        z = tf.keras.layers.Dense(1)(z)
        z = tf.keras.layers.BatchNormalization()(z)
        outputs = tf.keras.activations.sigmoid(z)

        # Model Compilation
        self.model = tf.keras.Model(inputs, outputs)
        self.model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.AdamW(1e-4))

        # Model Training
        self.model.fit(x.to_numpy(), y, epochs=10, verbose=0)
        self.classes_ = np.unique(y)

        return self

    def predict_proba(self, x):
        predictions = np.zeros((len(x), 2))
        predictions[:, 1] = self.model.predict(x, verbose=0)[:, 0]
        predictions[:, 0] = 1 - predictions[:, 1]
        return predictions

    def predict(self, x):
        return np.argmax(self.predict_proba(x), axis=1)
```

##### Model Architecture:
1. **Input Layer:**
   - `inputs = tf.keras.Input((x.shape[1]))`: Defines the input layer with the number of features in the input data (`x`).

2. **Normalization Layer:**
   - `inputs_norm = tf.keras.layers.BatchNormalization()(inputs)`: Applies batch normalization to normalize the input features.

3. **Dense Layers with Leaky ReLU Activation:**
   - Several densely connected layers (`Dense`) with different numbers of units and batch normalization.
   - Each dense layer is followed by a Leaky Rectified Linear Unit (Leaky ReLU) activation function.

4. **Output Layer:**
   - `z = tf.keras.layers.Dense(1)(z)`: Final dense layer with a single unit, representing the binary classification output.
   - Batch normalization and a sigmoid activation function are applied to produce probabilities.

##### Model Compilation and Training:
- **Compilation:**
  - `self.model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.AdamW(1e-4))`: Compiles the model with binary cross-entropy loss and the AdamW optimizer.

- **Training:**
  - `self.model.fit(x.to_numpy(), y, epochs=10, verbose=0)`: Trains the model on the input data (`x`) and labels (`y`) for 10 epochs.

##### Prediction Functions:
1. **Predict Probabilities:**
   - `def predict_proba(self, x)`: Returns the predicted probabilities for each class in a 2D array.

2. **Predict Class Labels:**
   - `def predict(self, x)`: Returns the predicted class labels by selecting the class with the highest probability.

##### Model Pipeline:
```python
TensorFlowey = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    CatBoostEncoder(cols=cat_features),
    TensorFlower()
)
```

- Incorporates the `TensorFlower` model into a scikit-learn pipeline.
- Preprocesses features using the `SalaryRounder`, `AgeRounder`, and `FeatureGenerator`.
- Encodes categorical features using CatBoostEncoder.
- Applies the TensorFlow model (`TensorFlower`) for training and prediction.

##### Cross-validation and Training:
```python
_, oof_list['TF'], predict_list['TF'] = cross_val_score(TensorFlowey)
```
- Calls the `cross_val_score` function to perform cross-validation for the TensorFlow model.
- Obtains the validation scores (`oof_list['TF']`), validation predictions, and test predictions for analysis.

This TensorFlow model is a simple feedforward neural network designed for binary classification, and it's integrated into a scikit-learn compatible pipeline for ease of use and integration with other models.

**Explanation:**

TensorFlower class: Custom TensorFlow model using Keras Sequential API with BatchNormalization and Dense layers.
TensorFlowey: It is a pipeline that includes preprocessing and the custom TensorFlow model.
cross_val_score: Cross-validates and evaluates the TensorFlow model.

# **Part 9: XGBoost Model**

##### Objective Function for Optuna:
```python
def xgb_objective(trial):
    # Hyperparameter search space
    params = {
        'eta': trial.suggest_float('eta', .001, .3, log=True),
        'max_depth': trial.suggest_int('max_depth', 2, 30),
        'subsample': trial.suggest_float('subsample', .5, 1),
        'colsample_bytree': trial.suggest_float('colsample_bytree', .1, 1),
        'min_child_weight': trial.suggest_float('min_child_weight', .1, 20, log=True),
        'reg_lambda': trial.suggest_float('reg_lambda', .01, 20, log=True),
        'reg_alpha': trial.suggest_float('reg_alpha', .01, 10, log=True),
        'n_estimators': 1000,
        'random_state': seed,
        'tree_method': 'hist',
    }

    # XGBoost model within an Optuna pipeline
    optuna_model = make_pipeline(
        SalaryRounder,
        AgeRounder,
        FeatureGenerator,
        Vectorizer(cols=['Surname', 'AllCat', 'EstimatedSalary', 'CustomerId'], max_features=1000, n_components=3),
        CatBoostEncoder(cols=['CustomerId', 'Surname', 'EstimatedSalary', 'AllCat', 'CreditScore']),
        MEstimateEncoder(cols=['Geography', 'Gender']),
        XGBClassifier(**params)
    )

    # Cross-validation and scoring
    optuna_score, _, _ = cross_val_score(optuna_model)

    return np.mean(optuna_score)
```

- Defines an objective function for Optuna hyperparameter optimization.
- Searches for optimal hyperparameters for XGBoost using Optuna.

##### Optuna Study:
```python
xgb_study = optuna.create_study(direction='maximize')

# xgb_study.optimize(xgb_objective, 50)
xgb_params = {'eta': 0.04007938900538817, 'max_depth': 5, 'subsample': 0.8858539721226424,
              'colsample_bytree': 0.41689519430449395, 'min_child_weight': 0.4225662401139526,
              'reg_lambda': 1.7610231110037127, 'reg_alpha': 1.993860687732973}
```

- Creates an Optuna study for hyperparameter optimization with a maximization objective.
- Optionally, performs hyperparameter optimization (commented out in the code).

##### XGBoost Model Pipeline:
```python
XGB = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat', 'EstimatedSalary', 'CustomerId'], max_features=1000, n_components=3),
    CatBoostEncoder(cols=['CustomerId', 'Surname', 'EstimatedSalary', 'AllCat', 'CreditScore']),
    MEstimateEncoder(cols=['Geography', 'Gender']),
    XGBClassifier(**xgb_params, random_state=seed, tree_method='hist', n_estimators=1000)
)
```

- Defines the XGBoost model pipeline with fixed hyperparameters or those optimized through Optuna.
- Preprocesses features, encodes categorical features, and applies the XGBoost classifier.

##### Cross-validation and Training:
```python
_, oof_list['XGB'], predict_list['XGB'] = cross_val_score(XGB, show_importance=False)
```

- Calls the `cross_val_score` function to perform cross-validation for the XGBoost model.
- Obtains the validation scores (`oof_list['XGB']`), validation predictions, and test predictions for analysis.

This XGBoost model leverages Optuna for hyperparameter optimization and is integrated into a scikit-learn compatible pipeline for consistent preprocessing and model training. The hyperparameters can be fixed or optimized based on the use of the Optuna study.

**Explanation:**

xgb_objective: This function defines the optimization objective for Optuna to find the best hyperparameters for XGBoost.
XGBClassifier: It is a gradient boosting algorithm using decision trees as base learners.

# **Part 10: LightGBM Model**

##### Objective Function for Optuna:
```python
def lgb_objective(trial):
    # Hyperparameter search space
    params = {
        'learning_rate': trial.suggest_float('learning_rate', .001, .1, log=True),
        'max_depth': trial.suggest_int('max_depth', 2, 20),
        'subsample': trial.suggest_float('subsample', .5, 1),
        'min_child_weight': trial.suggest_float('min_child_weight', .1, 15, log=True),
        'reg_lambda': trial.suggest_float('reg_lambda', .1, 20, log=True),
        'reg_alpha': trial.suggest_float('reg_alpha', .1, 10, log=True),
        'n_estimators': 1000,
        'random_state': seed,
        # 'boosting_type' : 'dart',
    }

    # LightGBM model within an Optuna pipeline
    optuna_model = make_pipeline(
        SalaryRounder,
        AgeRounder,
        FeatureGenerator,
        Vectorizer(cols=['Surname', 'AllCat'], max_features=1000, n_components=3),
        CatBoostEncoder(cols=['Surname', 'AllCat', 'CreditScore', 'Age']),
        MEstimateEncoder(cols=['Geography', 'Gender', 'NumOfProducts']),
        StandardScaler(),
        LGBMClassifier(**params)
    )

    # Cross-validation and scoring
    optuna_score, _, _ = cross_val_score(optuna_model)

    return np.mean(optuna_score)
```

- Defines an objective function for Optuna hyperparameter optimization.
- Searches for optimal hyperparameters for LightGBM using Optuna.

##### Optuna Study:
```python
lgb_study = optuna.create_study(direction='maximize')

# lgb_study.optimize(lgb_objective, 100)
lgb_params = {'learning_rate': 0.01864960338160943, 'max_depth': 9, 'subsample': 0.6876252164703066,
              'min_child_weight': 0.8117588782708633, 'reg_lambda': 6.479178739677389, 'reg_alpha': 3.2952573115561234}
```

- Creates an Optuna study for hyperparameter optimization with a maximization objective.
- Optionally, performs hyperparameter optimization (commented out in the code).

##### LightGBM Model Pipeline:
```python
LGB = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat'], max_features=1000, n_components=3),
    CatBoostEncoder(cols=['Surname', 'AllCat', 'CreditScore', 'Age']),
    MEstimateEncoder(cols=['Geography', 'Gender', 'NumOfProducts']),
    StandardScaler(),
    LGBMClassifier(**lgb_params, random_state=seed, n_estimators=1000)
)
```

- Defines the LightGBM model pipeline with fixed hyperparameters or those optimized through Optuna.
- Preprocesses features, encodes categorical features, and applies the LightGBM classifier.

##### Cross-validation and Training:
```python
_, oof_list['LGB'], predict_list['LGB'] = cross_val_score(LGB, show_importance=False)
```

- Calls the `cross_val_score` function to perform cross-validation for the LightGBM model.
- Obtains the validation scores (`oof_list['LGB']`), validation predictions, and test predictions for analysis.

This LightGBM model leverages Optuna for hyperparameter optimization and is integrated into a scikit-learn compatible pipeline for consistent preprocessing and model training. The hyperparameters can be fixed or optimized based on the use of the Optuna study.

**Explanation:**

lgb_objective: This function defines the optimization objective for Optuna to find the best hyperparameters for LightGBM.
LGBMClassifier: It is a gradient boosting framework using tree-based learning algorithms.

# **Part 11: CatBoost Models**

In this section, three CatBoost models (`CB`, `CB_Bayes`, `CB_Bernoulli`) are constructed within a scikit-learn `Pipeline`. These models are then evaluated using cross-validation.

##### 1. **Basic CatBoost Model (`CB`):**
```python
CB = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat'], max_features=1000, n_components=4),
    SVDRounder,
    CatBoostClassifier(random_state=seed, verbose=0, cat_features=cat_features + [f'SurnameSVD{i}' for i in range(4)],
                       has_time=True)
)

_, oof_list['CB'], predict_list['CB'] = cross_val_score(CB, show_importance=False)
```

   - **Pipeline Setup:**
     - A scikit-learn `Pipeline` is created (`CB`) to organize a sequence of preprocessing and modeling steps.
     - Preprocessing steps include rounding specific numerical features (`SalaryRounder`, `AgeRounder`), generating new features (`FeatureGenerator`), vectorizing text-based features (`Vectorizer`), applying truncated SVD (`SVDRounder`), and finally using a CatBoostClassifier.

   - **Model Configuration:**
     - The CatBoostClassifier is configured with specific parameters:
       - `random_state`: Ensures reproducibility.
       - `verbose=0`: Suppresses CatBoost's output during training.
       - `cat_features`: Specifies categorical features, including additional features generated by truncated SVD.
       - `has_time=True`: Indicates the dataset has a time component.

   - **Cross-Validation:**
     - The `cross_val_score` function is used to perform cross-validation on this CatBoost model.
     - The results are stored in the `oof_list['CB']` (out-of-fold predictions) and `predict_list['CB']` (test predictions).

##### 2. **CatBoost with Bayesian Bootstrap (`CB_Bayes`):**
```python
CB_Bayes = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat'], max_features=1000, n_components=4),
    SVDRounder,
    CatBoostClassifier(random_state=seed, verbose=0, cat_features=cat_features + [f'SurnameSVD{i}' for i in range(4)],
                       bootstrap_type='Bayesian', has_time=True)
)

_, oof_list['CB_Bayes'], predict_list['CB_Bayes'] = cross_val_score(CB_Bayes, show_importance=False)
```

   - **Model Variation:**
     - The second CatBoost model (`CB_Bayes`) is similar to the basic model but introduces a change in the bootstrap sampling type.
     - `bootstrap_type='Bayesian'`: Utilizes Bayesian bootstrap sampling.

   - **Cross-Validation:**
     - The model is evaluated using cross-validation, and the results are stored in `oof_list['CB_Bayes']` and `predict_list['CB_Bayes']`.

##### 3. **CatBoost with Bernoulli Bootstrap (`CB_Bernoulli`):**
```python
CB_Bernoulli = make_pipeline(
    SalaryRounder,
    AgeRounder,
    FeatureGenerator,
    Vectorizer(cols=['Surname', 'AllCat'], max_features=1000, n_components=4),
    SVDRounder,
    CatBoostClassifier(random_state=seed, verbose=0, cat_features=cat_features + [f'SurnameSVD{i}' for i in range(4)],
                       bootstrap_type='Bernoulli', has_time=True)
)

_, oof_list['CB_Bernoulli'], predict_list['CB_Bernoulli'] = cross_val_score(CB_Bernoulli, show_importance=False)
```

   - **Model Variation:**
     - The third CatBoost model (`CB_Bernoulli`) introduces another change in the bootstrap sampling type.
     - `bootstrap_type='Bernoulli'`: Utilizes Bernoulli bootstrap sampling.

   - **Cross-Validation:**
     - The model is evaluated using cross-validation, and the results are stored in `oof_list['CB_Bernoulli']` and `predict_list['CB_Bernoulli']`.

These models represent variations of CatBoost with different bootstrap sampling strategies, allowing for flexibility and exploration of model performance under different conditions. The cross-validation results provide insights into the models' predictive capabilities.

# **Part 12: Voting Ensemble Detailed Explanation**

```python
# Voting Ensemble
weights = RidgeClassifier(random_state=seed).fit(oof_list, train.Exited).coef_[0]
weights /= weights.sum()
pd.DataFrame(weights, index=list(oof_list), columns=['weight per model'])

# _, ensemble_oof, predictions = cross_val_score(voter, show_importance = False)
print(f'Score: {(roc_auc_score(train.Exited, oof_list.to_numpy() @ weights)):.5f}')
predictions = predict_list.to_numpy() @ weights
```

**1. RidgeClassifier for Weight Learning:**
   - A `RidgeClassifier` is employed to learn the optimal weights for combining predictions from different models.
   - The classifier is trained on the out-of-fold (oof) predictions (`oof_list`) from various models, using the actual target values (`train.Exited`).
   - The `random_state=seed` ensures reproducibility.

**2. Normalization of Weights:**
   - The coefficients obtained from the trained `RidgeClassifier` represent the weights assigned to each model in the ensemble.
   - These weights are normalized to ensure that they sum up to 1, making them interpretable as proportions.

**3. Display Weights:**
   - The learned weights are presented in a DataFrame to provide transparency on how much influence each model has in the ensemble.

**4. Ensemble Evaluation:**
   - The ROC AUC score is calculated for the ensemble using the learned weights (`oof_list.to_numpy() @ weights`).
   - This score is printed to assess the overall performance of the ensemble.

**5. Final Test Predictions:**
   - The final predictions for the test set are obtained by multiplying the predictions from individual models (`predict_list.to_numpy()`) with their respective weights.
   - The matrix multiplication (`@`) and summation produce the ensemble's weighted prediction for each sample in the test set.

**Note:**
- The use of a RidgeClassifier for weight learning implies a linear combination of models, providing interpretable and stable results.
- The ensemble leverages the diversity captured by individual models, assigning different weights based on their performance on the training data.

# <div style="background-color: lightgreen; font-size: 40px; text-align: center;"> Detailed explanation of each model </div>

### 1. **Linear Models:**

#### Logistic Regression:
   - **Explanation:**
     - Logistic Regression is a classic linear model used for binary classification problems.
     - It models the probability that an instance belongs to a particular class.
     - It's based on the logistic function, which maps any real-valued number into a range between 0 and 1.

#### RidgeClassifier:
   - **Explanation:**
     - RidgeClassifier is a linear model that uses Ridge Regression for classification.
     - Ridge Regression is an extension of linear regression with regularization to prevent overfitting.
     - It introduces a penalty term (L2 regularization) to the linear regression loss function.

### 2. **Tree-Based Models:**

#### XGBoostClassifier:
   - **Explanation:**
     - XGBoost is an ensemble learning method known for its speed and performance.
     - It builds multiple decision trees and combines them to improve accuracy.
     - It uses a gradient boosting framework and incorporates regularization techniques.

#### LightGBMClassifier:
   - **Explanation:**
     - LightGBM is a gradient boosting framework developed by Microsoft.
     - It is designed for distributed and efficient training.
     - LightGBM uses a histogram-based approach for tree building, leading to faster training times.

#### CatBoostClassifier:
   - **Explanation:**
     - CatBoost is a gradient boosting library developed by Yandex, designed for categorical features.
     - It handles categorical features efficiently without the need for one-hot encoding.
     - It includes built-in support for handling missing data.

### 3. **Neural Network Model:**

#### TensorFlow (Custom Neural Network):
   - **Explanation:**
     - TensorFlow is an open-source machine learning framework developed by Google.
     - A custom neural network model is built using TensorFlow for this project.
     - It consists of multiple layers, including dense layers with batch normalization and leaky ReLU activation functions.
     - The model is trained using binary crossentropy loss and the AdamW optimizer.

### 4. **Ensemble Models:**

#### Voting Ensemble:
   - **Explanation:**
     - A Voting Ensemble combines predictions from multiple models.
     - Each model's prediction is weighted based on its performance.
     - The weights are learned using RidgeClassifier on out-of-fold predictions.

### Overall:
   - **Objective:**
     - The goal is to predict the target variable "Exited," indicating whether a customer will exit the bank or not.
   - **Diversity:**
     - Multiple types of models are tested to capture different patterns and relationships in the data.
   - **Evaluation:**
     - Model performance is assessed using cross-validation, and metrics like AUC-ROC score are used to evaluate the models' effectiveness.

This comprehensive approach with various model types aims to improve the predictive performance and robustness of the overall solution.

# <div style="background-color: lightblue; font-size: 40px; text-align: center;"> Conclusion: Customer Churn Prediction Project </div>

In this comprehensive customer churn prediction project, our objective was to construct an effective predictive model within a banking context. The project unfolded across three pivotal phases: data exploration, feature engineering, and model evaluation.

**1. Data Exploration:**
   - Rigorously examined the dataset, encompassing both training and testing data, to obtain a profound understanding of its features.
   - Explored customer demographics, financial indicators, and transactional behavior, extracting valuable insights.
   - Addressed data cleanliness issues by investigating data types, handling missing values, and analyzing variable distributions.

**2. Feature Engineering:**
   - Implemented diverse techniques to augment model predictability and enhance generalization capabilities.
   - Introduced innovative features such as 'IsActive_by_CreditCard,' 'Products_Per_Tenure,' and 'ZeroBalance' to capture critical patterns.
   - Employed feature transformations, including TF-IDF vectorization and singular value decomposition (SVD) on selected categorical variables ('Surname,' 'AllCat,' 'EstimatedSalary,' 'CreditScore'), to uncover latent patterns.

**3. Model Evaluation:**
   - Evaluated a variety of machine learning models, including Logistic Regression, a custom TensorFlow model ('TensorFlower'), and an XGBoost classifier.
   - Applied cross-validation techniques to ensure robust performance assessment.
   - Achieved the highest validation score of 0.90492 using the Voting Ensemble with Ridge Classifier weights.

**Key Findings:**
   - The logistic regression model established a strong baseline performance, achieving a validation score of approximately 88.42%.
   - The custom TensorFlow model ('TensorFlower') demonstrated promising results, surpassing the logistic regression model with a validation score of around 89.23%.
   - The Voting Ensemble with Ridge Classifier weights emerged as the top-performing model, achieving the highest validation score of 0.90492.

**Recommendations and Future Work:**
   - Based on our analysis, we recommend deploying the Voting Ensemble model with Ridge Classifier weights for customer churn prediction, given its outstanding performance.
   - Continuous monitoring and periodic recalibration will be crucial to ensure the model's sustained effectiveness in dynamic business environments.
   - Future work may explore additional ensemble techniques and feature engineering strategies to further refine model performance.

**In summary, this project culminates in a robust predictive model that empowers proactive identification of customers at risk of churning, providing a valuable tool for targeted retention strategies and fostering a customer-centric approach within the banking domain.**