# Data Splitting
This notebook splits the processed dataset into training and testing sets. We will load the processed dataset, split it, and save the resulting training and testing datasets for further analysis.


## Step 1: Import Libraries and Configuration
In this step, we import the necessary libraries and load the configuration settings. The configuration file contains paths to various datasets. We convert these paths to absolute paths to ensure the file locations are correctly identified. The paths for processed data, training data, and testing data are printed for verification.


In [11]:
# Import necessary libraries
import pandas as pd # type: ignore
import os
import json

# Load configuration
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'config.json')
print(f"Config path: {config_path}")
with open(config_path, 'r') as f:
    config = json.load(f)


# Convert relative paths to absolute paths
project_root = os.path.dirname(notebook_dir) # type: ignore
processed_data_path = os.path.join(project_root, config['processed_data_path'])
train_path = os.path.join(project_root, 'data', 'train', 'train_dataset.csv')
test_path = os.path.join(project_root, 'data', 'test', 'test_dataset.csv')
train_path_prep = os.path.join(project_root, 'Data_Preparation', 'training_sets', 'train_dataset.csv')
test_path_prep = os.path.join(project_root, 'Data_Preparation', 'testing_sets', 'test_dataset.csv')

# Print the absolute paths for verification
print(f"Processed data path: {processed_data_path}")
print(f"Train path: {train_path}")
print(f"Test path: {test_path}")
print(f"Train path for Data Preparation: {train_path_prep}")
print(f"Test path for Data Preparation: {test_path_prep}")


Exception: File `'/d:/Customer-Churn-Analysis/notebooks/data_preprocessing/data_splitting.ipynb'` not found.

## Step 2: Load Processed Data
In this step, we define a function `load_data` to load data from a specified CSV file path. We use this function to load the processed dataset. The first few rows of the dataset are displayed to confirm successful loading and to inspect the data.


## Step 3: Split Data into Training and Testing Sets
Here, we define the `split_data` function to separate the dataset into features (`X`) and the target variable (`y`). The data is then split into training and testing sets using an 80-20 split. The training and testing sets are concatenated back into DataFrames and displayed to verify the split.


## Step 4: Save the Training and Testing Datasets
In this step, we save the training and testing datasets to their respective paths specified in the configuration. The paths include locations for both the general data folder and the data preparation folder. Confirmation messages are printed to ensure the datasets are saved correctly.


## Summary
In this notebook, we performed the following steps:
1. Imported necessary libraries and loaded configuration settings.
2. Loaded the processed dataset.
3. Split the data into training and testing sets.
4. Saved the training and testing datasets.

The next steps will involve building and evaluating predictive models using the training and testing datasets.
