# Data Preprocessing 

## Overview: 
This notebook covers the steps for data preprocessing, including loading and preprocessing the dataset, handling missing data and encoding categorical variables, and applying feature scaling and normalization techniques. Feature engineering will be conducted in the next stage within the feature_engineering folder.



## Task 1: Load and preprocess the dataset


### Step 1: Importing Libraries and Setup Paths

**Objective:**
Import necessary libraries and set up file paths for data loading and preprocessing.

**Code:**
```python
import pandas as pd
import os
from data_loader import load_data
from data_cleaner import clean_data

# Define file paths
raw_data_path = 'data/raw/Dataset (ATS)-1.csv'
interim_cleaned_data_path = 'data/interim/cleaned_dataset.csv'


In [1]:
# Import necessary libraries
import os
import sys
import json
import pandas as pd

# Ensure the utils module can be found by setting paths
notebook_dir = os.path.dirname(os.path.abspath(''))
project_root = os.path.abspath(os.path.join(notebook_dir, '..'))
utils_path = os.path.join(project_root, 'utils')

# Print the directories for verification
print(f"Notebook directory: {notebook_dir}")
print(f"Project root: {project_root}")
print(f"Utils path: {utils_path}")

# Add the utils directory to the system path
sys.path.append(utils_path)

# Print the current system path for verification
print("sys.path:", sys.path)

# Import custom modules
try:
    from data_loader import load_data
    from data_cleaner import clean_data
    # Returns: Successful import of the modules if they exist
    print("Modules imported successfully.")
except ModuleNotFoundError as e:
    # Returns: Error message if the module import fails
    print(f"ModuleNotFoundError: {e}")

# Load configuration file
config_path = os.path.join(project_root, 'config.json')
# Print the path to the configuration file for verification
print(f"Config path: {config_path}")

# Check if the configuration file exists
if not os.path.exists(config_path):
    # Returns: Error message if the configuration file does not exist
    print(f"Config file does not exist at {config_path}")
else:
    # Open and read the configuration file
    with open(config_path, 'r') as f:
        config = json.load(f)
        # Returns: Dictionary `config` containing the configuration settings

    # Set file paths from the configuration settings
    raw_data_path = os.path.join(project_root, config['raw_data_path'])
    interim_cleaned_data_path = os.path.join(project_root, config['interim_cleaned_data_path'])
    preprocessed_data_path = os.path.join(project_root, config['preprocessed_data_path'])

    # Print the paths for verification
    print(f"Raw data path: {raw_data_path}")
    print(f"Interim cleaned data path: {interim_cleaned_data_path}")
    print(f"Preprocessed data path: {preprocessed_data_path}")
    # Returns: Paths for raw data, interim cleaned data, and preprocessed data


Notebook directory: d:\Customer-Churn-Analysis\notebooks
Project root: d:\Customer-Churn-Analysis
Utils path: d:\Customer-Churn-Analysis\utils
sys.path: ['d:\\Customer-Churn-Analysis\\notebooks\\data_preprocessing', 'd:\\Customer-Churn-Analysis\\notebooks\\data_preprocessing\\:', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\python38.zip', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\DLLs', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\lib', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis', '', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\lib\\site-packages', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\lib\\site-packages\\win32', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\lib\\site-packages\\win32\\lib', 'c:\\Users\\iambh\\anaconda3\\envs\\churn_analysis\\lib\\site-packages\\Pythonwin', 'd:\\Customer-Churn-Analysis\\utils']
Executing data_loader.py
Modules imported successfully.
Config path: d:\Customer-Churn-Analysis\config.json
Raw data p

**Summary:**
I have successfully imported the necessary libraries and set up the file paths for data loading and preprocessing. These libraries and paths will be used throughout the data preprocessing steps.



### Step 2: Load Data

**Objective:**
Load the raw dataset from a CSV file into a pandas DataFrame.

**Code:**
```python
# Load data
df = load_data(raw_data_path)
df.head()


In [2]:
# Load the raw data
print(f"Attempting to load raw data from: {raw_data_path}")

# Call the load_data function to load the raw data from the specified path
df = load_data(raw_data_path)

# Check if data is loaded
if df is not None:
    # Display the first few rows of the DataFrame
    display(df.head())
    # Returns: The first few rows of the loaded DataFrame for verification
else:
    # Print an error message if the DataFrame is None
    print(f"File not found at {raw_data_path}. Please check the file path.")
    # Returns: Error message indicating that the file was not found at the specified path


Attempting to load raw data from: d:\Customer-Churn-Analysis\data/raw/Dataset (ATS)-1.csv
Data loaded successfully from d:\Customer-Churn-Analysis\data/raw/Dataset (ATS)-1.csv


Unnamed: 0,gender,SeniorCitizen,Dependents,tenure,PhoneService,MultipleLines,InternetService,Contract,MonthlyCharges,Churn
0,Female,0,No,1,No,No,DSL,Month-to-month,29.85,No
1,Male,0,No,34,Yes,No,DSL,One year,56.95,No
2,Male,0,No,2,Yes,No,DSL,Month-to-month,53.85,Yes
3,Male,0,No,45,No,No,DSL,One year,42.3,No
4,Female,0,No,2,Yes,No,Fiber optic,Month-to-month,70.7,Yes


**Summary:**
The raw dataset has been successfully loaded into a pandas DataFrame. The data is now ready for the cleaning process, where I will handle any missing values and encode categorical variables.



### Step 3: Clean Data

**Objective:**
Clean the dataset by handling missing values and encoding categorical variables.

**Code:**
```python
# Clean the data
df_cleaned = clean_data(df)
df_cleaned.head()


In [3]:
# Clean the loaded data
if df is not None:
    # Apply the clean_data function to the loaded DataFrame
    df_cleaned = clean_data(df)

    # Display the first few rows of the cleaned DataFrame
    display(df_cleaned.head())
    # Returns: The first few rows of the cleaned DataFrame for verification
else:
    # Print an error message if the loaded DataFrame is None
    print("Data loading failed, skipping cleaning step.")
    # Returns: Error message indicating that data loading failed and the cleaning step is skipped


Missing values handled by dropping rows with missing values.
Categorical columns identified: Index(['gender', 'Dependents', 'PhoneService', 'MultipleLines',
       'InternetService', 'Contract', 'Churn'],
      dtype='object')
Categorical columns ['gender', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'Contract', 'Churn'] encoded.


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender_Female,gender_Male,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,Contract_Month-to-month,Contract_One year,Contract_Two year,Churn_No,Churn_Yes
0,0,1,29.85,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,0,34,56.95,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
2,0,2,53.85,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
3,0,45,42.3,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,0,2,70.7,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0


**Summary:**
The dataset has been cleaned by handling missing values and encoding categorical variables. The cleaned data is now ready to be saved and used for further processing.



### Step 4: Save Clean Data

**Objective:**
Save the cleaned data to the specified path.

**Code:**
```python
# Save cleaned data
df_cleaned.to_csv(interim_cleaned_data_path, index=False)
df_cleaned.to_csv(preprocessed_data_path, index=False)



In [4]:
# Save the cleaned data
if df_cleaned is not None:
    # Save the cleaned data to the interim directory
    df_cleaned.to_csv(interim_cleaned_data_path, index=False)
    # Save the cleaned data to the preprocessed_dataset directory
    df_cleaned.to_csv(preprocessed_data_path, index=False)
    
    # Print confirmation messages indicating successful saving
    print(f"Cleaned data saved to interim at {interim_cleaned_data_path}")
    print(f"Cleaned data saved to preprocessed_dataset at {preprocessed_data_path}")
    # Returns: Confirmation messages indicating the cleaned data has been saved to both paths
else:
    # Print an error message if the cleaned DataFrame is None
    print("Data cleaning failed, skipping save step.")
    # Returns: Error message indicating that data cleaning failed and the save step is skipped


Cleaned data saved to interim at d:\Customer-Churn-Analysis\data/interim/cleaned_dataset.csv
Cleaned data saved to preprocessed_dataset at d:\Customer-Churn-Analysis\Data_Preparation/preprocessed_dataset/cleaned_dataset.csv


**Summary:**
The cleaned dataset has been saved to the specified path. This dataset will serve as the input for the next steps, where I will handle missing data and encode categorical variables.


## Task 2: Handle Missing Data and Encode Categorical Variables


### Step 5: Handling Missing Data

**Objective:**
Handle missing data using mean imputation.

**Code:**
```python
from handle_missing_and_encode import handle_missing_data

# Handle missing data
df_missing_handled = handle_missing_data(df_cleaned)
df_missing_handled.head()

In [5]:
# Import the function to handle missing data from the custom module
try:
    from handle_missing_and_encode import handle_missing_data
    # Returns: Successful import of the function if the module is found
except ModuleNotFoundError as e:
    print("Module import unsuccessful:", e)
    # Returns: Error message if the module import fails

# Apply the function to handle missing data on the cleaned dataset
df_missing_handled = handle_missing_data(df_cleaned)
# Returns: DataFrame with handled missing data if the handling is successful

if df_missing_handled is not None:
    # Display the first few rows of the DataFrame after handling missing data
    display(df_missing_handled.head())
    # Returns: The first few rows of the DataFrame with handled missing data
else:
    print("Handling missing data failed.")
    # Returns: Error message indicating that handling missing data failed


Executing handle_missing_and_encode.py
Missing data handled by mean imputation.


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender_Female,gender_Male,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,Contract_Month-to-month,Contract_One year,Contract_Two year,Churn_No,Churn_Yes
0,0.0,1.0,29.85,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,0.0,34.0,56.95,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
2,0.0,2.0,53.85,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
3,0.0,45.0,42.3,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,0.0,2.0,70.7,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0


**Summary:**
Missing data in the dataset has been handled using mean imputation. This ensures that there are no missing values in the numerical columns, making the dataset more consistent for analysis.



### Step 6: Encode Categorical Variables

**Objective:**
Encode categorical variables using one-hot encoding.

**Code:**
```python
from handle_missing_and_encode import encode_categorical_variables

# Encode categorical variables
df_encoded = encode_categorical_variables(df_missing_handled)
df_encoded.head()


In [6]:
# Import the function to encode categorical variables from the custom module
try:
    from handle_missing_and_encode import encode_categorical_variables
    # Returns: Successful import of the function if the module is found
except ModuleNotFoundError as e:
    print("Module import unsuccessful:", e)
    # Returns: Error message if the module import fails

# Apply the function to encode categorical variables on the DataFrame with handled missing data
df_encoded = encode_categorical_variables(df_missing_handled)
# Returns: DataFrame with encoded categorical variables if encoding is successful

if df_encoded is not None:
    # Display the first few rows of the DataFrame after encoding categorical variables
    display(df_encoded.head())
    # Returns: The first few rows of the DataFrame with encoded categorical variables
else:
    print("Encoding categorical variables failed.")
    # Returns: Error message indicating that encoding categorical variables failed


No categorical columns found to encode.


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender_Female,gender_Male,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,Contract_Month-to-month,Contract_One year,Contract_Two year,Churn_No,Churn_Yes
0,0.0,1.0,29.85,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,0.0,34.0,56.95,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
2,0.0,2.0,53.85,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
3,0.0,45.0,42.3,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,0.0,2.0,70.7,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0


**Summary:**
Categorical variables in the dataset have been encoded using one-hot encoding. This converts categorical data into a numerical format, which is essential for machine learning algorithms to process the data effectively.



### Step 7: Save Preprocessed Data

**Objective:**
Save the preprocessed data to the specified path.

**Code:**
```python
# Save preprocessed data
preprocessed_data_path = 'data/preprocessed/preprocessed_dataset.csv'
df_encoded.to_csv(preprocessed_data_path, index=False)


In [7]:
# Save the processed DataFrame to the specified interim and preprocessed paths
if df_encoded is not None:
    # Save the encoded DataFrame to the interim cleaned data path
    df_encoded.to_csv(interim_cleaned_data_path, index=False)
    # Save the encoded DataFrame to the preprocessed data path
    df_encoded.to_csv(preprocessed_data_path, index=False)
    # Print confirmation messages indicating successful saving
    print("Processed data saved to interim at", interim_cleaned_data_path)
    print("Processed data saved to preprocessed_dataset at", preprocessed_data_path)
    # Returns: Confirmation messages indicating the processed data has been saved to both paths
else:
    # Print an error message if the encoded DataFrame is None
    print("Processed data saving failed.")
    # Returns: Error message indicating that processed data saving failed


Processed data saved to interim at d:\Customer-Churn-Analysis\data/interim/cleaned_dataset.csv
Processed data saved to preprocessed_dataset at d:\Customer-Churn-Analysis\Data_Preparation/preprocessed_dataset/cleaned_dataset.csv


**Summary:**
The preprocessed dataset, with handled missing values and encoded categorical variables, has been saved to the specified path. This dataset is now prepared for the next phase of feature scaling and normalization.


## Task 3: Feature Scaling and Normalization


### Step 8: Import Scaling Function

**Objective:**
Import scaling functions for standard scaling and min-max scaling.

**Code:**
```python
from scaler import apply_standard_scaling, apply_min_max_scaling


## Import Scaling Functions

In [8]:
# Import the functions to apply scaling from the custom module
try:
    from scaler import apply_standard_scaling, apply_min_max_scaling
    print("Scaling modules imported successfully.")
    # Returns: Confirmation message indicating successful import of scaling modules
except ModuleNotFoundError as e:
    print("Module import unsuccessful:", e)
    # Returns: Error message indicating unsuccessful import of scaling modules

# Define paths for saving scaled data
standard_scaled_data_path = '../data_preparation/scaling_techniques/standard_scaled_dataset.csv'
min_max_scaled_data_path = '../data_preparation/scaling_techniques/min_max_scaled_dataset.csv'
# Returns: Paths for saving the standard scaled data and the min-max scaled data


Executing scaler.py
Scaling modules imported successfully.


**Summary:**
The scaling functions for standard scaling and min-max scaling have been successfully imported. These functions will be used to normalize the dataset in the subsequent steps.



### Step 9: Applying Standard Scaling

**Objective:**
Apply standard scaling to the dataset.

**Code:**
```python
# Apply standard scaling
df_standard_scaled = apply_standard_scaling(df_encoded)
df_standard_scaled.head()


In [9]:
# Apply Standard Scaling to the cleaned data
if df_encoded is not None:
    # Apply Standard scaling to the encoded DataFrame
    df_standard_scaled = apply_standard_scaling(df_encoded)
    # Display the first few rows of the scaled DataFrame
    display(df_standard_scaled.head())
    # Returns: The first few rows of the Standard scaled DataFrame
else:
    # Print an error message if the encoded DataFrame is None
    print("Data scaling failed.")
    # Returns: Error message indicating that data scaling failed


Numeric columns for scaling: Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'gender_Female',
       'gender_Male', 'Dependents_No', 'Dependents_Yes', 'PhoneService_No',
       'PhoneService_Yes', 'MultipleLines_No', 'MultipleLines_Yes',
       'InternetService_DSL', 'InternetService_Fiber optic',
       'Contract_Month-to-month', 'Contract_One year', 'Contract_Two year',
       'Churn_No', 'Churn_Yes'],
      dtype='object')
Standard scaling applied.


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender_Female,gender_Male,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,Contract_Month-to-month,Contract_One year,Contract_Two year,Churn_No,Churn_Yes
0,-0.439916,-1.277445,-1.160323,1.009559,-1.009559,0.654012,-0.654012,3.05401,-3.05401,0.854176,-0.854176,0.88566,-0.88566,0.904184,-0.514249,-0.562975,0.601023,-0.601023
1,-0.439916,0.066327,-0.259629,-0.990532,0.990532,0.654012,-0.654012,-0.327438,0.327438,0.854176,-0.854176,0.88566,-0.88566,-1.10597,1.944582,-0.562975,0.601023,-0.601023
2,-0.439916,-1.236724,-0.36266,-0.990532,0.990532,0.654012,-0.654012,-0.327438,0.327438,0.854176,-0.854176,0.88566,-0.88566,0.904184,-0.514249,-0.562975,-1.663829,1.663829
3,-0.439916,0.514251,-0.746535,-0.990532,0.990532,0.654012,-0.654012,3.05401,-3.05401,0.854176,-0.854176,0.88566,-0.88566,-1.10597,1.944582,-0.562975,0.601023,-0.601023
4,-0.439916,-1.236724,0.197365,1.009559,-1.009559,0.654012,-0.654012,-0.327438,0.327438,0.854176,-0.854176,-1.129102,1.129102,0.904184,-0.514249,-0.562975,-1.663829,1.663829


**Summary:**
Standard scaling has been applied to the dataset, normalizing the features to have a mean of 0 and a standard deviation of 1. This scaling technique helps in improving the performance of many machine learning algorithms.



### Step 10: Applying Min-Max Scaling

**Objective:**
Apply min-max scaling to the dataset.

**Code:**
```python
# Apply min-max scaling
df_min_max_scaled = apply_min_max_scaling(df_encoded)
df_min_max_scaled.head()


In [10]:
# Apply Min-Max Scaling to the cleaned data
if df_encoded is not None:
    # Apply Min-Max scaling to the encoded DataFrame
    df_min_max_scaled = apply_min_max_scaling(df_encoded)
    # Display the first few rows of the scaled DataFrame
    display(df_min_max_scaled.head())
    # Returns: The first few rows of the Min-Max scaled DataFrame
else:
    # Print an error message if the encoded DataFrame is None
    print("Data scaling failed.")
    # Returns: Error message indicating that data scaling failed


Numeric columns for scaling: Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'gender_Female',
       'gender_Male', 'Dependents_No', 'Dependents_Yes', 'PhoneService_No',
       'PhoneService_Yes', 'MultipleLines_No', 'MultipleLines_Yes',
       'InternetService_DSL', 'InternetService_Fiber optic',
       'Contract_Month-to-month', 'Contract_One year', 'Contract_Two year',
       'Churn_No', 'Churn_Yes'],
      dtype='object')
Min-Max scaling applied.


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender_Female,gender_Male,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,Contract_Month-to-month,Contract_One year,Contract_Two year,Churn_No,Churn_Yes
0,0.0,0.013889,0.115423,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,0.0,0.472222,0.385075,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
2,0.0,0.027778,0.354229,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
3,0.0,0.625,0.239303,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,0.0,0.027778,0.521891,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0


**Summary:**
Min-max scaling has been applied to the dataset, scaling the features to a range of 0 to 1. This normalization technique ensures that all features contribute equally to the model's performance.



### Step 11: Save Scaled Data

**Objective:**
Save the scaled datasets to the specified paths.

**Code:**
```python
# Save scaled data
standard_scaled_data_path = 'data_preparation/scaling_techniques/standard_scaled_dataset.csv'
min_max_scaled_data_path = 'data_preparation/scaling_techniques/min_max_scaled_dataset.csv'

df_standard_scaled.to_csv(standard_scaled_data_path, index=False)
df_min_max_scaled.to_csv(min_max_scaled_data_path, index=False)


In [11]:
import os

# Save the scaled data to the specified paths
def save_scaled_data(df, path):
    """
    Save the scaled data to the specified path.
    
    Parameters:
    df (pd.DataFrame): The DataFrame containing the scaled data.
    path (str): The path where the scaled data should be saved.
    
    Returns:
    None
    """
    # Get the directory from the specified path
    directory = os.path.dirname(path)
    # Check if the directory exists, if not, create it
    if not os.path.exists(directory):
        os.makedirs(directory)
    # Save the DataFrame to the specified path
    df.to_csv(path, index=False)
    print(f"Data saved at {path}")
    # Returns: Confirmation message indicating the data has been saved

# Check if the standard scaled DataFrame is not None and save it
if df_standard_scaled is not None:
    save_scaled_data(df_standard_scaled, standard_scaled_data_path)
    # Returns: Confirmation message indicating the standard scaled data has been saved

# Check if the min-max scaled DataFrame is not None and save it
if df_min_max_scaled is not None:
    save_scaled_data(df_min_max_scaled, min_max_scaled_data_path)
    # Returns: Confirmation message indicating the min-max scaled data has been saved
else:
    print("Scaled data saving failed.")
    # Returns: Error message indicating the scaled data saving failed


Data saved at ../data_preparation/scaling_techniques/standard_scaled_dataset.csv
Data saved at ../data_preparation/scaling_techniques/min_max_scaled_dataset.csv


**Summary:**
The scaled datasets, both standard scaled and min-max scaled, have been saved to their respective paths. These datasets are now ready for further analysis and machine learning model training.


## Next Steps

1. Conduct exploratory data analysis (EDA) to understand the data distribution, relationships between variables, and identify any anomalies or patterns.
2. Document each step and summarize the results to ensure clarity and reproducibility for all team members and stakeholders involved in the project.
3. Build and evaluate machine learning models using the preprocessed and scaled data.
4. Interpret the results and iterate on the models as needed to improve performance.
