## Model

This notebook is the Model part of the replication. We should note that because the original notebook didn't set the random seed, the ROC_AUC scores here are not exactly the same with the original one, but all models here can run successfully.

Import sklearn packages needed

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

Split the training set and test set with ratio 3:1

In [None]:
X = training_data.iloc[:, 1:]
y = training_data['seriousdlqin2yrs']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

### KNN Classifier

The 1st baseline model is the KNN classifier. The following are the parameters we choose.

- n_neighbors=5: It means that the classifier will consider the 5 nearest neighbors to a given data point when making a prediction.

- weights='uniform': It means all neighbors have equal weight.

- algorithm='auto': The model will automatically select the most appropriate algorithm among 'ball_tree', 'kd_tree', and 'brute' based on the input data.

- leaf_size: This parameter is relevant when using the 'ball_tree' or 'kd_tree' algorithm. It determines the leaf size of the tree data structure used for efficient nearest neighbor searches.

- metric='minkowski': This parameter specifies the distance metric used to compute the distances between points.

- p=2: This parameter determines the power parameter for the **Minkowski distance** metric. Here p=2 which corresponds to the **Euclidean distance** metric, which is commonly used.



In [29]:
knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30,
                           metric='minkowski', metric_params=None, p=2)
knn.fit(X_train, y_train)

It calculates the accuracy of the KNN classifier.

In [31]:
knn.score(X_test, y_test)

0.9310133333333334

It calculates the ROC_AUC score of the KNN classifier. The following are the parameters we choose.

- average='macro': This parameter determines how the ROC AUC scores are aggregated in the case of multiclass classification. It calculates the average ROC AUC score for each class independently and then takes the mean. In our case the y is binary, therefore we don't need to set this parameter.

- sample_weight=None: All samples have the wame weight.

In [40]:
y_pred = knn.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred , average='macro', sample_weight=None)



0.5902430034525364

### Logistic Regression Classifier

The 2nd baseline model is the Logistic Regression classifier. The following are the parameters we choose.

- penalty='l1': We choose the L1 type of regularization penalty here.

- dual=False: This parameter is relevant when the number of samples is larger than the number of features. It determines whether to solve the dual or primal optimization problem. False indicates that the primal problem is solved.


- fit_intercept: This parameter determines whether to include an intercept term in the logistic regression model.

- intercept_scaling=1: It scales the intercept term.

- multi_class='ovr': This parameter determines the strategy for handling multi-class classification problems. 'ovr' stands for one-vs-rest.

- class_weight=None: This parameter allows you to assign different weights to different classes. It can be useful when dealing with imbalanced class distributions. None means all classes have equal weight.

- solver='liblinear': This parameter determines the algorithm used for optimization. The 'liblinear' is suitable for small datasets.

- max_iter=1000: This parameter specifies the maximum number of iterations for the optimization algorithm to converge.

- tol=0.0001: This parameter specifies the tolerance for convergence of the optimization algorithm. The algorithm stops if the change in the objective function value is less than tol.

- C=1: $\frac{1}{C}$ controls the inverse of the regularization strength. It determines the trade-off between fitting the training data and keeping the model coefficients small to avoid overfitting.

- random_state=None: This parameter sets the random seed for reproducible results.

- verbose=2: This parameter controls the verbosity of the training process. Setting it to 2 enables the output of progress messages during training.



In [32]:
lr = LogisticRegression(penalty='l1', dual=False, fit_intercept=True, intercept_scaling=1,
                        multi_class='ovr', class_weight=None,
                        solver='liblinear', tol=0.0001, C=1.0, max_iter=100,
                        random_state=None, verbose=2)
lr.fit(X_train, y_train)

[LibLinear]

It calculates the accuracy of the Logistic Regression classifier.

In [33]:
lr.score(X_test, y_test)

0.9354666666666667

It calculates the ROC_AUC score of the Logistic Regression classifier.

In [35]:
y_pred = lr.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred , average='macro', sample_weight=None)



0.8494489584447374

### Random Forest Classifier

The 3rd baseline model is the Random Forest classifier. The following are the parameters we choose.

- n_estimators=10: This parameter specifies the number of decision trees to be created in the random forest.

- criterion='gini': Here we use the Gini impurity as the criterion when split in each decision tree.

- max_depth=None: This parameter specifies the maximum depth of each decision tree in the random forest. The default value is None, which means that the decision trees will be grown until all leaves are pure or until all leaves contain fewer samples than min_samples_split.

- min_samples_split=2: This parameter sets the minimum number of samples required to split an internal node in the decision trees. The value value here is 2, which means that a node must have at least two samples to be considered for splitting.

- min_samples_leaf=1: This parameter sets the minimum number of samples required to be at a leaf node in the decision trees.

- min_weight_fraction_leaf=0: This parameter sets the minimum weighted fraction of the total number of samples required to be at a leaf node. 0 means that the weight of the samples is not considered.

- max_features='auto': This parameter determines the number of features to consider when looking for the best split at each node. 'auto' uses the square root of the total number of features.

- max_leaf_nodes=None: This parameter limits the maximum number of leaf nodes in each decision tree. The default value is None, which means there is no maximum limit.

- bootstrap=True: This parameter determines whether bootstrap samples are used when building decision trees.

- oob_score=False: This parameter determines whether to use out-of-bag samples to estimate the generalization accuracy of the random forest.

- n_jobs=1: This parameter specifies the number of parallel jobs to run during training and prediction.

- random_state=None: This parameter sets the random seed for reproducible results.

- verbose=2: This parameter controls the verbosity of the training process. Setting it to 2 enables the output of progress messages during training.

In [37]:
rf = RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2,
                               min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto',
                               max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1,
                               random_state=None, verbose=0)
rf.fit(X_train, y_train)

  warn(


It calculates the accuracy of the Random Forest classifier.

In [38]:
rf.score(X_test, y_test)

0.93128

It calculates the ROC_AUC score of the Random Forest classifier.

In [39]:
y_pred = rf.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred, average='macro', sample_weight=None)



0.7723554070633775

### AdaBoost Classifier

In [45]:
from sklearn.tree import DecisionTreeClassifier

ada = AdaBoostClassifier(n_estimators=200, learning_rate=1.0)
ada.fit(X_train, y_train)

It calculates the accuracy of the AdaBoosting classifier.

In [46]:
ada.score(X_test, y_test)

0.93416

It calculates the ROC_AUC score of the AdaBoosting classifier.

In [47]:
y_pred = ada.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred, average='macro', sample_weight=None)



0.8607267479275506

### Gradient Boosting Classifier

The 5th baseline model is the Gradient Boosting classifier. The following are the parameters we choose.

- loss='deviance': This parameter specifies the loss function to be optimized during the gradient boosting process. The 'deviance' corresponds to the logistic regression loss for binary classification problems.

- learning_rate=0.1: This parameter controls the shrinkage of the contribution of each tree in the ensemble.

- n_estimators=200: This parameter specifies the number of trees to be built.

- subsample=1: This parameter controls the fraction of samples to be used for fitting each tree. Values less than 1.0 introduce stochasticity into the gradient boosting process, which can help reduce overfitting.

- min_samples_split=2: This parameter sets the minimum number of samples required to split an internal node in each decision tree.

- min_samples_leaf=1: This parameter sets the minimum number of samples required to be at a leaf node in each decision tree.

- min_weight_fraction_leaf=0: This parameter sets the minimum weighted fraction of the total number of samples required to be at a leaf node. 0 means that the weight of the samples is not considered.

- max_depth=3: This parameter specifies the maximum depth of each decision tree.

- init=None: This parameter specifies the initial estimator for the gradient boosting ensemble. None means a simple decision tree with a depth of 1 is used.

- random_state=None: This parameter sets the random seed for reproducible results.

- max_features=None: This parameter determines the number of features to consider when looking for the best split at each node. None means all features are considered.

- verbose=2: This parameter controls the verbosity of the training process. Setting it to 2 enables the output of progress messages during training.

In [48]:
gb = GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=200, subsample=1.0,
                                min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3,
                                init=None, random_state=None, max_features=None, verbose=0)
gb.fit(X_train, y_train)



It calculates the accuracy of the Gradient Boosting classifier.

In [49]:
gb.score(X_test, y_test)

0.9351466666666667

It calculates the ROC_AUC score of the Gradient Boosting classifier.

In [50]:
y_pred = gb.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred, average='macro', sample_weight=None)



0.8658695722050163

## Cross Validation

In [15]:
from sklearn.model_selection import cross_val_score

# Generate the cross validation score for each models
def cvDictGen(functions, metric, X_train=X, y_train=y, cv=5, verbose=1):
    """
    func: list of functions to be evaluated.

    metric: Metric of the performance.

    X_train: X of the training set.

    y_train: y of the training set.

    verbose: Whether to print out the process.
    """

    cvDict = {}
    for func in functions:
        cvScore = cross_val_score(func, X_train, y_train, cv=cv, verbose=verbose, scoring=metric)
        cvDict[str(func).split('(')[0]] = [cvScore.mean(), cvScore.std()]

    return cvDict

# Normalize the cross validation score for each models
def cvDictNormalize(cvDict):

    cvDictNormalized = {}
    norm1 = sum([lst[0] for lst in cvDict.values()])
    norm2 = sum([lst[1] for lst in cvDict.values()])
    for key in cvDict.keys():
            cvDictNormalized[key] = ['{:0.2f}'.format((cvDict[key][0]/norm1)),
                                     '{:0.2f}'.format((cvDict[key][1]/norm2))]
    return cvDictNormalized

The unNormalized Cross Validation Result

In [53]:
cvDict = cvDictGen(functions=[knn, lr, rf, ada, gb], metric='roc_auc')
cvDict

[LibLinear][LibLinear][LibLinear][LibLinear][LibLinear]

  warn(
  warn(
  warn(
  warn(
  warn(


{'KNeighborsClassifier': [0.595871676666674, 0.0024190748279330353],
 'LogisticRegression': [0.849200473963769, 0.0036337968565900935],
 'RandomForestClassifier': [0.7777348588207509, 0.005894740403345748],
 'AdaBoostClassifier': [0.8586812528322133, 0.0021126810786966793],
 'GradientBoostingClassifier': [0.8639067246517946, 0.0026203413657827955]}

The Normalized Cross Validation Result

In [71]:
cvDictNormalize(cvDict)

{'KNeighborsClassifier': ['0.15', '0.15'],
 'LogisticRegression': ['0.22', '0.22'],
 'RandomForestClassifier': ['0.20', '0.35'],
 'AdaBoostClassifier': ['0.22', '0.13'],
 'GradientBoostingClassifier': ['0.22', '0.16']}

In [72]:
from joblib import dump

models = [knn, lr, rf, ada, gb]
# Store the fitted model
for mol in models:
  dump(mol, f'/content/drive/MyDrive/STAT3011/{mol}.joblib')

In [75]:
from joblib import load

# Load the stored model
ada = load('/content/drive/MyDrive/STAT3011/AdaBoostClassifier(n_estimators=200).joblib')
gb = load("/content/drive/MyDrive/STAT3011/GradientBoostingClassifier(loss='deviance', n_estimators=200).joblib")

### Hyper parameter optimization using Randomized search

In [76]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

This step is the randomized cross validation (CV) search for the AdaBoosting. Combinations of Parameters will be randomly choosed and evaluated with the CV technique. Finally the parameter combination with the best CV performance will be chosen.

The following are parameters for the RandomizedSearchCV.

- estimator='ada': It specifies the model that will be tuned using randomized search.

- param_distributions: This parameter is a dictionary containing the parameter distributions to sample from during the randomized search. Each key in the dictionary represents a hyperparameter of the estimator, and the corresponding value is a distribution or list of possible values for that hyperparameter.

- n_iter=5: This parameter specifies the number of parameter settings that will be sampled from the param_distributions.

- scoring='roc_auc': This parameter specifies the scoring metric that will be used to evaluate the performance of each parameter setting.

- cv=None: This parameter determines the cross-validation strategy used to evaluate the performance of each parameter setting. None means the default 5-fold cross-validation will be used.

- verbose=2: This parameter controls the verbosity of the training process. Setting it to 2 enables the output of progress messages during training.

In [94]:
adaHyperParams = {'n_estimators': [10,50,100,200,420]}

gridSearchAda = RandomizedSearchCV(estimator=ada, param_distributions=adaHyperParams,
                                   n_iter=5, scoring='roc_auc', cv=None, verbose=2).fit(X_train, y_train)

Fitting 5 folds for each of 5 candidates, totalling 25 fits
[CV] END ....................................n_estimators=10; total time=   3.8s
[CV] END ....................................n_estimators=10; total time=   2.0s
[CV] END ....................................n_estimators=10; total time=   1.4s
[CV] END ....................................n_estimators=10; total time=   1.8s
[CV] END ....................................n_estimators=10; total time=   0.9s
[CV] END ....................................n_estimators=50; total time=   4.1s
[CV] END ....................................n_estimators=50; total time=   6.3s
[CV] END ....................................n_estimators=50; total time=   5.5s
[CV] END ....................................n_estimators=50; total time=   6.9s
[CV] END ....................................n_estimators=50; total time=   5.6s
[CV] END ...................................n_estimators=100; total time=  12.6s
[CV] END ...................................n_est

In [95]:
gridSearchAda.best_params_, gridSearchAda.best_score_

({'n_estimators': 100}, 0.8562169556184855)

#### GradientBoosting

This step is the randomized cross validation (CV) search for the GradientBoosting.

In [97]:
gbHyperParams = {'loss' : ['deviance', 'exponential'],
                 'n_estimators': randint(10, 500),
                 'max_depth': randint(1,10)}

gridSearchGB = RandomizedSearchCV(estimator=gb, param_distributions=gbHyperParams,
                                  n_iter=10, scoring='roc_auc', cv=3, verbose=2).fit(X_train, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits




[CV] END ........loss=deviance, max_depth=3, n_estimators=75; total time=  13.7s




[CV] END ........loss=deviance, max_depth=3, n_estimators=75; total time=  10.9s




[CV] END ........loss=deviance, max_depth=3, n_estimators=75; total time=  16.0s




[CV] END .......loss=deviance, max_depth=8, n_estimators=341; total time= 2.1min




[CV] END .......loss=deviance, max_depth=8, n_estimators=341; total time= 2.1min




[CV] END .......loss=deviance, max_depth=8, n_estimators=341; total time= 2.1min




[CV] END .......loss=deviance, max_depth=4, n_estimators=480; total time= 1.5min




[CV] END .......loss=deviance, max_depth=4, n_estimators=480; total time= 1.5min




[CV] END .......loss=deviance, max_depth=4, n_estimators=480; total time= 1.5min
[CV] END ....loss=exponential, max_depth=7, n_estimators=347; total time= 1.9min
[CV] END ....loss=exponential, max_depth=7, n_estimators=347; total time= 1.9min
[CV] END ....loss=exponential, max_depth=7, n_estimators=347; total time= 1.9min




[CV] END ........loss=deviance, max_depth=9, n_estimators=39; total time=  16.9s




[CV] END ........loss=deviance, max_depth=9, n_estimators=39; total time=  17.1s




[CV] END ........loss=deviance, max_depth=9, n_estimators=39; total time=  16.1s
[CV] END ....loss=exponential, max_depth=6, n_estimators=110; total time=  31.6s
[CV] END ....loss=exponential, max_depth=6, n_estimators=110; total time=  30.0s
[CV] END ....loss=exponential, max_depth=6, n_estimators=110; total time=  30.8s
[CV] END ....loss=exponential, max_depth=9, n_estimators=129; total time=  54.1s
[CV] END ....loss=exponential, max_depth=9, n_estimators=129; total time=  53.7s
[CV] END ....loss=exponential, max_depth=9, n_estimators=129; total time=  55.4s
[CV] END ....loss=exponential, max_depth=5, n_estimators=161; total time=  37.1s
[CV] END ....loss=exponential, max_depth=5, n_estimators=161; total time=  38.2s
[CV] END ....loss=exponential, max_depth=5, n_estimators=161; total time=  37.6s




[CV] END .......loss=deviance, max_depth=7, n_estimators=496; total time= 2.7min




[CV] END .......loss=deviance, max_depth=7, n_estimators=496; total time= 2.7min




[CV] END .......loss=deviance, max_depth=7, n_estimators=496; total time= 2.7min
[CV] END ....loss=exponential, max_depth=1, n_estimators=181; total time=   9.5s
[CV] END ....loss=exponential, max_depth=1, n_estimators=181; total time=   9.9s
[CV] END ....loss=exponential, max_depth=1, n_estimators=181; total time=  10.2s


In [98]:
gridSearchGB.best_params_, gridSearchGB.best_score_

({'loss': 'exponential', 'max_depth': 5, 'n_estimators': 161},
 0.860167795214304)

### Train models with help of new hyper parameter

In this step we fit the model with the best parameters and check the cross validation performance.

In [26]:
bestGb = gridSearchGB_best_estimator_.fit(X_train, y_train)
bestAda = gridSearchAda_best_estimator_.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


In [99]:
bestGb = gridSearchGB.best_estimator_.fit(X_train, y_train)
bestAda = gridSearchAda.best_estimator_.fit(X_train, y_train)

best_cvDict = cvDictGen(functions=[bestGb, bestAda], metric='roc_auc')
best_cvDict

{'GradientBoostingClassifier': [0.863453647332082, 0.0026127847416857145],
 'AdaBoostClassifier': [0.8591468159728505, 0.002580346154686282]}

The ROC_AUC score of the AdaBoosting with the grid search parameters.

In [100]:
y_pred_ada = bestAda.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred_ada, average='macro', sample_weight=None)



0.8610617616093497

The ROC_AUC score of the GridentBoosting with the grid search parameters.

In [101]:
y_pred_gb = bestGb.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred_gb, average='macro', sample_weight=None)



0.866200223773587

### Feature Transformation

In this step, we transform all features $x$ in X_train into $log(x+1)$, which can help to normalize its distribution and reduce the impact of extreme values or skewness. It is particularly useful when dealing with features that have a large scale.

In [4]:
import numpy as np
from sklearn.preprocessing import FunctionTransformer

transformer = FunctionTransformer(np.log1p)
X_train_transform = transformer.transform(np.array(X_train))

Then we fit models with transformed X features using grid search parameters.

In [7]:
bestGbFitted_transformed = gridSearchGB.best_estimator_.fit(X_train_transform, y_train)
bestAdaFitted_transformed = gridSearchAda.best_estimator_.fit(X_train_transform, y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Then we check the cross validation result and find that ROC_AUC scores of these two models are improved after the feature transformation.

In [17]:
cvDictbestpara_transform = cvDictGen(functions=[bestGbFitted_transformed, bestAdaFitted_transformed], metric='roc_auc')
cvDictbestpara_transform

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


{'GradientBoostingClassifier': [0.8633363804012559, 0.0023194922989105814],
 'AdaBoostClassifier': [0.8590733597107562, 0.001772821157510175]}

In [18]:
transformer = FunctionTransformer(np.log1p)
X_test_transform = transformer.transform(np.array(X_test))

In [19]:
pred_y_ada_transform = bestAdaFitted_transformed.predict_proba(np.array(X_test_transform))[:,1]
roc_auc_score(y_test, pred_y_ada_transform , average='macro', sample_weight=None)

0.8610617616093497

In [20]:
pred_y_gb_transform = bestGbFitted_transformed.predict_proba(np.array(X_test_transform))[:,1]
roc_auc_score(y_test, pred_y_gb_transform , average='macro', sample_weight=None)

0.8662139534262193

### Voting based ensamble model

The voting based ensamble model combines the predictions of multiple individual models to make a final prediction. Each individual model is trained independently. When making predictions, each model generates its prediction, and then a voting mechanism is used to determine the final prediction.

There are different types of voting mechanisms used in ensemble models, and here we choose the **soft voting**. It calculates the probabilities  predicted by each individual model for each class. The final prediction is determined by averaging the predicted probabilities across all models and selecting the class with the highest combined probability.

Here we use the weighted average and give Grident Boosting more weights.

In [21]:
from sklearn.ensemble import VotingClassifier

voting_transform = VotingClassifier(estimators=[('gb', bestGbFitted_transformed),
                                         ('ada', bestAdaFitted_transformed)], voting='soft',weights=[2,1])
voting_transform = voting_transform.fit(X_train_transform, y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


In [22]:
y_pred_vote_transform = voting_transform.predict_proba(np.array(X_test_transform))[:,1]
voting_transform.score(X_test_transform, y_test)

0.9355733333333334

The ROC_AUC score of the voting ensamble model is not better than the single model.

In [23]:
roc_auc_score(y_test, y_pred_vote_transform , average='macro', sample_weight=None)

0.8662745151327746

We also try to ensamble models trained with untransformed features. However, the ROC_AUC score also doesn't improve.

In [27]:
voting = VotingClassifier(estimators=[('gb', bestGb), ('ada', bestAda)],
                                 voting='soft',weights=[2,1])
voting = voting.fit(X_train, y_train)

y_pred_vote =voting.predict_proba(np.array(X_test.values))[:,1]
roc_auc_score(y_test, y_pred_vote, average='macro', sample_weight=None)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


0.8663124310949759

### Testing on Real Test Dataset

Finally, we need to predict on the test set.

In [28]:
# Read Training dataset as well as drop the index column
test_data = pd.read_csv('/content/cs-test.csv').drop('Unnamed: 0', axis = 1)
# For each column heading we replace "-" and convert the heading in lowercase
cleancolumn = []
for i in range(len(test_data.columns)):
    cleancolumn.append(test_data.columns[i].replace('-', '').lower())
test_data.columns = cleancolumn

#### Data Preprocessing on the test set

In [29]:
test_data = test_data.iloc[:, 1:]
test_data.fillna((test_data.median()), inplace=True)

#### Prediction with Voting ensambled model

In [30]:
pred_y_voting = voting.predict_proba(np.array(test_data.values))[:,1]
print (len(pred_y_voting))



101503


In [31]:
output = pd.DataFrame({'ID':test_data.index, 'probability':pred_y_voting})
output.to_csv("./predictions.csv", index=False)

In [32]:
test_data_transform = transformer.transform(np.array(test_data))

pred_y_voting_transform = voting_transform.predict_proba(np.array(test_data.values))[:,1]
print(len(pred_y_voting_transform))

101503


In [33]:
output_transform = pd.DataFrame({'ID':test_data.index, 'probability':pred_y_voting_transform})
output.to_csv("./predictions_voting_Feature_transformation.csv", index=False)