In [None]:
pip install keras==2.12.0

Collecting keras==2.12.0
  Downloading keras-2.12.0-py2.py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: keras
  Attempting uninstall: keras
    Found existing installation: keras 2.15.0
    Uninstalling keras-2.15.0:
      Successfully uninstalled keras-2.15.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 2.12.0 which is incompatible.[0m[31m
[0mSuccessfully installed keras-2.12.0


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
import warnings

warnings.filterwarnings('ignore')

# Load the dataset and split into features and target variable
df = pd.read_csv('/content/train_df.csv')
x = df.drop('success_indicator', axis=1)
y = df['success_indicator']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=77)

pipeline = Pipeline([('scaler', StandardScaler()), ('classifier', None)])

param_grids = [
    {'classifier': [LogisticRegression()],
     'classifier__C': [0.1, 1, 10]},
    {'classifier': [RandomForestClassifier()],
     'classifier__n_estimators': [100, 200, 300]},
    {'classifier': [MLPClassifier()],
     'classifier__hidden_layer_sizes': [(100,), (50, 50), (25, 25, 25)],
     'classifier__activation': ['relu', 'tanh'],
     'classifier__solver': ['adam']},
    {'classifier': [SVC()],
     'classifier__C': [0.1, 1, 10],
     'classifier__kernel': ['linear', 'rbf']},
    {'classifier': [KNeighborsClassifier()],
     'classifier__n_neighbors': [3, 5, 7],
     'classifier__weights': ['uniform', 'distance']}
]

best_model = None
best_score = 0

for entity in param_grids:
    gs = GridSearchCV(pipeline, entity, cv=5, scoring='accuracy')
    gs.fit(x_train, y_train)

    y_pred = gs.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    print("Best Parameters:", gs.best_params_)
    print("Best Score:", gs.best_score_)
    print("Accuracy:", accuracy)
    print("Precision:", precision)
    print("Recall:", recall)
    print("F1 Score:", f1)
    print("***************************************************")

    if gs.best_score_ > best_score:
        best_model = gs.best_estimator_
        best_score = gs.best_score_

print("Best Model:", best_model)
print("Best Score:", best_score)

Best Parameters: {'classifier': LogisticRegression(C=0.1), 'classifier__C': 0.1}
Best Score: 0.781875
Accuracy: 0.78875
Precision: 0.8231481481481482
Recall: 0.8581081081081081
F1 Score: 0.8402646502835539
***************************************************
Best Parameters: {'classifier': RandomForestClassifier(), 'classifier__n_estimators': 100}
Best Score: 0.83984375
Accuracy: 0.843125
Precision: 0.8558476881233001
Recall: 0.9111969111969112
F1 Score: 0.8826554464703131
***************************************************
Best Parameters: {'classifier': MLPClassifier(hidden_layer_sizes=(50, 50)), 'classifier__activation': 'relu', 'classifier__hidden_layer_sizes': (50, 50), 'classifier__solver': 'adam'}
Best Score: 0.82984375
Accuracy: 0.839375
Precision: 0.8524886877828054
Recall: 0.9092664092664092
F1 Score: 0.8799626342830452
***************************************************
Best Parameters: {'classifier': SVC(C=10), 'classifier__C': 10, 'classifier__kernel': 'rbf'}
Best Score: 0.

Based on the provided results, the best model selected by GridSearchCV is a RandomForestClassifier with the following parameters:

- **Best Parameters**: {'classifier': RandomForestClassifier(), 'classifier__n_estimators': 100}
- **Best Score**: 0.83984375
- **Accuracy**: 0.843125
- **Precision**: 0.8558476881233001
- **Recall**: 0.9111969111969112
- **F1 Score**: 0.8826554464703131

Here are some reasons why the RandomForestClassifier might have been chosen as the best model:

1. **High Accuracy**: The RandomForestClassifier achieved the highest accuracy of 0.843125 among all models tested, indicating that it correctly classified a large portion of the test data.

2. **Balanced Precision and Recall**: The precision (0.8558) and recall (0.9112) scores are both reasonably high. This suggests that the model not only predicts positive outcomes accurately (precision) but also captures a high proportion of actual positive cases (recall).

3. **High F1 Score**: The F1 score, which combines precision and recall into a single metric, is also high at 0.8827. This indicates a good balance between precision and recall.

4. **Robustness to Overfitting**: RandomForestClassifier tends to handle overfitting well due to its ensemble nature. By averaging multiple decision trees, it reduces the risk of overfitting compared to a single decision tree.

5. **Capability to Capture Complex Relationships**: RandomForestClassifier is capable of capturing complex relationships in the data due to its ensemble of decision trees. This makes it suitable for datasets with non-linear relationships between features and the target variable.

Overall, the RandomForestClassifier appears to be the best choice based on its strong performance across multiple evaluation metrics and its ability to handle complex datasets effectively.