## 🌏 Step to do in sk learn

#### 1️⃣ Get the data ready 📂
Load your dataset, split into X (features) and y (labels/answers).

#### 2️⃣ Pick the right model 🎯
Choose the algorithm that fits your problem (classification, regression, clustering, etc.).

#### 3️⃣ Train the model 🏋️‍♂️
Feed your X and y to the model so it learns patterns.

#### 4️⃣ Test the model 🧪
Check how well it predicts using test data.

#### 5️⃣ Make it better 🚀
Tune settings (hyperparameters), try different algorithms, or get better data.

#### 6️⃣ Save it 💾
Store the trained model so you can use it later without retraining.



In [1]:
# Import all Stuff (●'◡'●)
# Standard imports
import numpy as np
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

### 1️⃣ Get the data ready 📂
Load your dataset, split into X (features) and y (labels/answers).

In [5]:
heart_disease = pd.read_csv("data/heart-disease.csv")
heart_disease

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


In [6]:
# Create X (features matrix)
X = heart_disease.drop("target" , axis=1)

# Create Y (labels matrix)
Y = heart_disease["target"]

###  2️⃣ Pick the right model 🎯 ✅
Choose the algorithm that fits your problem (classification, regression, clustering, etc.).

In [14]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100) # (clf = classification)

# We'll keep the default hyperparameters
# clf.get_params()

### 3️⃣ Train the model 🏋️‍♂️
Feed your X and y to the model so it learns patterns.

In [11]:
X_train , X_test , Y_train , Y_test = train_test_split(X , Y , test_size=0.2)

In [16]:
clf.fit(X_train , Y_train); # This will Train it 😎

In [19]:
# Make predictions on the test set
Y_preds = clf.predict(X_test)
Y_preds

array([1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
       0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0])

In [20]:
from sklearn.metrics import accuracy_score
print("Test Accuracy:", accuracy_score(Y_test, Y_preds))


Test Accuracy: 0.8852459016393442


### 4️⃣ Test the model 🧪
Check how well it predicts using test data.

In [21]:
clf.score(X_train , Y_train)

1.0

In [22]:
clf.score(X_test , Y_test)

0.8852459016393442

In [23]:
from sklearn.metrics import classification_report , confusion_matrix , accuracy_score

print(classification_report(Y_test , Y_preds))

              precision    recall  f1-score   support

           0       0.91      0.80      0.85        25
           1       0.87      0.94      0.91        36

    accuracy                           0.89        61
   macro avg       0.89      0.87      0.88        61
weighted avg       0.89      0.89      0.88        61



### 5️⃣ Make it better 🚀
Tune settings (hyperparameters), try different algorithms, or get better data.

In [39]:
# Try differnt amount of n_estimators

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from termcolor import colored
from collections import defaultdict

np.random.seed(42)
results = []
acc_map = defaultdict(list)

# Training + logs
for i in range(10, 100, 10):
    clf = RandomForestClassifier(n_estimators=i).fit(X_train, Y_train)
    acc = round(clf.score(X_test, Y_test) * 100, 2)
    results.append((i, acc))
    
    emoji = "🏆" if acc >= 90 else "📈" if acc >= 80 else "⚠️"
    color = "green" if acc >= 90 else "yellow" if acc >= 80 else "red"
    print(colored(f"{i} estimators → {acc:.2f}% {emoji}", color, attrs=["bold"]))
    
    acc_map[acc].append(i)

# Summary
print(colored("\n📊 Summary", "magenta", attrs=["bold", "underline"]))
best_acc = max(acc_map)
worst_acc = min(acc_map)

for acc, ests in sorted(acc_map.items(), key=lambda x: x[0], reverse=True):
    if acc == best_acc:
        desc = "🥇 Highest accuracy"
        color = "green"
        emoji = "🏆"
    elif acc == worst_acc:
        desc = "💀 Lowest accuracy"
        color = "red"
        emoji = "💀"
    else:
        desc = "📈 Mid-range accuracy"
        color = "yellow"
        emoji = "📈"
    print(colored(f"{desc}: {acc:.2f}% → {', '.join(map(str, ests))} estimators {emoji}", color, attrs=["bold"]))


[1m[32m10 estimators → 90.16% 🏆[0m
[1m[33m20 estimators → 83.61% 📈[0m
[1m[32m30 estimators → 90.16% 🏆[0m
[1m[33m40 estimators → 88.52% 📈[0m
[1m[32m50 estimators → 90.16% 🏆[0m
[1m[33m60 estimators → 83.61% 📈[0m
[1m[33m70 estimators → 85.25% 📈[0m
[1m[33m80 estimators → 88.52% 📈[0m
[1m[33m90 estimators → 88.52% 📈[0m
[4m[1m[35m
📊 Summary[0m
[1m[32m🥇 Highest accuracy: 90.16% → 10, 30, 50 estimators 🏆[0m
[1m[33m📈 Mid-range accuracy: 88.52% → 40, 80, 90 estimators 📈[0m
[1m[33m📈 Mid-range accuracy: 85.25% → 70 estimators 📈[0m
[1m[31m💀 Lowest accuracy: 83.61% → 20, 60 estimators 💀[0m


### 6️⃣ Save it 💾
Store the trained model so you can use it later without retraining.

In [40]:
# 6. Save a model and load it ✅
import pickle

pickle.dump(clf, open("practice-heart-disease.pkl" , "wb") ) # Save ✅

In [42]:
# Import and use it. 🧠
loaded_model = pickle.load(open("random_forest_model_1.pkl" , "rb"))
loaded_model.score(X_test , Y_test)

0.9672131147540983

### ---------------------------------------------------------------THE END-----------------------------------------------------------