👨‍🏫 **Jupyter By:** [Ahmad Ahmadi](https://www.linkedin.com/in/ahmad-ahmadi95/) 

🌏 **Website: [izlearn.ir](https://izlearn.ir)**

🔗 **[Support Me](#support-me)**

---

• **Table of The Content**

- [1. What is Scikit-learn](#1)
- [2. Why Scikit-Learn?](#2)
- [3. Scikit-Learn Workflow](#3)
- [4. What is Machine Learning](#4)
- [5. Heart Disease Classifier](#5)
- [6. Extra Tips](#6)

## 🔰 Scikit-Learn

<a name="1"></a>
### 1. What is Scikit-learn (sklearn)?

🔸 Generally **scikit-learn** is Python **machine learning** library.

🔸 **Scikit-learn** helps us to build **machine learning models**  and **evaluate** the models

<a name="2"></a>
### 2. Why Scikit-Learn? 

🔸 Built on **NumPy** and **Matplotlib** (and also Python).

🔸 Has many **built-in machine learning models**.

🔸 Has enough methods **to evaluate our machine learning models**.

### 3. Scikit-Learn Workflow

1. Get data ready
2. Select a model (to suit your problem)
3. Fit the model to the data and make predictions
4. Evaluate the model 
5. Improve through experimentation
6. Save & reload your trained model

<a name="4"></a>
### 4. What is Machine Learning 

🔸 **Machine learning** is a branch of **artificial intelligence** and **computer science** that focuses the ***using data*** and ***algorithms*** to enable **AI** to imitate the way that humans learn, gradually improving **its accuracy**. [source](https://www.ibm.com/topics/machine-learning)

---

<a name="5"></a>
### 5. Heart Disease Classifier

In [1]:
# 0. Import the packages
import numpy as np
import pandas as pd

In [2]:
# 1. Import the data you want to use
heart_disease = pd.read_csv("data/heart-disease.csv")

In [3]:
heart_disease.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [4]:
# Create X (features matrix)
X = heart_disease.drop(columns="target", axis=1)

# Create y (labels)
y = heart_disease["target"]

In [5]:
# 2. Selecting the proper model & hyperprameters
from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier()

# We use the default hyperprameters
# Use clf.get_params() to see all of them
rf_clf.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

In [6]:
# 3. Fit the model to the training data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2)

In [7]:
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {X_test.shape}")

X_train shape: (242, 13)
X_test shape: (61, 13)
y_train shape: (242,)
y_test shape: (61, 13)


In [8]:
rf_clf.fit(X_train, y_train)

In [9]:
# make a prediction
y_preds = rf_clf.predict(X_test)

In [10]:
y_preds

array([1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0])

In [11]:
# 4. Evaluate the model on the 'train' data(How well our machine learning model is)
rf_clf.score(X_train, y_train)

1.0

In [12]:
# Evaluate the model on the 'test' data
rf_clf.score(X_test, y_test)

0.7704918032786885

In [13]:
# More evaluation metrics
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [14]:
print(classification_report(y_test, y_preds))

              precision    recall  f1-score   support

           0       0.80      0.62      0.70        26
           1       0.76      0.89      0.82        35

    accuracy                           0.77        61
   macro avg       0.78      0.75      0.76        61
weighted avg       0.77      0.77      0.76        61



In [15]:
print(confusion_matrix(y_test, y_preds))

[[16 10]
 [ 4 31]]


In [16]:
print(accuracy_score(y_test, y_preds))

0.7704918032786885


In [17]:
# 5. Improve the model
# 5.1. Try different amount of 'n_estimators' (hyperprameter)
np.random.seed(7)
for i in range(10, 110, 10):
    print(f"Trying model with {i} estimators...")
    rf_clf = RandomForestClassifier(n_estimators=i).fit(X_train, y_train)
    print(f"Model accuracy on test data: {rf_clf.score(X_test, y_test)*100:.2f}%")
    print("---")

Trying model with 10 estimators...
Model accuracy on test data: 81.97%
---
Trying model with 20 estimators...
Model accuracy on test data: 75.41%
---
Trying model with 30 estimators...
Model accuracy on test data: 78.69%
---
Trying model with 40 estimators...
Model accuracy on test data: 78.69%
---
Trying model with 50 estimators...
Model accuracy on test data: 77.05%
---
Trying model with 60 estimators...
Model accuracy on test data: 78.69%
---
Trying model with 70 estimators...
Model accuracy on test data: 77.05%
---
Trying model with 80 estimators...
Model accuracy on test data: 78.69%
---
Trying model with 90 estimators...
Model accuracy on test data: 80.33%
---
Trying model with 100 estimators...
Model accuracy on test data: 78.69%
---


---

💪🏼 According to the above output, the model initialized with **n_estimators=30** has the highest accuracy of **88.52%**. 

---

In [18]:
# 6. Save a model and load it
import pickle

In [19]:
pickle.dump(rf_clf, open("random_forest_classifier_model_01.pkl", mode="wb"))

In [20]:
loaded_model = pickle.load(open("random_forest_classifier_model_01.pkl", mode="rb"))
loaded_model

In [21]:
loaded_model.score(X_test, y_test)

0.7868852459016393

<a name="6"></a>
### 6. Extra Tips

🔸 How to see the current **sklearn version** we're using?

🔸 How to disable **warnings** in the Jupyter environment?

In [22]:
import sklearn

In [23]:
# Show the version of 'sklearn' we're using
sklearn.show_versions()


System:
    python: 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
executable: c:\users\pegasus\desktop\my_projects\myenv2\scripts\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.5.2
          pip: 24.2
   setuptools: 56.0.0
        numpy: 2.0.2
        scipy: 1.13.1
       Cython: None
       pandas: 2.2.2
   matplotlib: 3.9.2
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 20
         prefix: libscipy_openblas
       filepath: C:\Users\PEGASUS\Desktop\my_projects\myenv2\Lib\site-packages\numpy.libs\libscipy_openblas64_-caad452230ae4ddb57899b8b3a33c55c.dll
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 20
         prefix: vcomp
       filepath: C:\Users\PEGASUS\Desktop\my_projects\myenv2\Lib\site-packages\sklearn\.lib

In [24]:
# How to ignore warnings in the Jupyter
import warnings
warnings.filterwarnings("ignore")

---
<a name='support-me'></a>
### THE END 

🔸 If you want to be in touch with **me** (Ahmad Ahmadi), click this [link](https://t.me/izlearn_support) to be taken to my **Telegram**.

🔸 If you have a **LinkedIn** account, **[this](https://linkedin.com/in/ahmad-ahmadi95/)** is my page I will be happy to have you as my connections.

🔸 Finally using this **[link](https://zarinp.al/454855)**(Iran) and this **[link](https://paylink.payment4.com/fa/izlearn/2572d5b8-9ece-48ba-8014-3192ed9274e2)**(out of Iran) you're able to **donate🎁** and **support** our team. 🤙🏼