# Introduction to scikit-learn (sklearn)
This notebook provides an introduction to the scikit-learn library, which is a 
powerful tool for machine learning in Python. It covers the basic concepts and 
functionalities of scikit-learn, including data preprocessing, model selection, 
and evaluation metrics. The notebook also includes practical examples and code 
snippets to help you get started with using scikit-learn for your own machine 
learning projects.

What we are going to cover:

0. An end-to-end scikit-learn workflow
1. Getting the data ready
2. Chosse the right estimator/algorithm for our problems
3. Fit the model/algorithm and use it to make predictions on our data 
4. Evaluating a model
5. Improve a model
6. Save and load a trained model
7. Puting it all together 


## 0. An end-to-end scikit-learn workflowm

In [15]:
# 1. Get the data ready
# import the library
%matplotlib inline
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd

In [19]:
heart_disease = pd.read_csv('data sets/heart-disease.csv')
heart_disease.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [20]:
print(heart_disease.columns.tolist())


['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']


In [21]:
#This code snippet is preparing the data for a machine learning model.
# Create X (features matrix)
x = heart_disease.drop("target", axis=1)

# Create y (labels)
y = heart_disease["target"]


In [None]:
#This code snippet is importing the RandomForestClassifier class from the sklearn.ensemble module. 
#This is a step in choosing the right model for a machine learning task and setting the hyperparameters 
# for the RandomForestClassifier model.
# 2. Choose the right model and hyperparameters
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()


# We will keep the default hyperparameters for now.



# is a method that returns the current parameters of the RandomForestClassifier model instance `clf`. This method provides a way to view the current hyperparameters that are set for the model. It can be useful for understanding the default settings or for checking the specific values of the hyperparameters that are being used in the model.
clf.get_params()`   
clf.get_params()

