## Basic Classification and Prediction Using scikit-learn

Scikit-learn is an efficient and simple tool for predictive data analysis. All you need to do is to follow the steps: import the module, apply it on data, and predict on future data. Let's take a look at how each of these works!

In [36]:
# if you haven't installed scikit-learn, please type the following command to install it in Python
! pip install sklearn

### Read in data

In [15]:
# pandas is a library designed for data analysis through its integration of dataframes
import pandas as pd

data = pd.read_csv('heart.csv')
data.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,output
0,63,1,3,145,233,1,0,150,0,2.3,1
1,37,1,2,130,250,0,1,187,0,3.5,1
2,41,0,1,130,204,0,0,172,0,1.4,1
3,56,1,1,120,236,0,1,178,0,0.8,1
4,57,0,0,120,354,0,1,163,1,0.6,1


Split the variables as X and the output label as y

In [16]:
# separate our variables (X) and the output label (y)
X = data.iloc[:,0:9]
y = data['output']

Now X only contains values of the variables, and y only contains label values (1 for very likely to get a heart attack and 0 for less likely)

In [17]:
X.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng
0,63,1,3,145,233,1,0,150,0
1,37,1,2,130,250,0,1,187,0
2,41,0,1,130,204,0,0,172,0
3,56,1,1,120,236,0,1,178,0
4,57,0,0,120,354,0,1,163,1


In [18]:
y

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: output, Length: 303, dtype: int64

Our data is ready now. Let's build the Logistic Regression classifier.

### Logistics Regression
One common classifier is Logistic Regression. Scikit-learn has a built-in module for Logistic Regression. In the following code blocks, we will demonstrate each method and apply it to our demo data. When building Logistic Regression classifier for your data, you can copy and paste the following functions step by step.

In [19]:
# Step 1: import logistic regression from sklearn. You will not see any output since we are just importing modules and libraries
from sklearn.linear_model import LogisticRegression

In [20]:
# Step 2: initialize a logistic regression classifier by calling the function
# Note: you won't see any output since we are just initializing the classifier

clf = LogisticRegression()

If you want to generate the same results every time (the classifier may vary every time!), you can set the random state to an integer:

In [21]:
clf = LogisticRegression(random_state=0)

Now you have an initialized logistic regression classifier. You can fit it to the training data:

In [22]:
# Step 3: fit the classifier to our data
# Note: you won't see any output since we are just fitting it to the data. We will see the predictions later

clf.fit(X, y)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Now your classifier is ready to predict on test data. Call the "predict" method to do it:

In [None]:
# suppose we have a new patient and the variable values look like this

new_data = [[57, 0, 0, 140, 241, 0, 1, 123, 1]]

In [24]:
# Step 4: call this predict() function to use this classifier to predict our new patient data

prediction_lr = clf.predict(new_data)

print(prediction_lr)

[0]




Based on our classifier, it predicts that the female patient aged 57, chest pain type 0, resting blood pressure 140 mm Hg, cholestoral in 240 mg/dl, no fasting blood suger, cholestoral in mg/dl 1, maximum heart rate 123, and whose exercise induced angina, is NOT likely to have a heart attack.