### Name : Mandar Gurjar
### Roll No : 35
### Branch : IT
### Semester : 5
### Aim : To perform Logistic Regression

In [2]:
##import libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score , confusion_matrix ,classification_report ,roc_auc_score

In [3]:
## Read Data from csv file

data  = pd.read_csv('heart_disease.csv')
# print(data.head(5))
print(data.columns)
## Data is succesfully imported

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')


In [4]:
## Now preprossesing begins

## One-hot encoding categorial values

data_encode = pd.get_dummies(data,columns=['sex', 'cp', 'restecg', 'exang', 'slope', 'thal'],drop_first=True);
# print(data_encode.head())
# data encoded succesfully


In [5]:
X = data_encode.drop('target',axis=1)
y = data_encode['target']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=42)


In [6]:
## Data scaling 

scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

In [7]:
## Define the model and perform logistic regression

model = LogisticRegression()
model.fit(X_train,y_train)

In [8]:
## Setup Prediction variables

y_pred = model.predict(X_test)

In [9]:
## Study through logistic regression

# Accuraccy
print("Accuraccy : " ,accuracy_score(y_test,y_pred))

Accuraccy :  0.8181818181818182


In [10]:

# Confusion Matrix
print("Confusion Matrix : \n",confusion_matrix(y_test,y_pred))

Confusion Matrix : 
 [[122  37]
 [ 19 130]]


In [11]:

# Classification Report
print("Classification Report : \n",classification_report(y_test,y_pred))


Classification Report : 
               precision    recall  f1-score   support

           0       0.87      0.77      0.81       159
           1       0.78      0.87      0.82       149

    accuracy                           0.82       308
   macro avg       0.82      0.82      0.82       308
weighted avg       0.82      0.82      0.82       308



In [12]:

# ROC AUC score

print("ROC AUC score : ",roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]))

ROC AUC score :  0.9007640032079693


# Heart Disease Prediction with Logistic Regression

## 1. Importing Libraries
The following libraries are imported for data processing, model building, and evaluation:
- `pandas`: For handling data in a DataFrame structure.
- `numpy`: For numerical operations.
- `train_test_split`: To split the dataset into training and testing sets.
- `StandardScaler`: For scaling features.
- `LogisticRegression`: For building a logistic regression model.
- `accuracy_score`, `confusion_matrix`, `classification_report`, `roc_auc_score`: For evaluating model performance.

```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score
```

## 2. Loading the Dataset
The dataset is read from a CSV file named `heart_disease.csv` using `pandas`. The columns of the dataset are printed to verify successful import.

```python
data = pd.read_csv('heart_disease.csv')
print(data.columns)
```

## 3. One-Hot Encoding Categorical Values
Categorical features such as `sex`, `cp`, `restecg`, `exang`, `slope`, and `thal` are encoded using one-hot encoding. The `drop_first=True` parameter helps to avoid the dummy variable trap.

```python
data_encode = pd.get_dummies(data, columns=['sex', 'cp', 'restecg', 'exang', 'slope', 'thal'], drop_first=True)
```

## 4. Splitting the Data
The target variable `y` is separated from the feature variables `X`. The data is then split into training and testing sets using `train_test_split` with 70% for training and 30% for testing.

```python
X = data_encode.drop('target', axis=1)
y = data_encode['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```

## 5. Scaling the Data
To normalize the data, `StandardScaler` is used to scale the features in the training and testing sets.

```python
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```

## 6. Building and Training the Logistic Regression Model
A logistic regression model is defined and trained using the scaled training data.

```python
model = LogisticRegression()
model.fit(X_train, y_train)
```

## 7. Making Predictions
Predictions are made on the test data using the trained model.

```python
y_pred = model.predict(X_test)
```

## 8. Evaluating the Model
Several evaluation metrics are used to assess the performance of the logistic regression model:

- **Accuracy**: Measures the overall correctness of the model.
- **Confusion Matrix**: Displays the counts of true positive, true negative, false positive, and false negative predictions.
- **Classification Report**: Provides detailed metrics including precision, recall, and F1-score.
- **ROC AUC Score**: Evaluates the model's ability to distinguish between classes, considering the probability scores.

```python
# Accuracy
print("Accuracy: ", accuracy_score(y_test, y_pred))

# Confusion Matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# ROC AUC Score
print("ROC AUC Score: ", roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]))
```
```