# Description of Heart Disease:

**Heart disease, also known as cardiovascular disease, refers to a class of conditions that affect the heart and blood vessels. It encompasses a range of disorders, including coronary artery disease, heart failure, valvular heart diseases, and various cardiomyopathies. Heart disease is a leading cause of morbidity and mortality worldwide, contributing to a significant burden on healthcare systems.**

**Common risk factors for heart disease include high blood pressure, high cholesterol levels, smoking, obesity, diabetes, and a sedentary lifestyle. Over time, these risk factors can lead to the accumulation of plaque in the arteries, reducing blood flow to the heart and increasing the likelihood of heart attacks, heart failure, and other cardiovascular events.**

# Solving Heart Disease with Machine Learning:

**Machine learning (ML) offers promising avenues for addressing heart disease through early detection, risk prediction, and personalized treatment strategies. Here's how machine learning can contribute to tackling heart disease:**

    Risk Prediction Models:
        ML algorithms can analyze vast datasets to identify patterns and correlations that may not be apparent through traditional methods. By incorporating a patient's medical history, lifestyle factors, and genetic information, ML models can predict an individual's risk of developing heart disease.

    Early Detection and Diagnosis:
        ML algorithms can analyze medical imaging data, such as cardiac MRI or CT scans, to detect subtle signs of heart disease at an early stage. This enables healthcare professionals to intervene proactively and implement preventive measures.

    Personalized Treatment Plans:
        ML can assist in developing personalized treatment plans by considering individual patient characteristics, responses to medications, and lifestyle factors. This approach allows for more targeted and effective interventions, improving overall patient outcomes.

    Remote Monitoring:
        Wearable devices and remote monitoring solutions equipped with ML algorithms can continuously assess a person's vital signs and detect anomalies indicative of cardiovascular issues. This real-time monitoring can facilitate early intervention and reduce the risk of complications.

    Rehabilitation and Lifestyle Management:
        ML-based applications can provide personalized recommendations for lifestyle modifications, including diet and exercise plans. These applications can adapt and evolve based on the individual's progress, fostering long-term adherence to healthy habits.

    Data-Driven Research and Drug Development:
        Machine learning can accelerate research by analyzing large-scale genetic and clinical datasets to identify novel biomarkers, drug targets, and treatment strategies. This data-driven approach may lead to the development of more effective therapies for heart disease.

**While machine learning holds great promise in the fight against heart disease, its integration into healthcare requires careful consideration of ethical and privacy concerns. Collaboration between healthcare professionals, data scientists, and regulatory bodies is crucial to ensure the responsible and effective implementation of these technologies in the cardiovascular domain.**

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [2]:
df = pd.read_csv('/kaggle/input/heart-disease/heart_disease_data.csv')

In [3]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [4]:
df.shape

(303, 14)

In [5]:
df.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [7]:
df.isna().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [8]:
df['target'].value_counts()

target
1    165
0    138
Name: count, dtype: int64

In [9]:
x = df.drop('target',axis=1)
y =df['target']

In [10]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.2,random_state=100)

In [11]:
model = LogisticRegression()

In [12]:
model.fit(x_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [13]:
x_train_pred=model.predict(x_train)
train_acc=accuracy_score(x_train_pred,y_train)
train_acc

0.8636363636363636

In [14]:
 x_test_pred=model.predict(x_test)
test_acc=accuracy_score(x_test_pred,y_test)
test_acc

0.8524590163934426