# Heart Disease Prediction using Support Vector Machine

### Importing Necessary Libraries

In [2]:
import numpy as np 
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

### Loading the dataset

In [7]:
heart_data = pd.read_csv('heart.csv')
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1025 non-null   int64  
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 112.2 KB


### Selecting the Target Variable

In [10]:
X = heart_data.drop(columns='target',axis=1)
Y = heart_data['target']
heart_data['target'].value_counts()

target
1    526
0    499
Name: count, dtype: int64

### Splitting the data

In [13]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

### Fitting the Model

In [16]:
from sklearn.svm import SVC
clf1 = SVC(kernel="linear")
clf1.fit(X_train,Y_train)

### Testing Accuracy of the model

In [19]:
y_pred= clf1.predict(X_test)
from sklearn.metrics import classification_report

# Evaluate the classification report
classification_rep = classification_report(y_pred,Y_test)
print("Classification Report:")
print(classification_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.72      0.90      0.80        80
           1       0.92      0.78      0.84       125

    accuracy                           0.82       205
   macro avg       0.82      0.84      0.82       205
weighted avg       0.84      0.82      0.83       205



### Defining a Prediction Function

- Function Definition: The function predict_heart_disease takes two arguments: model (the trained SVM model) and input_data (a tuple containing input features for a single instance).
- Convert Input Data to NumPy Array: The input data is converted to a NumPy array using np.asarray(input_data). This ensures that the input data is in a format compatible with the SVM model.
- Reshape Input Data: The NumPy array is reshaped to match the expected input shape of the SVM model. In this case, the shape is (1, -1), where -1 indicates that NumPy should infer the number of columns based on the number of elements in the input data.
- Make Prediction: The SVM model's predict method is used to make a prediction on the reshaped input data. The result is stored in the prediction variable.
- Print Prediction Result: Based on the prediction result, the function prints either "The Person does not have Heart Disease" if prediction[0] is 0, or "The Person has Heart Disease" if prediction[0] is not 0.

In [28]:
def predict_heart_disease(model, input_data):
    # Change the input data to a numpy array
    input_data_as_numpy_array = np.asarray(input_data)
    
    # Reshape the numpy array as we are predicting for only one instance
    input_data_reshaped = input_data_as_numpy_array.reshape(1, -1)
    
    # Make the prediction
    prediction = model.predict(input_data_reshaped)
    
    # Print the prediction result
    if prediction[0] == 0:
        print('The Person does not have Heart Disease')
    else:
        print('The Person has Heart Disease')

### Prediction on Random Data

In [37]:
input_data = (62, 0, 0, 140, 268, 0, 0, 160, 0, 3.6, 0, 2, 2)
predict_heart_disease(clf1, input_data)

The Person does not have Heart Disease




In [55]:
import numpy as np

# --- Assumes clf1 is already trained and available in this session ---
# from sklearn.svm import SVC
# clf1 = SVC(kernel="linear")
# clf1.fit(X_train, Y_train)

def ask_numeric(prompt, cast=int, min_val=None, max_val=None, allowed=None):
    """Ask for input until a valid numeric/choice is provided."""
    while True:
        try:
            raw = input(prompt).strip()
            if allowed is not None:
                if raw not in allowed:
                    print(f"  → Enter one of: {', '.join(allowed)}")
                    continue
                return cast(raw)
            val = cast(raw)
            if min_val is not None and val < min_val:
                print(f"  → Value must be >= {min_val}")
                continue
            if max_val is not None and val > max_val:
                print(f"  → Value must be <= {max_val}")
                continue
            return val
        except ValueError:
            print("  → Invalid entry; try again.")

def show_feature_help():
    print("\nEnter patient values. If unsure, enter the closest estimate.\n")
    print("Feature descriptions / expected values (based on common heart datasets):")
    print(" age: integer (years) — e.g. 29 - 77")
    print(" sex: 0 = female, 1 = male")
    print(" cp (chest pain type): 0,1,2,3  (larger usually = more typical angina)")
    print(" trestbps: resting blood pressure (mm Hg) — typical 90 - 200")
    print(" chol: serum cholesterol (mg/dl) — typical 100 - 600")
    print(" fbs: fasting blood sugar > 120 mg/dl (0 = false, 1 = true)")
    print(" restecg: 0,1,2 (resting ECG result categories)")
    print(" thalach: maximum heart rate achieved — typical 70 - 210")
    print(" exang: exercise induced angina (0 = no, 1 = yes)")
    print(" oldpeak: ST depression induced by exercise relative to rest (float, e.g. 0.0 - 6.0)")
    print(" slope: slope of the peak exercise ST segment (0,1,2)")
    print(" ca: number of major vessels colored by fluoroscopy (0 - 3, integer)")
    print(" thal: 0 = normal, 1/2/3 = different types of thalassemia encoding (dataset-specific)\n")

def collect_user_input():
    show_feature_help()
    age = ask_numeric("age (years): ", int, min_val=1, max_val=120)
    sex = ask_numeric("sex (0=female, 1=male): ", int, allowed=["0","1"])
    cp = ask_numeric("cp (0-3): ", int, min_val=0, max_val=3)
    trestbps = ask_numeric("trestbps (resting BP mm Hg): ", int, min_val=50, max_val=300)
    chol = ask_numeric("chol (serum chol mg/dl): ", int, min_val=50, max_val=1000)
    fbs = ask_numeric("fbs (>120 mg/dl) (0=no,1=yes): ", int, allowed=["0","1"])
    restecg = ask_numeric("restecg (0-2): ", int, min_val=0, max_val=2)
    thalach = ask_numeric("thalach (max heart rate achieved): ", int, min_val=30, max_val=250)
    exang = ask_numeric("exang (exercise induced angina: 0=no, 1=yes): ", int, allowed=["0","1"])
    oldpeak = ask_numeric("oldpeak (ST depression, e.g. 0.0): ", float, min_val=0.0, max_val=10.0)
    slope = ask_numeric("slope (0-2): ", int, min_val=0, max_val=2)
    ca = ask_numeric("ca (0-3): ", int, min_val=0, max_val=3)
    thal = ask_numeric("thal (commonly 0-3): ", int, min_val=0, max_val=3)

    return (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal)

def explain_risk(features):
    """
    Print a short explanation of which inputs are in 'higher risk' ranges.
 
    """
    age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal = features

    print("\nRisk hints based on entered values (general heuristics):")
    if age >= 50:
        print(" - Age: ≥50 years increases baseline heart risk.")
    else:
        print(" - Age: younger than 50 (lower baseline risk).")

    if sex == 1:
        print(" - Sex: male (dataset-level risk often higher for men).")
    else:
        print(" - Sex: female.")

    if cp in (2,3):
        print(" - Chest pain: type 2/3 often indicates more concerning chest pain patterns.")
    else:
        print(" - Chest pain: type 0/1 less typical for major ischemic pain (dataset-encoding dependent).")

    if trestbps >= 140:
        print(f" - Resting BP: {trestbps} mmHg (elevated; higher cardiovascular risk).")
    else:
        print(f" - Resting BP: {trestbps} mmHg (normal/moderate).")

    if chol >= 240:
        print(f" - Cholesterol: {chol} mg/dl (high; associated with higher risk).")
    else:
        print(f" - Cholesterol: {chol} mg/dl (normal/moderate).")

    if fbs == 1:
        print(" - Fasting blood sugar >120 mg/dl: YES — metabolic risk factor.")
    else:
        print(" - Fasting blood sugar >120 mg/dl: NO.")

    if thalach < 120:
        print(f" - Max heart rate (thalach) {thalach}: relatively low — may be concerning.")
    else:
        print(f" - Max heart rate (thalach) {thalach}: reasonable.")

    if exang == 1:
        print(" - Exercise-induced angina: YES — associated with higher likelihood of disease.")
    else:
        print(" - Exercise-induced angina: NO.")

    if oldpeak >= 2.0:
        print(f" - ST depression (oldpeak) = {oldpeak}: elevated — can indicate ischemia.")
    else:
        print(f" - ST depression (oldpeak) = {oldpeak}: lower.")

    if ca > 0:
        print(f" - Number of major vessels (ca) = {ca}: higher number may indicate more severe disease.")
    else:
        print(" - Number of major vessels (ca) = 0: lower measured vessel involvement.")

    if thal == 0:
        print(" - Thal: coded as 0 (often normal); dataset-specific encodings matter.")
    else:
        print(f" - Thal: coded as {thal} (dataset-specific abnormality codes may increase risk).")


def predict_with_model(model, input_tuple):
    # Ensure shape (1, -1)
    arr = np.asarray(input_tuple).reshape(1, -1)
    pred = model.predict(arr)  # assumes model exists
    return pred[0]

def main():
    print("=== Heart-disease predictor ===")
    features = collect_user_input()
    explain_risk(features)

    # Use trained model
    try:
        result = predict_with_model(clf1, features)
    except NameError:
        print("\nERROR: trained model 'clf1' not found in this session. Make sure clf1 is loaded and trained.")
        return
    except Exception as e:
        print("\nPrediction error:", e)
        return

    print("\n=== Model prediction ===")
    if result == 0:
        print("Prediction: The person does NOT have heart disease.")
    else:
        print("Prediction: The person HAS heart disease.")

if __name__ == "__main__":
    main()


=== Heart-disease predictor ===

Enter patient values. If unsure, enter the closest estimate.

Feature descriptions / expected values (based on common heart datasets):
 age: integer (years) — e.g. 29 - 77
 sex: 0 = female, 1 = male
 cp (chest pain type): 0,1,2,3  (larger usually = more typical angina)
 trestbps: resting blood pressure (mm Hg) — typical 90 - 200
 chol: serum cholesterol (mg/dl) — typical 100 - 600
 fbs: fasting blood sugar > 120 mg/dl (0 = false, 1 = true)
 restecg: 0,1,2 (resting ECG result categories)
 thalach: maximum heart rate achieved — typical 70 - 210
 exang: exercise induced angina (0 = no, 1 = yes)
 oldpeak: ST depression induced by exercise relative to rest (float, e.g. 0.0 - 6.0)
 slope: slope of the peak exercise ST segment (0,1,2)
 ca: number of major vessels colored by fluoroscopy (0 - 3, integer)
 thal: 0 = normal, 1/2/3 = different types of thalassemia encoding (dataset-specific)



age (years):  60
sex (0=female, 1=male):  0
cp (0-3):  0
trestbps (resting BP mm Hg):  140
chol (serum chol mg/dl):  268
fbs (>120 mg/dl) (0=no,1=yes):  0
restecg (0-2):  0
thalach (max heart rate achieved):  160
exang (exercise induced angina: 0=no, 1=yes):  0
oldpeak (ST depression, e.g. 0.0):  3.6
slope (0-2):  0
ca (0-3):  2
thal (commonly 0-3):  2



Risk hints based on entered values (general heuristics):
 - Age: ≥50 years increases baseline heart risk.
 - Sex: female.
 - Chest pain: type 0/1 less typical for major ischemic pain (dataset-encoding dependent).
 - Resting BP: 140 mmHg (elevated; higher cardiovascular risk).
 - Cholesterol: 268 mg/dl (high; associated with higher risk).
 - Fasting blood sugar >120 mg/dl: NO.
 - Max heart rate (thalach) 160: reasonable.
 - Exercise-induced angina: NO.
 - ST depression (oldpeak) = 3.6: elevated — can indicate ischemia.
 - Number of major vessels (ca) = 2: higher number may indicate more severe disease.
 - Thal: coded as 2 (dataset-specific abnormality codes may increase risk).

=== Model prediction ===
Prediction: The person does NOT have heart disease.


