# CIS 508 - Assignment 1

## Problem Introduction and Motivation

> *What is heart disease?* <br>
The term “heart disease” refers to several types of heart conditions. In the United States, the most common type of heart disease is coronary artery disease (CAD), which can lead to heart attack. As reported by the **National Center for Chronic Disease Prevention and Health Promotion (NCCDPHP)**, heart disease is the leading cause of death for men, women, and people of most racial and ethnic groups in the United States.
Technology has a huge influence in all the markets today. With the help of technology it has now become possible to find solutions to once incurable diseases. Machine Learning is one such important aspect of technology, that can process large amounts of patient data beyond the scope of human capability, then reliably convert that analysis into medical insights that help clinicians plan and deliver care. The prediction of heart disease is one example of how machine learning can be useful. Below is the statistical chart to display the severness of heart disease.

<img src = "https://world-heart-federation.org/wp-content/uploads/2021/07/WHF-CVD-Number-1-Killer-2021.jpg">

> *Problem Statement:* <br>
We are provided with data on almost 300 medical patients who were evaluated for the presence of heart disease.  Some were diagnosed as having heart disease, others were found to NOT have heart disease.  The goal here is to build a classification model that can help us predict the diagnosis given just 5 patient characteristics.

> *How does this help?* <br>
With help of our model prediction, the patient can see a doctor in case of positive outcome of heart disease and take necessary precautions.

## Project Dependencies

In [1]:
#Step 1: Import all the necessary packages required to process our data
import pandas as pd                                                          #pandas library is a useful tool to analyze, understand and clean our data.
from sklearn.linear_model import LogisticRegression                          #sklearn, a statistical library provides with all the regression and classification algorithms required for the purpose of building a prediction model
from sklearn.metrics import accuracy_score                                   #Once prediction is done, it is important to verify our results so as to answer how accurate our model is? The accuracy score metrics calculates the accuracy of our model
import pickle                                                                #Pickle helps for serializing and deserializing python objects for an ease of data transportation

## Data Preparation

In [2]:

#Step 2: Once the libraries are imported, we need to load/read our real-life historical data on which the classification model is trained. 
#We do this with the help of our analytical library pandas. Pandas creates a 2-D data structure also known as dataframe for the purpose of easy understanding and viewing of our data

df = pd.read_csv('heart_disease.csv')

#Step 3: Once the dataframes are created and the data has been understood, we now identify our feature variables and target variables for our classification.
#What are feature variables?
#Feature variables are individual measurable properties or characteristic of a phenomenon om which the target variable relies. For example, Price of house depends on its features like area, locality, age etc
#What are target variables?
#Target variables are variables whose value needs to be predicted. For example, in the above example the price of the house is a target variable.

X = df.iloc[:,1:len(df.columns)]                                            #Here X represents all the feature variables divided using iloc property in pandas
y = df.iloc[:,0]                                                            #Here y represents our target variable which is whether a person has gheart disease or not




## Modeling

In [3]:
#Step 4: Once we are ready with our feature variables and target variables, these are given to our classifier model in order to predict the results
model = LogisticRegression(max_iter=800)                                   #we create a model object, which has our Logistic Regression model with max iterations as 800. Max iterations suggests nothing but the iteration required for a model to converge i.e a state where any additional training will not affect the model/accuracy
model.fit(X,y)                                                             #To this model we give out feature and target variables data

#Step 5: Once the model is built and ready, now is the time to check the accuracy of our model.
predictions = model.predict(X)                                             #We create an object to store the predictions produced after providing our model with all the feature variables                                       
print(accuracy_score(y,predictions))                                       #Now, we compare the original target variables value with our newly predicted value for each set of feature variables. Depending on the comparison and correct predictions, we find the accuracy of our model and print it

#Step 6: We now save our classifier as a pickle file with a binary file mode (wb)
pickle_out = open('classifier', mode='wb')                                #pickle_out is a new object that allows to WRITE BINARY (wb) to a new file called 'classifier', on completion of which a new file is generated in our Jupyter folder same file location as the main file.
pickle.dump(model, pickle_out)                                            # The dump functionality translates all our model code into binary and stores it in the newly generated classifier file
pickle_out.close()                                                        #Ends the pickle functions.

0.7542087542087542


## Deployment

In [4]:

%%writefile app.py

#Step 8: Import all the necessary libraries
import pickle
import streamlit as st

pickle_in = open('classifier', 'rb')                        #Open our classifier binary file in read mode
classifier = pickle.load(pickle_in)

@st.cache()

# Define the function which will make the prediction using data
# inputs from users
def prediction(age, sex,
               non_anginal_pain, max_heart_rate, exercise_induced_angina):
    
    # Make predictions
    prediction = classifier.predict(
        [[age,sex, non_anginal_pain,max_heart_rate, exercise_induced_angina]])
    
    if prediction == 0:
        pred = 'You probably do not suffer from heart disease'
    else:
        pred = 'PLEASE SEE A DOCTOR!  You might are suffering from heart disease'
    return pred

# This is the main function in which we define our webpage
def main():
    
    # Create input fields
    age = st.number_input("age",
                                  min_value=30,
                                  max_value=100,
                                  value=34,
                                  step=1,
                                 )
    
    sex = st.number_input("sex -(0-1)",
                                  min_value=0,
                                  max_value=1,
                                  value=1,
                                 )
    non_anginal_pain = st.number_input("non_anginal_pain -(0-1)",
                          min_value=0,
                          max_value=1,
                          value=1,
                         )

    max_heart_rate = st.number_input("max_heart_rate",
                          min_value=50,
                          max_value=210,
                          value=55,
                          step=1
                         )
   
    exercise_induced_angina = st.number_input("exercise_induced_angina -(0-1)",
                          min_value=0,
                          max_value=1,
                          value=1,
                         )

    result = ""
    
    # When 'Predict' is clicked, make the prediction and store it
    if st.button("Predict"):
        result = prediction(age,sex,non_anginal_pain, max_heart_rate, exercise_induced_angina)
        st.success(result)
        
if __name__=='__main__':
    main()
    

Overwriting app.py


## Running our App

In [None]:
!streamlit run app.py