In [138]:
import streamlit as st
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt


### Website: 
https://machinelearning-g7zjenvie2g438jznnjdig.streamlit.app/

These lines import the required modules and libraries for the Streamlit application. The DecisionTreeClassifier, LogisticRegression, and KNeighborsClassifier machine learning models, as well as Matplotlib, are used to build plots. Streamlit is used to build the web interface.

In [139]:
st.header("Maternal Health risk High risk, Mid risk and Low risk predictions")

DeltaGenerator()

This line sets the header title for your Streamlit app.

In [140]:
health_data = pd.read_csv("Maternal Health Risk Data Set.csv")

This line reads the "Maternal Health Risk Data Set.csv" CSV file into a data analysis tool called a pandas DataFrame..

In [141]:
describe = health_data.describe()

This line computes descriptive statistics for the DataFrame, such as mean, standard deviation, minimum, maximum, etc., and stores the result in the describe variable.

In [142]:
st.subheader("Data Description", divider="blue")
st.dataframe(describe)

DeltaGenerator()

These lines create a subheader and display the descriptive statistics in a table format with a blue divider line.

In [143]:
st.subheader("Overview of the data", divider="blue")
st.dataframe(health_data)


DeltaGenerator()

These lines create another subheader and display the entire health data in a table format with a blue divider line.

In [144]:
fig, ax = plt.subplots()
ax.scatter(health_data.Age, health_data.HeartRate, color="red")

<matplotlib.collections.PathCollection at 0x2880974d0>

These lines create a scatter plot using matplotlib. It's plotting the "Age" and "HeartRate" columns from your data with red dots.

In [145]:
st.subheader("Distribution between the Age and the Heart rate", divider="blue")
st.pyplot(fig)

DeltaGenerator()

These lines create a subheader for the scatter plot and display the plot using st.pyplot.

In [146]:
st.subheader("input data")
input_data = health_data.drop(columns="RiskLevel")
output_data = health_data["RiskLevel"]

These lines create a subheader and separate your input data (all columns except "RiskLevel") and output data (only "RiskLevel").

In [147]:
st.dataframe(input_data)

DeltaGenerator()

This line displays the input data in a table format.

In [148]:
input_train, input_test, output_train, output_test = train_test_split(input_data, output_data, test_size=0.3)


These lines split your data into training and testing sets using train_test_split from scikit-learn. It reserves 30% of the data for testing.

In [149]:
model_decisiontree = DecisionTreeClassifier()
model_logisticr = LogisticRegression()
model_KN = KNeighborsClassifier(n_neighbors=5)

model_decisiontree.fit(input_train, output_train)
model_logisticr.fit(input_train, output_train)
model_KN.fit(input_train, output_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


These lines create three different machine learning models (Decision Tree, Logistic Regression, K-Nearest Neighbors), initialize them, and fit them with the training data.

In [150]:
predictions_decisionTree = model_decisiontree.predict(input_test)
predictions_logisticr = model_logisticr.predict(input_test)
predictions_KN = model_KN.predict(input_test)


These lines make predictions using the fitted models on the test data.

In [151]:
algorithms = st.radio("Select an algorithms", ["Decision Tree", "Logistic Regression", "KN neighrest Neighbour", "Compare"])


This line creates a radio button in the Streamlit app, allowing the user to select an algorithm (model) or choose to compare them.

In [152]:
if algorithms == "Decision Tree":
    st.header("Decision Tree")

    score_decisionTree = accuracy_score(output_test, predictions_decisionTree)

# Round to two decimal places
    rounded_score_decisionTree = round(score_decisionTree, 2)

# Convert to a percentage
    percentage_score_decisionTree = rounded_score_decisionTree * 100

# Visualize the percentage in a Streamlit application
    st.metric(label="Accuracy Percentage", value=percentage_score_decisionTree)
    st.write("Accuracy Score Visualization")

# Define labels for the pie chart
    # Add more model names if needed
    labels = ["Decision Tree accuracy", "Not accurate"]

    # Define the percentage scores corresponding to the labels
    sizes = [percentage_score_decisionTree,
             100 - percentage_score_decisionTree]

    # Define colors for the sections of the pie chart
    colors = ['lightblue', 'lightgray']

    # Create the pie chart
    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, labels=labels, colors=colors,
            autopct='%1.1f%%', startangle=90)
    # Equal aspect ratio ensures that pie is drawn as a circle.
    ax1.axis('equal')

    # Display the pie chart in Streamlit
    st.pyplot(plt)

elif algorithms == "Logistic Regression":
    st.header("Logic Regression")
    score_logisticr = accuracy_score(output_test, predictions_logisticr)
    rounded_score_logisticr = round(score_logisticr, 3)
    percentage_score_logisticr = rounded_score_logisticr * 100
    st.metric(label="Accuracy Percentage", value=percentage_score_logisticr)

    st.write("Accuracy Score Visualization")

   # Define labels for the pie chart
    # Add more model names if needed
    labels = ["Decision Tree accuracy", "Not accurate"]

    # Define the percentage scores corresponding to the labels
    sizes = [percentage_score_logisticr,
             100 - percentage_score_logisticr]

    # Define colors for the sections of the pie chart
    colors = ['lightgreen', 'lightgray']

    # Create the pie chart
    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, labels=labels, colors=colors,
            autopct='%1.1f%%', startangle=90)
    # Equal aspect ratio ensures that pie is drawn as a circle.
    ax1.axis('equal')

    # Display the pie chart in Streamlit
    st.pyplot(plt)
elif algorithms == "KN neighrest Neighbour":
    st.header("KN neighrest Neighbour")

    score_KN = accuracy_score(output_test, predictions_KN)
    rounded_score_KN = round(score_KN, 3)
    percentage_score_KN = rounded_score_KN * 100
    st.metric(label="Accuracy Percentage", value=percentage_score_KN)

    st.write("Accuracy Score Visualization")

   # Define labels for the pie chart
    # Add more model names if needed
    labels = ["Decision Tree accuracy", "Not accurate"]

    # Define the percentage scores corresponding to the labels
    sizes = [percentage_score_KN,
             100 - percentage_score_KN]

    # Define colors for the sections of the pie chart
    colors = ['lightyellow', 'lightgray']

    # Create the pie chart
    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, labels=labels, colors=colors,
            autopct='%1.1f%%', startangle=90)
    # Equal aspect ratio ensures that pie is drawn as a circle.
    ax1.axis('equal')

    # Display the pie chart in Streamlit
    st.pyplot(plt)

These conditions determine which algorithm the user chose before executing particular code blocks in accordance with their selection.

Each condition's code blocks compute, display, and show accuracy scores using pie charts.

In [153]:
def compare_accuracy_scores():
    # Calculate the accuracy scores for different models
    score_logisticr = accuracy_score(output_test, predictions_logisticr)
    score_KN = accuracy_score(output_test, predictions_KN)
    score_decisionTree = accuracy_score(output_test, predictions_decisionTree)
    accuracy_scores = [score_decisionTree, score_logisticr, score_KN]
    models = ['Decision Tree', 'Logistic Regression', 'kN Nearest Neighbour']

    # Create a bar chart
    plt.bar(models, accuracy_scores, color=['blue', 'green', 'red'])
    plt.xlabel('Model')
    plt.ylabel('Accuracy Score')
    plt.title('Comparison of Accuracy Scores for Different Models')
    plt.ylim(0, 1)  # Set the y-axis range to be between 0 and 1 for accuracy.

    # Display the accuracy scores on top of the bars.
    for i, score in enumerate(accuracy_scores):
        plt.text(i, score, f'{score:.2f}', ha='center', va='bottom')

    # Show the bar chart in the Streamlit app
    st.pyplot(plt)


This function calculates and displays the accuracy scores of all three models and compares them using a bar chart.

In [154]:
if algorithms == "Compare":
    st.header("Comparison of Accuracy Scores for Different Models")
    compare_accuracy_scores()


The compare_accuracy_scores() method is invoked by this condition to display the comparison of accuracy scores if the user picked "Compare" as their action.

## Introduction ##


In order to guarantee the health of both expectant mothers and their unborn children, predicting maternal health risk is an essential component of healthcare. This is accomplished by using classification algorithms, such as Decision Trees, Logistic Regression, and K-Nearest Neighbors (KNN), to evaluate and divide maternal health risks into three distinct levels: high risk, mid risk, and low risk. For the purpose of making precise and timely forecasts, these algorithms examine a wide range of pertinent data, such as medical history, vital signs, and numerous health markers. 


### The different Algorithms ###

KNN categorizes maternal health hazards based on their resemblance to neighboring cases, whereas Decision Trees create hierarchical decision rules, Logistic Regression models the probability of various risk levels, and these three techniques work together in this context. By utilizing these algorithms, healthcare professionals may give tailored care to expectant moms, allocate resources effectively, and make informed decisions, thereby improving maternal and newborn outcomes.

### 1. Logistic Regression ### 

  Training: To maximize the probability of the observed data, the logistic regression model is trained to identify the values of the coefficients (0, 1, 2,...) that maximize that likelihood. Gradient descent and other optimization algorithms are frequently used for this. The coefficients of the model are changed iteratively until convergence is reached by minimizing a loss function, typically the log-likelihood.

   Prediction: Once the model has been trained, predictions can be made using fresh data. The logistic regression model determines the likelihood that the binary result will be 1, given a collection of input features. The model forecasts an event as 1 if this probability is higher than a predetermined threshold (often 0.5), and as 0 otherwise.

### 2. KNN (K-Nearest Neighbors) ###

Training: The method simply saves the full dataset throughout the training phase, including the input characteristics and their related labels.

Distance Metric: To determine how similar two data points are, KNN uses a distance metric, such as the Euclidean distance, Manhattan distance, or another. The challenge and the type of data will determine the distance measure to choose.

Prediction:

In order to categorize a new data point, KNN locates the k-nearest data points in the training set using the selected distance measure. You have to pre-define the hyperparameter "K".

The k-nearest neighbors are used in the procedure to count the number of data points in each class.



## Explaination of the results 

Decision Trees: When tested with 30% of the data as the test set, the Decision Tree model consistently received the highest accuracy score, ranging from 70% to 80%. This implies that categorizing maternal health concerns using decision trees is a reliable strategy. In this situation, Decision Trees' hierarchical decision rules can be especially useful because they can identify intricate correlations in the data.
 
K-Nearest Neighbors (KNN): The accuracy score for the KNN model was between 65% and 70%. Although it might not perform better than Decision Trees, KNN still offers a dependable prediction. KNN depends on the similarity of nearby data points, and the value of the hyperparameter "K" can affect how well it performs. The precision of this parameter might be increased with careful tweaking.

The accuracy score for logistic regression: which ranged from 50% to 60%, was the lowest. Logistic Regression is nevertheless an effective tool for modeling probability and binary classification tasks, but perhaps not performing as well as the other two algorithms in this particular situation. Its performance might be enhanced by feature engineering or by taking into account more sophisticated variants of logistic regression models.


## Conclusion: 


The specific needs and goals of the healthcare system should serve as a guide when selecting an algorithm to forecast maternal health concerns. Decision trees seem to be the most precise choice, making them appropriate for reliable risk assessment. KNN is also capable of making predictions that are reasonably accurate, which may be useful in some circumstances. Although less accurate in this instance, logistic regression nevertheless provides insightful information that may be applicable to other maternal healthcare-related topics.


## use of AI (ChatGPT):

    * Explain briefly Logistic Regression
    * Explain briefly K-Nearest Neighbors
    * make a graph to compare the accuracy of the 3 algorithms in streamlit

    * How to use K-Nearest Neighbors in python machine learning

    



