Skip to content

Latest commit



130 lines (94 loc) · 6.73 KB

File metadata and controls

130 lines (94 loc) · 6.73 KB


This project describes a step-by-step procedure for deploying a machine learning powered RestAPI for stroke diagnosis using FastAPI and Docker.


The purpose of this project is to create an API for accurate medical diagnosis of Stroke using machine learning and individual patient's characteristics. Thus, providing useful information for the medical staff to deploy the needed treatment and decrease risks and consequences.

Stroke is a condition that occurs when the blood supply to the brain is interrupted or reduced due to a blockage (ischemic stroke) or rupture of a blood vessel (hemorrhagic stroke). Without blood, the brain will not get oxygen and nutrients, so cells in some areas of the brain will die. This condition causes parts of the body controlled by the damaged area of the brain to not function properly.

Stroke is an emergency condition that needs to be treated as soon as possible, because brain cells can die in just a matter of minutes. Prompt and appropriate treatment measures can minimize the level of brain damage and prevent possible complications.

In this machine learning project, the overall objective is to develop a system to predict the likelihood of a patient having a stroke based on several factors including: age, certain diseases (hypertension, heart disease). Thus helping medical professionals identify high risk patients.

As previously explained, stroke can kill a patient in a matter of minutes. Detecting stroke using the existing causative factors with the help of machine learning can be very useful to detect early tendecies of stroke in patients.

Dataset Description

The stroke dataset from kaggle is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.

Feature Description
id unique identifier
gender Male, Female or Other
age age of the patient
hypertension 0 if the patient doesn't have hypertension, 1 if the patient has hypertension
heart_disease 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
ever_married No or Yes
work_type children, Govt_jov, Never_worked, Private or Self-employed
Residence_type Rural or Urban
avg_glucose_level average glucose level in blood
bmi body mass index
smoking_status formerly smoked, never smoked, smokes or Unknown
stroke if the patient had a stroke or 0 if not

Project Directory

├── app
    └── Models                              > model directory
        ├── RandomForest.pkl                > pickled random forest model
        └── preprocessing_pipeline.pkl      > Pickled preprocessing pipeline
    ├── utilities                           
    │   └──                      > Script containing helper function used in modeling and preprocessing
    ├── Dockerfile
    ├── docker-compose.yaml
    ├──                 > pydantic driven input validation
    ├──                              > FastAPI app
    ├──                         > Model development script
    └── requirements.txt                    > dependencies
├── data                                    > Data directory
│   └── stroke_data.csv                     > stroke dataset
├── test                                    > set of tests
│   ├──                      > centralized data registry tests
│   ├──                  > FastAPI app tests
│   ├──                   > test for prediction/inference
│   └──               > test for preprocessing pipeline
├──                               > Documentation  


  • Successfully created a RESTful API using FastAPI, implementing the following:
    • A GET request on the root drives back to the redocs page by default.
    • A POST request that does model inference.
    • Type hinting was used.
    • Use a Pydantic model to ingest the body from POST. This model should contain an example.
    • Hint: the data has names with hyphens and Python does not allow those as variable names. We do not modify the column names in the csv and instead use the functionality of FastAPI/Pydantic/etc to deal with this.
  • Several unit tests to test the API (one for the GET and two for POST, one that tests each prediction).

Build Docker Image

  • You can build an image, create and start a container from the dockerfiles using the following commands:
cd app

docker-compose up --build

The command above finds docker-compose file, builds the image then make it running.

  • You can then access the API on local host via port 8000:

localhost:8000 This shows the documentation page as default

  • You can interact with the API on localhost via: localhost:8000/docs

  • Here are some screenshots:

  • From the screenshot above, we diagnose a 24.4% tendency of having stroke.


For project the following tech stack, APIs, architecture was used and applied:

  • Python✅
  • FastAPI ✅
  • Docker Compose ✅
  • sklearn ✅


  • In order for the unit testing to be successful we used a json loading, so that the code is not hard coded only for the provided given data.

  • Therefore, during the unit testing (all the tests can be found under /test) we can test patients data using specific json inputs. All the endpoints retrieve the necessary data from the provided json.


The areas of improvement in this projects includes:

  • To build more endpoints
  • To build an endpoint for extracting feature importances (eg. through the use of the LIME algorithm)
  • To optimise machine learning models by experimenting (by properly storing the experiment configurations and results eg. Spreadsheet) - DVC/MLflow integration
  • web Hosting