Predicting COVID-19 ICU Admissions
Brazil has been one of the countries most affected by the COVID-19 pandemic, with more than 16 million confirmed cases and 454 429 confirmed deaths as of May 26, 2021. The country was unprepared for the pandemic, and was unable to respond adequately due to the strain on hospital capacity.

 

A data science team at a top-tier hospital in Brazil has released a dataset on the Kaggle platform which seeks interesting solutions and findings from the public. The team is using ML to help reduce the strain on hospital's ICU beds, where the objective is to develop a ML model to predict if a patient of a confirmed COVID-19 case will require admission to the ICU.

 

The dataset collected for the research is publicly available on the datascience platform Kaggle. A full description of the dataset can be found on the platform:  https://www.kaggle.com/datasets/S%C3%ADrio-Libanes/covid19

 

 

Tasks & Deliverables

 

In this assignment, you are challenged to perform a full lifecycle ML model development according to the objective of the dataset, which includes the following elements:

Conduct EDA (Exploratory Data Analysis) to gain an understanding of the Data
Preprocess the Data to prepare it for the ML Model
Develop a few candidate ML models and pick the most promising one. Evaluate it based on a variety of metrics as well as analysing it’s performance (does it overtrain, can we improve it by changing our data preprocessing etc.)
Deliverables:
Upload your code to a github repository as a Jupyter Notebook. Use the commit system to update your progress whenever you are working on the progress (rather than just upload the file at the end). Ensure you have a readme that explains what you are doing.
Executive Presentation: Prepare a presentation for the board of executives where you explain why your model can help save lives, why the hospital should start using it and present it’s performance as well as trends in the data you discovered during EDA.
Research Report: Write a paper with up to 2000 words (this can be less as long as the important information is included) where you present your methods and finding to an expert audience. Your paper should roughly follow the structure:
Introduction (why are we analysing the data/What is the problem)
Data Description (describe the data you are working with in detail)
Methods (what methods are you using for your research)
Machine Learning model development (what decisions did you make, why?)
Performance analysis
Conclusion
 

 

 

The project will be marked based on the following assessment criteria:

 

  Assessment Elements

  Marks

  Exploratory Data Analysis (EDA)

  25%

  Data Preparation for ML

  15%

  ML Model Development

  10%

  Github Repository

  10%

  Executive Presentation

  20%

  Research Report

  20%

  Total

  100%

 

 

Proficient (95+)

 

The student confidently applies all concepts learned in the course. They have made the project their own and have gone beyond what would normally be expected. Their research is completely independent with very well explained methodology. The code is efficient and clean with complete docstring commenting and following of PEP8 guidelines. The text could reasonably come from a Bachelors/Graduate Data Science student and contains all the elements one would expect from a “real” research report. The presentation is creative, shows the data accurately whilst being convincing and telling the story of a successful project. They student has exceeded expectations.

 

 

Competence (85-94)

 

The student understands all basic and the majority of advanced concepts. They use the learned skills creatively and take on the project independently. Interesting results are found, and their methodology makes logical sense. The code executes without errors and proper commenting technique (e.g. docstring/PEP8) is used. The text is structured as a research report and the presentation is at a level where it would be suitable for a work environment. The student shows competence in all areas of the taught material.

 

Average (80-84)

 

The student understands the basic concepts as well as some of the more advanced material. The code executes without obvious errors. It is written in basic but clean syntax and has regular commenting. The student shows self-motivated analysis of the data and conducts part of the research somewhat independently. The writing is good with a decent structure. Few mistakes are found and some creative thinking as well as accurate naming of concepts is used.

 

Basic (70-79)

 

The student understands all the basic concepts but still struggles with some of the advanced material. The code executes with little to no errors. Whilst not being the most efficient/cleanest code it is understandable and has decent commenting. The student finds all the basic trends and has done some self-motivated analysis. The report is average to slightly above average and contains only few mistakes. There is good reason to belief that the student can join a business as an early-career Data Scientist in the (near) future.

 

Novice (5//0-69)

 

Has an understanding of the very basic concepts in DataScience. The code executes with few errors but is not efficient or cleanly written – only few comments are present. The analysis is basic and does not go beyond the obvious. The writing is below average. Overall, the student does not show sufficient skills to be considered an early career Data Scientist.

 

Fail (0-49)

 

There are clear gaps even in the understanding of basic concepts. The code has multiple errors and is very difficult to understand. Little to no comments exist. The writing is sloppy and various conceptual errors as well as a lack of grammar and spelling can be found.

 

 

 

 