Skip to content

SamusRam/covid-19-person-level-eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.

Moreover, the easiest way how to carry on with analyzing this data without the need to set up local environment is to fork a corresponding Kaggle kernel.

Project Motivation

There are mainly cumulative COVID-19 statistics for larger populations, e.g. country-level counts of infected people. Being valuable, it tells us little about COVID-19 situation for an age/gender group our parents belong to, or the group of our partners, or our siblings’ age/gender. In this project I've performed and shared a prelimenary analysis of currently available person-level information about age and gender groups in The Czech Republic.

Goals

  • Firstly, we'll do exploratory data analysis. In the EDA the main goal would be to drill down into the persol-level details as much as possible. The detailed drill down would be illustrated on Czechia due to better data at hand.

  • Secondly, we'll target the following important question: Does COVID-19 attack all age-gender groups evenly?

  • Thirdly, we'll attempt to quantitatively compare by how much are selected age-gender groups more at risk of contracting COVID-19.

File Descriptions

There is 1 main notebook with its own Content enabling easier navigation between different parts of the analysis.

Data about COVID-19 patients in The Czech Republic are openly available at the Ministry of Health of the Czech Republic webpage. To put the COVID-19 situation into demographics context, we use data from the Czech Statistical Office website. For brief comparison with population of Canada, data from Roche UNCOVER COVID-19 Challenge are used, as well as demographics data from Statistics Canada website. Novel Corona Virus 2019 Dataset by SRK are used for data quality validation.

Results

The main findings have been summarized in a Medium post.

Licensing, Authors, Acknowledgements

Must give credit to the Ministry of Health of the Czech Republic, the Czech Statistical Office, Kaggle platform including its partners and community for the data.
Please, feel free to use the code as you like and, if interested, to carry on with the analysis.

About

Analysis of COVID-19 person-level data from The Czech Republic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published