Skip to content

A self-driven project utilizing ARIMA, Seq2Seq, and XGBoost to help design the COVID19 forecasting algorithm. Data sources are from Kaggle Competition and JHU CSSE.

Notifications You must be signed in to change notification settings

Olliang/COVID-19-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19-Forecasting


Introduction


This is a project initiated by the COVID19 Global Forecasting Kaggle competition intending to utilize data science to forecast the number of Cronavirus spread around the world. Pandemic is a heavy topic for everyone. I wanted to contribute with my knowledge in data science to potentially help discover the patterns of the Coronavirus spread and important features that affects the spread. Hopefully my findings can be helpful to lead some regions to take the correct actions.

The techniques I am planning to use for forecasting are:

  • ARIMA
  • Seq2Seq + LSTM (Deep Learning)
  • Xgboost (Machine Learning)

Main Files


  • covid19 - EDA.ipynb - Notebook performing Exploratory Data Analysis on Global comfirmed cases and deaths before June 10th
  • covid19 - ARIMA.ipynb - Notebook performing ARIMA algorithms to forecast Global comfirmed cases and deaths

Data Sources

(Note: You can find all those data from the data folder on this GITHUB)

  1. Kaggle: COVID19 Global Forecasting (Week 5)
  • train.csv
  • test.csv
  • submission.csv

  1. JHU CSSE COVID-19 Dataset
  • time_series_covid19_confirmed_global.csv
  • time_series_covid19_deaths_global.csv
  • time_series_covid19_recovered_global.csv
  • time_series_covid19_confirmed_US.csv

About

A self-driven project utilizing ARIMA, Seq2Seq, and XGBoost to help design the COVID19 forecasting algorithm. Data sources are from Kaggle Competition and JHU CSSE.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published