Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Course project for UCLA CS145, Introduction to Data Mining

Running the Model

The main driver script is run.py. It takes in a single argument, the ML model type: [NN, PR, AR, ARIMA, ARMA, MA, SARIMA]

Models used for prediction:

PR: Polynomial Regression
NN: Neural Network
AR: Auto Regression
MA: Moving Average
ARIMA
ARMA
SARIMA

ex)

py run.py NN

This will generate a result csv file, matching the Kaggle submission format. To change any configurations, refer to the constant variables declared in run.py, polynomial_regression.py, neural_network.py, or prediction_model.py (superclass of all prediction models).

Initializing Input Data

Partitioning daily report data by states

To transform input data, run:

python transform_input.py

It will then create a csv file for each states, each containing its state's daily report. Miscellaneous states from the input data set are ignored

NOTE Each time this script is ran, all the <state>.csv files are truncated an refilled from the daily report files.

Data format (copied from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data)

USA daily state reports (csse_covid_19_daily_reports_us)

This table contains an aggregation of each USA State level data.

Create the Test.csv

To create the test.csv file, run:

python create_test_csv.py

Get MAPE

To get MAPE of the prediction vs truth data, run:

python mape.py

File naming convention

MM-DD-YYYY.csv in UTC.

Field description

Province_State - The name of the State within the USA.
Country_Region - The name of the Country (US).
Last_Update - The most recent date the file was pushed.
Lat - Latitude.
Long_ - Longitude.
Confirmed - Aggregated case count for the state.
Deaths - Aggregated death toll for the state.
Recovered - Aggregated Recovered case count for the state.
Active - Aggregated confirmed cases that have not been resolved (Active cases = total cases - total recovered - total deaths).
FIPS - Federal Information Processing Standards code that uniquely identifies counties within the USA.
Incident_Rate - cases per 100,000 persons.
People_Tested - Total number of people who have been tested.
People_Hospitalized - Total number of people hospitalized. (Nullified on Aug 31, see Issue #3083)
Mortality_Rate - Number recorded deaths * 100/ Number confirmed cases.
UID - Unique Identifier for each row entry.
ISO3 - Officialy assigned country code identifiers.
Testing_Rate - Total test results per 100,000 persons. The "total test results" are equal to "Total test results (Positive + Negative)" from COVID Tracking Project.
Hospitalization_Rate - US Hospitalization Rate (%): = Total number hospitalized / Number cases. The "Total number hospitalized" is the "Hospitalized – Cumulative" count from COVID Tracking Project. The "hospitalization rate" and "Total number hospitalized" is only presented for those states which provide cumulative hospital data. (Nullified on Aug 31, see Issue #3083)

Neural Network Model

For more details of Neural Network Model please refer to neural_network.py.

In this class we train based on Neural Network and we use GridSearch to find the best parameters

You can add/remove parameters and their values to see how to find the optimal NN settings. Please only modify the following in neural_network.py

self.parameters = {
    'hidden_layer_sizes': [(80, 80), (70, 70), (60, 60)],
    'activation': ['relu'],
    'solver': ['adam'],
    'learning_rate': ['adaptive'],
    'learning_rate_init': [0.0001, 0.001, 0.005, 0.0005]
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.vscode		.vscode
deprecated		deprecated
project		project
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
create_test_csv.py		create_test_csv.py
mape.py		mape.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Running the Model

Models used for prediction:

ex)

Initializing Input Data

Partitioning daily report data by states

Data format (copied from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data)

Create the Test.csv

Get MAPE

File naming convention

Field description

Neural Network Model

About

Releases

Packages

Contributors 4

Languages

alimz758/Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Folders and files

Latest commit

History

Repository files navigation

Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Running the Model

Models used for prediction:

ex)

Initializing Input Data

Partitioning daily report data by states

Data format (copied from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data)

Create the Test.csv

Get MAPE

File naming convention

Field description

Neural Network Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages