Purpose

This exercise is about data organization, orchestration, and coding, including creating Docker images.

Aim of this exercise is to:

Evaluate your coding (e.g. data reorganizations)
Understand data orchestration capabilities (e.g. Docker)
Understand how you design a solution (overall thinking)

Exercise

Data

The data is daily COVID case data from the United States. The data is located at: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us which is a series of CSV files, one for each day. Information about the data is located at https://github.com/CSSEGISandData/COVID-19.

Instructions

Put code in a public code repository hosted on GitHub, or a private repository and add muschellij2 as a collaborator with read access

Deliverables

Note: the goal is the solution. If any steps below pose an unreasonable challenge at any time, and you need to get to the end result in a different way due to time constraints, please communicate that.

GitHub Repository/Actions

Create a GitHub repository for this exercise.
Create a Docker image that can read in the data from the day before, and compile a report/print out the cases for the day before. If the data is not there, print out a diagnostic message. If you are using R, you can use the rocker images as a base https://github.com/rocker-org/rocker-versioned2
Set up GitHub Actions to build this Docker image

Docker Image

Using this Docker image:

filter rows that are only in the United States,
take the mean cases (Confirmed variable) and deaths (Deaths) by state, averaging over counties (Admin2). Print this out in the action
Append the results to a file from the previous days’ results.
run a this pipeline on a schedule (daily) using GitHub Actions.

Discussion/writeup

Please provide a half/full page description (either separate or in a README) of:

the challenges in getting this up and running
improvements you’d make in this pipeline if more time were available or any issues with the solution and how you’d perform checks on it
additional cleaning you would consider performing on a data set like this.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
README.pdf		README.pdf
product-coding-exercise.Rproj		product-coding-exercise.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Purpose

Exercise

Data

Instructions

Deliverables

GitHub Repository/Actions

Docker Image

Discussion/writeup

About

Uh oh!

Releases

Packages

Uh oh!

StreamlineDataScience/product-coding-exercise

Folders and files

Latest commit

History

Repository files navigation

Purpose

Exercise

Data

Instructions

Deliverables

GitHub Repository/Actions

Docker Image

Discussion/writeup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages