# Perspectives on COVID-19: State Data

I am curious about COVID-19, and this notebook is my effort to find context and perspective from responsible, public data.

All data used is from:

- [The *New York Times*](https://github.com/nytimes/covid-19-data)
- [“Deaths and Mortality”, CDC](https://www.cdc.gov/nchs/fastats/deaths.htm)
- [“Stats of the State of South Carolina”, CDC](https://www.cdc.gov/nchs/pressroom/states/southcarolina/southcarolina.htm)
- [South Carolina Department of Health and Environmental Control (DHEC)](https://www.scdhec.gov/vital-records/parentage/sc-vital-records-data-and-statistics)
- [The Office of National Statistics, UK](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales)

## Configuring Libraries for the Almond Kernel

First, we'll make a bintray repository with libraries available to your almond kernel.

In [16]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")

interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [17]:
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

[32mimport [39m[36m$ivy.$                                      
[39m
[32mimport [39m[36mplotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [18]:
import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source

import java.text.SimpleDateFormat
import java.util.Date


[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}

[39m
[32mimport [39m[36mjava.io.File
[39m
[32mimport [39m[36mjava.io.PrintWriter

[39m
[32mimport [39m[36mscala.io.Source

[39m
[32mimport [39m[36mjava.text.SimpleDateFormat
[39m
[32mimport [39m[36mjava.util.Date
[39m

## Useful Functions

Save a string to a names file:

In [19]:
def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

defined [32mfunction[39m [36msaveString[39m

Like `.split`, but preserving the character we split on:

In [20]:
def splitWithSplitter(text: String, puncs: String): Vector[String] = {
	//val regexWithSplitter = s"((?<=${puncs})|(?=${puncs}))"
    val regexWithSplitter = s"((?<=${puncs}))"
	text.split(regexWithSplitter).toVector.filter(_.size > 0)
}

defined [32mfunction[39m [36msplitWithSplitter[39m

Pretty Print Things:

In [21]:
def showMe(v:Any):Unit = {
  v match {
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

defined [32mfunction[39m [36mshowMe[39m

## Load Some Data

Load up-to-date data from the NY Times. Source: <https://github.com/nytimes/covid-19-data>.

In [22]:
val dataLines: Vector[String] = {
    scala.io.Source.fromURL("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv").mkString.split("\n").toVector
}

// quick test
val badLines = dataLines.filter( l => {
    l.split(",").size != 5
})

assert ( badLines.size == 0 )

[36mdataLines[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"date,state,fips,cases,deaths"[39m,
...
[36mbadLines[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m()

## Comparative Data

Source for USA Data: [“Deaths and Mortality”, CDC](https://www.cdc.gov/nchs/fastats/deaths.htm).

Source for South Carolina Data: [“Stats of the State of South Carolina”, CDC](https://www.cdc.gov/nchs/pressroom/states/southcarolina/southcarolina.htm)

### USA

Source for USA Data: [“Deaths and Mortality”, CDC](https://www.cdc.gov/nchs/fastats/deaths.htm).

In [23]:
val usa_2017_Total_Deaths: Int = 2813502

// Chronic lower respiratory diseases
val usa_2017_Total_Deaths_CLRD: Int = 160201

// Influenza and Pneumonia
val usa_2017_Total_Deaths_IPn: Int = 55672

// All respiratory-related deaths
val usa_2017_Total_Deaths_Resp: Int = usa_2017_Total_Deaths_CLRD + usa_2017_Total_Deaths_IPn

// Suicide
val usa_2017_Suicide_Total: Int = 47173

// Heart Disease
val usa_2017_Heart_Total: Int = 647457

// Accidents
val usa_2017_Accident_Total: Int = 169936

// Cancer
val usa_2017_Cancer_Total: Int = 599108



[36musa_2017_Total_Deaths[39m: [32mInt[39m = [32m2813502[39m
[36musa_2017_Total_Deaths_CLRD[39m: [32mInt[39m = [32m160201[39m
[36musa_2017_Total_Deaths_IPn[39m: [32mInt[39m = [32m55672[39m
[36musa_2017_Total_Deaths_Resp[39m: [32mInt[39m = [32m215873[39m
[36musa_2017_Suicide_Total[39m: [32mInt[39m = [32m47173[39m
[36musa_2017_Heart_Total[39m: [32mInt[39m = [32m647457[39m
[36musa_2017_Accident_Total[39m: [32mInt[39m = [32m169936[39m
[36musa_2017_Cancer_Total[39m: [32mInt[39m = [32m599108[39m

### South Carolina

Source for 2018 South Carolina Data: [“Stats of the State of South Carolina”, CDC](https://www.cdc.gov/nchs/pressroom/states/southcarolina/southcarolina.htm) and [SC DHEC](https://www.scdhec.gov/vital-records/parentage/sc-vital-records-data-and-statistics)

In [24]:
// Total Deaths
val sc_2018_Total_Deaths: Int = 50633

// Chronic Lower Respiratory Disease
val sc_2017_Total_Deaths_CLRD: Int = 2990

// Influenza and Pneumonia
val sc_2017_Total_Deaths_IPn: Int = 882

// All Respiratory-Related Deaths
val sc_2017_Total_Deaths_All_Resp = sc_2017_Total_Deaths_CLRD + sc_2017_Total_Deaths_IPn

[36msc_2018_Total_Deaths[39m: [32mInt[39m = [32m50633[39m
[36msc_2017_Total_Deaths_CLRD[39m: [32mInt[39m = [32m2990[39m
[36msc_2017_Total_Deaths_IPn[39m: [32mInt[39m = [32m882[39m
[36msc_2017_Total_Deaths_All_Resp[39m: [32mInt[39m = [32m3872[39m

## Make Data Structures

For State data-points from the NY Times:

In [25]:
case class StateDatum( date: String, state: String, cases: Int, deaths: Int)

defined [32mclass[39m [36mStateDatum[39m

For Aggregate data (just a Vector of the above):

In [26]:
case class RunningTally( days: Vector[StateDatum] )

defined [32mclass[39m [36mRunningTally[39m

A daily snapshot (for the USA or a single state)

In [27]:
case class DailySnapshot( date: String, newCases: Int, newDeaths: Int, totalCases: Int, totalDeaths: Int)

defined [32mclass[39m [36mDailySnapshot[39m

## Load Data

In [28]:
val vst: Vector[StateDatum] = dataLines.tail.map( dl => {
    val fields: Vector[String] = dl.split(",").toVector
    val date: String = fields(0)
    val state: String = fields(1)
    val cases: Int = fields(3).toInt
    val deaths: Int = fields(4).toInt
    StateDatum(date, state, cases, deaths)
})

val rt = RunningTally(vst)


[36mvst[39m: [32mVector[39m[[32mStateDatum[39m] = [33mVector[39m(
  [33mStateDatum[39m([32m"2020-01-21"[39m, [32m"Washington"[39m, [32m1[39m, [32m0[39m),
...
[36mrt[39m: [32mRunningTally[39m = [33mRunningTally[39m(
  [33mVector[39m(
...

## Data Functions

Taking, by default, our running-tally data (`rt`) and an optional `Option[String]` that can specify a state (defaults to `None`), return a `Vector[DailySnapshot]`, which will include new cases, new deaths, total cases, and total deaths.

In [29]:
def totalNewDaily(data: Vector[StateDatum] = rt.days, state: Option[String] = None): Vector[DailySnapshot] = {
    val sortedData: Vector[StateDatum] = {
        val filtered: Vector[StateDatum] = {
            state match {
                case Some(s) => data.filter(_.state == s)
                case None => {
                    val groupedByDay: Vector[(String, Vector[StateDatum])] = {
                        data.groupBy(_.date).toVector
                    }
                    val merged: Vector[StateDatum] = groupedByDay.map( gbd => {
                        val vec = gbd._2
                        val vecDate: String = vec.head.date
                        val vecState: String = "USA"
                        val vecCases: Int = vec.map(_.cases).sum
                        val vecDeaths: Int = vec.map(_.deaths).sum
                        StateDatum(vecDate, vecState, vecCases, vecDeaths)
                    })
                    merged
                }
            }
        }
        val sorted: Vector[StateDatum] = filtered.sortBy(_.date)
        sorted
    }
    // We don't want a running, cumulative tally, but new cases/deaths each day
    sortedData.zipWithIndex.map( sd => {
        val d: StateDatum = sd._1
        val i: Int = sd._2
        val newCases: Int = {
            if (i == 0) d.cases
            else {
                val totalToday: Int = d.cases
                val totalPrev: Int = {
                    sortedData(i-1).cases
                }
                totalToday - totalPrev
            }
        }
        val newDeaths: Int = {
            if (i == 0) d.deaths
            else {
                val totalToday: Int = d.deaths
                val totalPrev: Int = {
                    sortedData(i-1).deaths
                }
                totalToday - totalPrev
            }
        }
        DailySnapshot(d.date, newCases, newDeaths, d.cases, d.deaths)
    })
    
}

defined [32mfunction[39m [36mtotalNewDaily[39m

## Compare Any Three States: Total Deaths

Editd the value for `states`, immediately below, to specify three states to compare.

In [47]:
val states:Vector[String] = Vector(
    "Washington",
    "New York",
    "Louisiana"
)

case class StateNumbers(index: Int, state:String, nums:Vector[Int])

// We need to start all the states on the same day…
def padVector(shortVector: Vector[Int], maxSize: Int): Vector[Int] = {
    Vector(1)
}

// Get a data for each state
val state_death_vec: Vector[StateNumbers] = states.zipWithIndex.map( z => {
    val s: String = z._1
    val i: Int = z._2
    StateNumbers(i, s, totalNewDaily(state = Some(s)).map(_.totalDeaths))
})

// Normalize the size
val state_death_vec_normalized: Vector[StateNumbers] = {
    val maxSize = state_death_vec.sortBy(_.nums.size).last.nums.size
    state_death_vec.map( sdv => {
        val s = sdv.state
        val n = sdv.nums
        val i = sdv.index
        val howMany = maxSize - n.size
        val adder = Vector.fill(howMany)(0)
        val newN = adder ++ n
        StateNumbers(i, s, newN)
    })
}

val colors: Vector[(Int, Int, Int, Double)] = {
    Vector(
        (204,0,0,0.95),
        (0,0,204,0.95),
        (0,204,0,0.95),
    )
}

val statePlotters: Vector[Scatter] = state_death_vec_normalized.map( sd => {
    Scatter(
  (1 to sd.nums.size),
  sd.nums,
  name = s"${sd.state}",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(
        colors(sd.index)._1,
        colors(sd.index)._2,
        colors(sd.index)._3,
        colors(sd.index)._4
    ),
  )
)
})

val data = statePlotters

val layout = Layout("COVID-19 Deaths: " + states.mkString(", "))

plot(data, layout)

/*
val total_deaths_0: Vector[Int] = totalNewDaily(state = None).map(_.totalDeaths)

val curve_total_deaths_trace = Scatter(
  (1 to curve_total_deaths.size),
  curve_total_deaths,
  name = "2020 Covid Death-Toll",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(204, 0, 0, 0.95),
    line = Line(
      color = Color.RGBA(217, 217, 217, 1.0),
      width = 1.0
    ),
    symbol = Symbol.Circle(),
    size = 3
  )
)

val data = Seq(curve_total_deaths_trace)

val layout = Layout("Total US Deaths")

plot(data, layout)
*/

[36mstates[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m([32m"Washington"[39m, [32m"New York"[39m, [32m"Louisiana"[39m)
defined [32mclass[39m [36mStateNumbers[39m
defined [32mfunction[39m [36mpadVector[39m
[36mstate_death_vec[39m: [32mVector[39m[[32mStateNumbers[39m] = [33mVector[39m(
  [33mStateNumbers[39m(
...
[36mstate_death_vec_normalized[39m: [32mVector[39m[[32mStateNumbers[39m] = [33mVector[39m(
  [33mStateNumbers[39m(
...
[36mcolors[39m: [32mVector[39m[([32mInt[39m, [32mInt[39m, [32mInt[39m, [32mDouble[39m)] = [33mVector[39m(
  ([32m204[39m, [32m0[39m, [32m0[39m, [32m0.95[39m),
...
[36mstatePlotters[39m: [32mVector[39m[[32mScatter[39m] = [33mVector[39m(
  [33mScatter[39m(
...
[36mdata[39m: [32mVector[39m[[32mScatter[39m] = [33mVector[39m(
  [33mScatter[39m(
...
[36mlayout[39m: [32mLayout[39m = [33mLayout[39m(
  [33mSome[39m([32m"COVID-19 Deaths: Washington, New York, Louisiana"[3

## South Carolina Data

Source for 2018 South Carolina Data: [“Stats of the State of South Carolina”, CDC](https://www.cdc.gov/nchs/pressroom/states/southcarolina/southcarolina.htm) and [SC DHEC](https://www.scdhec.gov/vital-records/parentage/sc-vital-records-data-and-statistics)

In [52]:
// Total Deaths
val sc_2018_Total_Deaths: Int = 50633

// Chronic Lower Respiratory Disease
val sc_2017_Total_Deaths_CLRD: Int = 2990

// Influenza and Pneumonia
val sc_2017_Total_Deaths_IPn: Int = 882

// All Respiratory-Related Deaths
val sc_2017_Total_Deaths_All_Resp = sc_2017_Total_Deaths_CLRD + sc_2017_Total_Deaths_IPn

// SC Covid Death Toll
val sc_covid_death_toll = totalNewDaily(state = Some("South Carolina")).map(_.totalDeaths)

// SC Total Cases
val sc_covid_total_cases = totalNewDaily(state = Some("South Carolina")).map(_.totalCases)




[36msc_2018_Total_Deaths[39m: [32mInt[39m = [32m50633[39m
[36msc_2017_Total_Deaths_CLRD[39m: [32mInt[39m = [32m2990[39m
[36msc_2017_Total_Deaths_IPn[39m: [32mInt[39m = [32m882[39m
[36msc_2017_Total_Deaths_All_Resp[39m: [32mInt[39m = [32m3872[39m
[36msc_covid_death_toll[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m0[39m,
...
[36msc_covid_total_cases[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m2[39m,
...

## SC: COVID vs. Flu-Season

Throughout a year like 2017, the death-toll for Influenzas and Pneumonias mounts, day to day. Here we chart that death-toll, assuming the annual deaths are evenly distributed across the year. 

But the flu-season is, as defined by the CDC, 6 months, from October through March. So we can also compare daily flu deaths and their mounting toll, over a six-month season.

Source for 2018 South Carolina Data: [“Stats of the State of South Carolina”, CDC](https://www.cdc.gov/nchs/pressroom/states/southcarolina/southcarolina.htm) and [SC DHEC](https://www.scdhec.gov/vital-records/parentage/sc-vital-records-data-and-statistics)

In [51]:


val sc_2018_mounting_flu_toll: Vector[Int] = {
    val daily = sc_2017_Total_Deaths_IPn / 365
    (1 to sc_covid_death_toll.size).toVector.map( i => {
        if (i == 1) daily
        else daily + (daily * (i - 1))
    })
}

val sc_2018_mounting_flu_toll_6mo: Vector[Int] = {
    val daily = sc_2017_Total_Deaths_IPn / 365 * 2
    (1 to sc_covid_death_toll.size).toVector.map( i => {
        if (i == 1) daily
        else daily + (daily * (i - 1))
    })
}

val sc_2018_mounting_flu_toll_trace = Scatter(
  (1 to sc_covid_death_toll.size),
  sc_2018_mounting_flu_toll,
  name = "2018 Influenza Deaths",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(204, 204, 0, 0.95),
    line = Line(
      color = Color.RGBA(217, 217, 217, 1.0),
      width = 1.0
    ),
    symbol = Symbol.Circle(),
    size = 3
  )
)

val sc_2018_mounting_flu_toll_6mo_trace = Scatter(
  (1 to sc_covid_death_toll.size),
  sc_2018_mounting_flu_toll_6mo,
  name = "2018 Influenza Deaths (6 mo. season)",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(0, 204, 204, 0.95),
    line = Line(
      color = Color.RGBA(217, 217, 217, 1.0),
      width = 1.0
    ),
    symbol = Symbol.Circle(),
    size = 3
  )
)

val sc_2020_covid_toll_trace = Scatter(
  (1 to sc_covid_death_toll.size),
  sc_covid_death_toll,
  name = "2020 Covid Deaths",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(0, 0, 204, 0.95),
    line = Line(
      color = Color.RGBA(217, 217, 217, 1.0),
      width = 1.0
    ),
    symbol = Symbol.Circle(),
    size = 3
  )
)

val data = Seq(
        sc_2018_mounting_flu_toll_trace,
        sc_2018_mounting_flu_toll_6mo_trace,
        sc_2020_covid_toll_trace
)

val layout = Layout("COVID Death Toll vs. Flu Season, 2018")

plot(data, layout)

[36msc_2018_mounting_flu_toll[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m2[39m,
...
[36msc_2018_mounting_flu_toll_6mo[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m4[39m,
...
[36msc_2018_mounting_flu_toll_trace[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36msc_2018_mounting_flu_toll_6mo_trace[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36msc_2020_covid_toll_trace[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36mdata[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
...
[36mlayout[39m: [32mLayout[39m = [33mLayout[39m(
  [33mSome[39m([32m"COVID Death Toll vs. Flu Season, 2018"[39m),
...
[36mres50_7[39m: [32mString[39m = [32m"plot-cd67cda4-d14b-49b1-ba05-0bfbe862157a"[39m

## SC Case-Fatality-Rate

A rolling calculation of deaths/cases.

In [53]:
val cfr: Vector[Double] = sc_covid_death_toll.zip(sc_covid_total_cases).map( c => {
    c._1.toDouble / c._2.toDouble
})


val case_fatality_rate = Scatter(
  (1 to cfr.size),
  cfr,
  name = "Case Fatality Rate",
  mode = ScatterMode(ScatterMode.Lines),
  marker = Marker(
    color = Color.RGBA(204, 0, 0, 0.95),
    line = Line(
      color = Color.RGBA(217, 217, 217, 1.0),
      width = 1.0
    ),
    symbol = Symbol.Circle(),
    size = 3
  )
)

val data = Seq(case_fatality_rate)

val layout = Layout("SC: 2020 Covid CFR")

plot(data, layout)

[36mcfr[39m: [32mVector[39m[[32mDouble[39m] = [33mVector[39m(
  [32m0.0[39m,
...
[36mcase_fatality_rate[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36mdata[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
...
[36mlayout[39m: [32mLayout[39m = [33mLayout[39m(
  [33mSome[39m([32m"SC: 2020 Covid CFR"[39m),
...
[36mres52_4[39m: [32mString[39m = [32m"plot-3ffbc8b0-b5cd-4596-ba4b-13d19b027093"[39m