Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about working with active cases instead of total cases? #29

Open
alexeev opened this issue Mar 30, 2020 · 22 comments
Open

How about working with active cases instead of total cases? #29

alexeev opened this issue Mar 30, 2020 · 22 comments

Comments

@alexeev
Copy link

alexeev commented Mar 30, 2020

I believe, ploting active cases instead of total confirmed cases would allow better comparison of effective measures against spread of covid-19 in different countries.

Because recovered people are not infectious, the current graph is biased towards countries with a greater fraction of healed (and died) people among total cases, like China and Korea. Bigger amount of closed cases actually makes the denominator bigger, but it is not government measure for containment, making countries that first got it look better than they actually are.

Working with active cases would also help with the second waves, otherwise invisible on the current chart.

@aatishb
Copy link
Owner

aatishb commented Apr 2, 2020

A few issues I see with this:

  1. Currently JHU doesn't provide active cases. They do provide recovered cases, so we can subtract these from the total number of cases. But I'm skeptical of how consistently these numbers are reported across countries. If there are large differences in reporting of recovered numbers across countries, we'd end up with misleading results.

  2. This change would cause the trajectories on the graph to move forwards and backwards (i.e. to traverse a loop. The end = zero active cases is the same as the beginning). This could be visually confusing or distracting for most people and difficult to quickly interpret / understand.

  3. Since this can only be done for cases (not deaths) the graph style would change when switching from confirmed cases <-> deaths, which may also be confusing.

@alexeev
Copy link
Author

alexeev commented Apr 4, 2020

Fair points, Aatish. I think, I should withdraw my proposal.

@exclipy
Copy link
Contributor

exclipy commented Apr 11, 2020

I think this view would be immensely valuable.

To quote Henry in his video: "But the spread of the disease only cares about two things: how many cases there are today and how many new cases there will be today." (accompanied by the illustration labeled "current # of infections".

As you say, this means the chart would move backwards, but I think that's perfect for visualizing the (possibly) cyclic nature of the pandemic:

  1. There are a few cases
  2. Few cases turns to many cases
  3. Many cases spur action
  4. Action reduces number of cases
  5. Go to 1.

When we visualize this, then we can demonstrate to the public when we go back to square 1 and, and provide a warning to not repeat step 2.

@exclipy
Copy link
Contributor

exclipy commented Apr 11, 2020

I put together a quick demo (code here):
http://htmlpreview.github.io/?https://raw.githubusercontent.com/exclipy/covidtrends/active-cases/index.html

What do you think?

@exclipy
Copy link
Contributor

exclipy commented Apr 12, 2020

To address the understandability problem, perhaps we could ask Henry to make a followup video explaining the phase diagram.

Something like https://youtu.be/p_di4Zn4wz4?t=662

@guyko81
Copy link

guyko81 commented Apr 14, 2020

I really like the demo from @exclipy (and the explanation why it would be useful), although I think it would be better to see with active cases proposed by @alexeev instead of the last 3 weeks new cases.
I don't see an issue with the reliability of these numbers as the recovered+diseased numbers are as consistent as the total numbers (not worse or better).

Although I think there should be an option to choose between different values on X axis (total <-> actual).
The current chart has the advantage that it shows the first wave very clearly but it won't show the second wave - as pointed out by @exclipy

@SomeSunlight
Copy link

Tx exclipy, I was looking for this view!

To argument 2) from aatishb: When we improve the diagram based on the valuable suggestion from alexeev, and points start to move back from right to left, we should not withdraw all, only because it does not look “nice”, or because people might be too stupid to understand it. I think it is opposite: The demo view from exclipy shows the reality even better than before, as any country hit by a wave will be “after the wave” like “before the wave”, no difference, and as long as there are still too many not immune yet...

@exclipy
Copy link
Contributor

exclipy commented Apr 18, 2020

Even if I must say so myself...

Wow, I'm looking at the latest data on my view today and there are some very interesting insights comparing the paths of different countries.

@UriGrod
Copy link

UriGrod commented May 3, 2020

I like @exclipy 's feature very much!
I came here to make this same suggestion. It's the most important view these days, because of 2nd waves and the role of visualisations to help the public understand that. without it, or something similar, this visualisation is no longer "the place to be" as it was for me in the past weeks, because all it is showing now is a continuation of the same win...

The best example IMO is China's Heilongjiang area from the end of March:

The original chart shows an increase rate during the 2nd wave which seems to be way higher than the 2-d doubling helper-line, but ofc the actual total number doesn't actually double so quickly. That helper line is relevant to total cases doubling only when the active and total cases are close...
In any case, it's not possible to assess the second wave at all. And if it still looks like a second wave at all, it's mainly because 1st wave didn't bring this region to high numbers.

But the same case using @exclipy 's feature (Customise to China, deselect all and select only Heilongjiang), or just look at this amazing screen grab (showing also how nicely it works on Linear as well):

Heilongjiang-Linear-cyclic

Wow! you can really see how the first wave died, and then nothing for a while, and then a second wave on that doubling-pr-2d line... almost copying the first wave. It's visually cyclic and enables early assessment of the second wave as such, identifying it as such from the start. This wouldn't change even if happened in e.g. Hubei, where numbers are much higher (BTW @exclipy I think there's an issue exposed by this example's last data point)

@aatishb 's concern about the line going backwards is valid, and same for a stagnant state (see the US last few days). However I strongly feel the people who come to look at a logarithmic chart of new on total, and have been discussing Covid data for so long (and will click on something to change the X-axis), are ready for the next step :)

There may be a way to make it a little less confusing:
@exclipy is it simple to arrange that the lines created have some kind of visual "expiration date", e.g. that they only show the past few weeks of data, or maybe the older the data the less prominent it becomes (being greyed our gradually). I think that would do 2 things:
(a) make the whole thing easier on the eyes
(b) reflect the reality of the infection rates (assuming no significant natural immunisation rates) which aren't very affected by the past but rather only of current rates X human behaviour. That, IMO, was the best advantage of the original graph, not caring about "days from" but focussing on infection rate.

@robertgalambos
Copy link

I like the "visual expiration date" style solution. I think that could help against the crowded lines.
Another suggestion. If the mouse hovers a given country it lights up. Which is good. Little arrows could be added to help to see the direction. If that is possible easily in this framework.

I also think that we should use the active cases. This bigger average window for the confirmed cases is just a workaround, to "estimate" the real active cases. If we could compare the the two solutions we could see if the active cases data source does behaves well enough to be displayed or not.

@robertgalambos
Copy link

I have downloaded the original CSV-s to be able to have quick plot, and test the graph with active cases. First I validated that I have the same graph when using "confirmed cases" on the X axis.
This is the result:
image
Should be the same as here:
https://aatishb.com/covidtrends/?location=Austria&location=Germany&location=Hungary&location=Italy&location=South+Korea&location=Spain&location=US&location=United+Kingdom

After that I have modified the X axis to show "active cases" only. Which was calculated based on the CSV in the following way:
active cases = confirmed cases - recovered - deadth
I got the following plot:
image
I think it helps to evaluate if a country has many confirmed cases, but we want to see how well it gets those numbers down.
South Korea is on the first just a dip, but on the second you actually see, how they bring down the active cases, and how many new cases are still there (in the last week).

  • Moving on the graph left means, there are more who recover than who get ill.
  • Moving down means there are less and less new cases.

I think the waves of a disease will plot circles on this graph, which would also make them comparable...is the second wave bigger...was it faster under control?
I also think that you can draw "safe regions" on the graph where the virus is under control...

What do you think?

PS: Sorry did not bothered too much with the plot coloring...no fading effect either...just a quick and dirty, how it would look like.

@robertgalambos
Copy link

As for another aspect. I have normalized the X axis with the countries population. As big countries have bigger infrastructure and should be able to handle bigger confirmed/active cases.
The data for the population is coming from here (there might be better sources)
https://www.worldometers.info/world-population/population-by-country/

As the US is above every other country on the diagonal, where is it really, when normalized with population size?
The first plot is the classic one with the "confirmed cases" X axis, normalized to case/100000 people:
image
As it can be seen on this plot, the US is not that different from other countries. The only problem is that its plot is not going down at the moment.

The second plot is one with the "active cases" X axis, also normalized to case/100000 people:
image
I think that with the normalization to population, the waves of each country can be better compared to each other. It would show how well the US compared to the UK, or Germany does...etc.

What do you think?

Would it make sense to update the webpage with the possibility to choose

  • Active cases on the X axis instead of confirmed cases
  • Being able to normalize via population, to make it more comparable?

@guyko81
Copy link

guyko81 commented May 24, 2020

I really like the charts (could have chosen a bit more different colours maybe?)
I think the problem comes from the code: it assumes that only 1 source arrives and an asynchronous function is written on that sole source. So it's not trivial to rewrite to use both confirmed cases and deaths together. At least it would require the rewriting of the main chart code.

@robertgalambos
Copy link

I have taken a look on the code. It seems so as you say. It does have some inner data model, but is not prepared for: taking all the input data, building a data model, modifying the data model, creating the plots. It is more a one go script...which needs some refactoring to modify...and also some sw architectural change...to make it work.

Yes with the colors it is dificult. Especially when you put 8-10 countries on the plot. The interactive web UI would help a lot I think, with hover event, highlight, and such stuff.
However since the above plots I have added title, axis title, spaced a little the colors out, zoomed in on the important part (like leaving the noise on the bottom left corner out of the plot).
But it is of course just a simple script that produces static plots.

@hellbovine
Copy link

Thanks for pointing me here Robert

Yours are exactly the graphs I´m looking for. It would be really nice if they were in the live version, a shame it isn´t easy to do.

@UriGrod
Copy link

UriGrod commented Jun 2, 2020

I like the active-case attempt by @robertgalambos very much. I still think that having the simpler "confirmed cases in the last 3 weeks" as the X-axis is better from a data-relevance point of view.

  1. As mentioned above, the active-case data will heavily depend on each country's case-closing criteria, as well as death rate (because one dies from Corona faster than one recovers from it), whereas the confirmed-3-week data will have the same noise-factor of the 1-w y axis, i.e. just each country's different testing regiment, scope etc.

  2. IMO the main objective of the visualisation, and the reason I like it so much, is the almost direct visualisation of R, infection rate, itself. But for a 2nd it fails to do so, and so the question here is whether 3w or active case as x-axis does that better for a second wave. @robertgalambos Would you create your graph for the Heilongjiang Region of China, where a 2nd wave occurred? Then we can clearly see whether thee visualisation helps discover and make sense of a 2nd wave.

@exclipy 's 3w x-axis nicely visualised (see gif in my comment above) how R was very much the same for a 2nd wave as it is for the 1st wave, until measures were taken. That IMO is a super important message which is unintuitive to so many of us (the "how can it be the same after we've already suffered so much...." argument).

@UriGrod
Copy link

UriGrod commented Jun 2, 2020

@aatishb I'd like again to upvote the 3w x-axis. I've been using @exclipy 's version to see Israel's 2nd wave unfolds. In the original graph one can see cases going up again, but no idea about the rate of infection. Whereas it looks like Israel is approaching Wave 1 infection rate but maybe a little less, when looking at:

2020-06-02_ISR-2nd_wave

@UriGrod
Copy link

UriGrod commented Jun 8, 2020

@aatishb an alternative idea, possibly super simple, is to have Total Cases since YYYY-MM-DD, where default is as it is now, since the start, but any date can be inserted. That would enable me to e.g. assess Israel's 2nd wave rate of infection aganist 10d doubling, by putting 2020-05-20 as start date for the calculation of "Total Confirmed Cases"

@robertgalambos
Copy link

@UriGrod The viruse does not cares about the confirmed cases in the last 3 weeks. It only cares about the active cases.

  1. Each country's "case-closing criteria" is as diverse and unknown as each country's confirmed case criteria. We do not know what is better to use on the X axis. And this can also change from country to country. Our best guess is in my opinion the active cases number.
  2. Yes I will create a graph for that too. Sorry for the slow answer but did not had that much time, in the past week.

"how can it be the same after we've already suffered so much....". One needs only to check how much of the population got the virus. It is not that big of percentage. Every sanity check calculation which I did for my country got it under 10%...And even with the current representative testing (where we do not have the final results yet) we still got under 10%. Even if check the death from New York with a death rate from 2%, it is under 10%. So heard immunity is quite far.

@UriGrod
Copy link

UriGrod commented Jun 12, 2020

@robertgalambos
technically, I think the real number that matters to assess infection rate is "active and not yet in isolation". Example: In country A people are in isolation between being tested and getting the result. In country B they are "assumed innocent" so not in isolation unless tested positive. If everything else is equal, infection rate will be higher in country B. Of course, many of the Active but not in ISO aren't even tested.

So whereas Confirmed Last 3w doesn't equal "Active but not in Iso", it might approximate it better than Active Cases, especially in countries where ppl get iso quickly. But I don't know this. I'm just saying it's an option. That's why it would be interesting to compare the Heilongjiang data with the 2 graphs. Because which one of thee metrics approximates better the number-of-actively-infecting-cases, is an empirical question.

"how can it be the same": is a false argument, ofc. I agree completely. I just made it to emphasize why it is super important to show the infection rate in a second wave. Seems though that no change will be made to @aatishb 's lovely website. There's not much action here anymore.

@robertgalambos
Copy link

@UriGrod Made some plots for Heilongjiang. However this has quite few data, and so it is harder to see. I made normal and log-log plots. On the log-log plot the 0 is not handled, it is just missing as point. Did not bother with that.

Normal plot:
image

Log-log plot:
image

The script as .m (matlab/octave) hard-coded script renamed as .txt:
plotterspec.txt

I also made a plot for Israel because it has more data, and we might see better which is better from the two plots.

Normal plot:
image

Log-log plot (a little zoomed in):
image

The script as .m (matlab/octave) hard-coded script renamed as .txt:
plotterspec2.txt

They are similar...however the 3 week confirmed on the X axis pulls the data in the left direction, which could mislead that the situation is better, than it really is. That might could be a problem if comparing countries.

What do you think?

@BYT3M3
Copy link

BYT3M3 commented Jun 29, 2020

I think we can all agree that whether or not the total number of active cases is accurate, it is far better to approximate the total number of active cases than it is to plot the new cases versus the total number of confirmed cases. This is because with the total number of confirmed cases on the X-axis the graph will always be misleading and will always underestimate the growth rate of the epidemic. This underestimation was not much of a problem at the start of the pandemic, as the difference between the total number of confirmed cases and the total number of active cases was small.

We have to acknowledge that the graph of aatishb as it is, is inherently misleading and can lead to dangerous underestimations. It has to go. Either we take some kind of measure to estimate the number of active cases, e.g. the total number of confirmed cases in the past X weeks (which implicitly assumes there is some kind of meaningful average to be put on the contagious period of a patient and does not assume that a confirmed case can't be infected again, which I consider to be justifiable assumptions) or we hold off on publishing a more accurate graph when more accurate data is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants