-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How about working with active cases instead of total cases? #29
Comments
A few issues I see with this:
|
Fair points, Aatish. I think, I should withdraw my proposal. |
I think this view would be immensely valuable. To quote Henry in his video: "But the spread of the disease only cares about two things: how many cases there are today and how many new cases there will be today." (accompanied by the illustration labeled "current # of infections". As you say, this means the chart would move backwards, but I think that's perfect for visualizing the (possibly) cyclic nature of the pandemic:
When we visualize this, then we can demonstrate to the public when we go back to square 1 and, and provide a warning to not repeat step 2. |
I put together a quick demo (code here): What do you think? |
To address the understandability problem, perhaps we could ask Henry to make a followup video explaining the phase diagram. Something like https://youtu.be/p_di4Zn4wz4?t=662 |
I really like the demo from @exclipy (and the explanation why it would be useful), although I think it would be better to see with active cases proposed by @alexeev instead of the last 3 weeks new cases. Although I think there should be an option to choose between different values on X axis (total <-> actual). |
Tx exclipy, I was looking for this view! To argument 2) from aatishb: When we improve the diagram based on the valuable suggestion from alexeev, and points start to move back from right to left, we should not withdraw all, only because it does not look “nice”, or because people might be too stupid to understand it. I think it is opposite: The demo view from exclipy shows the reality even better than before, as any country hit by a wave will be “after the wave” like “before the wave”, no difference, and as long as there are still too many not immune yet... |
Even if I must say so myself... Wow, I'm looking at the latest data on my view today and there are some very interesting insights comparing the paths of different countries. |
I like @exclipy 's feature very much! The best example IMO is China's Heilongjiang area from the end of March: The original chart shows an increase rate during the 2nd wave which seems to be way higher than the 2-d doubling helper-line, but ofc the actual total number doesn't actually double so quickly. That helper line is relevant to total cases doubling only when the active and total cases are close... But the same case using @exclipy 's feature (Customise to China, deselect all and select only Heilongjiang), or just look at this amazing screen grab (showing also how nicely it works on Linear as well): Wow! you can really see how the first wave died, and then nothing for a while, and then a second wave on that doubling-pr-2d line... almost copying the first wave. It's visually cyclic and enables early assessment of the second wave as such, identifying it as such from the start. This wouldn't change even if happened in e.g. Hubei, where numbers are much higher (BTW @exclipy I think there's an issue exposed by this example's last data point) @aatishb 's concern about the line going backwards is valid, and same for a stagnant state (see the US last few days). However I strongly feel the people who come to look at a logarithmic chart of new on total, and have been discussing Covid data for so long (and will click on something to change the X-axis), are ready for the next step :) There may be a way to make it a little less confusing: |
I like the "visual expiration date" style solution. I think that could help against the crowded lines. I also think that we should use the active cases. This bigger average window for the confirmed cases is just a workaround, to "estimate" the real active cases. If we could compare the the two solutions we could see if the active cases data source does behaves well enough to be displayed or not. |
I have downloaded the original CSV-s to be able to have quick plot, and test the graph with active cases. First I validated that I have the same graph when using "confirmed cases" on the X axis. After that I have modified the X axis to show "active cases" only. Which was calculated based on the CSV in the following way:
I think the waves of a disease will plot circles on this graph, which would also make them comparable...is the second wave bigger...was it faster under control? What do you think? PS: Sorry did not bothered too much with the plot coloring...no fading effect either...just a quick and dirty, how it would look like. |
As for another aspect. I have normalized the X axis with the countries population. As big countries have bigger infrastructure and should be able to handle bigger confirmed/active cases. As the US is above every other country on the diagonal, where is it really, when normalized with population size? The second plot is one with the "active cases" X axis, also normalized to case/100000 people: What do you think? Would it make sense to update the webpage with the possibility to choose
|
I really like the charts (could have chosen a bit more different colours maybe?) |
I have taken a look on the code. It seems so as you say. It does have some inner data model, but is not prepared for: taking all the input data, building a data model, modifying the data model, creating the plots. It is more a one go script...which needs some refactoring to modify...and also some sw architectural change...to make it work. Yes with the colors it is dificult. Especially when you put 8-10 countries on the plot. The interactive web UI would help a lot I think, with hover event, highlight, and such stuff. |
Thanks for pointing me here Robert Yours are exactly the graphs I´m looking for. It would be really nice if they were in the live version, a shame it isn´t easy to do. |
I like the active-case attempt by @robertgalambos very much. I still think that having the simpler "confirmed cases in the last 3 weeks" as the X-axis is better from a data-relevance point of view.
@exclipy 's 3w x-axis nicely visualised (see gif in my comment above) how R was very much the same for a 2nd wave as it is for the 1st wave, until measures were taken. That IMO is a super important message which is unintuitive to so many of us (the "how can it be the same after we've already suffered so much...." argument). |
@aatishb I'd like again to upvote the 3w x-axis. I've been using @exclipy 's version to see Israel's 2nd wave unfolds. In the original graph one can see cases going up again, but no idea about the rate of infection. Whereas it looks like Israel is approaching Wave 1 infection rate but maybe a little less, when looking at: |
@aatishb an alternative idea, possibly super simple, is to have Total Cases since YYYY-MM-DD, where default is as it is now, since the start, but any date can be inserted. That would enable me to e.g. assess Israel's 2nd wave rate of infection aganist 10d doubling, by putting 2020-05-20 as start date for the calculation of "Total Confirmed Cases" |
@UriGrod The viruse does not cares about the confirmed cases in the last 3 weeks. It only cares about the active cases.
"how can it be the same after we've already suffered so much....". One needs only to check how much of the population got the virus. It is not that big of percentage. Every sanity check calculation which I did for my country got it under 10%...And even with the current representative testing (where we do not have the final results yet) we still got under 10%. Even if check the death from New York with a death rate from 2%, it is under 10%. So heard immunity is quite far. |
@robertgalambos So whereas Confirmed Last 3w doesn't equal "Active but not in Iso", it might approximate it better than Active Cases, especially in countries where ppl get iso quickly. But I don't know this. I'm just saying it's an option. That's why it would be interesting to compare the Heilongjiang data with the 2 graphs. Because which one of thee metrics approximates better the number-of-actively-infecting-cases, is an empirical question. "how can it be the same": is a false argument, ofc. I agree completely. I just made it to emphasize why it is super important to show the infection rate in a second wave. Seems though that no change will be made to @aatishb 's lovely website. There's not much action here anymore. |
@UriGrod Made some plots for Heilongjiang. However this has quite few data, and so it is harder to see. I made normal and log-log plots. On the log-log plot the 0 is not handled, it is just missing as point. Did not bother with that. The script as .m (matlab/octave) hard-coded script renamed as .txt: I also made a plot for Israel because it has more data, and we might see better which is better from the two plots. Log-log plot (a little zoomed in): The script as .m (matlab/octave) hard-coded script renamed as .txt: They are similar...however the 3 week confirmed on the X axis pulls the data in the left direction, which could mislead that the situation is better, than it really is. That might could be a problem if comparing countries. What do you think? |
I think we can all agree that whether or not the total number of active cases is accurate, it is far better to approximate the total number of active cases than it is to plot the new cases versus the total number of confirmed cases. This is because with the total number of confirmed cases on the X-axis the graph will always be misleading and will always underestimate the growth rate of the epidemic. This underestimation was not much of a problem at the start of the pandemic, as the difference between the total number of confirmed cases and the total number of active cases was small. We have to acknowledge that the graph of aatishb as it is, is inherently misleading and can lead to dangerous underestimations. It has to go. Either we take some kind of measure to estimate the number of active cases, e.g. the total number of confirmed cases in the past X weeks (which implicitly assumes there is some kind of meaningful average to be put on the contagious period of a patient and does not assume that a confirmed case can't be infected again, which I consider to be justifiable assumptions) or we hold off on publishing a more accurate graph when more accurate data is available. |
I believe, ploting active cases instead of total confirmed cases would allow better comparison of effective measures against spread of covid-19 in different countries.
Because recovered people are not infectious, the current graph is biased towards countries with a greater fraction of healed (and died) people among total cases, like China and Korea. Bigger amount of closed cases actually makes the denominator bigger, but it is not government measure for containment, making countries that first got it look better than they actually are.
Working with active cases would also help with the second waves, otherwise invisible on the current chart.
The text was updated successfully, but these errors were encountered: