#**CFR**
#**Author：Hendrix Wang**
#**Date：Third of Nov**
The purpose of this section is to calculate the different CFR of each country and then analyze and compare.
The case fatality rate in epidemiology refers to the rate of death among patients suffering from a certain disease in a certain period of time.In itself, the question "how deadly is COVID-19" is very tricky, regular figures are not a real reflection on the actual number of reported cases or the deaths, recoveries and testing. We hear about a number in the media and the public that tells us how dangerous this virus is, but this mystery is in this number that needs to be clarified. How was it determined by this number? Is this number a representation of all ages and all sexes? Does that involve occasions when the pandemic was the best / worst of its kind? These are only some fundamental issues that occur and cause individuals to become confused. This is a product of the Case Fatality Rate (CFR) being reported by multiple media outlets since it uses available data and is best understood by the public.

## Data Engineering 
Author: Sherry Wang 

This process notebook assumes that the reader has viewed all previous process notebooks as this is the final notebook that combines the analysis of all. 

The purpose of this section is to load in the neccasary libraries and the cleaned datasets from the `Cleaned Data` folder in GitHub. 

The code below imports essential packages for future analysis

In [44]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import datetime

#Extra libraries 
from math import log10, floor
import sympy as sp 
#import io
#import plotly.io as pio
#pio.renderers.default = "png"
pd.options.mode.chained_assignment = None 

The code below lists the countries of interest: United States, India, Russia, Brazil, South Africa, Australia (one from each continent). The purpose of this is to assist with further analysis by having the names of the countries.

In [45]:
countries=sorted(['United States','India','Russia','Brazil','South Africa', 'Australia'])

The code below reads in two data sets: the first is the merged datasets from the "Our World in Data" source and the "Johns Hopkins" source.

In [46]:
covid=pd.read_csv("https://raw.github.sydney.edu.au/swan9801/R13B-Group4-COVID/master/Process%20Notebooks/Cleaned%20data/covid.csv?token=AAABDAXJLHLBXQ7N47MLBJS7V7FOW")

Now that the cleaned data is imported, we can move on to the CFR analysis. 

#CFR
we will explore the CFR as a percentage using the equation:

$CFR=\frac{number\ of\ deaths\ from\ COVID-19}{number\ of\ confirmed\ cases\ from\ COVID-19}*100$ 


To make it easy to calculate the CFR, I wrote the following function

In [47]:
def calculate_CFR(country):
  #The purpose of the code below is to extract all data entries for country alone. 
  covid_country=covid[covid["location"]==country]
  #The purpose of the code below is calculate the CFR
  CFR = covid_country["total_deaths"].max()/covid_country["total_cases"].max()*100
  return CFR
  print(CFR)

In order to draw the CFR trend chart of each country more conveniently, a function was created

In [48]:
dict_colours={"Australia":"gold","Brazil":"forestGreen","India":"tomato","Russia":"crimson","South Africa":"mediumPurple","United States":"dodgerBlue"}
line_name = {"Australia":"Australia CFR","Brazil":"Brazil CFR","India":"India CFR","Russia":"Russia CFR","South Africa":"South Africa CFR","United States":"United States CFR"}
def plot(country):
  #The purpose of the code below is to create a blank diagram
  fig = go.Figure()
  #The purpose of the code below is to extract all data entries for country alone. 
  covid_country=covid[covid['location']==country]
  #The purpose of the code below is select the label.
  name_=line_name[country]
  #The purpose of the code below is choose the colour of the line.
  country_colour=dict_colours[country]
  #The purpose of the code below is calculate the CFR
  covid_country['total_CFR']=(covid_country["total_deaths"]/covid_country["total_cases"])*100
  #The following code is to draw the CFR diagram
  fig.add_trace(go.Scatter(x=covid_country['date'], y=covid_country['total_CFR'], name=name_,
                         line=dict(color=country_colour, width=2)))
  fig.update_xaxes(title_text="Date",tickangle = 290)
  fig.update_yaxes(title_text="Percentage(%)")
  fig.update_layout(
      title={
          'text': 'The CFR for '+country,
          'y':0.95,
          'x':0.5,
          'xanchor': 'center',
          'yanchor': 'top'})
  return fig.show()

## Australia

Introduction：We chose Australia from Oceania. The current outbreak in Australia is clearly under control. The daily life of Australians is gradually recovering

The following code uses a function to calculate the CFR.

In [49]:
calculate_CFR('Australia')


3.2836145101730505

Australia's current CFR is 3.2836%

The following code is to draw Australia's CFR trend chart

In [50]:
plot('Australia')

Australia's CFR has been decreasing for two periods, at the beginning of the epidemic and in July. It was stable from May to June, and then returned to stability in October. But the stability in October was after a big increase, which is a very serious problem. This erratic graph proves that Australia may not have found a real way to reduce CFR.

## India

introduction：We chose India from Asia. The COVID-19 in India is still very serious. It is necessary to study the trend of the covid through big data and use big data to study why some countries can control the covid well.

The following code uses a function to calculate the CFR.

In [51]:
calculate_CFR('India')

1.4862950954832361

India's current CFR is 1.4863%

The following code is to draw India's CFR trend chart








In [52]:
plot('India')

It is evident above that the CFR isn't constant over time and fluctuates between mid March to mid July. While the case fatality rate currently is about 1.5% in India, it is quite difficult to interpret this as this rate reflects the severity of this virus in a particular context, time and population. As we don't have information on the ages and the gender of each patient, it is difficult to use this number as the death rate of COVID-19 as it varies between patients.

## United States

introduction：We chose the United States from North America. The United States should be the most severely affected area, and the number of daily new cases is keep increasing

The following code uses a function to calculate the CFR.

In [35]:
calculate_CFR('United States')

2.463810097859207

United States' current CFR is 2.4638%

The following code is to draw United States's CFR trend chart

In [36]:
plot('United States')

In the graph, we can see that the smoothed CFR can see the trends much more clearer and have reduced the outliers shown by the extreme spikes, which may have been corrections of data. We can also visualise that left hand side has a much more severe death rate than the right hand side. Right hand side seems to be stablising.

## Russia

introduction:We chose Russia from Europe. In Russia the covid-19 is currently in another outbreak stage

The following code uses a function to calculate the CFR.

In [37]:
calculate_CFR('Russia')

1.725290441901581

Russia's current CFR is 1.7253%

The following code is to draw Russia's CFR trend chart

In [38]:
plot('Russia')

In the figure, we can see that the smooth CFR can see the trend more clearly. We can also imagine that the death rate on the right side is much higher than that on the left side. The mortality rate seems to have been rising and then stabilized.

## South Africa

introduction:We chose South Africa from Africa. The Covid-19 in South Africa is under control

The following code uses a function to calculate the CFR.

In [39]:
calculate_CFR('South Africa')

2.6808642279494297

South Africa's current CFR is 2.6809%

The following code is to draw South Africa's CFR trend chart

In [40]:
plot('South Africa')

South Africa’s CFR chart is similar to that of Russia. Deaths have been rising steadily, but there is a difference. After rising, Russia’s CFR has a clear trend of stabilizing, while South Africa’s CFR is clearly still rising.

## Brazil

introduction：We chose Brazil from South America. The Covid-19 in South Africa is currently not clear whether it is under control

The following code uses a function to calculate the CFR.

In [41]:
calculate_CFR('Brazil')

2.882026466786821

Brazil's current CFR is 2.8820%

The following code is to draw Brazil's CFR trend chart



In [42]:
plot('Brazil')

Brazil's CFR chart is similar to that of the United States and India, and both fell to a stable state after a period of sharp rise. It is possible that these three countries have finally become very similar in their handling of the virus for various reasons.

# Mean and Uncertainty of CFR

Counting a country's CFR individually will only show one country's situation, and in order to be able to come up with a more reliable CFR value for COVID-19, I tried to figure out the MEAN and UNCERTAINTY by putting all the countries' CFRs together

In [43]:
def CFR(country1,country2,country3,country4,country5,country6):
  #The purpose of the code below is to extract all data entries for country alone.
  covid_country1=covid[covid["location"]==country1]
  covid_country2=covid[covid["location"]==country2]
  covid_country3=covid[covid["location"]==country3]
  covid_country4=covid[covid["location"]==country4]
  covid_country5=covid[covid["location"]==country5]
  covid_country6=covid[covid["location"]==country6]
  #The purpose of the code below is calculate the individual CFR
  CFR1 = covid_country1["total_deaths"].max()/covid_country1["total_cases"].max()*100
  CFR2 = covid_country2["total_deaths"].max()/covid_country2["total_cases"].max()*100
  CFR3 = covid_country3["total_deaths"].max()/covid_country3["total_cases"].max()*100
  CFR4 = covid_country4["total_deaths"].max()/covid_country4["total_cases"].max()*100
  CFR5 = covid_country5["total_deaths"].max()/covid_country5["total_cases"].max()*100
  CFR6 = covid_country6["total_deaths"].max()/covid_country6["total_cases"].max()*100

  All_CFR = ['CFR1','CFR2','CFR3','CFR4','CFR5','CFR6']
  All_CFR1 = [CFR1,CFR2,CFR3,CFR4,CFR5,CFR6]
  #The following code is designed to calculate the mean
  average = sum(All_CFR1)/6
  #The following code is designed to calculate the uncertainty
  uncertainity = abs(max(All_CFR1)-min(All_CFR1))/2
  
  def round_to_1(x):
    return round(x, -int(floor(log10(abs(x)))))
  u=round_to_1(uncertainity)
  return ("{}±{}".format(average,u))

The following code uses the function edited above to calculate the mean and uncertainty.

In [None]:
CFR('Australia','India','United States','Russia','South Africa','Brazil')

'2.4203168066922207±0.9'

From the above calculations, we can conclude that the value of CFR is 2.4203 and its uncertainty is ±0.9

# Put all the CFRs for all the countries in one chart.

The averages and uncertainties calculated above may be reliable, but there isn't much evidence to prove that they are. So below I'll put the CFRs for all countries in a single graph for comparison

I'm going to use code to edit a function that puts all the CFRs in a single diagram.

In [None]:
dict_colours={"Australia":"gold","Brazil":"forestGreen","India":"tomato","Russia":"crimson","South Africa":"mediumPurple","United States":"dodgerBlue"}
line_name = {"Australia":"Australia CFR","Brazil":"Brazil CFR","India":"India CFR","Russia":"Russia CFR","South Africa":"South Africa CFR","United States":"United States CFR"}
def plot_CFR(country1,country2,country3,country4,country5,country6):
  fig6 = go.Figure()
  covid_country1=covid[covid['location']==country1]
  covid_country2=covid[covid['location']==country2]
  covid_country3=covid[covid['location']==country3]
  covid_country4=covid[covid['location']==country4]
  covid_country5=covid[covid['location']==country5]
  covid_country6=covid[covid['location']==country6]
  name1=line_name[country1]
  name2=line_name[country2]
  name3=line_name[country3]
  name4=line_name[country4]
  name5=line_name[country5]
  name6=line_name[country6]
  country_colour1=dict_colours[country1]
  country_colour2=dict_colours[country2]
  country_colour3=dict_colours[country3]
  country_colour4=dict_colours[country4]
  country_colour5=dict_colours[country5]
  country_colour6=dict_colours[country6]
  covid_country1['total_CFR']=(covid_country1["total_deaths"]/covid_country1["total_cases"])*100
  covid_country2['total_CFR']=(covid_country2["total_deaths"]/covid_country2["total_cases"])*100
  covid_country3['total_CFR']=(covid_country3["total_deaths"]/covid_country3["total_cases"])*100
  covid_country4['total_CFR']=(covid_country4["total_deaths"]/covid_country4["total_cases"])*100
  covid_country5['total_CFR']=(covid_country5["total_deaths"]/covid_country5["total_cases"])*100
  covid_country6['total_CFR']=(covid_country6["total_deaths"]/covid_country6["total_cases"])*100
  fig6.add_trace(go.Scatter(x=covid_country1['date'], y=covid_country1['total_CFR'], name=name1,
                         line=dict(color=country_colour1, width=2)))
  fig6.add_trace(go.Scatter(x=covid_country2['date'], y=covid_country2['total_CFR'], name=name2,
                         line=dict(color=country_colour2, width=2)))
  fig6.add_trace(go.Scatter(x=covid_country3['date'], y=covid_country3['total_CFR'], name=name3,
                         line=dict(color=country_colour3, width=2)))
  fig6.add_trace(go.Scatter(x=covid_country4['date'], y=covid_country4['total_CFR'], name=name4,
                         line=dict(color=country_colour4, width=2)))
  fig6.add_trace(go.Scatter(x=covid_country5['date'], y=covid_country5['total_CFR'], name=name5,
                         line=dict(color=country_colour5, width=2)))
  fig6.add_trace(go.Scatter(x=covid_country6['date'], y=covid_country6['total_CFR'], name=name6,
                         line=dict(color=country_colour6, width=2)))
  fig6.update_xaxes(title_text="Date",tickangle = 290)
  fig6.update_yaxes(title_text="Percentage(%)")
  fig6.update_layout(
      title={
          'text': "New daily CFR for each country",
          'y':0.95,
          'x':0.5,
          'xanchor': 'center',
          'yanchor': 'top'})
  return fig6.show()

The following code is to draw the CFR diagram

In [None]:
plot_CFR('Australia','India','United States','Russia','South Africa','Brazil')

We can see from the above graph that whatever CFR changes on the left hand side of the graph, it becomes stable by the right hand side of the graph. And to the right hand side, the CFR lines for all countries are in a range. The interval here is the mean and uncertainty calculated above (2.4203168066922207 ± 0.9). This means that the values calculated above are correct, and that for countries other than these six, their CFR should also be in this interval.