![](https://data2x.org/wp-content/uploads/2020/06/ODWBlog3.png)The sex-disaggregated data available as of 11 June 2020 from GH5050  
https://data2x.org/tracking-gender-data-on-covid-19-part-2/

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly.graph_objects as go
import plotly.offline as py


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
nRowsRead = 1000 # specify 'None' if want to read whole file
df = pd.read_csv('../input/cusersmarildownloadsdisaggregatedcsv/disaggregated.csv', delimiter=';', encoding = "ISO-8859-1", nrows = nRowsRead)
df.dataframeName = 'disaggregated.csv'
nRow, nCol = df.shape
print(f'There are {nRow} rows and {nCol} columns')
df.head()

To arrive at a more nuanced understanding of the way in which countries report on COVID-19 cases and deaths, the global shares according to World Bank FY2020 was analyzed by income groups . Though only 5 low-income countries report on deaths and cases in GH5050, their capacity to report on these indicators by sex remains low. On the other hand, lower-middle-income countries report more of their cases by sex as a share than any other income group. This is again driven by the absence of sex-disaggregation of cases by two countries with some of the largest caseloads in the world. If Brazil’s nearly 780,000 cases were sex-disaggregated, the share of cases reported by sex for upper-middle-income countries would rise to 68 percent, and if the United States’ 2 million cases were sex-disaggregated, the share of cases for high-income countries would more than double to 92 percent. Reporting on deaths by sex, meanwhile, reveals the most straightforward correlation between statistical capacity and income. High-income countries report practically all their deaths by sex and this share declines with each step down the income group ladder.


![](https://data2x.org/wp-content/uploads/2020/06/ODWBlog5.png)https://data2x.org/tracking-gender-data-on-covid-19-part-2/

In [None]:
import missingno as msno

p=msno.bar(df)

In [None]:
#Code from Tanay Mehta https://www.kaggle.com/heyytanay/super-eda-all-models-0-80-val-acc/notebook

from colorama import Fore, Style

def count(string: str, color=Fore.RED):
    """
    Saves some work 😅
    """
    print(color+string+Style.RESET_ALL)

In [None]:
def statistics(dataframe, column):
    count(f"The Average value in {column} is: {dataframe[column].mean():.2f}", Fore.RED)
    count(f"The Maximum value in {column} is: {dataframe[column].max()}", Fore.BLUE)
    count(f"The Minimum value in {column} is: {dataframe[column].min()}", Fore.YELLOW)
    count(f"The 25th Quantile of {column} is: {dataframe[column].quantile(0.25)}", Fore.GREEN)
    count(f"The 50th Quantile of {column} is: {dataframe[column].quantile(0.50)}", Fore.CYAN)
    count(f"The 75th Quantile of {column} is: {dataframe[column].quantile(0.75)}", Fore.MAGENTA)

In [None]:
# Print Age Column Statistics
statistics(df, 'Cases where sex_disaggregated data is available')

In [None]:
# Let's plot the "Cases where sex_disaggregated data is available" column.
plt.style.use("classic")
sns.distplot(df['Cases where sex_disaggregated data is available'], color='blue')
plt.title(f"Cases where sex_disaggregated data is available [\u03BC : {df['Cases where sex_disaggregated data is available'].mean():.2f} gender | \u03C3 : {df['Cases where sex_disaggregated data is available'].std():.2f} gender]")
plt.xlabel("Cases where sex_disaggregated data is available")
plt.ylabel("Count")
plt.show()

In [None]:
# Print Age Column Statistics
statistics(df, 'Deaths where sex_disaggregated data is available')

In [None]:
# Let's plot the "Deaths where sex_disaggregated data is available" column.
plt.style.use("classic")
sns.distplot(df['Deaths where sex_disaggregated data is available'], color='red')
plt.title(f"Deaths where sex_disaggregated data is available [\u03BC : {df['Deaths where sex_disaggregated data is available'].mean():.2f} gender | \u03C3 : {df['Deaths where sex_disaggregated data is available'].std():.2f} gender]")
plt.xlabel("Deaths where sex_disaggregated data is available")
plt.ylabel("Count")
plt.show()

In [None]:
# Let's plot the "Proportion of deaths in confirmed cases male_female ratio" column.
plt.style.use("classic")
sns.distplot(df['Proportion of deaths in confirmed cases male_female ratio'], color='green')
plt.title(f"Proportion of deaths in confirmed cases male_female ratio [\u03BC : {df['Proportion of deaths in confirmed cases male_female ratio'].mean():.2f} gender | \u03C3 : {df['Proportion of deaths in confirmed cases male_female ratio'].std():.2f} gender]")
plt.xlabel("Proportion of deaths in confirmed cases male_female ratio")
plt.ylabel("Count")
plt.show()

In [None]:
plt.style.use("ggplot")
plt.figure(figsize=(18, 9))
sns.boxplot(df['Proportion of deaths in confirmed cases male_female ratio'], df['Deaths where sex_disaggregated data is available'])
plt.title("Deaths where sex_disaggregated data is available & Proportion of deaths in confirmed cases male_female ratio")
plt.xlabel("Proportion of deaths in confirmed cases male_female ratio")
plt.ylabel("Deaths where sex_disaggregated data is available")
plt.show()

In [None]:
# Count Plot
plt.style.use("classic")
plt.figure(figsize=(10, 8))
sns.countplot(df['Case_death_data_by_sex'], palette='Accent_r')
plt.xlabel("Case_death_data_by_sex")
plt.ylabel("Count")
plt.title("Case_death_data_by_sex")
plt.xticks(rotation=45, fontsize=8)
plt.show()

In [None]:
plt.figure(figsize=(30,10))
plt.subplot(131)
sns.countplot(x= 'Cases percent male', data = df, palette="gist_stern",edgecolor="black")
plt.xticks(rotation=45)
plt.subplot(132)
sns.countplot(x= 'Cases percent female', data = df, palette="gnuplot",edgecolor="black")
plt.xticks(rotation=45)
plt.show()

In [None]:
plt.figure(figsize=(30,10))
plt.subplot(131)
sns.countplot(x= 'Deaths percent male', data = df, palette="Set2",edgecolor="black")
plt.xticks(rotation=45)
plt.subplot(132)
sns.countplot(x= 'Deaths percent female', data = df, palette="Set3",edgecolor="black")
plt.xticks(rotation=45)
plt.show()

In [None]:
plt.figure(figsize=(30,10))
plt.subplot(131)
sns.countplot(x= 'Proportion of deaths in confirmed cases_male', data = df, palette="PuRd",edgecolor="black")
plt.xticks(rotation=45)
plt.subplot(132)
sns.countplot(x= 'Proportion of deaths in confirmed cases_female', data = df, palette="viridis",edgecolor="black")
plt.xticks(rotation=45)
plt.show()

In [None]:
fig = px.line(df, x="Deaths date", y="Deaths percent male", color_discrete_sequence=['purple'], 
              title="Deaths % male")
fig.show()

In [None]:
fig = px.line(df, x="Deaths date", y="Deaths percent female", color_discrete_sequence=['crimson'], 
              title="Deaths % female")
fig.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.set(style='ticks')
scatter_df = df[["Cases where sex_disaggregated data is available", "Deaths where sex_disaggregated data is available", "Proportion of deaths in confirmed cases male_female ratio"]]
sns.pairplot(scatter_df)
plt.show()

It's messy, titles are long.

In [None]:
fig = px.bar(df, 
             x='Deaths date', y='Proportion of deaths in confirmed cases male_female ratio', color_discrete_sequence=['#2B3A67'],
             title='Proportion of deaths in confirmed cases male-female ratio', text='Deaths percent male')
fig.show()

In [None]:
fig = px.bar(df, 
             x='Deaths date', y='Proportion of deaths in confirmed cases_female', color_discrete_sequence=['crimson'],
             title='Proportion of deaths in confirmed cases-female', text='Deaths percent female')
fig.show()

In [None]:
def plot_MaleFemaleRatio(col, df, title):
    fig, ax = plt.subplots(figsize=(18,6))
    df.groupby(['Proportion of deaths in confirmed cases male_female ratio'])[col].sum().plot(rot=45, kind='bar', ax=ax, legend=True, cmap='bone')
    ax.set_yticklabels(['{:,}'.format(int(x)) for x in ax.get_yticks().tolist()])
    ax.set(Title=title, xlabel='Proportion of deaths in confirmed cases male_female ratio')
    return ax

In [None]:
plot_MaleFemaleRatio('Proportion of deaths in confirmed cases male_female ratio', df, 'Deaths Male-Female Ratio');

In [None]:
ax = df.groupby('Proportion of deaths in confirmed cases male_female ratio')['Proportion of deaths in confirmed cases male_female ratio', 'Deaths where sex_disaggregated data is available'].sum().plot(kind='bar', rot=45, figsize=(12,6), logy=True,
                                                                 title='Proportion of deaths in confirmed cases male_female ratio')
plt.xlabel('Proportion of deaths in confirmed cases male_female ratio')
plt.ylabel('Count Log')

plt.show()

In [None]:
#Let's visualise disaggregated
disaggregated = df.groupby('Deaths date').sum()[['Deaths where sex_disaggregated data is available', 'Cases where sex_disaggregated data is available', 'Proportion of deaths in confirmed cases male_female ratio']]
#evolution['Expiration Rate'] = (evolution['Expired'] / evolution['Cumulative']) * 100
#evolution['Discharging Rate'] = (evolution['Discharged'] / evolution['Cumulative']) * 100
disaggregated.head()

In [None]:
plt.figure(figsize=(20,7))
plt.plot(disaggregated['Deaths where sex_disaggregated data is available'], label='Deaths where sex_disaggregated data is available')
plt.plot(disaggregated['Cases where sex_disaggregated data is available'], label='Cases where sex_disaggregated data is available')
plt.plot(disaggregated['Proportion of deaths in confirmed cases male_female ratio'], label='Proportion of deaths in confirmed cases male_female ratio')
plt.legend()
#plt.grid()
plt.title('Proportion of deaths in confirmed cases male_female')
plt.xticks(disaggregated.index,rotation=45)
plt.xlabel('Deaths date')
plt.ylabel('Count')
plt.show()

In [None]:
#What about disaggregated
plt.figure(figsize=(20,7))
plt.plot(disaggregated['Proportion of deaths in confirmed cases male_female ratio'], label='Sex-Disaggregated Deaths Ratio')
plt.legend()
#plt.grid()
plt.title('Sex-Disaggregated Data - Deaths by Covid-19 Ratio')
plt.xticks(disaggregated.index,rotation=45)
plt.ylabel('Count')
plt.show()

In [None]:
#This is another way of visualizing the sex-disaggregated data
diff_disaggregated = disaggregated.diff().iloc[1:]
plt.figure(figsize=(20,7))
plt.plot(diff_disaggregated['Proportion of deaths in confirmed cases male_female ratio'], label='Sex-Disaggregated Deaths Ratio')
plt.legend()
plt.grid()
plt.title('Sex-Disaggregated Data - Deaths by Covid-19 Ratio')
plt.xticks(disaggregated.index,rotation=45)
plt.ylabel('Count')
plt.show()

In [None]:
print('Sex-Disaggregated Data Statistics')

diff_disaggregated.describe()

In [None]:
#Code by Olga Belitskaya https://www.kaggle.com/olgabelitskaya/sequential-data/comments
from IPython.display import display,HTML
c1,c2,f1,f2,fs1,fs2=\
'#a83a32','#a8324e','Akronim','Smokum',30,15
def dhtml(string,fontcolor=c1,font=f1,fontsize=fs1):
    display(HTML("""<style>
    @import 'https://fonts.googleapis.com/css?family="""\
    +font+"""&effect=3d-float';</style>
    <h1 class='font-effect-3d-float' style='font-family:"""+\
    font+"""; color:"""+fontcolor+"""; font-size:"""+\
    str(fontsize)+"""px;'>%s</h1>"""%string))
    
    
dhtml('Kaggle Notebook Runner: Marília Prata, not a DS. Shh! @mpwolke' ) 