# Analysing the COVID-19 pandemic in Bosnia and Herzegovina

The analysie will be preformed on a dataset gathered from the <a href="https://www.who.int/">WHO</a> website. The first part of this analysis will be data cleaning, wich is the most important part of data analysis. You know how they say it if the data is not clean we get garbage in and garbage out.

The next part will contain visualizations to get a more understanding picture of the situation so we can preform some statistical methods later. 

After we finished the data cleaning and visualization process will continue on data modeling so we can make predictions in the later part when we will actualy use our data to make predictions on how the situation will improve or not in the future.

When all of this is set and done we will make the conclusion and suggest how things can be done in the future to improve the situation.

## Table of Contetn's

* [Importing the nececery Libraries](#importing-the-nececery-libraries)
* [Data import and exploration](#data-import-and-exploration)

## Importing the nececery Libraries

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

sns.set()

## Data import and exploration

In [2]:
rawData = pd.read_excel(os.path.join("../dataSet/rawData/", "mbih.xlsx"), engine='openpyxl')

In [3]:
rawData.head()

Unnamed: 0.1,Unnamed: 0,date,total_cases,new_cases,population,population_density,median_age,aged_65_older
0,6456,2020-03-05,2,2,3280815,68.496,42.5,16.569
1,6457,2020-03-06,2,0,3280815,68.496,42.5,16.569
2,6458,2020-03-07,3,1,3280815,68.496,42.5,16.569
3,6459,2020-03-08,3,0,3280815,68.496,42.5,16.569
4,6460,2020-03-09,3,0,3280815,68.496,42.5,16.569


In [4]:
rawData.info() # checking the datatype of each column, the null valuse

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 304 entries, 0 to 303
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0          304 non-null    int64  
 1   date                304 non-null    object 
 2   total_cases         304 non-null    int64  
 3   new_cases           304 non-null    int64  
 4   population          304 non-null    int64  
 5   population_density  304 non-null    float64
 6   median_age          304 non-null    float64
 7   aged_65_older       304 non-null    float64
dtypes: float64(3), int64(4), object(1)
memory usage: 19.1+ KB


In [5]:
rawData.describe() # fast overview of statistical methods for each column

Unnamed: 0.1,Unnamed: 0,total_cases,new_cases,population,population_density,median_age,aged_65_older
count,304.0,304.0,304.0,304.0,304.0,304.0,304.0
mean,6607.5,27216.855263,368.891447,3280815.0,68.496,42.5,16.569
std,87.90146,33921.539614,467.923894,0.0,0.0,0.0,3.558571e-15
min,6456.0,2.0,0.0,3280815.0,68.496,42.5,16.569
25%,6531.75,2333.75,26.75,3280815.0,68.496,42.5,16.569
50%,6607.5,12659.0,218.5,3280815.0,68.496,42.5,16.569
75%,6683.25,34249.25,460.25,3280815.0,68.496,42.5,16.569
max,6759.0,112143.0,1953.0,3280815.0,68.496,42.5,16.569


## Data preprocessing

In [6]:
rawData.drop(columns = "Unnamed: 0", inplace = True)
rawData['date'] = rawData['date'].astype('datetime64')
rawData.head()

Unnamed: 0,date,total_cases,new_cases,population,population_density,median_age,aged_65_older
0,2020-03-05,2,2,3280815,68.496,42.5,16.569
1,2020-03-06,2,0,3280815,68.496,42.5,16.569
2,2020-03-07,3,1,3280815,68.496,42.5,16.569
3,2020-03-08,3,0,3280815,68.496,42.5,16.569
4,2020-03-09,3,0,3280815,68.496,42.5,16.569


In [7]:
pd.options.display.max_rows = 1000
bihdata = pd.read_excel(os.path.join("../dataSet/rawData/", "bih.xlsx"), engine='openpyxl')
bihdata.drop(columns = "Unnamed: 0", inplace = True)
bihdata

Unnamed: 0,Datum,Potvrđeni slučajevi,Broj testiranih,Broj smrtnih slučajeva,Broj oporavljenih osoba,Broj aktivnih slučajeva
0,30.12.2020,110985,511940,4050,77225,29710.0
1,29.12.2020,110454,509067,4024,76802,29628.0
2,28.12.2020,109911,505681,3976,76121,29814.0
3,27.12.2020,109691,503906,3953,75717,30021.0
4,26.12.2020,109330,502063,3923,75124,30283.0
5,25.12.2020,108891,499883,3901,74667,30323.0
6,24.12.2020,108298,496925,3878,73896,30524.0
7,23.12.2020,107570,493664,3838,73149,30583.0
8,22.12.2020,106896,490610,3792,72597,30507.0
9,21.12.2020,106222,487271,3706,71548,30968.0


In [8]:
tested = pd.DataFrame(columns = ["Datum", "Broj testiranih dnevno"])
for index in range(0, len(bihdata["Broj testiranih"])):    
    if index == len(bihdata["Broj testiranih"]) - 2:
        i, j = index, len(bihdata["Broj testiranih"]) - 1
        
        tested = tested.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj testiranih dnevno": bihdata.iloc[i, 2] - bihdata.iloc[j, 2]},
            ignore_index = True)
        
        break
    else:
        i, j = index, index + 1
        tested = tested.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj testiranih dnevno": bihdata.iloc[i, 2] - bihdata.iloc[j, 2]},
            ignore_index = True)        

In [9]:
arrayNegative = pd.DataFrame(columns = ["Datum", "Broj oporavljenih osoba"])
for index in range(0, len(bihdata["Broj oporavljenih osoba"])):    
    if index == len(bihdata["Broj testiranih"]) - 2:
        
        i, j = index, len(bihdata["Broj testiranih"]) - 1

        arrayNegative = arrayNegative.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj oporavljenih osoba": bihdata.iloc[i, 4] - bihdata.iloc[j, 4]},
            ignore_index = True)
        
        break
    else:
        i, j = index, index + 1
        
        arrayNegative = arrayNegative.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj oporavljenih osoba": bihdata.iloc[i, 4] - bihdata.iloc[j, 4]},
            ignore_index = True)        

In [10]:
died = pd.DataFrame(columns = ["Datum", "Broj smrtnih slučajeva dnevno"])
for index in range(0, len(bihdata["Broj smrtnih slučajeva"])):    
    if index == len(bihdata["Broj smrtnih slučajeva"]) - 2:
        i, j = index, len(bihdata["Broj smrtnih slučajeva"]) - 1
        
        died = died.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj smrtnih slučajeva dnevno": bihdata.iloc[i, 3] - bihdata.iloc[j, 3]},
            ignore_index = True)
        
        break
    else:
        i, j = index, index + 1
        died = died.append(
            {"Datum": str(bihdata.iloc[index, 0]), "Broj smrtnih slučajeva dnevno": bihdata.iloc[i, 3] - bihdata.iloc[j, 3]},
            ignore_index = True)

In [11]:
pd.options.display.max_rows = 1000
died

Unnamed: 0,Datum,Broj smrtnih slučajeva dnevno
0,30.12.2020,26
1,29.12.2020,48
2,28.12.2020,23
3,27.12.2020,30
4,26.12.2020,22
5,25.12.2020,23
6,24.12.2020,40
7,23.12.2020,46
8,22.12.2020,86
9,21.12.2020,53


In [12]:
rawData['date'] = rawData['date'].dt.strftime('%d.%m.%Y')

In [13]:
pd.options.display.max_rows = 1000
fullDataFrame = pd.merge(left=rawData, left_on='date', how = 'left',
         right=arrayNegative[['Broj oporavljenih osoba', 'Datum']], right_on='Datum').drop('Datum', axis = 1)

In [14]:
fullDataFrame = pd.merge(left = fullDataFrame, left_on = 'date', how = 'left',
                        right = tested[['Datum', 'Broj testiranih dnevno']], right_on = 'Datum').drop('Datum', axis = 1)

In [15]:
fullDataFrame = pd.merge(left = fullDataFrame, left_on = 'date', how = 'left',
                        right = died[['Datum', 'Broj smrtnih slučajeva dnevno']], right_on = 'Datum').drop('Datum', axis = 1)

In [16]:
fullDataFrame

Unnamed: 0,date,total_cases,new_cases,population,population_density,median_age,aged_65_older,Broj oporavljenih osoba,Broj testiranih dnevno,Broj smrtnih slučajeva dnevno
0,05.03.2020,2,2,3280815,68.496,42.5,16.569,,,
1,06.03.2020,2,0,3280815,68.496,42.5,16.569,,,
2,07.03.2020,3,1,3280815,68.496,42.5,16.569,,,
3,08.03.2020,3,0,3280815,68.496,42.5,16.569,,,
4,09.03.2020,3,0,3280815,68.496,42.5,16.569,,,
5,10.03.2020,5,2,3280815,68.496,42.5,16.569,,,
6,11.03.2020,7,2,3280815,68.496,42.5,16.569,,,
7,12.03.2020,11,4,3280815,68.496,42.5,16.569,,,
8,13.03.2020,13,2,3280815,68.496,42.5,16.569,,,
9,14.03.2020,18,5,3280815,68.496,42.5,16.569,,,
