## Tratamento de dados

Para podermos obter resultados fiáveis, é preciso haver um tratamento dos dados que vem do dataset.
Pegando dos dados processados anteriormente feito inicialmente, começamos por extrai-los do ficheiro para podermos manipulá-los.

In [1]:
import pandas as pd

covid_data = pd.read_csv('covid_19_clean_complete.csv')
covid_data

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
0,,Afghanistan,33.000000,65.000000,1/22/20,0,0,0
1,,Albania,41.153300,20.168300,1/22/20,0,0,0
2,,Algeria,28.033900,1.659600,1/22/20,0,0,0
3,,Andorra,42.506300,1.521800,1/22/20,0,0,0
4,,Angola,-11.202700,17.873900,1/22/20,0,0,0
...,...,...,...,...,...,...,...,...
27451,,Western Sahara,24.215500,-12.885800,5/4/20,6,0,5
27452,,Sao Tome and Principe,0.186360,6.613081,5/4/20,23,3,4
27453,,Yemen,15.552727,48.516388,5/4/20,12,2,0
27454,,Comoros,-11.645500,43.333300,5/4/20,3,0,0


De seguida, após uma análise, verificou-se a presença de entrada respeitantes a navios que nalgum momento tiveram casos de Covid-19 e não estão portanto associados a nenhum país particular.
Além disso, consideramos que estes dados iriam criar ruído, pelo que optamos por ignorá-los e remover dos dados em análise.

In [2]:
covid_data = covid_data.drop(covid_data[covid_data['Province/State']=='Grand Princess'].index)
covid_data = covid_data.drop(covid_data[covid_data['Province/State']=='Diamond Princess'].index)
covid_data = covid_data.drop(covid_data[covid_data['Country/Region']=='Diamond Princess'].index)
covid_data = covid_data.drop(covid_data[covid_data['Country/Region']=='MS Zaandam'].index)
covid_data = covid_data.reset_index()
del covid_data['index']
covid_data

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
0,,Afghanistan,33.000000,65.000000,1/22/20,0,0,0
1,,Albania,41.153300,20.168300,1/22/20,0,0,0
2,,Algeria,28.033900,1.659600,1/22/20,0,0,0
3,,Andorra,42.506300,1.521800,1/22/20,0,0,0
4,,Angola,-11.202700,17.873900,1/22/20,0,0,0
...,...,...,...,...,...,...,...,...
27035,,Western Sahara,24.215500,-12.885800,5/4/20,6,0,5
27036,,Sao Tome and Principe,0.186360,6.613081,5/4/20,23,3,4
27037,,Yemen,15.552727,48.516388,5/4/20,12,2,0
27038,,Comoros,-11.645500,43.333300,5/4/20,3,0,0


A maioria das entradas da coluna **Province/State** tem valores nulos, pelo que procedemos a eliminá-los. Além disso, como ter uma entrada para uma região e país não é muito relevante, optamos por agregar as duas informações numa só coluna denominada de **Local**.

In [3]:
import numpy as np

covid_data['Province/State'] = covid_data.replace(np.nan, '', regex=True)
cols = ['Province/State', 'Country/Region']
covid_data['Local'] = covid_data[cols].apply(lambda row: ' / '.join(row.values.astype(str)) if row.values[0] != '' else ''.join(row.values.astype(str)), axis=1)
del covid_data['Province/State']
del covid_data['Country/Region']
covid_data

Unnamed: 0,Lat,Long,Date,Confirmed,Deaths,Recovered,Local
0,33.000000,65.000000,1/22/20,0,0,0,Afghanistan
1,41.153300,20.168300,1/22/20,0,0,0,Albania
2,28.033900,1.659600,1/22/20,0,0,0,Algeria
3,42.506300,1.521800,1/22/20,0,0,0,Andorra
4,-11.202700,17.873900,1/22/20,0,0,0,Angola
...,...,...,...,...,...,...,...
27035,24.215500,-12.885800,5/4/20,6,0,5,Western Sahara
27036,0.186360,6.613081,5/4/20,23,3,4,Sao Tome and Principe
27037,15.552727,48.516388,5/4/20,12,2,0,Yemen
27038,-11.645500,43.333300,5/4/20,3,0,0,Comoros


De seguida, vamos converter as datas em contagem de dias desde o início do dataset (22 de janeiro de 2020)

In [4]:
covid_data['Date'] = pd.to_datetime(covid_data['Date'],format='%m/%d/%y')
covid_data['Date'] -= pd.to_datetime("2020-01-22")
covid_data['Date'] /= np.timedelta64(1,'D')
covid_data = covid_data.rename(columns  = {'Date':'Days Passed'})
covid_data

Unnamed: 0,Lat,Long,Days Passed,Confirmed,Deaths,Recovered,Local
0,33.000000,65.000000,0.0,0,0,0,Afghanistan
1,41.153300,20.168300,0.0,0,0,0,Albania
2,28.033900,1.659600,0.0,0,0,0,Algeria
3,42.506300,1.521800,0.0,0,0,0,Andorra
4,-11.202700,17.873900,0.0,0,0,0,Angola
...,...,...,...,...,...,...,...
27035,24.215500,-12.885800,103.0,6,0,5,Western Sahara
27036,0.186360,6.613081,103.0,23,3,4,Sao Tome and Principe
27037,15.552727,48.516388,103.0,12,2,0,Yemen
27038,-11.645500,43.333300,103.0,3,0,0,Comoros


Por fim, adicionar as colunas da contagem do dia anterior. Este passo é um pouco mais longo tendo em conta o numero de linhas e a procura pelo valor anterior.

In [5]:
covid_data['Conf. Prev.'] = covid_data.apply(lambda row: 
                                                      covid_data[(covid_data['Local'] == row['Local']) & (covid_data['Days Passed'] == row['Days Passed']-1)]['Confirmed'].item()
                                                      if row['Days Passed'] > 0 else 0,axis=1)
covid_data['Deaths Prev.'] = covid_data.apply(lambda row: 
                                                      covid_data[(covid_data['Local'] == row['Local']) & (covid_data['Days Passed'] == row['Days Passed']-1)]['Deaths'].item()
                                                      if row['Days Passed'] > 0 else 0,axis=1)
covid_data['Recov. Prev.'] = covid_data.apply(lambda row: 
                                                      covid_data[(covid_data['Local'] == row['Local']) & (covid_data['Days Passed'] == row['Days Passed']-1)]['Recovered'].item()
                                                      if row['Days Passed'] > 0 else 0,axis=1)
covid_data

Unnamed: 0,Lat,Long,Days Passed,Confirmed,Deaths,Recovered,Local,Conf. Prev.,Deaths Prev.,Recov. Prev.
0,33.000000,65.000000,0.0,0,0,0,Afghanistan,0,0,0
1,41.153300,20.168300,0.0,0,0,0,Albania,0,0,0
2,28.033900,1.659600,0.0,0,0,0,Algeria,0,0,0
3,42.506300,1.521800,0.0,0,0,0,Andorra,0,0,0
4,-11.202700,17.873900,0.0,0,0,0,Angola,0,0,0
...,...,...,...,...,...,...,...,...,...,...
27035,24.215500,-12.885800,103.0,6,0,5,Western Sahara,6,0,5
27036,0.186360,6.613081,103.0,23,3,4,Sao Tome and Principe,16,1,4
27037,15.552727,48.516388,103.0,12,2,0,Yemen,10,2,0
27038,-11.645500,43.333300,103.0,3,0,0,Comoros,3,0,0


In [6]:
with pd.ExcelWriter('covid_19_distance.xlsx') as writer:
    covid_data.to_excel(writer)

## SVR

**Support Vector Regression** utiliza conceitos semelhantes aos aplicados no algoritmo de Support Vector Machine para aplicações em métodos de regressão.

Deste modo serão expostos alguns conceitos teóricos fundamentais para que se possa compreender este algoritmo:
-  **Hyper Plane** : Uma linha de separação que irá ajudar a prever os valores em causa.
-   **Boundary line** : Margens da hyperplane que separam os valores existentes.
-  **Support vectors** : Os pontos mais perto da "boundary line". A distância dos pontos é a minima.
    
Assim, neste algoritmo o objetivo é considerar os valores dentro da margem, ou seja aqueles com menor erro entre eles. Assim o objetivo é descobrir um hyper plain que mais se aproxime aos valores existentes, ou seja, que se tenha uma distância minima a um maior número destes pontos.

Desta forma a aplicação deste algoritmo vai ser feita auxiliando-nos da implementação presente na ferramenta scikit-learn. 

### Support Vector Regression

#### Parâmetros da pesquisa

Serão refer

   * **kernel** : rbf, sigmod foi implementado o default uma vez que a obtenção de resultados era muito superior às restantes opções
   * **epsilonfloat**: 0.1 e 0.2 - o valor associado ao valor da distância entre a "boundary line" e o "hyper plane"
   * **cache size**: 500 - permite diminuir o tempo de execução 
   * **C** : 1 uma vez que os dados com que trabalhamos não tem muito ruído

In [7]:
from sklearn.svm import SVR

In [8]:
covid_data

Unnamed: 0,Lat,Long,Days Passed,Confirmed,Deaths,Recovered,Local,Conf. Prev.,Deaths Prev.,Recov. Prev.
0,33.000000,65.000000,0.0,0,0,0,Afghanistan,0,0,0
1,41.153300,20.168300,0.0,0,0,0,Albania,0,0,0
2,28.033900,1.659600,0.0,0,0,0,Algeria,0,0,0
3,42.506300,1.521800,0.0,0,0,0,Andorra,0,0,0
4,-11.202700,17.873900,0.0,0,0,0,Angola,0,0,0
...,...,...,...,...,...,...,...,...,...,...
27035,24.215500,-12.885800,103.0,6,0,5,Western Sahara,6,0,5
27036,0.186360,6.613081,103.0,23,3,4,Sao Tome and Principe,16,1,4
27037,15.552727,48.516388,103.0,12,2,0,Yemen,10,2,0
27038,-11.645500,43.333300,103.0,3,0,0,Comoros,3,0,0


De forma semelhante ao algoritmo anterior procedemos à criação de sets para podermos treinar o nosso modelo, e por fim testá-lo.

In [9]:
from sklearn.model_selection import train_test_split


#criar set de treino e teste
train, test = train_test_split(covid_data, test_size=0.0096)
train

Unnamed: 0,Lat,Long,Days Passed,Confirmed,Deaths,Recovered,Local,Conf. Prev.,Deaths Prev.,Recov. Prev.
1816,0.186360,6.613081,6.0,0,0,0,Sao Tome and Principe,0,0,0
20196,8.538000,-80.782100,77.0,2249,59,16,Panama,2100,55,14
20562,53.709800,27.953400,79.0,1981,19,169,Belarus,1486,16,139
18737,40.143100,47.576900,72.0,443,5,32,Azerbaijan,400,5,26
6712,1.000000,32.000000,25.0,0,0,0,Uganda,0,0,0
...,...,...,...,...,...,...,...,...,...,...
13641,15.783500,-90.230800,52.0,1,0,0,Guatemala,0,0,0
26975,48.669000,19.699000,103.0,1413,25,643,Slovakia,1408,24,619
5173,-8.874217,125.727539,19.0,0,0,0,Timor-Leste,0,0,0
20861,30.975600,112.270700,80.0,67803,3219,64264,Hubei / China,67803,3216,64236


In [10]:
#colunas em que vamos basear as previsões
x_columns = ['Lat','Long','Days Passed', 'Conf. Prev.','Deaths Prev.','Recov. Prev.']
#colunas que queremos prever
y_columns = ['Confirmed','Deaths','Recovered']

O uso de apenas este algoritmo não possibilita a previsão para mais do que um output.
Assim, foi preciso auxiliarmo-nos num "wrapper" - MultiOutputRegressor - de forma a conseguirmos
contornar este problema. Esta classe irá criar uma instância do modelo para cada um dos outputs do problema.

Contudo como referido, é criado um modelo separado para cada output. Consequentemente, esta
solução não consegue garantir a dependência entre as várias varíáveis, ou seja, assume que
os outputs são totalmente independentes uns dos outros. 

No contexto deste problema os outputs tem relação entre si uma vez que o maior número de casos confirmados 
implicam um maior número de mortes e recuperados. Apesar disto, decidimos avançar na mesma com este algoritmo uma
vez que a dependência entre os vários outputs poderá não ser assim tão significativa para os resultados finais.

In [11]:
from sklearn.multioutput import MultiOutputRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

## RBF with epsilon=0.1

In [12]:
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(train[x_columns])
Y = sc_y.fit_transform(train[y_columns])

In [16]:
print(sc_X.mean_)

[  21.55100794   23.64102449   51.49820762 3116.10220314  202.67961165
  818.09790889]


In [13]:
regressor_rbf = SVR(kernel='rbf', cache_size=500)
wrapper_rbf = MultiOutputRegressor(regressor_rbf)

In [14]:
wrapper_rbf.fit(X,Y)
predictions_rbf = wrapper_rbf.predict(test[x_columns])

In [15]:
predictions_rbf = pd.DataFrame(data=predictions_rbf,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])
predictions_rbf['Local'] = test['Local'].tolist()
predictions_rbf['Confirmed Prediction'] = predictions_rbf['Confirmed Prediction'].apply(np.ceil)
predictions_rbf['Deaths Prediction'] = predictions_rbf['Deaths Prediction'].apply(np.ceil)
predictions_rbf['Recovered Prediction'] = predictions_rbf['Recovered Prediction'].apply(np.ceil)
predictions_rbf['Confirmed Actual'] = test['Confirmed'].tolist()
predictions_rbf['Deaths Actual'] = test['Deaths'].tolist()
predictions_rbf['Recovered Actual'] = test['Recovered'].tolist()
predictions_rbf['Days Passed'] = test['Days Passed'].tolist()
predictions_rbf = predictions_rbf[['Days Passed','Local', 'Confirmed Prediction', 'Confirmed Actual','Deaths Prediction', 'Deaths Actual','Recovered Prediction','Recovered Actual']]
predictions_rbf

Unnamed: 0,Days Passed,Local,Confirmed Prediction,Confirmed Actual,Deaths Prediction,Deaths Actual,Recovered Prediction,Recovered Actual
0,12.0,Fujian / China,5.0,179,6.0,0,6.0,1
1,83.0,Tianjin / China,5.0,185,6.0,3,6.0,168
2,48.0,Spain,5.0,1695,6.0,35,6.0,32
3,1.0,Poland,5.0,0,6.0,0,6.0,0
4,22.0,South Sudan,5.0,0,6.0,0,6.0,0
...,...,...,...,...,...,...,...,...
255,36.0,Hunan / China,5.0,1017,6.0,4,6.0,804
256,42.0,Gibraltar / United Kingdom,5.0,1,6.0,0,6.0,0
257,32.0,Croatia,5.0,0,6.0,0,6.0,0
258,38.0,Heilongjiang / China,5.0,480,6.0,13,6.0,0


In [21]:
with pd.ExcelWriter('results_svr_rbf.xlsx') as writer:
    predictions_rbf.to_excel(writer)

## Sigmoid with epsilon=0.1


In [22]:
regressor_s = SVR(kernel='sigmoid', cache_size=500)
wrapper_s = MultiOutputRegressor(regressor_s)

In [23]:
wrapper_s.fit(train[x_columns],train[y_columns])
predictions_s = wrapper_s.predict(test[x_columns])

In [24]:
predictions_s = pd.DataFrame(data=predictions_s,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])
predictions_s['Local'] = test['Local'].tolist()
predictions_s['Confirmed Prediction'] = predictions_s['Confirmed Prediction'].apply(np.ceil)
predictions_s['Deaths Prediction'] = predictions_s['Deaths Prediction'].apply(np.ceil)
predictions_s['Recovered Prediction'] = predictions_s['Recovered Prediction'].apply(np.ceil)
predictions_s['Confirmed Actual'] = test['Confirmed'].tolist()
predictions_s['Deaths Actual'] = test['Deaths'].tolist()
predictions_s['Recovered Actual'] = test['Recovered'].tolist()
predictions_s['Days Passed'] = test['Days Passed'].tolist()
predictions_s = predictions_s[['Days Passed','Local', 'Confirmed Prediction', 'Confirmed Actual','Deaths Prediction', 'Deaths Actual','Recovered Prediction','Recovered Actual']]
predictions_s

Unnamed: 0,Days Passed,Local,Confirmed Prediction,Confirmed Actual,Deaths Prediction,Deaths Actual,Recovered Prediction,Recovered Actual
0,18.0,Shaanxi / China,28.0,208,14.0,0,15.0,25
1,102.0,Gibraltar / United Kingdom,26.0,144,12.0,0,14.0,132
2,78.0,Liechtenstein,18.0,78,7.0,1,7.0,55
3,69.0,South Sudan,10.0,0,-0.0,0,1.0,0
4,98.0,Hungary,218.0,2727,145.0,300,156.0,536
...,...,...,...,...,...,...,...,...
255,31.0,Somalia,10.0,0,-0.0,0,1.0,0
256,1.0,Shandong / China,10.0,6,-0.0,0,1.0,0
257,102.0,Malta,60.0,477,37.0,4,41.0,392
258,91.0,Malawi,12.0,23,2.0,3,2.0,3


In [25]:
with pd.ExcelWriter('results_svr_s.xlsx') as writer:
    predictions_s.to_excel(writer)

## Sigmoid with epsilon=0.2

In [26]:
regressor_s2 = SVR(kernel='sigmoid', epsilon=0.2, cache_size = 500)
wrapper_s2 = MultiOutputRegressor(regressor_s2)

In [27]:
wrapper_s2.fit(train[x_columns],train[y_columns])
predictions_s2 = wrapper_s2.predict(test[x_columns])

In [28]:
predictions_s2 = pd.DataFrame(data=predictions_s2,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])
predictions_s2['Local'] = test['Local'].tolist()
predictions_s2['Confirmed Prediction'] = predictions_s2['Confirmed Prediction'].apply(np.ceil)
predictions_s2['Deaths Prediction'] = predictions_s2['Deaths Prediction'].apply(np.ceil)
predictions_s2['Recovered Prediction'] = predictions_s2['Recovered Prediction'].apply(np.ceil)
predictions_s2['Confirmed Actual'] = test['Confirmed'].tolist()
predictions_s2['Deaths Actual'] = test['Deaths'].tolist()
predictions_s2['Recovered Actual'] = test['Recovered'].tolist()
predictions_s2['Days Passed'] = test['Days Passed'].tolist()
predictions_s2 = predictions_s2[['Days Passed','Local', 'Confirmed Prediction', 'Confirmed Actual','Deaths Prediction', 'Deaths Actual','Recovered Prediction','Recovered Actual']]
predictions_s2

Unnamed: 0,Days Passed,Local,Confirmed Prediction,Confirmed Actual,Deaths Prediction,Deaths Actual,Recovered Prediction,Recovered Actual
0,18.0,Shaanxi / China,28.0,208,14.0,0,15.0,25
1,102.0,Gibraltar / United Kingdom,26.0,144,12.0,0,14.0,132
2,78.0,Liechtenstein,18.0,78,6.0,1,7.0,55
3,69.0,South Sudan,10.0,0,-0.0,0,1.0,0
4,98.0,Hungary,218.0,2727,145.0,300,156.0,536
...,...,...,...,...,...,...,...,...
255,31.0,Somalia,10.0,0,-0.0,0,1.0,0
256,1.0,Shandong / China,10.0,6,-0.0,0,1.0,0
257,102.0,Malta,60.0,477,37.0,4,41.0,392
258,91.0,Malawi,12.0,23,2.0,3,2.0,3


In [29]:
with pd.ExcelWriter('results_svr_s2.xlsx') as writer:
    predictions_s2.to_excel(writer)

## Rbf com epsilon=0.2

In [32]:
regressor_rbf2 = SVR(kernel='rbf', epsilon=0.2,cache_size=500)
wrapper_rbf2 = MultiOutputRegressor(regressor_rbf2)

In [33]:
wrapper_rbf2.fit(train[x_columns],train[y_columns])
predictions_rbf2 = wrapper_rbf2.predict(test[x_columns])

In [34]:
predictions_rbf2 = pd.DataFrame(data=predictions_rbf2,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])
predictions_rbf2['Local'] = test['Local'].tolist()
predictions_rbf2['Confirmed Prediction'] = predictions_rbf2['Confirmed Prediction'].apply(np.ceil)
predictions_rbf2['Deaths Prediction'] = predictions_rbf2['Deaths Prediction'].apply(np.ceil)
predictions_rbf2['Recovered Prediction'] = predictions_rbf2['Recovered Prediction'].apply(np.ceil)
predictions_rbf2['Confirmed Actual'] = test['Confirmed'].tolist()
predictions_rbf2['Deaths Actual'] = test['Deaths'].tolist()
predictions_rbf2['Recovered Actual'] = test['Recovered'].tolist()
predictions_rbf2['Days Passed'] = test['Days Passed'].tolist()
predictions_rbf2 = predictions_rbf2[['Days Passed','Local', 'Confirmed Prediction', 'Confirmed Actual','Deaths Prediction', 'Deaths Actual','Recovered Prediction','Recovered Actual']]
predictions_rbf2

Unnamed: 0,Days Passed,Local,Confirmed Prediction,Confirmed Actual,Deaths Prediction,Deaths Actual,Recovered Prediction,Recovered Actual
0,18.0,Shaanxi / China,18.0,208,3.0,0,5.0,25
1,102.0,Gibraltar / United Kingdom,17.0,144,2.0,0,5.0,132
2,78.0,Liechtenstein,14.0,78,2.0,1,3.0,55
3,69.0,South Sudan,11.0,0,1.0,0,1.0,0
4,98.0,Hungary,115.0,2727,39.0,300,62.0,536
...,...,...,...,...,...,...,...,...
255,31.0,Somalia,11.0,0,1.0,0,1.0,0
256,1.0,Shandong / China,11.0,6,1.0,0,1.0,0
257,102.0,Malta,31.0,477,7.0,4,13.0,392
258,91.0,Malawi,11.0,23,1.0,3,1.0,3


In [35]:
with pd.ExcelWriter('results_svr_rbf2.xlsx') as writer:
    predictions_rbf2.to_excel(writer)

## Analysis Results

In [36]:
from sklearn.metrics import precision_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score

## RBF with epsilon=0.1


In [37]:
y_pred = pd.DataFrame(data=predictions_rbf,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])

### Precision

In [38]:
precision_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

  _warn_prf(average, modifier, msg_start, len(result))


0.00025832376578645234

In [39]:
precision_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.6455799755799756

In [40]:
precision_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.004149797570850202

### Accuracy

In [41]:
accuracy_score(test['Confirmed'], y_pred['Confirmed Prediction'])

0.011538461538461539

In [42]:
accuracy_score(test['Deaths'], y_pred['Deaths Prediction'])

0.3192307692307692

In [43]:
accuracy_score(test['Recovered'], y_pred['Recovered Prediction'])

0.04230769230769231

### Recall

In [44]:
recall_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

  _warn_prf(average, modifier, msg_start, len(result))


0.011538461538461539

In [45]:
recall_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.3192307692307692

In [46]:
recall_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.04230769230769231

### Confusion Matrix

In [47]:
confusion_matrix(test['Confirmed'], y_pred['Confirmed Prediction'])

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [48]:
confusion_matrix(test['Deaths'], y_pred['Deaths Prediction'])

array([[66, 84,  4, ...,  0,  0,  0],
       [ 0, 12,  7, ...,  0,  0,  0],
       [ 0,  2,  2, ...,  0,  0,  0],
       ...,
       [ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  0,  0, ...,  0,  0,  0]])

In [49]:
confusion_matrix(test['Recovered'], y_pred['Recovered Prediction'])

array([[  0, 124,   3, ...,   0,   0,   0],
       [  0,  10,   3, ...,   0,   0,   0],
       [  0,   1,   1, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0]])

In [50]:
def find_TP(y_true, y_pred):
    # counts the number of true positives (y_true = 1, y_pred = 1)
    return sum((y_true == 1) & (y_pred == 1))
def find_FN(y_true, y_pred):
    # counts the number of false negatives (y_true = 1, y_pred = 0)
    return # your code here
def find_FP(y_true, y_pred):
    # counts the number of false positives (y_true = 0, y_pred = 1)
    return # your code here
def find_TN(y_true, y_pred):
    # counts the number of true negatives (y_true = 0, y_pred = 0)
    return # your code here

In [51]:
print('TP:',find_TP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FN:',find_FN(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FP:',find_FP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('TN:',find_TN(test['Confirmed'], y_pred['Confirmed Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [52]:
print('TP:',find_TP(test['Deaths'], y_pred['Deaths Prediction']))
print('FN:',find_FN(test['Deaths'], y_pred['Deaths Prediction']))
print('FP:',find_FP(test['Deaths'], y_pred['Deaths Prediction']))
print('TN:',find_TN(test['Deaths'], y_pred['Deaths Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [53]:
print('TP:',find_TP(test['Recovered'], y_pred['Recovered Prediction']))
print('FN:',find_FN(test['Recovered'], y_pred['Recovered Prediction']))
print('FP:',find_FP(test['Recovered'], y_pred['Recovered Prediction']))
print('TN:',find_TN(test['Recovered'], y_pred['Recovered Prediction']))

TP: 0
FN: None
FP: None
TN: None


### F1 

In [54]:
f1_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0005053340819764178

In [55]:
f1_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.3935856115463841

In [56]:
f1_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.007536439464150307

## Sigmoid epsilon=0.1

In [57]:
y_pred = pd.DataFrame(data=predictions_s,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])

### Precision

In [58]:
precision_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0014423076923076924

In [59]:
precision_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.6491878491878492

In [60]:
precision_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.007427149964463397

### Accuracy

In [61]:
accuracy_score(test['Confirmed'], y_pred['Confirmed Prediction'])

0.011538461538461539

In [62]:
accuracy_score(test['Deaths'], y_pred['Deaths Prediction'])

0.45

In [63]:
accuracy_score(test['Recovered'], y_pred['Recovered Prediction'])

0.04230769230769231

### Recall

In [64]:
recall_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.011538461538461539

In [65]:
recall_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.45

In [66]:
recall_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.04230769230769231

### Confusion Matrix

In [67]:
confusion_matrix(test['Confirmed'], y_pred['Confirmed Prediction'])

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [68]:
confusion_matrix(test['Deaths'], y_pred['Deaths Prediction'])

array([[  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0, 107, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   1,   0, ...,   0,   0,   0],
       [  1,   0,   0, ...,   0,   0,   0]])

In [69]:
confusion_matrix(test['Recovered'], y_pred['Recovered Prediction'])

array([[  0, 120,   1, ...,   0,   0,   0],
       [  0,   8,   2, ...,   0,   0,   0],
       [  0,   1,   0, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0]])

In [70]:
def find_TP(y_true, y_pred):
    # counts the number of true positives (y_true = 1, y_pred = 1)
    return sum((y_true == 1) & (y_pred == 1))
def find_FN(y_true, y_pred):
    # counts the number of false negatives (y_true = 1, y_pred = 0)
    return # your code here
def find_FP(y_true, y_pred):
    # counts the number of false positives (y_true = 0, y_pred = 1)
    return # your code here
def find_TN(y_true, y_pred):
    # counts the number of true negatives (y_true = 0, y_pred = 0)
    return # your code here

In [71]:
print('TP:',find_TP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FN:',find_FN(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FP:',find_FP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('TN:',find_TN(test['Confirmed'], y_pred['Confirmed Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [72]:
print('TP:',find_TP(test['Deaths'], y_pred['Deaths Prediction']))
print('FN:',find_FN(test['Deaths'], y_pred['Deaths Prediction']))
print('FP:',find_FP(test['Deaths'], y_pred['Deaths Prediction']))
print('TN:',find_TN(test['Deaths'], y_pred['Deaths Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [73]:
print('TP:',find_TP(test['Recovered'], y_pred['Recovered Prediction']))
print('FN:',find_FN(test['Recovered'], y_pred['Recovered Prediction']))
print('FP:',find_FP(test['Recovered'], y_pred['Recovered Prediction']))
print('TN:',find_TN(test['Recovered'], y_pred['Recovered Prediction']))

TP: 0
FN: None
FP: None
TN: None


### F1 

In [74]:
f1_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.002564102564102564

In [75]:
f1_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.5287771281373395

In [76]:
f1_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.011791901791901791

## RBF com epsilon =0.2

In [77]:
y_pred = pd.DataFrame(data=predictions_rbf2 ,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])

### Precision

In [78]:
precision_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0002526670409882089

In [79]:
precision_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.6407459207459207

In [80]:
precision_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.004143772893772895

### Accuracy

In [81]:
accuracy_score(test['Confirmed'], y_pred['Confirmed Prediction'])

0.011538461538461539

In [82]:
accuracy_score(test['Deaths'], y_pred['Deaths Prediction'])

0.06923076923076923

In [83]:
accuracy_score(test['Recovered'], y_pred['Recovered Prediction'])

0.04230769230769231

### Recall

In [84]:
recall_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.011538461538461539

In [85]:
recall_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.06923076923076923

In [86]:
recall_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.04230769230769231

### Confusion Matrix

In [87]:
confusion_matrix(test['Confirmed'], y_pred['Confirmed Prediction'])

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [88]:
confusion_matrix(test['Deaths'], y_pred['Deaths Prediction'])

array([[  2, 147,   5, ...,   0,   0,   0],
       [  0,  11,   8, ...,   0,   0,   0],
       [  0,   2,   2, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0]])

In [89]:
confusion_matrix(test['Recovered'], y_pred['Recovered Prediction'])

array([[  0, 123,   4, ...,   0,   0,   0],
       [  0,  10,   3, ...,   0,   0,   0],
       [  0,   1,   1, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0]])

In [90]:
def find_TP(y_true, y_pred):
    # counts the number of true positives (y_true = 1, y_pred = 1)
    return sum((y_true == 1) & (y_pred == 1))
def find_FN(y_true, y_pred):
    # counts the number of false negatives (y_true = 1, y_pred = 0)
    return # your code here
def find_FP(y_true, y_pred):
    # counts the number of false positives (y_true = 0, y_pred = 1)
    return # your code here
def find_TN(y_true, y_pred):
    # counts the number of true negatives (y_true = 0, y_pred = 0)
    return # your code here

In [91]:
print('TP:',find_TP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FN:',find_FN(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FP:',find_FP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('TN:',find_TN(test['Confirmed'], y_pred['Confirmed Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [92]:
print('TP:',find_TP(test['Deaths'], y_pred['Deaths Prediction']))
print('FN:',find_FN(test['Deaths'], y_pred['Deaths Prediction']))
print('FP:',find_FP(test['Deaths'], y_pred['Deaths Prediction']))
print('TN:',find_TN(test['Deaths'], y_pred['Deaths Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [93]:
print('TP:',find_TP(test['Recovered'], y_pred['Recovered Prediction']))
print('FN:',find_FN(test['Recovered'], y_pred['Recovered Prediction']))
print('FP:',find_FP(test['Recovered'], y_pred['Recovered Prediction']))
print('TN:',find_TN(test['Recovered'], y_pred['Recovered Prediction']))

TP: 0
FN: None
FP: None
TN: None


### F1 

In [94]:
f1_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0004945054945054945

In [95]:
f1_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.04248629096146827

In [96]:
f1_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.007543664065403197

## Sigmoid epsilon = 0.2

In [97]:
y_pred = pd.DataFrame(data=predictions_s2 ,columns=['Confirmed Prediction','Deaths Prediction','Recovered Prediction'])

### Precision

In [98]:
precision_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0015050167224080267

In [99]:
precision_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.6457407097407099

In [100]:
precision_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.007748637021892835

### Accuracy

In [101]:
accuracy_score(test['Confirmed'], y_pred['Confirmed Prediction'])

0.011538461538461539

In [102]:
accuracy_score(test['Deaths'], y_pred['Deaths Prediction'])

0.46153846153846156

In [103]:
accuracy_score(test['Recovered'], y_pred['Recovered Prediction'])

0.04230769230769231

### Recall

In [104]:
recall_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.011538461538461539

In [105]:
recall_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.46153846153846156

In [106]:
recall_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.04230769230769231

### Confusion Matrix

In [107]:
confusion_matrix(test['Confirmed'], y_pred['Confirmed Prediction'])

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [108]:
confusion_matrix(test['Deaths'], y_pred['Deaths Prediction'])

array([[  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0, 110, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   1,   0, ...,   0,   0,   0],
       [  1,   0,   0, ...,   0,   0,   0]])

In [109]:
confusion_matrix(test['Recovered'], y_pred['Recovered Prediction'])

array([[  0, 117,   4, ...,   0,   0,   0],
       [  0,   8,   2, ...,   0,   0,   0],
       [  0,   1,   0, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0]])

In [110]:
def find_TP(y_true, y_pred):
    # counts the number of true positives (y_true = 1, y_pred = 1)
    return sum((y_true == 1) & (y_pred == 1))
def find_FN(y_true, y_pred):
    # counts the number of false negatives (y_true = 1, y_pred = 0)
    return # your code here
def find_FP(y_true, y_pred):
    # counts the number of false positives (y_true = 0, y_pred = 1)
    return # your code here
def find_TN(y_true, y_pred):
    # counts the number of true negatives (y_true = 0, y_pred = 0)
    return # your code here

In [111]:
print('TP:',find_TP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FN:',find_FN(test['Confirmed'], y_pred['Confirmed Prediction']))
print('FP:',find_FP(test['Confirmed'], y_pred['Confirmed Prediction']))
print('TN:',find_TN(test['Confirmed'], y_pred['Confirmed Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [112]:
print('TP:',find_TP(test['Deaths'], y_pred['Deaths Prediction']))
print('FN:',find_FN(test['Deaths'], y_pred['Deaths Prediction']))
print('FP:',find_FP(test['Deaths'], y_pred['Deaths Prediction']))
print('TN:',find_TN(test['Deaths'], y_pred['Deaths Prediction']))

TP: 0
FN: None
FP: None
TN: None


In [113]:
print('TP:',find_TP(test['Recovered'], y_pred['Recovered Prediction']))
print('FN:',find_FN(test['Recovered'], y_pred['Recovered Prediction']))
print('FP:',find_FP(test['Recovered'], y_pred['Recovered Prediction']))
print('TN:',find_TN(test['Recovered'], y_pred['Recovered Prediction']))

TP: 0
FN: None
FP: None
TN: None


### F1 

In [114]:
f1_score(test['Confirmed'], y_pred['Confirmed Prediction'],average='weighted',zero_division='warn')

0.0026627218934911242

In [115]:
f1_score(test['Deaths'], y_pred['Deaths Prediction'],average='weighted',zero_division='warn')

0.5363012250032001

In [116]:
f1_score(test['Recovered'], y_pred['Recovered Prediction'],average='weighted',zero_division='warn')

0.012201900663439125