# Covid small dataset

**The commands that we used in this project :**
```
* import pandas as pd -- To import Pandas library
* pd.read_csv - To import the CSV file in Jupyter notebook
* df.count() - It counts the no. of non-null values of each column.
* df.isnull().sum() - It detects the missing values from the dataframe.
* import seaborn as sns - To import the Seaborn library.
* import matplotlib.pyplot as plt - To import the Matplotlib library.
* sns.heatmap(df.isnull()) - It will show the all columns & missing values in them in heat map form.
* plt.show() - To show the plot.
* df.groupby(‘Col_name’) - To form groups of all unique values of the column.
* df.sort_values(by= ['Col_name'] ) - Sort the entire dataframe by the values of the given column.     
* df[df.Col_1 = = ‘Element1’] - Filtering – We are accessing all records with Element1 only of Col_1.
```

## Objetives

**English**
1) Show the number of Confirmed, Deaths and Recovered cases in each Region.
2) Remove all the records where the Confirmed Cases is Less Than 10.
3) In which Region, maximum number of Confirmed cases were recorded ?
4) In which Region, minimum number of Deaths cases were recorded ?
5) How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?
6-A ) Sort the entire data wrt No. of Confirmed cases in ascending order.
6-B ) Sort the entire data wrt No. of Recovered cases in descending order.

**Spanish**
1) Indique el número de casos confirmados, muertos y recuperados en cada región.
2) Elimine todos los registros en los que los casos confirmados sean inferiores a 10.
3) ¿En qué región se registró el máximo número de casos confirmados?
4) ¿En qué región se ha registrado el mínimo número de muertes?
5) ¿Cuántos casos confirmados, muertos y recuperados se registraron en la India hasta el 29 de abril de 2020?
6-A ) Ordena todos los datos con el número de casos confirmados en orden ascendente.
6-B ) Ordena todos los datos con el número de casos recuperados en orden descendente.

In [10]:
# Carga de archivos e importe de biblioteca
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


df = pd.read_csv('A4Covid_19_data.csv')

In [3]:
df.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7


**Question 1**
* Show the number of Confirmed, Deaths and Recovered cases in each Region.
* Indique el número de casos confirmados, muertos y recuperados en cada región.


In [4]:
df.count()

Date         321
State        140
Region       321
Confirmed    321
Deaths       321
Recovered    321
dtype: int64

In [8]:
##Busqueda de valores nulos
df.isnull().sum()

Date           0
State        181
Region         0
Confirmed      0
Deaths         0
Recovered      0
dtype: int64

In [12]:
df.columns

Index(['Date', 'State', 'Region', 'Confirmed', 'Deaths', 'Recovered'], dtype='object')

In [21]:
# agrupando datos por region
df.groupby('Region').sum()

Unnamed: 0_level_0,Confirmed,Deaths,Recovered
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1939,60,252
Albania,766,30,455
Algeria,3848,444,1702
Andorra,743,42,423
Angola,27,2,7
...,...,...,...
West Bank and Gaza,344,2,71
Western Sahara,6,0,5
Yemen,6,0,1
Zambia,97,3,54


In [28]:
# Ordenando datos
confirmed =df.groupby('Region')['Confirmed'].sum().sort_values(ascending=False)
confirmed

Region
US                       1039909
Spain                     236899
Italy                     203591
France                    166543
UK                        166441
                          ...   
Sao Tome and Principe          8
Papua New Guinea               8
Bhutan                         7
Western Sahara                 6
Yemen                          6
Name: Confirmed, Length: 187, dtype: int64

In [33]:
df.groupby('Region')[['Confirmed','Recovered']].sum().sort_values(ascending=False, by=['Confirmed', 'Recovered'])

Unnamed: 0_level_0,Confirmed,Recovered
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
US,1039909,120720
Spain,236899,132929
Italy,203591,71252
France,166543,49118
UK,166441,857
...,...,...
Sao Tome and Principe,8,4
Papua New Guinea,8,0
Bhutan,7,5
Western Sahara,6,5


**Question 2**

* Remove all the records where the Confirmed Cases is Less Than 10.
* Elimine todos los registros en los que los casos confirmados sean inferiores a 10.

In [37]:
filtr = df[~(df['Recovered']<10)]
filtr.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
5,4/29/2020,,Antigua and Barbuda,24,3,11


**Question 3**
* In which Region, maximum number of Confirmed cases were recorded ?
* ¿En qué región se registró el máximo número de casos confirmados?

In [40]:
df.groupby('Region')['Confirmed'].sum().sort_values(ascending=False)

Region
US                       1039909
Spain                     236899
Italy                     203591
France                    166543
UK                        166441
                          ...   
Sao Tome and Principe          8
Papua New Guinea               8
Bhutan                         7
Western Sahara                 6
Yemen                          6
Name: Confirmed, Length: 187, dtype: int64

In [42]:
top10 = df.groupby('Region')['Confirmed'].sum().sort_values(ascending=False).head(10)
top10

Region
US                1039909
Spain              236899
Italy              203591
France             166543
UK                 166441
Germany            161539
Turkey             117589
Russia              99399
Iran                93657
Mainland China      82862
Name: Confirmed, dtype: int64

**Question 4**

* In which Region, minimum number of Deaths cases were recorded ?
* ¿En qué región se ha registrado el mínimo número de muertes?


In [44]:
df.groupby('Region')['Deaths'].sum().sort_values()

Region
Laos              0
Mongolia          0
Mozambique        0
Cambodia          0
Fiji              0
              ...  
France        24121
Spain         24275
UK            26166
Italy         27682
US            60967
Name: Deaths, Length: 187, dtype: int64

**Question 5**

* How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?
* ¿Cuántos casos confirmados, muertos y recuperados se registraron en la India hasta el 29 de abril de 2020?


In [45]:
df.columns

Index(['Date', 'State', 'Region', 'Confirmed', 'Deaths', 'Recovered'], dtype='object')

In [46]:
df[df['Region']=='India']

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
74,4/29/2020,,India,33062,1079,8437


**Question 6**

***6a***
* Sort the entire data wrt No. of Confirmed cases in ascending order.
* Ordena todos los datos con el número de casos confirmados en orden ascendente.

***6b***
* Sort the entire data wrt No. of Recovered cases in descending order.
* Ordena todos los datos con el número de casos recuperados en orden descendente.

In [47]:
# Orden por casos confirmados
df.sort_values(by=['Confirmed'], ascending=True)

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
285,4/29/2020,Recovered,US,0,0,120720
284,4/29/2020,Recovered,Canada,0,0,20327
203,4/29/2020,Diamond Princess cruise ship,Canada,0,1,0
305,4/29/2020,Tibet,Mainland China,1,0,1
289,4/29/2020,Saint Pierre and Miquelon,France,1,0,0
...,...,...,...,...,...,...
57,4/29/2020,,France,165093,24087,48228
168,4/29/2020,,UK,165221,26097,0
80,4/29/2020,,Italy,203591,27682,71252
153,4/29/2020,,Spain,236899,24275,132929


In [49]:
# Orden por casos confirmados, forma descendente
df.sort_values(by=['Recovered'], ascending=False)

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
153,4/29/2020,,Spain,236899,24275,132929
285,4/29/2020,Recovered,US,0,0,120720
61,4/29/2020,,Germany,161539,6467,120400
76,4/29/2020,,Iran,93657,5957,73791
80,4/29/2020,,Italy,203591,27682,71252
...,...,...,...,...,...,...
248,4/29/2020,Maryland,US,20849,1078,0
246,4/29/2020,Manitoba,Canada,275,6,0
243,4/29/2020,Louisiana,US,27660,1845,0
241,4/29/2020,Kentucky,US,4537,234,0
