# Data Analytics with Pandas

In this tutorial we will learn how to use pandas for data analysis. You can think of pandas as an extremely powerful version of Excel, with a lot more features. 

A few basic concepts to be covered include:

    1. Pandas: creating data(lists, Series, Data Frames); 
    2. Dealing with already existing data (Data Input and Output)
    3. Data Manipulation: Methods and Operations on Data
    4. Exercise- Men at work!!!
        

In [45]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Creating Data

In [46]:
#creating series from lists
names = ['Rexy', 'Raphael', 'Rita', 'Razel', 'Randiel']
index1 = np.random.randint(70, 78, 5)
names
index1

array([74, 72, 74, 75, 70])

In [47]:
SeriesNames = pd.Series(names, index1)
SeriesNames

74       Rexy
72    Raphael
74       Rita
75      Razel
70    Randiel
dtype: object

In [48]:
SeriesNames[70]

'Randiel'

In [49]:
names2 = ['Damlie', 'Damon', 'Durim', 'Donald']
index2 = ['A', 'B', 'C', 'D']
seriesname2 = pd.Series(names2, index2)
seriesname2

A    Damlie
B     Damon
C     Durim
D    Donald
dtype: object

In [50]:
seriesname2['D']

'Donald'

In [51]:
#Creating Series from dictionaries
dict1 = {0:'Kampala', 1:'Arusha', 3:'Nyeri', 4:'Accra'}

In [52]:
location = pd.Series(dict1)
location

0    Kampala
1     Arusha
3      Nyeri
4      Accra
dtype: object

In [53]:
#creating a dataframe


In [54]:
#Querying the data to get size, population


In [55]:
#Adding a new column called Population desnity from population & Size


### Dealing with already existing Data

In [56]:
covid = pd.read_csv('covidset.csv')
covid.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
0,01/07/2020,1,7,2020,279,13,Afghanistan,AF,AFG,38041757.0,Asia
1,30/06/2020,30,6,2020,271,12,Afghanistan,AF,AFG,38041757.0,Asia
2,29/06/2020,29,6,2020,351,18,Afghanistan,AF,AFG,38041757.0,Asia
3,28/06/2020,28,6,2020,165,20,Afghanistan,AF,AFG,38041757.0,Asia
4,27/06/2020,27,6,2020,276,8,Afghanistan,AF,AFG,38041757.0,Asia


In [57]:
covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26982 entries, 0 to 26981
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   dateRep                  26982 non-null  object 
 1   day                      26982 non-null  int64  
 2   month                    26982 non-null  int64  
 3   year                     26982 non-null  int64  
 4   cases                    26982 non-null  int64  
 5   deaths                   26982 non-null  int64  
 6   countriesAndTerritories  26982 non-null  object 
 7   geoId                    26873 non-null  object 
 8   countryterritoryCode     26918 non-null  object 
 9   popData2019              26918 non-null  float64
 10  continentExp             26982 non-null  object 
dtypes: float64(1), int64(5), object(5)
memory usage: 2.3+ MB


In [58]:
# converting to datetime data type
covid['dateRep'] = pd.to_datetime(covid['dateRep'], format='%d/%m/%Y')
covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26982 entries, 0 to 26981
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   dateRep                  26982 non-null  datetime64[ns]
 1   day                      26982 non-null  int64         
 2   month                    26982 non-null  int64         
 3   year                     26982 non-null  int64         
 4   cases                    26982 non-null  int64         
 5   deaths                   26982 non-null  int64         
 6   countriesAndTerritories  26982 non-null  object        
 7   geoId                    26873 non-null  object        
 8   countryterritoryCode     26918 non-null  object        
 9   popData2019              26918 non-null  float64       
 10  continentExp             26982 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(5), object(4)
memory usage: 2.3+ MB


In [59]:
# covid['Day_of_week'] = covid['dateRep'].dt.week

In [60]:
covid.columns

Index(['dateRep', 'day', 'month', 'year', 'cases', 'deaths',
       'countriesAndTerritories', 'geoId', 'countryterritoryCode',
       'popData2019', 'continentExp'],
      dtype='object')

#### Selection and Indexing

In [61]:
covid[['cases', 'deaths']]

Unnamed: 0,cases,deaths
0,279,13
1,271,12
2,351,18
3,165,20
4,276,8
...,...,...
26977,0,0
26978,0,1
26979,0,0
26980,1,0


*Data Frame columns are just series*

#### Selecting Rows

In [62]:
covid.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
0,2020-07-01,1,7,2020,279,13,Afghanistan,AF,AFG,38041757.0,Asia
1,2020-06-30,30,6,2020,271,12,Afghanistan,AF,AFG,38041757.0,Asia
2,2020-06-29,29,6,2020,351,18,Afghanistan,AF,AFG,38041757.0,Asia
3,2020-06-28,28,6,2020,165,20,Afghanistan,AF,AFG,38041757.0,Asia
4,2020-06-27,27,6,2020,276,8,Afghanistan,AF,AFG,38041757.0,Asia


In [63]:
covid.loc[0]

dateRep                    2020-07-01 00:00:00
day                                          1
month                                        7
year                                      2020
cases                                      279
deaths                                      13
countriesAndTerritories            Afghanistan
geoId                                       AF
countryterritoryCode                       AFG
popData2019                         38041757.0
continentExp                              Asia
Name: 0, dtype: object

In [64]:
covid.loc[2]

dateRep                    2020-06-29 00:00:00
day                                         29
month                                        6
year                                      2020
cases                                      351
deaths                                      18
countriesAndTerritories            Afghanistan
geoId                                       AF
countryterritoryCode                       AFG
popData2019                         38041757.0
continentExp                              Asia
Name: 2, dtype: object

#### Selecting subset of rows and columns

In [65]:
covid['cases'][0]

279

In [66]:
covid.loc[0, 'cases']

279

In [67]:
covid[['cases', 'deaths']].iloc[[0, 2, 4]]

Unnamed: 0,cases,deaths
0,279,13
2,351,18
4,276,8


#### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [68]:
covid[covid['year']==2019]

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
173,2019-12-31,31,12,2019,0,0,Afghanistan,AF,AFG,38041757.0,Asia
467,2019-12-31,31,12,2019,0,0,Algeria,DZ,DZA,43053054.0,Africa
1172,2019-12-31,31,12,2019,0,0,Armenia,AM,ARM,2957728.0,Europe
1457,2019-12-31,31,12,2019,0,0,Australia,AU,AUS,25203200.0,Oceania
1641,2019-12-31,31,12,2019,0,0,Austria,AT,AUT,8858775.0,Europe
...,...,...,...,...,...,...,...,...,...,...,...
24507,2019-12-31,31,12,2019,0,0,Thailand,TH,THA,69625581.0,Asia
25544,2019-12-31,31,12,2019,0,0,United_Arab_Emirates,AE,ARE,9770526.0,Asia
25728,2019-12-31,31,12,2019,0,0,United_Kingdom,UK,GBR,66647112.0,Europe
26019,2019-12-31,31,12,2019,0,0,United_States_of_America,US,USA,329064917.0,America


In [69]:
#More than one condition
covid[(covid['year']==2019) & (covid['countriesAndTerritories']=='China')].squeeze()
#(covid[covid['year']==2019][covid['countriesAndTerritories']=='China']).squeeze()

dateRep                    2019-12-31 00:00:00
day                                         31
month                                       12
year                                      2019
cases                                       27
deaths                                       0
countriesAndTerritories                  China
geoId                                       CN
countryterritoryCode                       CHN
popData2019                       1433783692.0
continentExp                              Asia
Name: 5475, dtype: object

#### Groupby

The groupby method allows you to group rows of data together and call aggregate functions

In [70]:
df_c = covid.groupby('continentExp')
df_c

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000016BDFF50E80>

And then call aggregate methods off the object:

In [71]:
df_c[['cases', 'deaths']].mean()

Unnamed: 0_level_0,cases,deaths
continentExp,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,69.231926,1.734233
America,951.260179,45.086607
Asia,354.618629,8.829073
Europe,300.899236,23.646465
Oceania,9.692387,0.136831
Other,10.875,0.109375


In [72]:
df_c[['cases', 'deaths']].std()

Unnamed: 0_level_0,cases,deaths
continentExp,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,340.430511,7.956224
America,4422.759924,225.98932
Asia,1210.533348,43.696429
Europe,1085.048329,105.980585
Oceania,45.777897,0.595902
Other,27.530358,0.403051


In [73]:
df_c[['cases', 'deaths']].max()

Unnamed: 0_level_0,cases,deaths
continentExp,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,7210,168
America,54771,4928
Asia,19906,2003
Europe,11656,2004
Oceania,611,7
Other,134,2


In [74]:
df_c[['cases', 'deaths']].sum()

Unnamed: 0_level_0,cases,deaths
continentExp,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,405076,10147
America,5327057,252485
Asia,2261403,56303
Europe,2442700,191962
Oceania,9421,133
Other,696,7


In [75]:
df_c[['cases', 'deaths']].count()

Unnamed: 0_level_0,cases,deaths
continentExp,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,5851,5851
America,5600,5600
Asia,6377,6377
Europe,8118,8118
Oceania,972,972
Other,64,64


In [76]:
df_c['cases'].count()[0]

5851

In [77]:
df_c[['cases', 'deaths']].describe()

Unnamed: 0_level_0,cases,cases,cases,cases,cases,cases,cases,cases,deaths,deaths,deaths,deaths,deaths,deaths,deaths,deaths
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
continentExp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Africa,5851.0,69.231926,340.430511,-209.0,0.0,3.0,30.0,7210.0,5851.0,1.734233,7.956224,0.0,0.0,0.0,1.0,168.0
America,5600.0,951.260179,4422.759924,-2461.0,0.0,1.0,79.0,54771.0,5600.0,45.086607,225.98932,0.0,0.0,0.0,2.0,4928.0
Asia,6377.0,354.618629,1210.533348,0.0,0.0,8.0,156.0,19906.0,6377.0,8.829073,43.696429,0.0,0.0,0.0,2.0,2003.0
Europe,8118.0,300.899236,1085.048329,-766.0,0.0,9.0,103.0,11656.0,8118.0,23.646465,105.980585,-1918.0,0.0,0.0,4.0,2004.0
Oceania,972.0,9.692387,45.777897,0.0,0.0,0.0,2.0,611.0,972.0,0.136831,0.595902,0.0,0.0,0.0,0.0,7.0
Other,64.0,10.875,27.530358,-9.0,0.0,0.0,0.0,134.0,64.0,0.109375,0.403051,0.0,0.0,0.0,0.0,2.0


In [78]:
df_c[['cases', 'deaths']].describe().transpose()

Unnamed: 0,continentExp,Africa,America,Asia,Europe,Oceania,Other
cases,count,5851.0,5600.0,6377.0,8118.0,972.0,64.0
cases,mean,69.231926,951.260179,354.618629,300.899236,9.692387,10.875
cases,std,340.430511,4422.759924,1210.533348,1085.048329,45.777897,27.530358
cases,min,-209.0,-2461.0,0.0,-766.0,0.0,-9.0
cases,25%,0.0,0.0,0.0,0.0,0.0,0.0
cases,50%,3.0,1.0,8.0,9.0,0.0,0.0
cases,75%,30.0,79.0,156.0,103.0,2.0,0.0
cases,max,7210.0,54771.0,19906.0,11656.0,611.0,134.0
deaths,count,5851.0,5600.0,6377.0,8118.0,972.0,64.0
deaths,mean,1.734233,45.086607,8.829073,23.646465,0.136831,0.109375


In [79]:
df_c['cases'].describe().transpose()

continentExp,Africa,America,Asia,Europe,Oceania,Other
count,5851.0,5600.0,6377.0,8118.0,972.0,64.0
mean,69.231926,951.260179,354.618629,300.899236,9.692387,10.875
std,340.430511,4422.759924,1210.533348,1085.048329,45.777897,27.530358
min,-209.0,-2461.0,0.0,-766.0,0.0,-9.0
25%,0.0,0.0,0.0,0.0,0.0,0.0
50%,3.0,1.0,8.0,9.0,0.0,0.0
75%,30.0,79.0,156.0,103.0,2.0,0.0
max,7210.0,54771.0,19906.0,11656.0,611.0,134.0


In [80]:
covid.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
0,2020-07-01,1,7,2020,279,13,Afghanistan,AF,AFG,38041757.0,Asia
1,2020-06-30,30,6,2020,271,12,Afghanistan,AF,AFG,38041757.0,Asia
2,2020-06-29,29,6,2020,351,18,Afghanistan,AF,AFG,38041757.0,Asia
3,2020-06-28,28,6,2020,165,20,Afghanistan,AF,AFG,38041757.0,Asia
4,2020-06-27,27,6,2020,276,8,Afghanistan,AF,AFG,38041757.0,Asia


**Questions**
1. Number of rows and columns in the data

In [81]:
covid.shape

(26982, 11)

2. Number of columns in the data

In [82]:
# len(covid.columns)
covid.shape[1]

11

3. Number of countries reported on in the data

In [83]:
covid["countriesAndTerritories"].nunique()

210

4. Specify the countries captured in the data

In [84]:
covid["countriesAndTerritories"].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Anguilla', 'Antigua_and_Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia',
       'Bonaire, Saint Eustatius and Saba', 'Bosnia_and_Herzegovina',
       'Botswana', 'Brazil', 'British_Virgin_Islands',
       'Brunei_Darussalam', 'Bulgaria', 'Burkina_Faso', 'Burundi',
       'Cambodia', 'Cameroon', 'Canada', 'Cape_Verde',
       'Cases_on_an_international_conveyance_Japan', 'Cayman_Islands',
       'Central_African_Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Comoros', 'Congo', 'Costa_Rica', 'Cote_dIvoire', 'Croatia',
       'Cuba', 'Curaçao', 'Cyprus', 'Czechia',
       'Democratic_Republic_of_the_Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican_Republic', 'Ecuador', 'Egypt',
       'El_Salvador', 'Equatorial_Guinea', 'Eri

5. Number of continents in the data

In [85]:
covid["continentExp"].nunique()

6

6. Which country (ies) is(are) included in the continent named 'Other' in the data?

In [86]:
covid[covid["continentExp"] == "Other"]["countriesAndTerritories"].unique()

array(['Cases_on_an_international_conveyance_Japan'], dtype=object)

7. How many African countries are in the data?

In [87]:
covid[covid["continentExp"] ==  "Africa"]["countriesAndTerritories"].nunique()

55

8. Which African country had the highest total cases in the month of June?

In [88]:
covid[covid["continentExp"] ==  "Africa"][['cases', "countriesAndTerritories"]]

Unnamed: 0,cases,countriesAndTerritories
289,336,Algeria
290,298,Algeria
291,305,Algeria
292,283,Algeria
293,240,Algeria
...,...,...
26977,0,Zimbabwe
26978,0,Zimbabwe
26979,0,Zimbabwe
26980,1,Zimbabwe


In [89]:
covid[covid["cases"] == covid["cases"].max()]

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
3520,2020-06-20,20,6,2020,54771,1206,Brazil,BR,BRA,211049519.0,America


In [90]:
# covid[(covid["month"] == "June") | (covid["continentExp"] == "Africa")]["countriesAndTerritories"]
african = covid[(covid["month"] == 6) & (covid["continentExp"] == "Africa")]
af1 = african.groupby(by = "countriesAndTerritories")['cases'].sum()
af1.idxmax()
# df[(df['W']>1) | (df['Y'] > 1)]

'South_Africa'

Which African country had the highest total cases in the month of June?

9. When did Uganda register the first covid case?

In [97]:
uganda = covid[covid["countriesAndTerritories"] == "Uganda"][covid["cases"] != 0]
print(uganda.sort_values("dateRep")["dateRep"].iloc[0])

2020-03-22 00:00:00


  uganda = covid[covid["countriesAndTerritories"] == "Uganda"][covid["cases"] != 0]


10. How many countries have zero deaths uptodate?

In [98]:
sum(covid.groupby( "countriesAndTerritories" )["deaths"].sum()==0)

27

## Exercise

In [99]:
covid.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
0,2020-07-01,1,7,2020,279,13,Afghanistan,AF,AFG,38041757.0,Asia
1,2020-06-30,30,6,2020,271,12,Afghanistan,AF,AFG,38041757.0,Asia
2,2020-06-29,29,6,2020,351,18,Afghanistan,AF,AFG,38041757.0,Asia
3,2020-06-28,28,6,2020,165,20,Afghanistan,AF,AFG,38041757.0,Asia
4,2020-06-27,27,6,2020,276,8,Afghanistan,AF,AFG,38041757.0,Asia


11. Which continent had the highest number of deaths in March?

In [107]:
march = covid[covid["month"] == 3 ]
march
# highest_death = covid.groupby("continentExp")["deaths"].sum().idxmax()
# highest_death

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
92,2020-03-31,31,3,2020,27,0,Afghanistan,AF,AFG,38041757.0,Asia
93,2020-03-30,30,3,2020,8,1,Afghanistan,AF,AFG,38041757.0,Asia
94,2020-03-29,29,3,2020,15,1,Afghanistan,AF,AFG,38041757.0,Asia
95,2020-03-28,28,3,2020,16,1,Afghanistan,AF,AFG,38041757.0,Asia
96,2020-03-27,27,3,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia
...,...,...,...,...,...,...,...,...,...,...,...
26977,2020-03-25,25,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa
26978,2020-03-24,24,3,2020,0,1,Zimbabwe,ZW,ZWE,14645473.0,Africa
26979,2020-03-23,23,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa
26980,2020-03-22,22,3,2020,1,0,Zimbabwe,ZW,ZWE,14645473.0,Africa


In [109]:
highest_death = march.groupby("continentExp")["deaths"].sum().idxmax()
highest_death

'Europe'

12. Which African country registered the first case?

In [120]:
african = covid[covid["continentExp"] == "Africa"]
# african
african[african["cases"]!= 0].sort_values("dateRep").iloc[0,6]

'Egypt'

13. In which month are the most cases reported?

In [121]:
covid[covid["cases"] == covid["cases"].max()]["month"].iloc[0]

6

14. Is there a relation between the cases in America and the cases in Asia?

In [122]:
correlation = covid[(covid["continentExp"] == "America") | (covid["continentExp"] == "Asia") ]
correlation.groupby("continentExp")[["cases"]].sum().transpose()#.corr()

continentExp,America,Asia
cases,5327057,2261403


15. Which European country has the highest frequency in number of deaths reported?

In [134]:
# covid[covid["deaths"] == covid["deaths"].max()]
eurodeaths = covid[covid["continentExp"] == "Europe"]#[["deaths","countriesAndTerritories"]]
# eurodeaths.sort_values("deaths", ascending=False)["countriesAndTerritories"].iloc[0]
# ["deaths"].idxmax()
eurodeaths.groupby("countriesAndTerritories").sum()["deaths"].idxmax()

#["deaths"]
# covid.groupby("continentExp")[["deaths","countriesAndTerritories"]].sort_values("deaths")

'United_Kingdom'

16. What is the name of the country with the lowest daily reported cases in the month of March

In [142]:
# covid[(covid["month"]== 3) & (covid["cases"].sort_values("cases"))]
# covid[["cases"]].sort_values("cases")
marchcases = covid[(covid["month"]== 3)]
marchcases.groupby("countriesAndTerritories").mean()["cases"].idxmin()

'Cases_on_an_international_conveyance_Japan'

17. On average, how many cases were registered in Asia and Africa from March to June?

In [159]:
as_af = covid[(covid["continentExp"] == "Asia") | (covid["continentExp"] == "Africa")]
round(as_af[(as_af["month"] >= 3) & (as_af["month"] <= 6)]["cases"].mean())

242

18. 18. Based on the African countries represented in the data, introduce a new column called 'Region' and answer the questions below. [Link to the Regions](https://en.wikipedia.org/wiki/List_of_regions_of_Africa)

In [170]:
north_A : "Algeria Canary_Islands Ceuta Egypt Libya Madeira Melilla Morocco Sudan Tunisia Western_Sahara".split()
east_A : """Burundi Comoros Djibouti Eritrea Ethiopia French_Southern_Territories KenyaMalawi Mauritius Mayotte 
            Mozambique Reunion Rwanda Seychelles Somalia South_Sudan Tanzania Uganda Zambia Zimbabwe""".split()
mid_A : """Angola Cameroon Central_African_Republic Chad Congo 
            Democratic_Republic_of_the_Congo Equatorial Guinea Gabon São_Tomé_and_Príncipe""".split()
south_A : "Botswana Eswatini Lesotho Madagascar Namibia South_Africa".split()
west_A : """Benin Burkina_Faso Cape_Verde Cote_dIvoire Gambia Ghana Guinea Guinea_Bissau 
            Liberia Mali Mauritania Niger Nigeria Saint_Helena Senegal Sierra_Leone Togo""".split()

In [174]:
african

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp
289,2020-07-01,1,7,2020,336,7,Algeria,DZ,DZA,43053054.0,Africa
290,2020-06-30,30,6,2020,298,8,Algeria,DZ,DZA,43053054.0,Africa
291,2020-06-29,29,6,2020,305,5,Algeria,DZ,DZA,43053054.0,Africa
292,2020-06-28,28,6,2020,283,7,Algeria,DZ,DZA,43053054.0,Africa
293,2020-06-27,27,6,2020,240,7,Algeria,DZ,DZA,43053054.0,Africa
...,...,...,...,...,...,...,...,...,...,...,...
26977,2020-03-25,25,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa
26978,2020-03-24,24,3,2020,0,1,Zimbabwe,ZW,ZWE,14645473.0,Africa
26979,2020-03-23,23,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa
26980,2020-03-22,22,3,2020,1,0,Zimbabwe,ZW,ZWE,14645473.0,Africa


In [176]:
def add (x):
    if x in north_A:
        return "north_A"
    elif x in east_A:
        return "east_A"
    elif x in mid_A:
        return "mid_A"
    elif x in south_A:
        return "south_A"
    elif x in west_A:
        return "west_A"

african["Region"] = african["countriesAndTerritories"].apply(add)
african.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  african["Region"] = african["countriesAndTerritories"].apply(add)


Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Region
289,2020-07-01,1,7,2020,336,7,Algeria,DZ,DZA,43053054.0,Africa,north_A
290,2020-06-30,30,6,2020,298,8,Algeria,DZ,DZA,43053054.0,Africa,north_A
291,2020-06-29,29,6,2020,305,5,Algeria,DZ,DZA,43053054.0,Africa,north_A
292,2020-06-28,28,6,2020,283,7,Algeria,DZ,DZA,43053054.0,Africa,north_A
293,2020-06-27,27,6,2020,240,7,Algeria,DZ,DZA,43053054.0,Africa,north_A


19. Which region has had the highest number of registered cases overtime?

In [231]:

african.groupby("Region")["cases"].sum().idxmax()

'south_A'

20. Which region has registered a drop in the number of registered cases over time?

In [237]:
african.groupby("Region")["cases"].mean().idxmin()
# african.sort_values("dateRep").mea

'east_A'