# Check countries listed

FAO data has 207 countries and regions in it.  Need to filter out the regions so have countries only, and to check which other countries of the world haven't been included. 

## Get FAO dataset

Specifically, get the list of countries in it

In [23]:
# Get dataset we're using
import pandas as pd
import pycountry

df = pd.read_csv('../DATA/FoodBalanceSheets_E_All_Data/FoodBalanceSheets_E_All_Data.csv',
                 encoding='iso-8859-1')
dfcountries = df[['Area Code', 'Area']].drop_duplicates()
dfcountries

Unnamed: 0,Area Code,Area
0,2,Afghanistan
1720,3,Albania
2993,4,Algeria
4383,7,Angola
6237,8,Antigua and Barbuda
7588,9,Argentina
8952,1,Armenia
10245,10,Australia
11669,11,Austria
13057,52,Azerbaijan


## Get other country lists, to compare with this

* FAO's own list of countries came from googlesheet command =importhtml("http://www.fao.org/countryprofiles/iso3list/en/", "table", 1)
* The UN's master list,including FAO codes, came from m49 from https://unstats.un.org/unsd/methodology/m49/overview/
* The ISO3166 code list came from copy-paste from https://www.iso.org/obp/ui/#search/code/


In [31]:
faos = pd.read_csv('FAO countries list.csv')
faos

Unnamed: 0,Short name,Official name,ISO3,ISO2,UNI,UNDP,FAOSTAT,GAUL
0,Afghanistan,the Islamic Republic of Afghanistan,AFG,AF,4,AFG,2,1
1,Albania,the Republic of Albania,ALB,AL,8,ALB,3,3
2,Algeria,the People's Democratic Republic of Algeria,DZA,DZ,12,DZA,4,4
3,Andorra,the Principality of Andorra,AND,AD,20,AND,6,7
4,Angola,the Republic of Angola,AGO,AO,24,AGO,7,8
5,Antigua and Barbuda,Antigua and Barbuda,ATG,AG,28,ATG,8,11
6,Argentina,the Argentine Republic,ARG,AR,32,ARG,9,12
7,Armenia,the Republic of Armenia,ARM,AM,51,ARM,1,13
8,Australia,Australia,AUS,AU,36,AUS,10,17
9,Austria,the Republic of Austria,AUT,AT,40,AUT,11,18


In [30]:
un49s = pd.read_csv('UNSD_M49_countries_list.csv')
un49s

Unnamed: 0,Global Code,Global Name,Region Code,Region Name,Sub-region Code,Sub-region Name,Intermediate Region Code,Intermediate Region Name,Country or Area,M49 Code,ISO-alpha3 Code,Least Developed Countries (LDC),Land Locked Developing Countries (LLDC),Small Island Developing States (SIDS),Developed / Developing Countries
0,1,World,2.0,Africa,15.0,Northern Africa,,,Algeria,12,DZA,,,,Developing
1,1,World,2.0,Africa,15.0,Northern Africa,,,Egypt,818,EGY,,,,Developing
2,1,World,2.0,Africa,15.0,Northern Africa,,,Libya,434,LBY,,,,Developing
3,1,World,2.0,Africa,15.0,Northern Africa,,,Morocco,504,MAR,,,,Developing
4,1,World,2.0,Africa,15.0,Northern Africa,,,Sudan,729,SDN,x,,,Developing
5,1,World,2.0,Africa,15.0,Northern Africa,,,Tunisia,788,TUN,,,,Developing
6,1,World,2.0,Africa,15.0,Northern Africa,,,Western Sahara,732,ESH,,,,Developing
7,1,World,2.0,Africa,202.0,Sub-Saharan Africa,14.0,Eastern Africa,British Indian Ocean Territory,86,IOT,,,,Developing
8,1,World,2.0,Africa,202.0,Sub-Saharan Africa,14.0,Eastern Africa,Burundi,108,BDI,x,x,,Developing
9,1,World,2.0,Africa,202.0,Sub-Saharan Africa,14.0,Eastern Africa,Comoros,174,COM,x,,x,Developing


In [28]:
iso3s = pd.DataFrame([[x.alpha_3, x.name] for x in list(pycountry.countries)], columns=['ISO3', 'name'])
# iso3codes = list(iso3s.keys())
# print('{}'.format(iso3codes))
iso3s

Unnamed: 0,ISO3,name
0,ABW,Aruba
1,AFG,Afghanistan
2,AGO,Angola
3,AIA,Anguilla
4,ALA,Åland Islands
5,ALB,Albania
6,AND,Andorra
7,ARE,United Arab Emirates
8,ARG,Argentina
9,ARM,Armenia


## Look at what's missing from each list

* FAO code > 5000 is an area, not a country



In [16]:
len(iso3codes)

249

In [49]:
fdata = set(dfcountries[dfcountries['Area Code'] < 5000]['Area Code'].to_list())
ifao = set(faos['ISO3'])
ffao = set(faos['FAOSTAT'])
i49  = set(un49s['ISO-alpha3 Code'].to_list())
iiso = set(iso3s['ISO3'].to_list())

print('''
Numbers of countries in codes:
Data: {} 
FAOcodes: {} 
UN49: {} 
ISO3: {}

Number of regions in data: {}'''.format(len(fdata), len(ifao), len(i49), len(iiso), len(dfcountries) - len(fdata)))



Numbers of countries in codes:
Data: 175 
FAOcodes: 196 
UN49: 249 
ISO3: 249

Number of regions in data: 34


In [42]:
# Data countries not in FAOSTAT list
dfcountries[dfcountries['Area Code'].isin(fdata - ffao)]

Unnamed: 0,Area Code,Area
24438,17,Bermuda
45381,351,China
47307,96,"China, Hong Kong SAR"
48785,128,"China, Macao SAR"
52045,214,"China, Taiwan Province of"
86668,70,French Polynesia
165382,153,New Caledonia


In [45]:
# Countries in FAOSTAT list but not dataset
faos[faos['FAOSTAT'].isin(ffao - fdata)].sort_values('FAOSTAT')

Unnamed: 0,Short name,Official name,ISO3,ISO2,UNI,UNDP,FAOSTAT,GAUL
3,Andorra,the Principality of Andorra,AND,AD,20,AND,6,7
12,Bahrain,the Kingdom of Bahrain,BHR,BH,48,BHR,13,21
19,Bhutan,the Kingdom of Bhutan,BTN,BT,64,BTN,18,31
27,Burundi,the Republic of Burundi,BDI,BI,108,BDI,29,43
37,Comoros,the Union of the Comoros,COM,KM,174,COM,45,58
39,Cook Islands,the Cook Islands,COK,CK,184,COK,47,60
55,Equatorial Guinea,the Republic of Equatorial Guinea,GNQ,GQ,226,GNQ,61,76
60,Faroe Islands,Faroe Islands,FRO,FO,234,FRO,64,82
99,Libya,State of Libya,LBY,LY,434,LBY,124,145
108,Marshall Islands,the Republic of the Marshall Islands,MHL,MH,584,MHL,127,157


In [46]:
faos[faos['Official name'].str.contains('China')]

Unnamed: 0,Short name,Official name,ISO3,ISO2,UNI,UNDP,FAOSTAT,GAUL
35,China,the People's Republic of China,CHN,CN,156,CHN,41,53


In [54]:
# Countries in ISO but not FAO list
un49s[un49s['ISO-alpha3 Code'].isin(i49 - ifao)][['Country or Area', 'ISO-alpha3 Code']]

Unnamed: 0,Country or Area,ISO-alpha3 Code
6,Western Sahara,ESH
7,British Indian Ocean Territory,IOT
13,French Southern Territories,ATF
18,Mayotte,MYT
20,Réunion,REU
56,Saint Helena,SHN
60,Anguilla,AIA
62,Aruba,ABW
65,"Bonaire, Sint Eustatius and Saba",BES
66,British Virgin Islands,VGB
