### Sorting and Filtering

In [1]:
import pandas as pd
import numpy as np

We have already seen that we can obtain boolean masks from `Series` and `DataFrame` objects using the `isnull()` and `notnull()` methods:

Let's load up the `populations.csv` file we've studied before:

In [2]:
df = pd.read_csv('populations.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 4 columns):
 #   Column                                   Non-Null Count  Dtype 
---  ------                                   --------------  ----- 
 0   Geographic Area                          52 non-null     object
 1   July 1, 2001 Estimate                    52 non-null     int64 
 2   July 1, 2000 Estimate                    52 non-null     int64 
 3   April 1, 2000 Population Estimates Base  52 non-null     int64 
dtypes: int64(3), object(1)
memory usage: 1.8+ KB


In [3]:
df[:5]

Unnamed: 0,Geographic Area,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
0,United States,284796887,282124631,281421906
1,Alabama,4464356,4451493,4447100
2,Alaska,634892,627601,626932
3,Arizona,5307331,5165274,5130632
4,Arkansas,2692090,2678030,2673400


We're going to make `Geographic Area` the index:

In [4]:
data = df.set_index('Geographic Area')
data[:5]

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
United States,284796887,282124631,281421906
Alabama,4464356,4451493,4447100
Alaska,634892,627601,626932
Arizona,5307331,5165274,5130632
Arkansas,2692090,2678030,2673400


We can generate boolean masks the same way as before (and basically the same way we generated masks for NumPy arrays)

Let's say we are only interested in records where the `July 1, 2001 Estimate` values are under than `3_000_000`:

In [5]:
mask = data['July 1, 2001 Estimate'] < 3_000_000
mask

Geographic Area
United States           False
Alabama                 False
Alaska                   True
Arizona                 False
Arkansas                 True
California              False
Colorado                False
Connecticut             False
Delaware                 True
District of Columbia     True
Florida                 False
Georgia                 False
Hawaii                   True
Idaho                    True
Illinois                False
Indiana                 False
Iowa                     True
Kansas                   True
Kentucky                False
Louisiana               False
Maine                    True
Maryland                False
Massachusetts           False
Michigan                False
Minnesota               False
Mississippi              True
Missouri                False
Montana                  True
Nebraska                 True
Nevada                   True
New Hampshire            True
New Jersey              False
New Mexico              

Alternatively, we could have used the positional index:

In [6]:
data.iloc[:, 0] < 3_000_000

Geographic Area
United States           False
Alabama                 False
Alaska                   True
Arizona                 False
Arkansas                 True
California              False
Colorado                False
Connecticut             False
Delaware                 True
District of Columbia     True
Florida                 False
Georgia                 False
Hawaii                   True
Idaho                    True
Illinois                False
Indiana                 False
Iowa                     True
Kansas                   True
Kentucky                False
Louisiana               False
Maine                    True
Maryland                False
Massachusetts           False
Michigan                False
Minnesota               False
Mississippi              True
Missouri                False
Montana                  True
Nebraska                 True
Nevada                   True
New Hampshire            True
New Jersey              False
New Mexico              

Now that we have a mask, we can use it to filter the rows in the dataset:

In [7]:
data[mask]

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alaska,634892,627601,626932
Arkansas,2692090,2678030,2673400
Delaware,796165,786234,783600
District of Columbia,571822,571066,572059
Hawaii,1224398,1212281,1211537
Idaho,1321006,1299258,1293953
Iowa,2923179,2927509,2926324
Kansas,2694641,2691750,2688418
Maine,1286670,1276961,1274923
Mississippi,2858029,2849100,2844658


We have seen sorting a series by it's index before. Let's quickly recall how that worked:

In [8]:
s = pd.Series([10, 20, 30, 40], index=['Z', 'y', 'x', 'w'])
s

Z    10
y    20
x    30
w    40
dtype: int64

We can then sort by that index using the `sort_index()` method:

In [9]:
s.sort_index()

Z    10
w    40
x    30
y    20
dtype: int64

This sorted the series by using the standard sort order for strings (our index consisted of strings) - and this is why `Z` sorted before `w`.

In [10]:
'Z' < 'w'

True

If you think back to the `sorted()` function in Python, we could define a `key` function to sort by - the `sort_index` method also supports this same argument.

But there is a difference - whereas the `key` argument function for the `sorted` function expects a function that receives one element of the iterable at a time, the `key` argument for the `sort_index()` method expects a vectorized function - i.e. one that will be applied to the entire series first, and then used as the sort key.

Pandas has a vectorized version of the `casefold()` string method - it is a method available on series objects:

In [11]:
s.index

Index(['Z', 'y', 'x', 'w'], dtype='object')

In [12]:
s.index.str.casefold()

Index(['z', 'y', 'x', 'w'], dtype='object')

there are other string methods too, such as `len`, `upper`, `lower` etc:

In [13]:
s.index.str.upper()

Index(['Z', 'Y', 'X', 'W'], dtype='object')

In [14]:
s.index.str.len()

Int64Index([1, 1, 1, 1], dtype='int64')

So, the `key` argument to `sort_index()` expects a function that will receive the index as its argument, and return a new series - precisely what the `.str.casefold()` does.

In [15]:
s.sort_index(key=lambda ind: ind.str.casefold())

w    40
x    30
y    20
Z    10
dtype: int64

Notice that the series was sorted using the casefolded version of the values in the index, but the index itself was not modified - we still see `Z` in the index values, but this time sorted at the bottom of the list.

Let's look at another example, where we want to sort some numerical values by their absolute value:

In [16]:
s = pd.Series(list('abcdef'), index=[-1, -3, -5, 0, 2, 4])
s

-1    a
-3    b
-5    c
 0    d
 2    e
 4    f
dtype: object

In [17]:
s.sort_index(key=lambda ind: np.abs(ind))

 0    d
-1    a
 2    e
-3    b
 4    f
-5    c
dtype: object

Similarly, we can sort a `DataFrame` by its index.

In [18]:
data[:5]

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
United States,284796887,282124631,281421906
Alabama,4464356,4451493,4447100
Alaska,634892,627601,626932
Arizona,5307331,5165274,5130632
Arkansas,2692090,2678030,2673400


This dataset is almost already sorted alphabetically, just the first row (which is really a total row) is not.

We can sort the dataframe in string order the same way as the series:

In [19]:
data.sort_index(key=lambda ind: ind.str.casefold())

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alabama,4464356,4451493,4447100
Alaska,634892,627601,626932
Arizona,5307331,5165274,5130632
Arkansas,2692090,2678030,2673400
California,34501130,34000446,33871648
Colorado,4417714,4323410,4301261
Connecticut,3425074,3410079,3405565
Delaware,796165,786234,783600
District of Columbia,571822,571066,572059
Florida,16396515,16054328,15982378


Of course, we could sort using different key functions too, for example by the length of the string:

In [20]:
data.sort_index(key=lambda ind: ind.str.len())

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ohio,11373541,11359955,11353140
Utah,2269789,2241555,2233169
Iowa,2923179,2927509,2926324
Texas,21325018,20946503,20851820
Maine,1286670,1276961,1274923
Idaho,1321006,1299258,1293953
Oregon,3472867,3429293,3421399
Alaska,634892,627601,626932
Nevada,2106074,2018723,1998257
Kansas,2694641,2691750,2688418


We can also specify whether the sort should be ascending (the default), or descending:

In [21]:
data.sort_index(key=lambda ind: ind.str.casefold(), ascending=False)

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Wyoming,494423,494001,493782
Wisconsin,5401906,5372243,5363675
West Virginia,1801916,1807099,1808344
Washington,5987973,5908372,5894121
Virginia,7187734,7104016,7078515
Vermont,613090,609709,608827
Utah,2269789,2241555,2233169
United States,284796887,282124631,281421906
Texas,21325018,20946503,20851820
Tennessee,5740021,5702027,5689283


We can also sort by values, not just by index.

For that we can use the `.sort_values()` method - but of course we have multiple columns we could sort by, so we can specify which one to use:

In [22]:
data.sort_values('July 1, 2001 Estimate')

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Wyoming,494423,494001,493782
District of Columbia,571822,571066,572059
Vermont,613090,609709,608827
North Dakota,634448,640919,642200
Alaska,634892,627601,626932
South Dakota,756600,755509,754844
Delaware,796165,786234,783600
Montana,904433,903157,902195
Rhode Island,1058920,1050236,1048319
Hawaii,1224398,1212281,1211537


By default the sort order is ascending, but we can also specify a descending order:

In [23]:
data.sort_values('July 1, 2001 Estimate', ascending=False)

Unnamed: 0_level_0,"July 1, 2001 Estimate","July 1, 2000 Estimate","April 1, 2000 Population Estimates Base"
Geographic Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
United States,284796887,282124631,281421906
California,34501130,34000446,33871648
Texas,21325018,20946503,20851820
New York,19011378,18989332,18976457
Florida,16396515,16054328,15982378
Illinois,12482301,12435970,12419293
Pennsylvania,12287150,12282591,12281054
Ohio,11373541,11359955,11353140
Michigan,9990817,9952006,9938444
New Jersey,8484431,8429007,8414350


Sometimes we may want to sort by more than one column at a time.

Let's load another data set to see how this works.

In [24]:
df = pd.read_csv('world_bank_countries.csv')
df[:5]

Unnamed: 0,CountryCode,ShortName,TableName,LongName,Alpha2Code,CurrencyUnit,SpecialNotes,Region,IncomeGroup,Wb2Code,...,GovernmentAccountingConcept,ImfDataDisseminationStandard,LatestPopulationCensus,LatestHouseholdSurvey,SourceOfMostRecentIncomeAndExpenditureData,VitalRegistrationComplete,LatestAgriculturalCensus,LatestIndustrialData,LatestTradeData,LatestWaterWithdrawalData
0,AFG,Afghanistan,Afghanistan,Islamic State of Afghanistan,AF,Afghan afghani,Fiscal year end: March 20; reporting period fo...,South Asia,Low income,AF,...,Consolidated central government,General Data Dissemination System (GDDS),1979,"Multiple Indicator Cluster Survey (MICS), 2010/11","Integrated household survey (IHS), 2008",,2013/14,,2013.0,2000.0
1,ALB,Albania,Albania,Republic of Albania,AL,Albanian lek,,Europe & Central Asia,Upper middle income,AL,...,Budgetary central government,General Data Dissemination System (GDDS),2011,"Demographic and Health Survey (DHS), 2008/09",Living Standards Measurement Study Survey (LSM...,Yes,2012,2011.0,2013.0,2006.0
2,DZA,Algeria,Algeria,People's Democratic Republic of Algeria,DZ,Algerian dinar,,Middle East & North Africa,Upper middle income,DZ,...,Budgetary central government,General Data Dissemination System (GDDS),2008,"Multiple Indicator Cluster Survey (MICS), 2012","Integrated household survey (IHS), 1995",,,2010.0,2013.0,2001.0
3,ASM,American Samoa,American Samoa,American Samoa,AS,U.S. dollar,,East Asia & Pacific,Upper middle income,AS,...,,,2010,,,Yes,2007,,,
4,ADO,Andorra,Andorra,Principality of Andorra,AD,Euro,,Europe & Central Asia,High income: nonOECD,AD,...,,,2011. Population data compiled from administra...,,,Yes,,,2006.0,


In this case, let's suppose we are only interested in the following columns:
- `ShortName`
- `Region`
- `CountryCode`
- `CurrencyUnit`

We could just create a new data frame with just those columns using fancy indexing:

In [25]:
data = df[['ShortName', 'Region', 'CountryCode', 'CurrencyUnit']]
data[:5]

Unnamed: 0,ShortName,Region,CountryCode,CurrencyUnit
0,Afghanistan,South Asia,AFG,Afghan afghani
1,Albania,Europe & Central Asia,ALB,Albanian lek
2,Algeria,Middle East & North Africa,DZA,Algerian dinar
3,American Samoa,East Asia & Pacific,ASM,U.S. dollar
4,Andorra,Europe & Central Asia,ADO,Euro


Or, if we had known ahead of time which columns we are interested in, we could also specify that when we load the data from the csv file:

In [26]:
df = pd.read_csv(
    'world_bank_countries.csv',
    header=0,
    usecols=[0, 1, 5, 7],
    names=['code', 'name', 'currency', 'region']
)
df

Unnamed: 0,code,name,currency,region
0,AFG,Afghanistan,Afghan afghani,South Asia
1,ALB,Albania,Albanian lek,Europe & Central Asia
2,DZA,Algeria,Algerian dinar,Middle East & North Africa
3,ASM,American Samoa,U.S. dollar,East Asia & Pacific
4,ADO,Andorra,Euro,Europe & Central Asia
...,...,...,...,...
242,WBG,West Bank and Gaza,Israeli new shekel,Middle East & North Africa
243,WLD,World,,
244,YEM,Yemen,Yemeni rial,Middle East & North Africa
245,ZMB,Zambia,New Zambian kwacha,Sub-Saharan Africa


Also, I would prefer having the columns ordered: `region, code, name, currency`.

We need to do some fancy indexing for this, but we can easily do it in the same step as loading the csv file:

In [27]:
df = pd.read_csv(
    'world_bank_countries.csv',
    header=0,
    usecols=[0, 1, 5, 7],
    names=['code', 'name', 'currency', 'region']
)[['region', 'code', 'name', 'currency']]
df

Unnamed: 0,region,code,name,currency
0,South Asia,AFG,Afghanistan,Afghan afghani
1,Europe & Central Asia,ALB,Albania,Albanian lek
2,Middle East & North Africa,DZA,Algeria,Algerian dinar
3,East Asia & Pacific,ASM,American Samoa,U.S. dollar
4,Europe & Central Asia,ADO,Andorra,Euro
...,...,...,...,...
242,Middle East & North Africa,WBG,West Bank and Gaza,Israeli new shekel
243,,WLD,World,
244,Middle East & North Africa,YEM,Yemen,Yemeni rial
245,Sub-Saharan Africa,ZMB,Zambia,New Zambian kwacha


In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 247 entries, 0 to 246
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   region    214 non-null    object
 1   code      247 non-null    object
 2   name      247 non-null    object
 3   currency  214 non-null    object
dtypes: object(4)
memory usage: 7.8+ KB


You'll notice that we have some data in the `region` column that is missing (214 non-null elements, but 247 entries in the data set).

So the first thing I want to do is to drop those rows from the data frame where the region is null.

Recall the `.notnull()` method:

In [29]:
df['region'].notnull()

0       True
1       True
2       True
3       True
4       True
       ...  
242     True
243    False
244     True
245     True
246     True
Name: region, Length: 247, dtype: bool

This allows us to filter out null values using a boolean mask:

In [30]:
data = df[df['region'].notnull()]
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 214 entries, 0 to 246
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   region    214 non-null    object
 1   code      214 non-null    object
 2   name      214 non-null    object
 3   currency  214 non-null    object
dtypes: object(4)
memory usage: 8.4+ KB


Now we may want to sort this data set by region:

In [31]:
data.sort_values('region')

Unnamed: 0,region,code,name,currency
223,East Asia & Pacific,TON,Tonga,Tongan pa'anga
42,East Asia & Pacific,CHN,China,Chinese yuan
53,East Asia & Pacific,PRK,Dem. People's Rep. Korea,Democratic People's Republic of Korean won
229,East Asia & Pacific,TUV,Tuvalu,Australian dollar
73,East Asia & Pacific,FJI,Fiji,Fijian dollar
...,...,...,...,...
134,Sub-Saharan Africa,MDG,Madagascar,Malagasy ariary
124,Sub-Saharan Africa,LBR,Liberia,U.S. dollar
245,Sub-Saharan Africa,ZMB,Zambia,New Zambian kwacha
88,Sub-Saharan Africa,GNB,Guinea-Bissau,West African CFA franc


However, you'll notice that within each `region` we do not have a particular order to the countries.

In addition to sorting by `region`, we may also want to sort by `code`.

To do this, we simply specify multiple sort columns in a list instead of just a single string:

In [32]:
sorted_data = data.sort_values(['region', 'code'])
sorted_data

Unnamed: 0,region,code,name,currency
3,East Asia & Pacific,ASM,American Samoa,U.S. dollar
11,East Asia & Pacific,AUS,Australia,Australian dollar
27,East Asia & Pacific,BRN,Brunei,Brunei dollar
42,East Asia & Pacific,CHN,China,Chinese yuan
73,East Asia & Pacific,FJI,Fiji,Fijian dollar
...,...,...,...,...
230,Sub-Saharan Africa,UGA,Uganda,Ugandan shilling
199,Sub-Saharan Africa,ZAF,South Africa,South African rand
54,Sub-Saharan Africa,ZAR,Dem. Rep. Congo,Congolese franc
245,Sub-Saharan Africa,ZMB,Zambia,New Zambian kwacha


We are only seeing a subset of the data here, so let's iterate through all the rows and columns to see the entire data set.

This is actually not as straightforward as it sounds - in part because data frames consist of columns - not rows - and here I want to iterate through the rows, printing out each column value.

Pandas implements an `iterrows()` method for data frames that iterates over rows instead of columns. Each row is returned as a `tuple`, containing the (row) index label, and a `Series` object containing the column values:

In [33]:
for row_label, row_series in sorted_data.iterrows():
    print(row_label, type(row_series))

3 <class 'pandas.core.series.Series'>
11 <class 'pandas.core.series.Series'>
27 <class 'pandas.core.series.Series'>
42 <class 'pandas.core.series.Series'>
73 <class 'pandas.core.series.Series'>
144 <class 'pandas.core.series.Series'>
85 <class 'pandas.core.series.Series'>
96 <class 'pandas.core.series.Series'>
100 <class 'pandas.core.series.Series'>
108 <class 'pandas.core.series.Series'>
32 <class 'pandas.core.series.Series'>
112 <class 'pandas.core.series.Series'>
113 <class 'pandas.core.series.Series'>
117 <class 'pandas.core.series.Series'>
132 <class 'pandas.core.series.Series'>
140 <class 'pandas.core.series.Series'>
154 <class 'pandas.core.series.Series'>
150 <class 'pandas.core.series.Series'>
164 <class 'pandas.core.series.Series'>
136 <class 'pandas.core.series.Series'>
158 <class 'pandas.core.series.Series'>
159 <class 'pandas.core.series.Series'>
176 <class 'pandas.core.series.Series'>
171 <class 'pandas.core.series.Series'>
173 <class 'pandas.core.series.Series'>
53 <class

Now we're really not interested in the row index in this case, so we'll ignore it:

In [34]:
for _, row_series in sorted_data.iterrows():
    print(row_series)
    print('-' * 20)

region      East Asia & Pacific
code                        ASM
name             American Samoa
currency            U.S. dollar
Name: 3, dtype: object
--------------------
region      East Asia & Pacific
code                        AUS
name                  Australia
currency      Australian dollar
Name: 11, dtype: object
--------------------
region      East Asia & Pacific
code                        BRN
name                     Brunei
currency          Brunei dollar
Name: 27, dtype: object
--------------------
region      East Asia & Pacific
code                        CHN
name                      China
currency           Chinese yuan
Name: 42, dtype: object
--------------------
region      East Asia & Pacific
code                        FJI
name                       Fiji
currency          Fijian dollar
Name: 73, dtype: object
--------------------
region      East Asia & Pacific
code                        FSM
name                 Micronesia
currency            U.S. dollar
Name: 14

Name: 244, dtype: object
--------------------
region       North America
code                   BMU
name               Bermuda
currency    Bermuda dollar
Name: 21, dtype: object
--------------------
region        North America
code                    CAN
name                 Canada
currency    Canadian dollar
Name: 34, dtype: object
--------------------
region      North America
code                  USA
name        United States
currency      U.S. dollar
Name: 234, dtype: object
--------------------
region          South Asia
code                   AFG
name           Afghanistan
currency    Afghan afghani
Name: 0, dtype: object
--------------------
region            South Asia
code                     BGD
name              Bangladesh
currency    Bangladeshi taka
Name: 15, dtype: object
--------------------
region              South Asia
code                       BTN
name                    Bhutan
currency    Bhutanese ngultrum
Name: 22, dtype: object
--------------------
region      

As you can see, each series object has an index containing the column labels.

Finally, let's print just the values by using the `values` attribute (which returns a NumPy array) of each `Series` object:

In [35]:
for _, row_series in sorted_data.iterrows():
    print(row_series.values)

['East Asia & Pacific' 'ASM' 'American Samoa' 'U.S. dollar']
['East Asia & Pacific' 'AUS' 'Australia' 'Australian dollar']
['East Asia & Pacific' 'BRN' 'Brunei' 'Brunei dollar']
['East Asia & Pacific' 'CHN' 'China' 'Chinese yuan']
['East Asia & Pacific' 'FJI' 'Fiji' 'Fijian dollar']
['East Asia & Pacific' 'FSM' 'Micronesia' 'U.S. dollar']
['East Asia & Pacific' 'GUM' 'Guam' 'U.S. dollar']
['East Asia & Pacific' 'HKG' 'Hong Kong SAR, China' 'Hong Kong dollar']
['East Asia & Pacific' 'IDN' 'Indonesia' 'Indonesian rupiah']
['East Asia & Pacific' 'JPN' 'Japan' 'Japanese yen']
['East Asia & Pacific' 'KHM' 'Cambodia' 'Cambodian riel']
['East Asia & Pacific' 'KIR' 'Kiribati' 'Australian dollar']
['East Asia & Pacific' 'KOR' 'Korea' 'Korean won']
['East Asia & Pacific' 'LAO' 'Lao PDR' 'Lao kip']
['East Asia & Pacific' 'MAC' 'Macao SAR, China' 'Macao pataca']
['East Asia & Pacific' 'MHL' 'Marshall Islands' 'U.S. dollar']
['East Asia & Pacific' 'MMR' 'Myanmar' 'Myanmar kyat']
['East Asia & Pacif

['Sub-Saharan Africa' 'SYC' 'Seychelles' 'Seychelles rupee']
['Sub-Saharan Africa' 'TCD' 'Chad' 'Central African CFA franc']
['Sub-Saharan Africa' 'TGO' 'Togo' 'West African CFA franc']
['Sub-Saharan Africa' 'TZA' 'Tanzania' 'Tanzanian shilling']
['Sub-Saharan Africa' 'UGA' 'Uganda' 'Ugandan shilling']
['Sub-Saharan Africa' 'ZAF' 'South Africa' 'South African rand']
['Sub-Saharan Africa' 'ZAR' 'Dem. Rep. Congo' 'Congolese franc']
['Sub-Saharan Africa' 'ZMB' 'Zambia' 'New Zambian kwacha']
['Sub-Saharan Africa' 'ZWE' 'Zimbabwe' 'U.S. dollar']
