# Scraping Africa's Energy and Economy

We will be scraping Economy and Energy data on African countries

## Checkpoint 0

Base URL and links we'll use

### Base URL

Let's first assign a base url, to which we will append the url for each country we want

In [1]:
!pip install requests --upgrade --quiet

In [2]:
import requests

In [3]:
from bs4 import BeautifulSoup as bs

In [4]:
baseurl="https://www.cia.gov/the-world-factbook/countries/"

### African Countries

Let's create a list of African countries

In [5]:
african_countries=['Nigeria','Egypt', 'South Africa','Algeria','Morocco','Angola','Kenya','Ethiopia','Tanzania',
 'Ghana','cote-divoire','congo-democratic-republic-of-the','Uganda','Tunisia','Cameroon','Sudan','Libya','Zimbabwe',
 'Senegal','Zambia','Gabon','Guinea','Mali','Burkina Faso','Botswana','Mozambique','Benin','Equatorial Guinea','Madagascar',
 'Niger', 'congo-republic-of-the','Chad','Namibia','Rwanda', 'Malawi','Mauritius', 'Mauritania','Somalia','Togo','South Sudan',
 'Eswatini','Sierra Leone','Liberia','Djibouti','Burundi','Lesotho','Central African Republic','Eritrea','gambia-the','cabo-verde',
 'Seychelles','Guinea-Bissau','Comoros','sao-tome-and-principe']

In [6]:
african_countries=[country.lower().replace(" ","-") for country in african_countries]

In [7]:
african_countries[39]

'south-sudan'

In [8]:
baseurl+african_countries[0]

'https://www.cia.gov/the-world-factbook/countries/nigeria'

## Checkpoint 1

Getting h2s and h3s

### Find h2s

Finding all headings under each country(this is not a necessary step, as we only require the **Energy** and **Economy** headings). 

However, by finding all `h2`s, we will build a function, whose working can be used in the subsequent steps

In [9]:
def allh2s(countrieslis):
    allh2s=[]
    for country in countrieslis:
        url=baseurl+country
        response=requests.get(url)
        countryfacts=bs(response.content)
        countryh2s=countryfacts.find_all('h2')
        h2stext=[h2.text for h2 in countryh2s]
        allh2s.append(h2stext)
    return allh2s

In [10]:
# Note how long the function takes to load
h2ss=allh2s(african_countries)

In [11]:
# Remove the Photos Heading
h2s1=[h2.pop(0) for h2 in h2ss]

In [12]:
h2ss

[['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
 

One thing to note is that some coutries have 11 headings while others have 12. This is because the factbook has data on terrorism in some countries, but not in others. We should take this into account when scraping, to avoid discrepancies.

In [13]:
count=0
for h2 in h2ss:
    count+=1 if len(h2)==11 else False
count

28

In [14]:
h2ss

[['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
  'Energy',
  'Communications',
  'Transportation',
  'Military and Security',
  'Terrorism',
  'Transnational Issues'],
 ['Introduction',
  'Geography',
  'People and Society',
  'Environment',
  'Government',
  'Economy',
 

### Finding h3s under h2s: Energy and Economy

Now, let's get every `h3` subheading under the **Energy** and **Economy**

In [15]:
def allh3s(countrieslis,h2lis):
    totalh3=[]
    i=0
    for country in countrieslis:
        url=baseurl+country
        response=requests.get(url)
        countryfacts=bs(response.content)
        new=[]
        print(country)
        for h2 in h2lis[i]:
            if h2 == 'Energy' or h2== 'Economy':
                h3s=countryfacts.find("div",attrs={"id":str(h2).lower().replace(" ","-")}).find_all('h3') 
                allh3=[]
                for h3 in h3s:
                    allh3.append(h3.text)
                new.append(allh3)
            totalh3.append(new)
        i+=1
    return totalh3

In [16]:
# Note how long this function takes to load as well
totalh3s=allh3s(african_countries,h2ss)

nigeria
egypt
south-africa
algeria
morocco
angola
kenya
ethiopia
tanzania
ghana
cote-divoire
congo-democratic-republic-of-the
uganda
tunisia
cameroon
sudan
libya
zimbabwe
senegal
zambia
gabon
guinea
mali
burkina-faso
botswana
mozambique
benin
equatorial-guinea
madagascar
niger
congo-republic-of-the
chad
namibia
rwanda
malawi
mauritius
mauritania
somalia
togo
south-sudan
eswatini
sierra-leone
liberia
djibouti
burundi
lesotho
central-african-republic
eritrea
gambia-the
cabo-verde
seychelles
guinea-bissau
comoros
sao-tome-and-principe


In [17]:
totalh3s

[[['Economic overview',
   'Real GDP (purchasing power parity)',
   'Real GDP growth rate',
   'Real GDP per capita',
   'GDP (official exchange rate)',
   'Inflation rate (consumer prices)',
   'Credit ratings',
   'GDP - composition, by sector of origin',
   'GDP - composition, by end use',
   'Agricultural products',
   'Industries',
   'Industrial production growth rate',
   'Labor force',
   'Labor force - by occupation',
   'Unemployment rate',
   'Unemployment, youth ages 15-24',
   'Population below poverty line',
   'Gini Index coefficient - distribution of family income',
   'Household income or consumption by percentage share',
   'Budget',
   'Budget surplus (+) or deficit (-)',
   'Public debt',
   'Taxes and other revenues',
   'Fiscal year',
   'Current account balance',
   'Exports',
   'Exports - partners',
   'Exports - commodities',
   'Imports',
   'Imports - partners',
   'Imports - commodities',
   'Reserves of foreign exchange and gold',
   'Debt - external',
   

## Checkpoint 2

Linking  `h2s` and `h3s`(The h2 part is also not necessary, but it might be a good stepping stone before  `Dealing with h3s`  )

### Dealing with h2s

In [18]:
# find the highest number of elements in the h2ss list
lenh2ss=[len(h2) for h2 in h2ss]
max(lenh2ss)

12

We'll now want to find which `h2s` are included in this list

In [19]:
# find the h2s in the index in which the elements are maximum
totalh2=h2ss[lenh2ss.index(max(lenh2ss))]

In [20]:
# function to do the above
def linkh2s(h2s):
    lenh2ss=[len(h2) for h2 in h2s]
    maxh2s=h2s[lenh2ss.index(max(lenh2ss))]
    return maxh2s     

### Dealing with h3s

For the subheadings, there's a possibility that some (`h3s`), appear only in some countries, but not in others. We'll have to account for **all** headings appearing in **all** countries. 

There are many ways to do this, but in this case, we will use np.hstack and np.unique (Refer to the first notebook to see how we did this on a smaller scale)

totalh3s are all the h3s under every **energy** and **economy** headings in a page with African facts in the worldfactbook. This list therefore has many repeated elements. Can we find all the unique `h3`s such that we have a list without repeated `h3`s, but each unique h3 included?

In [21]:
import numpy as np

In [22]:
# Get h3s as a list of arrays
h3arr=[[np.hstack(totalh3s[i][j]) for i in range(len(totalh3s)-1)] for j in range(len(totalh3s[:len(totalh3s)][(len(totalh3s[:53])-1)]))]

In [23]:
h3arr

[[array(['Economic overview', 'Real GDP (purchasing power parity)',
         'Real GDP growth rate', 'Real GDP per capita',
         'GDP (official exchange rate)', 'Inflation rate (consumer prices)',
         'Credit ratings', 'GDP - composition, by sector of origin',
         'GDP - composition, by end use', 'Agricultural products',
         'Industries', 'Industrial production growth rate', 'Labor force',
         'Labor force - by occupation', 'Unemployment rate',
         'Unemployment, youth ages 15-24', 'Population below poverty line',
         'Gini Index coefficient - distribution of family income',
         'Household income or consumption by percentage share', 'Budget',
         'Budget surplus (+) or deficit (-)', 'Public debt',
         'Taxes and other revenues', 'Fiscal year',
         'Current account balance', 'Exports', 'Exports - partners',
         'Exports - commodities', 'Imports', 'Imports - partners',
         'Imports - commodities', 'Reserves of foreign exchan

In [24]:
# since np.unique sorts the elements(which we don't want), we return
# the indices from np.unique, and create a new array list of all unique elements, but not sorted
h3arri=[np.unique(np.hstack(h3arr[i]),return_index=1)[1] for i in range(len(h3arr))]

In [25]:
totalh3=[[ np.hstack(h3arr[i])[index] for index in sorted(h3arri[i])] for i in range(len(h3arr)) ]

In [26]:
# this represents the unique h3s
totalh3

[['Economic overview',
  'Real GDP (purchasing power parity)',
  'Real GDP growth rate',
  'Real GDP per capita',
  'GDP (official exchange rate)',
  'Inflation rate (consumer prices)',
  'Credit ratings',
  'GDP - composition, by sector of origin',
  'GDP - composition, by end use',
  'Agricultural products',
  'Industries',
  'Industrial production growth rate',
  'Labor force',
  'Labor force - by occupation',
  'Unemployment rate',
  'Unemployment, youth ages 15-24',
  'Population below poverty line',
  'Gini Index coefficient - distribution of family income',
  'Household income or consumption by percentage share',
  'Budget',
  'Budget surplus (+) or deficit (-)',
  'Public debt',
  'Taxes and other revenues',
  'Fiscal year',
  'Current account balance',
  'Exports',
  'Exports - partners',
  'Exports - commodities',
  'Imports',
  'Imports - partners',
  'Imports - commodities',
  'Reserves of foreign exchange and gold',
  'Debt - external',
  'Exchange rates'],
 ['Electricity 

### Scraping the ps

We need to distinguish each h3 from the other. If you inspect the h3s, they don't have `id`s or distinctive `classes` like the `h2`s. 

Can you figure out a way to do this?

In [27]:
# stack h3 as list
h3r=[np.hstack(totalh3)[i] for i in range(len(np.hstack(totalh3)))]

In [28]:
# this is a list of h3s under African countries as elements
h3r

['Economic overview',
 'Real GDP (purchasing power parity)',
 'Real GDP growth rate',
 'Real GDP per capita',
 'GDP (official exchange rate)',
 'Inflation rate (consumer prices)',
 'Credit ratings',
 'GDP - composition, by sector of origin',
 'GDP - composition, by end use',
 'Agricultural products',
 'Industries',
 'Industrial production growth rate',
 'Labor force',
 'Labor force - by occupation',
 'Unemployment rate',
 'Unemployment, youth ages 15-24',
 'Population below poverty line',
 'Gini Index coefficient - distribution of family income',
 'Household income or consumption by percentage share',
 'Budget',
 'Budget surplus (+) or deficit (-)',
 'Public debt',
 'Taxes and other revenues',
 'Fiscal year',
 'Current account balance',
 'Exports',
 'Exports - partners',
 'Exports - commodities',
 'Imports',
 'Imports - partners',
 'Imports - commodities',
 'Reserves of foreign exchange and gold',
 'Debt - external',
 'Exchange rates',
 'Electricity access',
 'Electricity',
 'Electrici

We will distinguish h3s by the links(`href`) that fall within them. There are other ways to do this. Can you figure out some of them, and how they fair, better or worse compared to this method?

Let's first use the h3s to create the links which will correspond with the `href`s

In [29]:
# create a translation table
# the translate function removes certain elements from a list
translation_table = dict.fromkeys(map(ord, '!@#$(),+'), None)

In [30]:
h3rlinks=[link.translate(translation_table).lower().replace(" ","-").replace('--','-').replace('--','-').removesuffix('-')  for link in h3r]

In [31]:
h3rlinks

['economic-overview',
 'real-gdp-purchasing-power-parity',
 'real-gdp-growth-rate',
 'real-gdp-per-capita',
 'gdp-official-exchange-rate',
 'inflation-rate-consumer-prices',
 'credit-ratings',
 'gdp-composition-by-sector-of-origin',
 'gdp-composition-by-end-use',
 'agricultural-products',
 'industries',
 'industrial-production-growth-rate',
 'labor-force',
 'labor-force-by-occupation',
 'unemployment-rate',
 'unemployment-youth-ages-15-24',
 'population-below-poverty-line',
 'gini-index-coefficient-distribution-of-family-income',
 'household-income-or-consumption-by-percentage-share',
 'budget',
 'budget-surplus-or-deficit',
 'public-debt',
 'taxes-and-other-revenues',
 'fiscal-year',
 'current-account-balance',
 'exports',
 'exports-partners',
 'exports-commodities',
 'imports',
 'imports-partners',
 'imports-commodities',
 'reserves-of-foreign-exchange-and-gold',
 'debt-external',
 'exchange-rates',
 'electricity-access',
 'electricity',
 'electricity-generation-sources',
 'coal',
 '

In [32]:
# let's try retrieve as a trial, the link for economic overview in nigeria
url=baseurl+"nigeria"
response=requests.get(url)
countryfacts=bs(response.content)
h3surl="/the-world-factbook/field/"
new=[]
links=countryfacts.find("div",attrs={"id":str('economy').lower().replace(" ","-")}).find(href=(h3surl+"economic-overview"))
print(h3surl+"economic-overview")

/the-world-factbook/field/economic-overview


In [33]:
links

<a href="/the-world-factbook/field/economic-overview">Economic overview</a>

In [34]:
h3rlinks[:34] #till the end of economy headings

['economic-overview',
 'real-gdp-purchasing-power-parity',
 'real-gdp-growth-rate',
 'real-gdp-per-capita',
 'gdp-official-exchange-rate',
 'inflation-rate-consumer-prices',
 'credit-ratings',
 'gdp-composition-by-sector-of-origin',
 'gdp-composition-by-end-use',
 'agricultural-products',
 'industries',
 'industrial-production-growth-rate',
 'labor-force',
 'labor-force-by-occupation',
 'unemployment-rate',
 'unemployment-youth-ages-15-24',
 'population-below-poverty-line',
 'gini-index-coefficient-distribution-of-family-income',
 'household-income-or-consumption-by-percentage-share',
 'budget',
 'budget-surplus-or-deficit',
 'public-debt',
 'taxes-and-other-revenues',
 'fiscal-year',
 'current-account-balance',
 'exports',
 'exports-partners',
 'exports-commodities',
 'imports',
 'imports-partners',
 'imports-commodities',
 'reserves-of-foreign-exchange-and-gold',
 'debt-external',
 'exchange-rates']

In [35]:
def allps(countrieslis,h2lis):
    '''
    Takes as an argument the countries list and the list of h2s
    Output- a list of paragraphs under Economy and Energy
    If a country doesn't have a certain heading that is provided within the h3s, a 0 is appended to the list
    '''
    totalp=[]
    i=0
    for country in countrieslis:
        url=baseurl+country
        response=requests.get(url)
        countryfacts=bs(response.content)
        new=[]
        print(country)
        for h2 in h2lis[i]:
            if h2=='Economy':
                for h3 in h3rlinks[:34]:
                    try:
                        link=countryfacts.find("div",attrs={"id":str(h2).lower().replace(" ","-")}).find(href=h3surl+h3)
                        allp=[]
                        allp.append(link.find_next("p").get_text())
                        new.append(allp)
                    except:
                        allp=[]
                        allp.append(0)
                        new.append(allp)
            if h2=='Energy':
                for h3 in h3rlinks[34:]:
                    try:
                        link=countryfacts.find("div",attrs={"id":str(h2).lower().replace(" ","-")}).find(href=h3surl+h3)
                        allp=[]
                        allp.append(link.find_next("p").get_text())
                        new.append(allp)
                    except:
                        allp=[]
                        allp.append(0)
                        new.append(allp)
        i+=1
        new.insert(0,list(country.split(' ')))
        totalp.append(new)
    return totalp

In [36]:
h3rlinks

['economic-overview',
 'real-gdp-purchasing-power-parity',
 'real-gdp-growth-rate',
 'real-gdp-per-capita',
 'gdp-official-exchange-rate',
 'inflation-rate-consumer-prices',
 'credit-ratings',
 'gdp-composition-by-sector-of-origin',
 'gdp-composition-by-end-use',
 'agricultural-products',
 'industries',
 'industrial-production-growth-rate',
 'labor-force',
 'labor-force-by-occupation',
 'unemployment-rate',
 'unemployment-youth-ages-15-24',
 'population-below-poverty-line',
 'gini-index-coefficient-distribution-of-family-income',
 'household-income-or-consumption-by-percentage-share',
 'budget',
 'budget-surplus-or-deficit',
 'public-debt',
 'taxes-and-other-revenues',
 'fiscal-year',
 'current-account-balance',
 'exports',
 'exports-partners',
 'exports-commodities',
 'imports',
 'imports-partners',
 'imports-commodities',
 'reserves-of-foreign-exchange-and-gold',
 'debt-external',
 'exchange-rates',
 'electricity-access',
 'electricity',
 'electricity-generation-sources',
 'coal',
 '

In [37]:
test=allps(african_countries[17:21],h2ss)

zimbabwe
senegal
zambia
gabon


In [38]:
len(test[3])

46

In [39]:
totalps=allps(african_countries,h2ss)

nigeria
egypt
south-africa
algeria
morocco
angola
kenya
ethiopia
tanzania
ghana
cote-divoire
congo-democratic-republic-of-the
uganda
tunisia
cameroon
sudan
libya
zimbabwe
senegal
zambia
gabon
guinea
mali
burkina-faso
botswana
mozambique
benin
equatorial-guinea
madagascar
niger
congo-republic-of-the
chad
namibia
rwanda
malawi
mauritius
mauritania
somalia
togo
south-sudan
eswatini
sierra-leone
liberia
djibouti
burundi
lesotho
central-african-republic
eritrea
gambia-the
cabo-verde
seychelles
guinea-bissau
comoros
sao-tome-and-principe


## Checkpoint 3

At this point, we have all the `h2`s, `h3`s and `p`s. These form all our data; the headings are the columns, while the paragraphs are our data! The next step is storing our data

Let's begin with linking our data, like we did in the angola notebook

In [40]:
totalh2[1]+totalh3[1][1]

'GeographyElectricity'

In [41]:
totalh2

['Introduction',
 'Geography',
 'People and Society',
 'Environment',
 'Government',
 'Economy',
 'Energy',
 'Communications',
 'Transportation',
 'Military and Security',
 'Terrorism',
 'Transnational Issues']

In [42]:
# Let's focus on only economy and energy
totalh2=totalh2[5:7]

In [43]:
h2sh3= [[ totalh2[i]+ ": " +totalh3[i][j] for j in range(len(totalh3[i]))] for i in range(len(totalh2))]
h2sh3

[['Economy: Economic overview',
  'Economy: Real GDP (purchasing power parity)',
  'Economy: Real GDP growth rate',
  'Economy: Real GDP per capita',
  'Economy: GDP (official exchange rate)',
  'Economy: Inflation rate (consumer prices)',
  'Economy: Credit ratings',
  'Economy: GDP - composition, by sector of origin',
  'Economy: GDP - composition, by end use',
  'Economy: Agricultural products',
  'Economy: Industries',
  'Economy: Industrial production growth rate',
  'Economy: Labor force',
  'Economy: Labor force - by occupation',
  'Economy: Unemployment rate',
  'Economy: Unemployment, youth ages 15-24',
  'Economy: Population below poverty line',
  'Economy: Gini Index coefficient - distribution of family income',
  'Economy: Household income or consumption by percentage share',
  'Economy: Budget',
  'Economy: Budget surplus (+) or deficit (-)',
  'Economy: Public debt',
  'Economy: Taxes and other revenues',
  'Economy: Fiscal year',
  'Economy: Current account balance',
  '

In [44]:
# Let's add a country column
country=['countries']
h2sh3.insert(0,country)

In [45]:
h2sh3

[['countries'],
 ['Economy: Economic overview',
  'Economy: Real GDP (purchasing power parity)',
  'Economy: Real GDP growth rate',
  'Economy: Real GDP per capita',
  'Economy: GDP (official exchange rate)',
  'Economy: Inflation rate (consumer prices)',
  'Economy: Credit ratings',
  'Economy: GDP - composition, by sector of origin',
  'Economy: GDP - composition, by end use',
  'Economy: Agricultural products',
  'Economy: Industries',
  'Economy: Industrial production growth rate',
  'Economy: Labor force',
  'Economy: Labor force - by occupation',
  'Economy: Unemployment rate',
  'Economy: Unemployment, youth ages 15-24',
  'Economy: Population below poverty line',
  'Economy: Gini Index coefficient - distribution of family income',
  'Economy: Household income or consumption by percentage share',
  'Economy: Budget',
  'Economy: Budget surplus (+) or deficit (-)',
  'Economy: Public debt',
  'Economy: Taxes and other revenues',
  'Economy: Fiscal year',
  'Economy: Current accou

In [46]:
columnsarr=np.hstack(h2sh3)

## Checkpoint 4

For the last checkpoint, we will store the data. LEt's use the linked headings as columns. The outcome should be to have a csv file, and one easy way to do this is to first turn our data into a pandas dataframe.

In [47]:
import pandas as pd

In [48]:
totalps

[[['nigeria'],
  ['Nigeria is Sub Saharan Africa’s largest economy and relies heavily on oil as its main source of foreign exchange earnings and government revenues. Following the 2008-09 global financial crises, the banking sector was effectively recapitalized and regulation enhanced. Since then, Nigeria’s economic growth has been driven by growth in agriculture, telecommunications, and services. Economic diversification and strong growth have not translated into a significant decline in poverty levels; over 62% of Nigeria\'s over 180 million people still live in extreme poverty. \xa0 Despite its strong fundamentals, oil-rich Nigeria has been hobbled by inadequate power supply, lack of infrastructure, delays in the passage of legislative reforms, an inefficient property registration system, restrictive trade policies, an inconsistent regulatory environment, a slow and ineffective judicial system, unreliable dispute resolution mechanisms, insecurity, and pervasive corruption. Regulator

In [49]:
h2sh3

[['countries'],
 ['Economy: Economic overview',
  'Economy: Real GDP (purchasing power parity)',
  'Economy: Real GDP growth rate',
  'Economy: Real GDP per capita',
  'Economy: GDP (official exchange rate)',
  'Economy: Inflation rate (consumer prices)',
  'Economy: Credit ratings',
  'Economy: GDP - composition, by sector of origin',
  'Economy: GDP - composition, by end use',
  'Economy: Agricultural products',
  'Economy: Industries',
  'Economy: Industrial production growth rate',
  'Economy: Labor force',
  'Economy: Labor force - by occupation',
  'Economy: Unemployment rate',
  'Economy: Unemployment, youth ages 15-24',
  'Economy: Population below poverty line',
  'Economy: Gini Index coefficient - distribution of family income',
  'Economy: Household income or consumption by percentage share',
  'Economy: Budget',
  'Economy: Budget surplus (+) or deficit (-)',
  'Economy: Public debt',
  'Economy: Taxes and other revenues',
  'Economy: Fiscal year',
  'Economy: Current accou

In [50]:
africaecvseg=pd.DataFrame(totalps,columns=columnsarr)

In [51]:
africaecvseg.head()

Unnamed: 0,countries,Economy: Economic overview,Economy: Real GDP (purchasing power parity),Economy: Real GDP growth rate,Economy: Real GDP per capita,Economy: GDP (official exchange rate),Economy: Inflation rate (consumer prices),Economy: Credit ratings,"Economy: GDP - composition, by sector of origin","Economy: GDP - composition, by end use",...,Energy: Electricity,Energy: Electricity generation sources,Energy: Coal,Energy: Petroleum,Energy: Refined petroleum products - production,Energy: Refined petroleum products - exports,Energy: Refined petroleum products - imports,Energy: Natural gas,Energy: Carbon dioxide emissions,Energy: Energy consumption per capita
0,[nigeria],[Nigeria is Sub Saharan Africa’s largest econo...,"[$1,013,530,000,000 (2020 est.)$1,032,050,000,...",[0.8% (2017 est.)-1.6% (2016 est.)2.7% (2015 e...,"[$4,900 (2020 est.)$5,100 (2019 est.)$5,200 (2...",[$475.062 billion (2019 est.)],[11.3% (2019 est.)12.1% (2018 est.)16.5% (2017...,[Fitch rating: B (2020)Moody's rating: B2 (201...,[agriculture: 21.1% (2016 est.)industry: 22.5%...,[household consumption: 80% (2017 est.)governm...,...,[installed generating capacity: 11.691 million...,[fossil fuels: 78.1% of total installed capaci...,"[production: 44,000 metric tons (2020 est.)con...","[total petroleum production: 1,646,900 bbl/day...","[35,010 bbl/day (2017 est.)]","[2,332 bbl/day (2015 est.)]","[223,400 bbl/day (2015 est.)]","[production: 46,296,835,000 cubic meters (2019...",[104.494 million metric tonnes of CO2 (2019 es...,[8.466 million Btu/person (2019 est.)]
1,[egypt],[Occupying the northeast corner of the African...,"[$1,223,040,000,000 (2020 est.)$1,180,890,000,...",[4.2% (2017 est.)4.3% (2016 est.)4.4% (2015 es...,"[$12,000 (2020 est.)$11,800 (2019 est.)$11,400...",[$323.763 billion (2019 est.)],[9.3% (2019 est.)14.4% (2018 est.)29.6% (2017 ...,[Fitch rating: B+ (2019)Moody's rating: B2 (20...,[agriculture: 11.7% (2017 est.)industry: 34.3%...,[household consumption: 86.8% (2017 est.)gover...,...,[installed generating capacity: 59.826 million...,[fossil fuels: 88.7% of total installed capaci...,"[production: 262,000 metric tons (2020 est.)co...","[total petroleum production: 660,800 bbl/day (...","[547,500 bbl/day (2015 est.)]","[47,360 bbl/day (2015 est.)]","[280,200 bbl/day (2015 est.)]","[production: 64,292,955,000 cubic meters (2019...",[235.137 million metric tonnes of CO2 (2019 es...,[40.063 million Btu/person (2019 est.)]
2,[south-africa],[South Africa is a middle-income emerging mark...,[$680.04 billion (2020 est.)$730.91 billion (2...,[0.06% (2019 est.)0.7% (2018 est.)1.4% (2017 e...,"[$11,500 (2020 est.)$12,500 (2019 est.)$12,600...",[$350.032 billion (2019 est.)],[4.1% (2019 est.)4.6% (2018 est.)5.2% (2017 es...,[Fitch rating: BB- (2020)Moody's rating: Ba2 (...,[agriculture: 2.8% (2017 est.)industry: 29.7% ...,[household consumption: 59.4% (2017 est.)gover...,...,[installed generating capacity: 62.728 million...,[fossil fuels: 87.9% of total installed capaci...,[production: 248.388 million metric tons (2020...,"[total petroleum production: 97,900 bbl/day (2...","[487,100 bbl/day (2015 est.)]","[105,600 bbl/day (2015 est.)]","[195,200 bbl/day (2015 est.)]","[production: 1,229,544,000 cubic meters (2019 ...",[470.358 million metric tonnes of CO2 (2019 es...,[98.474 million Btu/person (2019 est.)]
3,[algeria],[Algeria's economy remains dominated by the st...,[$468.4 billion (2020 est.)$495.56 billion (20...,[1.4% (2017 est.)3.2% (2016 est.)3.7% (2015 es...,"[$10,700 (2020 est.)$11,500 (2019 est.)$11,600...",[$169.912 billion (2019 est.)],[1.9% (2019 est.)4.2% (2018 est.)5.6% (2017 es...,[note: The year refers to the year in which th...,[agriculture: 13.3% (2017 est.)industry: 39.3%...,[household consumption: 42.7% (2017 est.)gover...,...,[installed generating capacity: 21.694 million...,[fossil fuels: 98.9% of total installed capaci...,[production: 0 metric tons (2020 est.)consumpt...,"[total petroleum production: 1,414,800 bbl/day...","[627,900 bbl/day (2015 est.)]","[578,800 bbl/day (2015 est.)]","[82,930 bbl/day (2015 est.)]","[production: 87,853,976,000 cubic meters (2019...",[151.633 million metric tonnes of CO2 (2019 es...,[61.433 million Btu/person (2019 est.)]
4,[morocco],[Morocco has capitalized on its proximity to E...,[$259.42 billion (2020 est.)$279.3 billion (20...,[2.5% (2019 est.)2.96% (2018 est.)3.98% (2017 ...,"[$6,900 (2020 est.)$7,500 (2019 est.)$7,400 (2...",[$118.858 billion (2019 est.)],[0.2% (2019 est.)2% (2018 est.)0.7% (2017 est.)],[Fitch rating: BB+ (2020)Moody's rating: Ba1 (...,[agriculture: 14% (2017 est.)industry: 29.5% (...,[household consumption: 58% (2017 est.)governm...,...,[installed generating capacity: 14.187 million...,[fossil fuels: 81.6% of total installed capaci...,[production: 0 metric tons (2020 est.)consumpt...,[total petroleum production: 0 bbl/day (2021 e...,"[66,230 bbl/day (2017 est.)]","[9,504 bbl/day (2015 est.)]","[229,300 bbl/day (2015 est.)]",[production: 105.678 million cubic meters (201...,[60.2 million metric tonnes of CO2 (2019 est.)...,[24.59 million Btu/person (2019 est.)]


In [52]:
africaecvseg.to_csv('africaecvseg.csv')

In [53]:
import jovian

<IPython.core.display.Javascript object>

In [54]:
jovian.commit(filename="africafactbookegsipc.ipynb")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "andrewkamaukim/africafactbook" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/andrewkamaukim/africafactbook[0m


'https://jovian.ai/andrewkamaukim/africafactbook'

## Next Steps

For the next step, we will be trying to understand and visualize the data we have scraped!