# Web Scrapping with Beautiful Soup and Wptool API
## Introduction
This project is about extracting data from different website. The aim is to have the list of the best 100 universities in Nigeria. This list is extracted using Python package, [Beautiful Soup](https://pypi.org/project/beautifulsoup4/), from the [Webometrics Ranking table of 2020](https://www.theabusites.com/webometrics-ranking-2019/), found on the THEABUSITE website.

"The Webometrics Ranking of World Universities, also known as Ranking Web of Universities, is a ranking system for the world’s universities based on a composite indicator that takes into account both the volume of the Web content (number of web pages and files) and the visibility and impact of these web publications according to the number of external links (site citations) they received."

Additional data about each university, such as motto, year of establishment, chancellor, vice chancellor, students, e.t.c, will be extracted from Wikipedia using the API, [wptools](https://pypi.org/project/wptools/). 

## Importing Required Libraries/Package

In [2]:
import requests as r
import pandas as pd
from bs4 import BeautifulSoup
import os
import wptools as wp
import time
import json
import numpy as np

## Accessing the website using request library and saving the content to a file

In [2]:
# url of the website where the table of top 100 universities are.
url = 'https://www.theabusites.com/webometrics-ranking-2019/'
html = r.get(url)

# Creatiing a folder on the file directory to save the content of the url
folder = 'top_100_universities_in_ng'
if not os.path.exists(folder):
    os.makedirs(folder)
with open(os.path.join(folder, "./webometrics_ranking_2019.html"), mode='wb') as file:
    file.write(html.content)

## Accessing the saved content for extraction

In [3]:
with open("top_100_universities_in_ng/webometrics_ranking_2019.html", 'r') as file:
    soup = BeautifulSoup(file, 'lxml')
    table = soup.find('table')

In [4]:
rows = table.find_all('tr')
sub = rows[1:]
sub
top = []
for i in range(len(sub)):
    rank = sub[i].find_all('td')[0].text
    world_rank = sub[i].find_all('td')[1].text
    universities = sub[i].find_all('td')[2].text
    url = sub[i].find_all('a')[0]['href']
    presence_rank = sub[i].find_all('td')[4].text
    impact_rank = sub[i].find_all('td')[5].text
    openness_rank = sub[i].find_all('td')[6].text
    excellence_rank = sub[i].find_all('td')[7].text
    
    top.append({
        'ranking':int(rank),
        'world_rank':int(world_rank),
        'universities':universities,
        'website':url,
        'presence_rank':presence_rank,
        'impact_rank':impact_rank,
        'openness_rank':openness_rank,
        'excellence_rank':excellence_rank
    })

col=['ranking', 'world_rank', 'universities', 'website', 'presence_rank',
    'impact_rank', 'openness_rank', 'excellence_rank']
df = pd.DataFrame(top, columns=col)

In [5]:
df

Unnamed: 0,ranking,world_rank,universities,website,presence_rank,impact_rank,openness_rank,excellence_rank
0,1,1322,University of Ibadan,https://www.ui.edu.ng/,2113,2088,1057,1561
1,2,1742,Covenant University Ota,http://covenantuniversity.edu.ng/,1169,3884,1356,1797
2,3,1805,University of Nigeria,http://www.unn.edu.ng/,1311,3279,1038,2243
3,4,1984,University of Lagos,https://unilag.edu.ng/,161,4143,1521,2312
4,5,2053,Obafemi Awolowo University,http://oauife.edu.ng/,2916,4560,1616,2025
...,...,...,...,...,...,...,...,...
95,96,14054,Yobe State University (Bukar Abba Ibrahim Univ...,https://www.ysu.edu.ng/,17040,15459,6490,6084
96,97,14091,Taraba State University Jalingo,http://www.tsuniversity.edu.ng/,18418,14883,6690,6084
97,98,14110,Ondo State University of Science & Technology ...,http://www.osustech.edu.ng/,23528,12868,7168,6084
98,99,14347,Anchor University Lagos,https://aul.edu.ng/,9304,17841,5779,6084


In [16]:
top_100_urls = [
    'https://en.wikipedia.org/wiki/University_of_Ibadan',
    'https://en.wikipedia.org/wiki/Covenant_University',
    'https://en.wikipedia.org/wiki/University_of_Nigeria',
    'https://en.wikipedia.org/wiki/University_of_Lagos',
    'https://en.wikipedia.org/wiki/Obafemi_Awolowo_University',
    'https://en.wikipedia.org/wiki/Ahmadu_Bello_University',
    'https://en.wikipedia.org/wiki/University_of_Ilorin',
    'https://en.wikipedia.org/wiki/Federal_University_of_Technology_Akure',
    'https://en.wikipedia.org/wiki/University_of_Port_Harcourt',
    'https://en.wikipedia.org/wiki/Adekunle_Ajasin_University',
    'https://en.wikipedia.org/wiki/University_of_Benin_(Nigeria)',
    'https://en.wikipedia.org/wiki/Federal_University_of_Technology,_Minna',
    'https://en.wikipedia.org/wiki/Ladoke_Akintola_University_of_Technology',
    'https://en.wikipedia.org/wiki/Rivers_State_University',
    'https://en.wikipedia.org/wiki/University_of_Calabar',
    'https://en.wikipedia.org/wiki/Bayero_University_Kano',
    'https://en.wikipedia.org/wiki/Lagos_State_University',
    'https://en.wikipedia.org/wiki/University_of_Jos',
    'https://en.wikipedia.org/wiki/Federal_University_of_Technology_Owerri',
    'https://en.wikipedia.org/wiki/University_of_Uyo',
    'https://en.wikipedia.org/wiki/Nnamdi_Azikiwe_University',
    'https://en.wikipedia.org/wiki/Olabisi_Onabanjo_University',
    'https://en.wikipedia.org/wiki/Federal_University_of_Agriculture,_Abeokuta',
    'https://en.wikipedia.org/wiki/University_of_Abuja',
    'https://en.wikipedia.org/wiki/University_of_Maiduguri',
    'https://en.wikipedia.org/wiki/Usmanu_Danfodiyo_University',
    'https://en.wikipedia.org/wiki/Abubakar_Tafawa_Balewa_University',
    'https://en.wikipedia.org/wiki/Ebonyi_State_University',
    'https://en.wikipedia.org/wiki/Federal_University_of_Petroleum_Resources_Effurun',
    'https://en.wikipedia.org/wiki/Benue_State_University',
    'https://en.wikipedia.org/wiki/American_University_of_Nigeria',
    'https://en.wikipedia.org/wiki/Federal_University_Oye_Ekiti',
    'https://en.wikipedia.org/wiki/University_of_Agriculture,_Makurdihttps://en.wikipedia.org/wiki/Niger_Delta_University',
    'https://en.wikipedia.org/wiki/African_University_of_Science_and_Technology',
    'https://en.wikipedia.org/wiki/Skyline_University',
    'https://en.wikipedia.org/wiki/Landmark_University',
    'https://en.wikipedia.org/wiki/Delta_State_University,_Abraka',
    'https://en.wikipedia.org/wiki/Ekiti_State_University',
    'https://en.wikipedia.org/wiki/Babcock_University',
    'https://en.wikipedia.org/wiki/Michael_Okpara_University_of_Agriculture',
    'https://en.wikipedia.org/wiki/Alex_Ekwueme_Federal_University_Ndufu_Alike_Ikwo',
    'https://en.wikipedia.org/wiki/Osun_State_University',
    'https://en.wikipedia.org/wiki/Cross_River_University_of_Technology',
    'https://en.wikipedia.org/wiki/Redeemer%27s_University_Nigeria',
    'https://en.wikipedia.org/wiki/Kwara_State_University',
    'https://en.wikipedia.org/wiki/Michael_Okpara_University_of_Agriculture',
    'https://en.wikipedia.org/wiki/Abia_State_University_Uturu',
    'https://en.wikipedia.org/wiki/Federal_University,_Dutsin-Ma',
    'https://en.wikipedia.org/wiki/Edo_University,_Iyamho',
    'https://en.wikipedia.org/wiki/Umaru_Musa_Yar%27adua_University',
    'https://en.wikipedia.org/wiki/Nigerian_Defence_Academy',
    'https://en.wikipedia.org/wiki/Imo_State_University',
    'https://en.wikipedia.org/wiki/Enugu_State_University_of_Science_and_Technology',
    '',
    'https://en.wikipedia.org/wiki/Joseph_Ayo_Babalola_University',
    'https://en.wikipedia.org/wiki/Federal_University_Dutse',
    'https://en.wikipedia.org/wiki/Akwa_Ibom_State_University',
    'https://en.wikipedia.org/wiki/Kaduna_State_University',
    'https://en.wikipedia.org/wiki/Federal_University,_Otuoke',
    'https://en.wikipedia.org/wiki/Lagos_Business_School',
    'https://en.wikipedia.org/wiki/Modibbo_Adama_Federal_University_of_Technology,_Yola',
    'https://en.wikipedia.org/wiki/Godfrey_Okoye_University',
    'https://en.wikipedia.org/wiki/Tai_Solarin_University_of_Education',
    'https://en.wikipedia.org/wiki/Chukwuemeka_Odumegwu_Ojukwu_University',
    '',
    'https://en.wikipedia.org/wiki/Igbinedion_University',
    'https://en.wikipedia.org/wiki/Auchi_Polytechnic',
    'https://en.wikipedia.org/wiki/Federal_University,_Lokoja',
    'https://en.wikipedia.org/wiki/Ibrahim_Badamasi_Babangida_University',
    'https://en.wikipedia.org/wiki/Ambrose_Alli_University',
    'https://en.wikipedia.org/wiki/Elizade_University',
    'https://en.wikipedia.org/wiki/Kogi_State_University',
    'https://en.wikipedia.org/wiki/National_Open_University_of_Nigeria',
    'https://en.wikipedia.org/wiki/Yaba_College_of_Technology',
    'https://en.wikipedia.org/wiki/Baze_University',
    'https://en.wikipedia.org/wiki/Nile_University_of_Nigeria',
    'https://en.wikipedia.org/wiki/University_of_Medical_Sciences,_Ondo',
    'https://en.wikipedia.org/wiki/Nasarawa_State_University',
    'https://en.wikipedia.org/wiki/Federal_Polytechnic,_Ilaro',
    'https://en.wikipedia.org/wiki/Pan-Atlantic_University',
    'https://en.wikipedia.org/wiki/Ajayi_Crowther_University',
    'https://en.wikipedia.org/wiki/Adeleke_University',
    'https://en.wikipedia.org/wiki/Federal_University,_Wukari',
    'https://en.wikipedia.org/wiki/Lead_City_University',
    'https://en.wikipedia.org/wiki/Federal_University_of_Lafia',
    'https://en.wikipedia.org/wiki/Benson_Idahosa_University',
    'https://en.wikipedia.org/wiki/Al-Hikmah_University',
    'https://en.wikipedia.org/wiki/Bauchi_State_University',
    'https://en.wikipedia.org/wiki/Kebbi_State_University_of_Science_and_Technology',
    'https://en.wikipedia.org/wiki/Bells_University_of_Technology',
    'https://en.wikipedia.org/wiki/Kano_State_University_of_Technology',
    'https://en.wikipedia.org/wiki/Bingham_University',
    'https://en.wikipedia.org/wiki/Lagos_State_University_of_Science_and_Technology',
    'https://en.wikipedia.org/wiki/Yobe_State_University',
    'https://en.wikipedia.org/wiki/Taraba_State_University',
    'https://en.wikipedia.org/wiki/Ondo_State_University_of_Science_and_Technology',
    'https://en.wikipedia.org/wiki/Anchor_University',
    'https://en.wikipedia.org/wiki/Federal_University,_Birnin_Kebbi'
]

In [None]:
start = time.time()
with open('top_100_universities_in_ng/wikipedia_info.txt', 'w') as opened_file:
    for url in top_100_urls:
        page = wp.page(url.split('/')[-1], silent=True)
        pg = page.get()
        all_info = pg.data['infobox']

        json.dump(all_info, opened_file)
        opened_file.write('\n')

print("%s seconds"%(time.time() - start))

In [6]:
info = []
with open('top_100_universities_in_ng/wikipedia_info.txt', 'r') as f:
    for line in f:
        try:
            each_uni = json.loads(line)
            name = each_uni.setdefault('name', np.NaN)
            motto = each_uni.setdefault('motto', np.NaN)
            estab = each_uni.setdefault('established', np.NaN)
            typ1 = each_uni.setdefault('type', np.NaN)
            typ = each_uni.setdefault('type', np.NaN)
            chanc = each_uni.setdefault('chancellor', np.NaN)
            vice_chanc = each_uni.setdefault('vice_chancellor', np.NaN)
            stu = each_uni.setdefault('students', np.NaN)
            undrgrd = each_uni.setdefault('undergrad', np.NaN)
            pstgrd = each_uni.setdefault('postgrad', np.NaN)
            acad_staff = each_uni.setdefault('academic_staff', np.NaN)
            adminstratv_staff = each_uni.setdefault('administrative_staff', np.NaN)
            city = each_uni.setdefault('city', np.NaN)
            state = each_uni.setdefault('state', np.NaN)
            camp = each_uni.setdefault('campus', np.NaN)

            info.append({
                'name': name,
                'motto': motto,
                'established': estab,
                'type': typ,
                'chancellor': chanc,
                'vice_chancellor': vice_chanc,
                'students': stu,
                'undergraduates': undrgrd,
                'postgraduates': pstgrd,
                'academic_staff': acad_staff,
                'administrative_staff': adminstratv_staff,
                'city': city,
                'state': state,
                'campus': camp
            })
        except KeyError as e:
            print(e)
    
cols = ['name', 'motto', 'established', 'type', 'chancellor',
       'vice_chancellor', 'students', 'undergraduates', 'postgraduates',
       'academic_staff', 'administrative_staff', 'city', 'state', 'campus']

In [7]:
pd.DataFrame(info, columns = cols)

Unnamed: 0,name,motto,established,type,chancellor,vice_chancellor,students,undergraduates,postgraduates,academic_staff,administrative_staff,city,state,campus
0,University of Ibadan,"""''Recte Sapere Fons''"" (To think straight is ...",{{start date and age|1948}},[[public university|Public]],"[[Sa'adu Abubakar|Saad Abubakar]], [[Sultan of...",Professor [[Kayode Adebowale ]],41743,,,,,[[Ibadan]],[[Oyo State|Oyo]],
1,Covenant University,''Raising a New Generation of Leaders'',21 October 2002,Private,[[David Oyedepo]],[[Abiodun H. Adebayo]],,,,,,"[[Ota, Ogun State]]",,Urban
2,University of Nigeria,''To Restore the Dignity of Man'',1955,[[public university|Public]],,[[Charles Igwe Arizechukwu]],36000,,,,,[[Nsukka]],[[Enugu state|Enugu]],Rural<br /> {{convert|871|ha|acre}} (Nsukka ca...
3,University of Lagos,In deed and in truth,1962,[[Public university|Public]] [[research univer...,Alhaji (Dr.) Abubakar IBN Umar Garbai El-Kanem...,[[Oluwatoyin Ogundipe|Prof. Oluwatoyin Ogundipe]],"55,000 (2017)","43,784 (2017)","9,070 (2017)","1,736 (2017)",552 (2017),[[Lagos]],,Urban
4,Obafemi Awolowo University,For Learning and Culture,1961,[[public university|Public]],Etsu [[Yahaya Abubakar]],[[Adebayo Simeon Bamire]],"about 35,000",13000,7500,,,[[Ile-Ife]],[[Osun State|Osun]],Urban {{convert|2020|ha|acre}}
5,Ahmadu Bello University,,4 October 1962,"[[Public University|Public]], [[Research unive...",''Igwe'' [[Nnaemeka Alfred Ugochukwu Achebe|Nn...,Professor [[Kabir Bala]],,,,,,[[Zaria]],[[Kaduna State]],[[urban area|Urban]]
6,University of Ilorin,''Probitas Doctrina: Better by far.'',1975,Public,[[Abdulmumini Kabir Usman]],[[Sulyman Age Abdulkareem]],50000,,,,,[[Ilorin]],[[Kwara State]],Urban
7,,''Technology for Self Reliance'',1981,[[Public university|Public]],"Alhaji (Dr.) Umar Faruk II, the Emir of Katagum",[[Adenike Oladiji]],15000,13000,2000,,300,"[[Akure, Ondo State]]",,"Obanla, Obakekere and Centre for Entrepreneurs..."
8,University of Port Harcourt,For Enlightenment and Self-Reliance,1975,Public,,[[Prof. Owunari Georgewill]],"35,000-39,999",,,,,[[Port Harcourt]],[[Rivers State]],[[urban area|Urban]]
9,Adekunle Ajasin University,For Learning and Service,December 1999,Public,,Olugbenga E. Ige,"over 20,000",,,,,[[Akungba Akoko|Akungba-Akoko]],[[Ondo State]],


In [10]:
cha = {'[[': "", ']]': ""}
for k, v in cha.items():
    typ1 = typ1.replace(k, v)
    
typ1

'Public university|Public research university'

In [11]:
cha = {'[[': "", ']]': ""}
for k, v in cha.items():
    name = name.replace(k, v)
    
name

'University of Lagos'

In [12]:
info = pg.data['infobox']
info

{'name': 'University of Lagos',
 'image': 'UniLagos.jpg',
 'motto': 'In deed and in truth',
 'established': '1962',
 'type': '[[Public university|Public]] [[research university]]',
 'chancellor': 'Alhaji (Dr.) Abubakar IBN Umar Garbai El-Kanemi, The Shehu of Borno',
 'vice_chancellor': '[[Oluwatoyin Ogundipe|Prof. Oluwatoyin Ogundipe]]',
 'academic_staff': '1,736 (2017)',
 'administrative_staff': '552 (2017)',
 'students': '55,000 (2017)',
 'undergrad': '43,784 (2017)',
 'postgrad': '9,070 (2017)',
 'city': '[[Lagos]]',
 'country': 'Nigeria',
 'coor': '{{Coord|6|31|0|N|3|23|10|E|type:edu|display|=|inline,title}}',
 'campus': 'Urban',
 'website': '{{url|https://unilag.edu.ng}}',
 'colors': 'Gold and maroon<br /> {{color box|Gold}} {{color box|Maroon}}'}

In [None]:
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