## Web Scraping Wikipedia Page for the World's Billionaires 2023

### Import libraries to use

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
#the link to the wikipedia page to be scraped

url = 'https://en.wikipedia.org/wiki/The_World%27s_Billionaires'

In [3]:
#the site's HTML code is fetched into python script to work with

page = requests.get(url)

soup = BeautifulSoup(page.content, 'html')

The table of interest in the web page is the 3rd one, so we will specify it using index to retrieve that particular table

In [4]:
#retrieve the 3rd table from the web page

table = soup.find_all('table')[2]

In [5]:
#the header os the table is fetched

header = table.find_all('th')

In [6]:
#clean the header to a presentable format

headers_title = [headers.text.strip() for headers in header]

print(headers_title)

['No.', 'Name', 'Net worth (USD)', 'Age', 'Nationality', 'Primary source(s) of wealth']


In [7]:
#store the header in a dataframe as column

df = pd.DataFrame(columns=headers_title)
df

Unnamed: 0,No.,Name,Net worth (USD),Age,Nationality,Primary source(s) of wealth


In [8]:
#check for the rows in the table

column_data = table.find_all('tr')

In [9]:
all_row_data = []

for row in column_data[1:]:
    row_data = row.find_all('td')
    
    #check if the row data contains the expected number of elements
    if len(row_data) == len(headers_title):
        individual_row_data = [data.text.strip() for data in row_data]
        all_row_data.append(individual_row_data)
    else:
        print("Skipping a row due to mismatch in number of elements")

#create a DataFrame using the extracted row data and headers
df = pd.DataFrame(all_row_data, columns=headers_title)

#display the dataframe
print(df)

  No.                      Name Net worth (USD) Age    Nationality  \
0   1  Bernard Arnault & family    $211 billion  74         France   
1   2                 Elon Musk    $180 billion  51  United States   
2   3                Jeff Bezos    $114 billion  59  United States   
3   4             Larry Ellison    $107 billion  78  United States   
4   5            Warren Buffett    $106 billion  92  United States   
5   6                Bill Gates    $104 billion  67  United States   
6   7         Michael Bloomberg   $94.5 billion  81  United States   
7   8      Carlos Slim & family     $93 billion  83         Mexico   
8   9             Mukesh Ambani   $83.4 billion  65          India   
9  10             Steve Ballmer   $80.7 billion  67  United States   

          Primary source(s) of wealth  
0                                LVMH  
1                       Tesla, SpaceX  
2                              Amazon  
3                  Oracle Corporation  
4                  Berkshire 

In [10]:
#extracted data is saved as csv

df.to_csv(r"C:\Users\oladi\Desktop\Projects\web scraping\world billionaires.csv", index=False)

In [11]:
df_billionaires = pd.read_csv("world billionaires.csv", index_col=0, header=0)

In [12]:
#view the data

print("-----List of World's Billionaire 2023-----")
display(df_billionaires.head(10))

-----List of World's Billionaire 2023-----


Unnamed: 0_level_0,Name,Net worth (USD),Age,Nationality,Primary source(s) of wealth
No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Bernard Arnault & family,$211 billion,74,France,LVMH
2,Elon Musk,$180 billion,51,United States,"Tesla, SpaceX"
3,Jeff Bezos,$114 billion,59,United States,Amazon
4,Larry Ellison,$107 billion,78,United States,Oracle Corporation
5,Warren Buffett,$106 billion,92,United States,Berkshire Hathaway
6,Bill Gates,$104 billion,67,United States,Microsoft
7,Michael Bloomberg,$94.5 billion,81,United States,Bloomberg L.P.
8,Carlos Slim & family,$93 billion,83,Mexico,"Telmex, América Móvil, Grupo Carso"
9,Mukesh Ambani,$83.4 billion,65,India,Reliance Industries
10,Steve Ballmer,$80.7 billion,67,United States,Microsoft


This constantly evolving list shows the prominent figures on the billionaire list and their primary source of wealth, as well as their nationality. The fact that this list continues to change and grow highlights the ever-changing landscape of wealth creation and the diverse pathways individuals take to attain billionaire status.