This project implements a simple ETL (Extract – Transform – Load) pipeline that collects data about the world’s largest banks from a Wikipedia archive, processes it, and saves it both as a CSV file and in an SQLite database.
The main goal is to automate the entire data workflow – from extraction, through transformation, to loading and querying.
Scrapes data from the table on Wikipedia – List of largest banks (archived)
Extracts each bank’s name and market capitalization (in billions of USD).
Reads exchange rates from exchange_rate.csv.
Converts market capitalization from USD to GBP, EUR, and INR.
Adds new columns to the DataFrame.
Saves the transformed data into Largest_banks_data.csv.
Loads the data into an SQLite database (Banks.db) under the table name Largest_banks.
Displays all rows from the table.
Calculates the average market capitalization in GBP.