Skip to content

Pawel-Srodek55/Data-enginerring-project-from-course-Python-Project-for-Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Project that I made for course "Python project for data engineering"

This project implements a simple ETL (Extract – Transform – Load) pipeline that collects data about the world’s largest banks from a Wikipedia archive, processes it, and saves it both as a CSV file and in an SQLite database.

The main goal is to automate the entire data workflow – from extraction, through transformation, to loading and querying.

Features

Extract

Scrapes data from the table on Wikipedia – List of largest banks (archived) image

Extracts each bank’s name and market capitalization (in billions of USD).

Transform

Reads exchange rates from exchange_rate.csv.

Converts market capitalization from USD to GBP, EUR, and INR.

Adds new columns to the DataFrame.

Load

Saves the transformed data into Largest_banks_data.csv.

Loads the data into an SQLite database (Banks.db) under the table name Largest_banks.

Query (SQL)

Displays all rows from the table.

Calculates the average market capitalization in GBP.

Retrieves the first 5 bank names. image

Logging

Every ETL step is logged in code_log.txt with timestamps. image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages