Skip to content

Data engineering project involving creating a database, its tables and uploading data into different tables from CSV files and all being done using Python

Notifications You must be signed in to change notification settings

JulienAganze/World-Cup-Data-Engineering-Project-Postgresql

Repository files navigation

World Cup Data Engineering Project

Aim of the project

The aim of this project is to explore the possibility of the interaction between postgresql and python

Description of the used data

The data used in this project eas obtained from Kaggle. And it is made of three CSV files. the file containing data related to world cup attendance from 1930 up to 2022. Here we are going to describe each file with pictures showing its respective columns individually\

  1. Attendance: having general information related to each world cup(Host country, total attendance, etc.. image

  2. Awards: containing information for every and each spefic awards winners at each world cup image

  3. Finals: contains each and every finall information(winner, score, etc image

General Project Overview

Used libraries

The main python library used in this case is psycopg2, which is the most popular PostgreSQL database adapter for the Python programming language. We will also be using pandas, will be usefull as some of its functions and methods will be needed for this project.

Project overview

Getting the dataset from Kaggle

The first task consists of geting or downloading the three csv files forming our dataset from Kaggle, and for this project we included them in our working directory

Creating the database in postgresql with the tables

Here we used the famous python library called psycopg2, which allows us to interect with our postgresql in python. So for this a database called worldcup was created with three tables called attendance, awards and finals. And all was achieved after ensuring a coonection to our database was available

Inserting values in the thre different tables

Here we inserted all values present in our csv files into the three tables created in postgresql.(At this stage we faced a problem related to the last table or the table called "final", but fortunatelly we just noticed that it wa a simple syntax error:))))))
All the detailed code and explanation can be found here

About

Data engineering project involving creating a database, its tables and uploading data into different tables from CSV files and all being done using Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published