# Olympics Data Analysis & Visualization Project

## Objective
To clean, structure, analyze, and visualize historical Olympic Games data using:

- Excel (Data Cleaning)
- DuckDB (SQL Database & Analysis)
- Python (Query Execution & Validation)
- Power BI (Data Visualization & Dashboarding)

## Project Workflow

1. Data Cleaning in Excel
2. Relational Database Creation using DuckDB
3. SQL-based Exploratory Data Analysis
4. Insight Generation
5. Interactive Dashboard Creation in Power BI

# Database Setup

A permanent DuckDB database file was created:

Database Name: olympics.duckdb

All cleaned CSV files were imported using:
read_csv_auto()

Reason for using DuckDB:
- Lightweight analytical database
- Fast CSV ingestion
- SQL support inside Python
- Suitable for large datasets

In [None]:
import pandas as pd 
import numpy as np 
import duckdb as db
connection = db.connect('olympics.duckdb')

connection.execute("""
CREATE TABLE city AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/city_clean.csv');

CREATE TABLE games AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/games_clean.csv');

CREATE TABLE sport AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/sport_clean.csv');

CREATE TABLE event AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/event_clean.csv');

CREATE TABLE person AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/person_clean.csv');

CREATE TABLE medal AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/medal_clean.csv');

CREATE TABLE competitor_event AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/competitor_event_clean.csv');

CREATE TABLE person_region AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/person_region_clean.csv');

CREATE TABLE games_competitor AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/games_competitor_clean.csv');

CREATE TABLE noc_region AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/noc_region_clean.csv');

CREATE TABLE games_city AS
SELECT * FROM read_csv_auto('C:/Hanish/sports-analysis/processed data/games_city_clean.csv');
""")

# Data Validation

After importing:

- Verified table creation using SHOW TABLES
- Checked row counts for large tables
- Tested relational joins between tables
- Confirmed foreign key relationships logically align

Database ready for analysis.

In [7]:
connection.execute("SHOW TABLES").fetchall()

[('city',),
 ('competitor_event',),
 ('event',),
 ('games',),
 ('games_city',),
 ('games_competitor',),
 ('medal',),
 ('noc_region',),
 ('person',),
 ('person_region',),
 ('sport',)]