This Project explores a historical Olympics dataset using SQL to uncover trends in global sporting performance.
- π οΈ Problem Statement
- π Dataset
- π§ Tools & Skills
- π Case Study Questions
- π Solutions
The Olympics Games, held every four years, showcase the pinnacle of International athletic performance and sportmanship. With decades of historical data available, Analysing Olympic performance trends provides valuable insights into global sporting dynamics and country-wise achievements.
The goal of this project is to analyse a comprehensive Olympics Dataset using SQL to answer Key Questions.
By Using SQL to extract, filter, group and aggregate data, this project aims to provide clear, data-driven insights into Olympic history, highlight dominant nations and athletes and identify trends that may inform fuy=uture analyses or predictions.
The analyses is based on the Olympics History Dataset available on Kaggle
- SQL (for querying and analysis)
- Data Cleaning
- Aggregation & Joins
- CTEs and Subqueries
- Window Functions
Each of the following case study questions can be answered using a single SQL statement:
View List of Questions
- How many olympics games have been held?
- List down all Olympics games held so far
- Mention the total no of nations who participated in each olympics game?
- Which year saw the highest and lowest no of countries participating in olympics?
- Which nation has participated in all of the olympic games?
- Identify the sport which was played in all summer olympics
- Which Sports were just played only once in the olympics?
- Fetch the total no of sports played in each olympic games
- Fetch details of the oldest athletes to win a gold medal.
- Find the Ratio of male and female athletes participated in all olympic games.
- Fetch the top 5 athletes who have won the most gold medals.
- Fetch the top 5 athletes who have won the most medals (gold/silver/bronze).
- Fetch the top 5 most successful countries in olympics. Success is defined by no of medals won.
- List down total gold, silver and bronze medals won by each country
- List down total gold, silver and bronze medals won by each country corresponding to each olympic games.
- Which countries have never won gold medal but have won silver/bronze medals?
View Solution
- How many olympics games have been held
SELECT COUNT(DISTINCT [Games]) AS TotalGamesHeld
FROM [dbo].[athlete_events$]
SELECT [Games],count([Games])AS GameFrequency
FROM [dbo].[athlete_events$]
GROUP BY [Games]- List down all Olympics games held so far?
SELECT DISTINCT [Games]
FROM [dbo].[athlete_events$]
ORDER BY [Games] ASC- Total no of nations who participated in each olympics game
SELECT [Games], COUNT (DISTINCT [Team]) AS Nations
FROM [dbo].[athlete_events$]
GROUP BY [Games]
ORDER BY Nations DESC
--OR--
SELECT a.[Games], COUNT (DISTINCT b.[region]) AS Nations
FROM [dbo].[athlete_events$] a
JOIN [dbo].[noc_regions$] b
ON a.[NOC] =b.[NOC]
GROUP BY [Games]
ORDER BY Nations DESC- Year with highest and lowest no. of countries participating in olympics
WITH CountryCount AS (
SELECT [Year], COUNT(DISTINCT[Team]) AS Country_Count
FROM [dbo].[athlete_events$]
GROUP BY [Year]
)
SELECT [Year], Country_Count,
CASE
WHEN Country_Count =(SELECT MAX(Country_Count) FROM CountryCount)
THEN 'Highest'
WHEN Country_Count =(SELECT MIN(Country_Count) FROM CountryCount)
THEN 'Lowest'
END AS CATEGORY
FROM CountryCount
WHERE Country_Count =(SELECT MAX(Country_Count) FROM CountryCount)
OR
Country_Count =(SELECT MIN(Country_Count) FROM CountryCount)- Which nation has participated in all of the olympic games?
SELECT [Team]
FROM [dbo].[athlete_events$]
GROUP BY [Team]
HAVING COUNT (DISTINCT [Games]) =
(SELECT COUNT (DISTINCT [Games]) FROM [dbo].[athlete_events$])
--IF NO NATION PARTICIPATED--
IF EXISTS (
SELECT [Team]
FROM [dbo].[athlete_events$]
GROUP BY [Team]
HAVING COUNT (DISTINCT [Games]) =
(SELECT COUNT (DISTINCT [Games]) FROM [dbo].[athlete_events$])
)
BEGIN
SELECT [Team]
FROM [dbo].[athlete_events$]
GROUP BY [Team]
HAVING COUNT(DISTINCT [Games])=(SELECT COUNT(DISTINCT [Games]) FROM [dbo].[athlete_events$])
END
ELSE
BEGIN
SELECT 'NO NATION PARTICIPATED IN ALL GAMES' AS Result
END- The sport which was played in all summer olympics
SELECT distinct([Sport])
FROM [dbo].[athlete_events$]
WHERE [Season] = 'Summer'
GROUP BY [Sport]
HAVING COUNT(DISTINCT Games) = (SELECT COUNT(DISTINCT Games)
FROM [dbo].[athlete_events$])
--IF NO SPORT WAS PLAYED IN ALL SUMMER OLYMPICS--
IF EXISTS (
SELECT DISTINCT ([Sport])
FROM [dbo].[athlete_events$]
WHERE [Season] = 'Summer'
GROUP BY [Sport]
HAVING COUNT(DISTINCT Games) = (SELECT COUNT(DISTINCT Games)
FROM [dbo].[athlete_events$]))
BEGIN
SELECT [Sport]
FROM [dbo].[athlete_events$]
GROUP BY [Sport]
HAVING COUNT(DISTINCT [Games])=(SELECT COUNT(DISTINCT [Games]) FROM [dbo].[athlete_events$])
END
ELSE
BEGIN
SELECT 'NO SPORT WAS PLAYED IN ALL SUMMER OLYMPICS' AS Result
END- Which Sports were just played only once in the olympics?
SELECT [Sport]
FROM [dbo].[athlete_events$]
GROUP BY [Sport]
HAVING COUNT(DISTINCT[Year])=1- Total no of sports played in each olympic games
SELECT DISTINCT [Games], COUNT(DISTINCT[Sport]) AS TotalSportPlayed
FROM [dbo].[athlete_events$]
GROUP BY [Games]
ORDER BY TotalSportPlayed DESC- Details of the oldest athletes to win a gold medal
SELECT *
FROM [dbo].[athlete_events$]
WHERE [Age] = (SELECT MAX([Age]) FROM [dbo].[athlete_events$] where [Medal]= 'Gold')
--or--
SELECT TOP 1 *
FROM [dbo].[athlete_events$]
WHERE [Medal]= 'Gold'
ORDER BY [Age] DESC- Ratio of male and female athletes participated in all olympic games
SELECT [Sex], count([Sex]) as Gender_Ratio
FROM [dbo].[athlete_events$]
GROUP BY [Sex]- Top 5 athletes who have won the most gold medals
SELECT TOP 5 [Name],COUNT([Medal]) AS medalWon
FROM [dbo].[athlete_events$]
WHERE [Medal] ='Gold'
GROUP BY [Name]
order by medalWon DESC- Top 5 athletes who have won the most medals (gold/silver/bronze)
SELECT TOP 5 [Name],[Medal],COUNT([Medal]) AS medalWon
FROM [dbo].[athlete_events$]
WHERE [Medal] IN ('Gold', 'Silver', 'Bronze')
GROUP BY [Name],[Medal]
ORDER BY medalWon DESC- Top 5 most successful countries in olympics. Success is defined by no of medals won
SELECT TOP 5 [Team], [Medal], COUNT([Medal]) AS MedalCount
FROM [dbo].[athlete_events$]
WHERE [Medal] <> 'NA'
GROUP BY [Team],[Medal]
ORDER BY MedalCount DESC- Total gold, silver and bronze medals won by each country
SELECT [Team],
SUM(CASE WHEN [Medal]='Gold'
THEN 1 ELSE 0 END) AS GOLD,
SUM(CASE WHEN [Medal]='Silver'
THEN 1 ELSE 0 END) AS SILVER,
SUM(CASE WHEN [Medal]='Bronze'
THEN 1 ELSE 0 END) AS BRONZE
FROM [dbo].[athlete_events$]
GROUP BY [Team]
ORDER BY GOLD DESC, SILVER DESC, BRONZE DESC- Total gold, silver and broze medals won by each country corresponding to each olympic games
SELECT DISTINCT([Games]),[Team] AS Country,
SUM(CASE WHEN [Medal]='Gold'
THEN 1 ELSE 0 END) AS GOLD,
SUM(CASE WHEN [Medal]='Silver'
THEN 1 ELSE 0 END) AS SILVER,
SUM(CASE WHEN [Medal]='Bronze'
THEN 1 ELSE 0 END) AS BRONZE
FROM [dbo].[athlete_events$]
GROUP BY [Team],[Games]
ORDER BY GOLD DESC, SILVER DESC, BRONZE DESC- Which countries have never won gold medal but have won silver/bronze medals
SELECT [Team] AS Country
FROM [dbo].[athlete_events$]
GROUP BY [Team]
HAVING
SUM(CASE WHEN [Medal] = 'Gold'
THEN 1 ELSE 0 END) =0
AND SUM(CASE WHEN [Medal] = 'Silver'
THEN 1 ELSE 0 END) >0
AND SUM(CASE WHEN [Medal] = 'Bronze'
THEN 1 ELSE 0 END) >0- π οΈ Problem Statement
- π Dataset
- π Case Study Questions
- π Solutions
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money theyβve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.
Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!
We were provided with three datasets:
- Sales
View Table
| customer_id | order_date | product_id |
|---|---|---|
| A | 2021-01-01 | 1 |
| A | 2021-01-01 | 2 |
| A | 2021-01-07 | 2 |
| A | 2021-01-10 | 3 |
| A | 2021-01-11 | 3 |
| A | 2021-01-11 | 3 |
| B | 2021-01-01 | 2 |
| B | 2021-01-02 | 2 |
| B | 2021-01-04 | 1 |
| B | 2021-01-11 | 1 |
| B | 2021-01-16 | 3 |
| B | 2021-02-01 | 3 |
| C | 2021-01-01 | 1 |
| C | 2021-01-01 | 3 |
| C | 2021-02-07 | 3 |
- Menu
View Table
| product_id | product_name | price |
|---|---|---|
| 1 | sushi | 10 |
| 2 | curry | 15 |
| 3 | ramen | 12 |
- Members
View Table
| customer_id | join_date |
|---|---|
| A | 2021-01-07 |
| B | 2021-01-09 |
Each of the following case study questions can be answered using a single SQL statement:
View List of Questions
- What is the total amount each customer spent at the restaurant?
- How many days has each customer visited the restaurant?
- What was the first item from the menu purchased by each customer?
- What is the most purchased item on the menu and how many times was it purchased by all customers?
- Which item was the most popular for each customer?
- Which item was purchased first by the customer after they became a member?
- Which item was purchased just before the customer became a member?
- What is the total items and amount spent for each member before they became a member?
- If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
- In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
- Use the available data to create a comprehensive data using the Join function.
- Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program.
View Solution
- What is the total amount each customer spent at the restaurant?
SELECT S.[Customer ID], Sum([Price]) AS TotalAmountSpent
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] as M
ON S.[Product ID] = M.[Product ID]
GROUP BY [Customer ID] - How many days has each customer visited the restaurant?
SELECT [Customer ID], COUNT(DISTINCT[Order Date]) as Days_Visited
FROM [dbo].[SALES]
GROUP BY [Customer ID]
ORDER BY Days_Visited DESC- What was the first item from the menu purchased by each customer?
SELECT [Order Date] ,[Customer ID],[Product Name]
FROM (SELECT S.[Customer ID],S.[Order Date], MN.[Product Name],
DENSE_RANK()
OVER( PARTITION BY S.[Customer ID]
ORDER BY S.[Order Date]) AS menu_rank
FROM [dbo].[SALES] AS S
INNER JOIN [dbo].[MENU] AS MN
ON S.[Product ID] =MN.[Product ID]
GROUP BY [Customer ID],[Product Name],[Order Date]) t
WHERE menu_rank = 1; - What is the most purchased item on the menu and how many times was it purchased by all customers?
SELECT MN.[Product Name], COUNT(S.[Product ID]) AS CountOfPurchase
FROM [dbo].[MENU] AS MN
LEFT JOIN [dbo].[SALES] AS S
ON MN.[Product ID] = S.[Product ID]
GROUP BY [Product Name]
ORDER BY CountOfPurchase DESC - Which item was the most popular for each customer?
WITH NEW AS (
SELECT S.[Customer ID],MN.[Product Name], COUNT(S.[Product ID]) AS CountOfOrders,
RANK ()
OVER( PARTITION BY S.[Customer ID]
ORDER BY COUNT(S.[Product ID]) DESC) AS RankOfOrders
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] AS MN
ON S.[Product ID]= MN.[Product ID]
GROUP BY [Customer ID],[Product Name])
SELECT [Customer ID],[Product Name],CountOfOrders,RankOfOrders
FROM NEW
WHERE RankOfOrders = 1- Which item was purchased first by the customer after they became a member?
SELECT NEW.[Customer ID],NEW.[Product Name]
FROM (SELECT S.[Customer ID],S.[Order Date],MN.[Product Name]
FROM [dbo].[SALES]AS S
JOIN [dbo].[MENU]AS MN
ON S.[Product ID] = MN.[Product ID]) NEW
JOIN [dbo].[MEMBERS] AS MB
ON NEW.[Customer ID] = MB.[Customer ID]
WHERE NEW.[Order Date] >= MB.[Join Date]- Which item was purchased just before the customer became a member?
WITH NM AS (
SELECT S.[Customer ID],MN.[Product Name],
DENSE_RANK()
OVER( PARTITION BY S.[Customer ID]
ORDER BY S.[Order Date]) AS RankOfOrder
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] AS MN
ON S.[Product ID] = MN.[Product ID]
JOIN [dbo].[MEMBERS] AS MB
ON S.[Customer ID] = MB.[Customer ID]
WHERE S.[Order Date] <MB.[Join Date])
SELECT [Customer ID],[Product Name],RankOfOrder
FROM NM
WHERE RankOfOrder = 1- What is the total items and amount spent for each member before they became a member?
SELECT S.[Customer ID], COUNT(S.[Product ID]) AS ProductCount,SUM(MN.[Price]) AS TotalAmount
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] AS MN
ON S.[Product ID] = MN.[Product ID]
JOIN [dbo].[MEMBERS] AS MB
ON S.[Customer ID] = MB.[Customer ID]
WHERE S.[Order Date] < MB.[Join Date]
GROUP BY S.[Customer ID]
ORDER BY S.[Customer ID] ASC- If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
SELECT S.[Customer ID],MN.[Product Name],
SUM(
CASE
WHEN MN.[Product Name] = 'SUSHI'
THEN MN.[Price] *20
ELSE MN.[Price] * 10
END)
AS ProductPoints
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] AS MN
ON S.[Product ID] = MN.[Product ID]
GROUP BY S.[Customer ID], MN.[Product Name]- In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
WITH dates
AS (SELECT *,
Date_add(join_date, interval 7 day) AS valid_date,
Last_day(join_date) AS last_date
FROM members)
SELECT s.customer_id,
SUM(CASE
WHEN s.order_date BETWEEN d.join_date AND d.valid_date THEN
m.price * 20
END) AS total_points
FROM dates d
join sales s
ON s.customer_id = d.customer_id
join menu m
ON m.product_id = s.product_id
WHERE s.order_date <= d.last_date
GROUP BY s.customer_id; - Use the available data to create a comprehensive data using the Join function.
SELECT S.[Customer ID],S.[Order Date],MN.[Product Name],MN.[Price]
(CASE
WHEN S.[Order Date] >= MB.[Join Date] THEN 'JOINED'
ELSE 'NOT JOINED'
END)
AS JoinStatus
FROM [dbo].[SALES] AS S
JOIN [dbo].[MENU] AS MN
ON S.[Product ID] = MN.[Product ID]
JOIN [dbo].[MEMBERS] AS MB
ON S.[Customer ID] = MB.[Customer ID]- Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program.
WITH new_table
AS (SELECT s.customer_id,
s.order_date,
m.product_name,
m.price,
( CASE
WHEN s.order_date >= mb.join_date THEN 'Y'
ELSE 'N'
END ) AS member
FROM sales s
LEFT JOIN menu m
ON s.product_id = m.product_id
LEFT JOIN members mb
ON mb.customer_id = s.customer_id)
SELECT *,
( CASE
WHEN member = "n" THEN "null"
ELSE Rank()
OVER (
partition BY customer_id, member
ORDER BY order_date)
END ) AS ranking
FROM new_table; 
