While studying SQL, I forget most of the concepts learned once I looked away from them. This project was my chance to practice what I have learned and to make sure it sticks.
Despite the fact that I enjoyed this project and learned a lot, I found it frustrating in the beginning because I had to go to Google a lot and had no idea how to use StackOverflow.
When I started the analysis, I knew nothing about football (zilch) or how world cup tournaments were held. But @the1stt (avid football fan) helped me understand the rules of football and how teams qualified for the World Cup
This dataset contains all World Cup matches and players from 1930-2014. I wanted to draw insights from this data, and thankfully, Maven Analytics suggested questions that would allow me to do that.
Here is a step-by-step guide for installing the Microsoft SQL Server Management Studio and importing your first dataset.
The World Cup Dataset contains three tables namely
- [World Cup Matches] : It consists of 852 rows and 20 columns
- [World Cup Players] : It consists of 37,784 rows and 9 columns
- [World Cup Preview] : It consists of 20 rows and 10 columns
World Cup Matches
All columns in this table have a data type of Varchar(50) , meaning that they can each store values up to 50 characters long.
- Year
- Datetime
- Stage
- Stadium
- City
- Home Team Name
- Home Team Goals
- Away Team Goals
- Away Team Name
- Win conditions
- Attendance
- Half-time Home Goals
- Half-time Away Goals
- Referee
- Assistant 1
- Assistant 2
- RoundID
- MatchID
- Home Team Initials
- Away Team Initials
- 2nd Half Home Goals
- 2nd Half Away Goals
World Cup Players
All columns in this table have a data type of Varchar(50) , meaning that they can each store values up to 50 characters long.
- RoundID
- MatchID
- Team Initials
- Coach Name
- Line-up
- Shirt Number
- Player Name
- Position
- Event
World Cup Preview
All columns in this table have a data type of Varchar(50) , meaning that they can each store values up to 50 characters long.
- Year
- Country
- Winner
- Runners-Up
- Third
- Fourth
- GoalsScored
- QualifiedTeams
- MatchesPlayed
- Attendance
- Subqueries
- Unions
- Windows Functions
- Aggregate Functions
- Converting Data Types
- Common Table Expressions (CTE)
- Conditional filters(AND, OR)
This dataset explored many questions namely:
-
How has attendance trended over time? (both average per game and total per year)
-
Do certain cities tend to draw bigger crowds?
-
Do certain teams see larger attendance?
-
Which teams have won the most games? How has number of wins by country trended over time?
-
Based on the Home Team and Away Team columns, does there seem to be a "home team advantage"?
-
Do any teams seem to be stronger in either the first half or the second half? (think about both offense and defense)
-
Which players had the most successful scoring careers? How about the longest careers?
A step-by-step guide is included in this documentation, as well as detailed information about the analysis process.
This project used data from Maven Analytics Data Playground. They made it easy to search for insights by suggesting them.
The full documentation is on my medium page. From the Ask Phase to the Share Phase, I discuss my entire process
Yes you can !
You'll find me
- Posting memes or talking about data on Twitter
- Writing articles about complex data concepts and making them digestible on Medium
- Posting data vizualizations inspiration and data infographics on Instagram
Distributed under the no License. See LICENSE.txt for more information.
Please ⭐️ this repository if this project helped you or buy me coffee!