Skip to content

IsabelleSchuessler/SQL-Layoffs-Analysis

Repository files navigation

SQL-Layoffs-Analysis

πŸ’Ό Data Cleaning and Exploratory Analysis of Company Layoffs using MySQL

This portfolio project demonstrates real-world data cleaning and exploratory data analysis (EDA) using MySQL.
The dataset represents global company layoffs from 2020–2023, including total layoffs, company stage, country, and industry.
The goal was to prepare raw, messy data for analysis, extract insights, and identify patterns in layoffs across industries and years.


🧹 PART 1 & 2 – Data Cleaning in SQL

Files:

  • 17.Data Cleaning PROJECT PART 1.sql
  • 17.Data Cleaning PROJECT PART 2.sql

πŸ”§ Steps Performed:

  1. Created staging tables to protect the raw dataset while performing transformations.
  2. Removed duplicates using ROW_NUMBER() and Common Table Expressions (CTE).
  3. Standardized text fields:
    • Trimmed extra spaces in company names.
    • Unified variations like β€œCrypto%” β†’ β€œCrypto”.
    • Cleaned country names (β€œUnited States.” β†’ β€œUnited States”).
  4. Converted date column from text to proper DATE format using STR_TO_DATE().
  5. Handled missing data:
    • Filled missing industries using company matches.
    • Removed rows with null values for both total_laid_off and percentage_laid_off.
  6. Dropped unnecessary columns after cleaning (e.g., helper columns for row numbers).

βœ… Results:

  • Dataset became fully clean and ready for analysis.
  • All text fields standardized, duplicates removed, and date formats corrected.

πŸ” PART 3 – Exploratory Data Analysis (EDA)

File:

  • 18.MySQL Exploratory Data Analysis Full Project.sql

πŸ“Š Analysis Goals:

  • Identify which companies and industries had the highest layoffs.
  • Examine layoff trends by year, month, country, and funding stage.
  • Find top 5 companies with most layoffs each year.
  • Calculate rolling totals of layoffs over time.

πŸ“ˆ Techniques Used:

  • GROUP BY, ORDER BY for aggregations.
  • Window functions (DENSE_RANK(), SUM() OVER) for trend ranking.
  • Common Table Expressions (CTEs) for organizing intermediate results.

🧠 Key Insights:

  • The Tech industry had the highest total layoffs overall.
  • Layoffs peaked between 2022–2023, with a strong correlation to economic slowdowns.
  • The United States accounted for most layoffs globally.
  • Companies in the Late-Stage and Post-IPO phases were most affected.

βš™οΈ Tools & Skills Demonstrated

  • MySQL Workbench / SQL Server Management Studio
  • Data Cleaning, Standardization, and Validation
  • Exploratory Data Analysis (EDA)
  • CTEs, Window Functions, Aggregations
  • Data Storytelling through SQL

πŸ“˜ Example Query (Top 5 Companies per Year)

WITH Company_Year AS (
    SELECT company, YEAR(`date`) AS years, SUM(total_laid_off) AS total_laid_off
    FROM layoffs_staging2
    GROUP BY company, YEAR(`date`)
),
Company_Year_Rank AS (
    SELECT *,
           DENSE_RANK() OVER (PARTITION BY years ORDER BY total_laid_off DESC) AS Ranking
    FROM Company_Year
    WHERE years IS NOT NULL
)
SELECT *
FROM Company_Year_Rank
WHERE Ranking <= 5;


---
## 🧠 What I Learned

How to clean real-world messy data using SQL.

Importance of standardizing fields before analysis.

How to summarize trends by company, country, and time period.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published