💾 Data Cleaning using SQL

📄 Project Description

This project focuses on cleaning a real-world employee layoffs dataset sourced from Kaggle. The raw data was imported into MySQL, where several SQL operations were used to clean and prepare the dataset for analysis.

The objective was to resolve issues in the raw data such as:

Duplicate records
Inconsistent text formatting
Missing or null values
Incorrect data types

🛠️ Tools Used

MySQL
MySQL Workbench
Kaggle (for data source)

🧹 Data Cleaning Workflow

✅ 1. Duplicate Removal

Identified duplicate rows using ROW_NUMBER() with a CTE.
Retained only the first occurrence, and deleted the rest.

✅ 2. Data Standardization

Trimmed extra white spaces from the company column.
Replaced inconsistent values like:
- "united states" → "United States"
- "crypto currency" → "crypto"
Converted the date column from TEXT to DATE type using STR_TO_DATE() and ALTER TABLE.

✅ 3. Handling Null & Missing Values

Replaced empty strings in industry with NULL.
Used self-joins to fill in missing industry values based on the company.
Removed rows where both total_laid_off and percentage_laid_off were null.

✅ 4. Final Cleanup

Dropped helper columns like row_num used during the cleaning process.

📌 Outcome

The dataset is now clean, consistent, and ready for:

Exploratory Data Analysis (EDA)
Dashboard development (e.g., in Power BI, Tableau)
Reporting and visualization

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💾 Data Cleaning using SQL

📄 Project Description

🛠️ Tools Used

🧹 Data Cleaning Workflow

✅ 1. Duplicate Removal

✅ 2. Data Standardization

✅ 3. Handling Null & Missing Values

✅ 4. Final Cleanup

📌 Outcome

About

Uh oh!

Releases

Packages

aravind178/Data_cleaning_using_SQL

Folders and files

Latest commit

History

Repository files navigation

💾 Data Cleaning using SQL

📄 Project Description

🛠️ Tools Used

🧹 Data Cleaning Workflow

✅ 1. Duplicate Removal

✅ 2. Data Standardization

✅ 3. Handling Null & Missing Values

✅ 4. Final Cleanup

📌 Outcome

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages