Welcome to the Data Cleaning in SQL repository! This project demonstrates the application of SQL techniques to clean and prepare raw datasets for analysis. It serves as a practical example of how to transform messy data into structured, reliable information using SQL.
In this project, I focused on cleaning a raw dataset by addressing common data quality issues such as:
- Removing duplicates
- Handling missing or NULL values
- Standardizing data formats
- Correcting data inconsistencies
The goal was to prepare the dataset for further analysis, ensuring its integrity and reliability.
- SQL: Utilized SQL queries for data manipulation and cleaning.
- MySQL Workbench: Executed SQL scripts and managed the database.
- CSV Files: Worked with CSV files for importing and exporting data.
Dataset/: Contains the raw and cleaned datasets in CSV format.Data Cleaning SQL Project Queries.sql: SQL script file with all the queries used for data cleaning tasks.
SELECT DISTINCTβ To identify and remove duplicate recordsIS NULL / IS NOT NULLβ For detecting and handling missing valuesUPDATEβ To correct data inconsistenciesALTER TABLEβ For modifying table structures when necessary
To replicate this project:
- Clone the repository:
git clone https://github.com/Shivangkus/Data-Cleaning-in-SQL.git
Open the Data Cleaning SQL Project Queries.sql file in MySQL Workbench.
Execute the SQL queries step by step to clean the dataset.
Import the cleaned dataset into your preferred analysis tool.
π Next Steps After cleaning the data, you can proceed with:
Exploratory Data Analysis (EDA) β To uncover patterns and insights
Data Visualization β Using tools like Tableau or Power BI
Statistical Analysis β For deeper understanding and modeling
π License This project is licensed under the MIT License