In this project I will try clean data and practice sql commands.
This project demonstrates how to clean and transform raw housing data using SQL.
The dataset used is housing_data
. The cleaning process involves:
- Handling missing values in
propertyaddress
. - Splitting
propertyaddress
andowneraddress
into separate columns (address, city, state). - Standardizing values in the
soldasvacant
column (converting Y/N to Yes/No). - Checking and ensuring no nulls in the
saleprice
column. - Removing duplicate records based on multiple attributes.
- Creating a clean view of the dataset (
clean_data
).
The final cleaned dataset can be exported as:
housing_clean.csv
\copy (SELECT * FROM clean_data) TO 'housing_clean.csv' CSV HEADER;
SELECT *
INTO OUTFILE '/path/to/housing_clean.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM clean_data;
.headers on
.mode csv
.output housing_clean.csv
SELECT * FROM clean_data;
.output stdout
data_cleaning.sql
: SQL script for cleaning the Nashville housing dataset.housing_clean.csv
: Final cleaned dataset (to be generated after running the export).
Data Cleaning SQL Project