This project focuses on cleaning, transforming, and organizing a dataset of meteorite landings using SQLite. It demonstrates how to handle real-world data issues such as missing values, inconsistent formatting, and the need for meaningful sorting and indexing.
The data comes from a CSV file containing meteorite landings, with the following key fields:
id: ID of the meteorite operationname: Name of the meteoriteclass: Classificationmass: Mass (in grams)year: Year of falllat: Geographic coordinateslong: Geographic coordinatesnametype: Should be used to filter out irrelevant entries
- Convert empty values to NULL for the fields:
mass,year,lat, andlong. - Round decimal columns (mass, lat, long) to the nearest hundredths place (e.g., 70.4777 β 70.48).
- Filter out meteorites with
nametype= 'Relict', as these are not considered relevant falls. - Extract the
yearfrom the full date if necessary, and ensure it's stored as an integer. - Assign unique
IDs starting from 1 based on:year(ascending),name(alphabetical) within the same year - Store the cleaned and transformed data in a final meteorites table, ready for analysis or visualization.
- SQLite 3
- SQL window functions (ROW_NUMBER())
- CSV import and manipulation
- Data wrangling techniques
meteorites.csv(original dataset)import.sql(script to import CSV and create temp table)transform.sql(SQL script to clean and insert into final table)README.md(Project documentation)
- Using the
import.sqlimport the CSV into a temporary table (temp_table). Be careful when using the ".import --csv --skip 1 meteorites.csv temp_table" command, as you will need to run it directly in the terminal. This is because SQL treats it as text in the import.sql file. - Run the transformation script (
transform.sql) to clean and insert into the final meteorites table. - Query the meteorites table for sorted, cleaned data ready for analysis.