The Iris, Titanic, and other seaborn example datasets in .db
format for practicing SQL queries, exploratory analysis, machine learning, etc.
Each database starts with an Observation
table, which contains only boolean and numeric types. Text data are split off into related tables.
Querying the Iris dataset with Python:
import sqlite3
import pandas as pd
db = sqlite3.connect('iris.db')
sql = """
SELECT
O.petal_length,
O.petal_width,
O.sepal_length,
O.sepal_width,
S.species
FROM Observation AS O
JOIN Species AS S ON S.species_id = O.species_id
"""
df = pd.read_sql_query(sql, db)
df.head()
Result:
petal_length | petal_width | sepal_length | sepal_width | species | |
---|---|---|---|---|---|
0 | 1.4 | 0.2 | 5.1 | 3.5 | setosa |
1 | 1.4 | 0.2 | 4.9 | 3.0 | setosa |
2 | 1.3 | 0.2 | 4.7 | 3.2 | setosa |
3 | 1.5 | 0.2 | 4.6 | 3.1 | setosa |
4 | 1.4 | 0.2 | 5.0 | 3.6 | setosa |
Table | Column | Type |
---|---|---|
Dataset | dataset_id | BIGINT |
Dataset | dataset | TEXT |
Observation | x | FLOAT |
Observation | y | FLOAT |
Observation | dataset_id | BIGINT |
Table | Column | Type |
---|---|---|
Attention | attention_id | BIGINT |
Attention | attention | TEXT |
Observation | "Unnamed: 0" | BIGINT |
Observation | subject | BIGINT |
Observation | solutions | BIGINT |
Observation | score | FLOAT |
Observation | attention_id | BIGINT |
Table | Column | Type |
---|---|---|
Abbrev | abbrev_id | BIGINT |
Abbrev | abbrev | TEXT |
Observation | total | FLOAT |
Observation | speeding | FLOAT |
Observation | alcohol | FLOAT |
Observation | not_distracted | FLOAT |
Observation | no_previous | FLOAT |
Observation | ins_premium | FLOAT |
Observation | ins_losses | FLOAT |
Observation | abbrev_id | BIGINT |
Table | Column | Type |
---|---|---|
Clarity | clarity_id | BIGINT |
Clarity | clarity | TEXT |
Color | color_id | BIGINT |
Color | color | TEXT |
Cut | cut_id | BIGINT |
Cut | cut | TEXT |
Observation | carat | FLOAT |
Observation | depth | FLOAT |
Observation | "table" | FLOAT |
Observation | price | BIGINT |
Observation | x | FLOAT |
Observation | y | FLOAT |
Observation | z | FLOAT |
Observation | cut_id | BIGINT |
Observation | color_id | BIGINT |
Observation | clarity_id | BIGINT |
Table | Column | Type |
---|---|---|
Align | align_id | BIGINT |
Align | align | TEXT |
Choice | choice_id | BIGINT |
Choice | choice | TEXT |
Observation | time | BIGINT |
Observation | coherence | FLOAT |
Observation | firing_rate | FLOAT |
Observation | align_id | BIGINT |
Observation | choice_id | BIGINT |
Table | Column | Type |
---|---|---|
Diet | diet_id | BIGINT |
Diet | diet | TEXT |
Kind | kind_id | BIGINT |
Kind | kind | TEXT |
Observation | "Unnamed: 0" | BIGINT |
Observation | id | BIGINT |
Observation | pulse | BIGINT |
Observation | diet_id | BIGINT |
Observation | time_id | BIGINT |
Observation | kind_id | BIGINT |
Time | time_id | BIGINT |
Time | time | TEXT |
Table | Column | Type |
---|---|---|
Month | month_id | BIGINT |
Month | month | TEXT |
Observation | year | BIGINT |
Observation | passengers | BIGINT |
Observation | month_id | BIGINT |
Table | Column | Type |
---|---|---|
Event | event_id | BIGINT |
Event | event | TEXT |
Observation | timepoint | BIGINT |
Observation | signal | FLOAT |
Observation | subject_id | BIGINT |
Observation | event_id | BIGINT |
Observation | region_id | BIGINT |
Region | region_id | BIGINT |
Region | region | TEXT |
Subject | subject_id | BIGINT |
Subject | subject | TEXT |
Table | Column | Type |
---|---|---|
Observation | timepoint | FLOAT |
Observation | subject | BIGINT |
Observation | "BOLD signal" | FLOAT |
Observation | "ROI_id" | BIGINT |
Roi | "ROI_id" | BIGINT |
Roi | "ROI" | TEXT |
Table | Column | Type |
---|---|---|
Observation | sepal_length | FLOAT |
Observation | sepal_width | FLOAT |
Observation | petal_length | FLOAT |
Observation | petal_width | FLOAT |
Observation | species_id | BIGINT |
Species | species_id | BIGINT |
Species | species | TEXT |
Table | Column | Type |
---|---|---|
Name | name_id | BIGINT |
Name | name | TEXT |
Observation | mpg | FLOAT |
Observation | cylinders | BIGINT |
Observation | displacement | FLOAT |
Observation | horsepower | FLOAT |
Observation | weight | BIGINT |
Observation | acceleration | FLOAT |
Observation | model_year | BIGINT |
Observation | origin_id | BIGINT |
Observation | name_id | BIGINT |
Origin | origin_id | BIGINT |
Origin | origin | TEXT |
Table | Column | Type |
---|---|---|
Method | method_id | BIGINT |
Method | method | TEXT |
Observation | number | BIGINT |
Observation | orbital_period | FLOAT |
Observation | mass | FLOAT |
Observation | distance | FLOAT |
Observation | year | BIGINT |
Observation | method_id | BIGINT |
Table | Column | Type |
---|---|---|
Day | day_id | BIGINT |
Day | day | TEXT |
Observation | total_bill | FLOAT |
Observation | tip | FLOAT |
Observation | size | BIGINT |
Observation | sex_id | BIGINT |
Observation | smoker_id | BIGINT |
Observation | day_id | BIGINT |
Observation | time_id | BIGINT |
Sex | sex_id | BIGINT |
Sex | sex | TEXT |
Smoker | smoker_id | BIGINT |
Smoker | smoker | TEXT |
Time | time_id | BIGINT |
Time | time | TEXT |
Table | Column | Type |
---|---|---|
Alive | alive_id | BIGINT |
Alive | alive | TEXT |
Class | class_id | BIGINT |
Class | class | TEXT |
Deck | deck_id | BIGINT |
Deck | deck | TEXT |
EmbarkTown | embark_town_id | BIGINT |
EmbarkTown | embark_town | TEXT |
Embarked | embarked_id | BIGINT |
Embarked | embarked | TEXT |
Observation | survived | BIGINT |
Observation | pclass | BIGINT |
Observation | age | FLOAT |
Observation | sibsp | BIGINT |
Observation | parch | BIGINT |
Observation | fare | FLOAT |
Observation | adult_male | BOOLEAN |
Observation | alone | BOOLEAN |
Observation | sex_id | BIGINT |
Observation | embarked_id | BIGINT |
Observation | class_id | BIGINT |
Observation | who_id | BIGINT |
Observation | deck_id | BIGINT |
Observation | embark_town_id | BIGINT |
Observation | alive_id | BIGINT |
Sex | sex_id | BIGINT |
Sex | sex | TEXT |
Who | who_id | BIGINT |
Who | who | TEXT |
These databases were created by converting the .csv files in the seaborn data repository to SQLite using Pandas and SQLAlchemy. The datasets themselves are all either open source or in the public domain.
Conversion and documentation by David James Knight.