🖼️ Art Analytics

Analyse how style, rarity, nationality and other factors influence the market value of famous paintings using MySQL and Python.

Overview

This project loads a famous_paintings relational database and puts six economic hypotheses to the test:

#	Hypothesis	Short description
1	The Value of Style	Certain art movements (e.g. Classicism, Cubism) systematically fetch higher prices.
2	The Value of Rarity	Scarce styles command a premium over saturated ones.
3	The Nationality Premium	Does an artist’s nationality influence price within the same movement?
4	Size Matters (todo)	Are larger canvases worth more?
5	Scarcity & Lifespan	Shorter-lived artists may generate scarcity and higher prices.
6	Subject Matter & Price	Some themes (e.g. biblical scenes) outperform others.

The heavy lifting happens in:

sql/sql-database.sql – sets up the schema, loads data, and declares window‑function queries for each hypothesis.
notebooks/famous_art_data_visualizations.ipynb – runs the queries, cleans the output with pandas, and visualises results with matplotlib/seaborn.

About Brushline Analytics

Brushline Analytics is the data‑science studio behind this repository. Our mission is smarter art investing through data: we turn historical auction records into actionable signals for collectors, galleries, and funds. This repo is one of several open‑core components we maintain to showcase our engineering practices and to help prospective teammates explore our stack before interview day.

Engineering principles

Clarity first – readable code, rigorous tests, explanatory notebooks.
Automation over toil – Makefile tasks, GitHub Actions CI, dbt for repeatable transformations.
Own the entire flow – from raw CSV to interactive dashboards.

If you’re evaluating us as a potential employer, jump to Contributing to see how we work day‑to‑day.

$1

1 Clone & create an environment

$ git clone https://github.com/<your-handle>/famous-art-analytics.git
$ cd famous-art-analytics
$ python -m venv .venv
$ source .venv/bin/activate   # Windows: .venv\Scripts\activate
$ pip install -r requirements.txt

2 Set up the MySQL database

$ mysql -u <user> -p < sql/sql-database.sql

This creates a famous_paintings database populated with tables such as artist, work, subject, product_size, canvas_size and more.

3 Run the analysis notebook

$ jupyter lab notebooks/famous_art_data_visualizations.ipynb

Execute the cells; feel free to tweak queries or chart styles and re‑run.

Repository Layout

.
├── notebooks/
│   └── famous_art_data_visualizations.ipynb
├── sql/
│   └── sql-database.sql
├── data/            # raw CSV dump (git‑ignored)
├── reports/         # exported charts and tables
└── README.md

Requirements

Python 3.10+
MySQL 8.0+ (or MariaDB 10.5+)
Python packages (see requirements.txt): pandas, numpy, sqlalchemy, ipython-sql, matplotlib, seaborn, jupyterlab

Analysis Workflow

Load the SQL schema and sample data.
Execute the hypothesis queries directly in MySQL client or from the notebook.
Pull the results into pandas dataframes.
Build descriptive and inferential visualisations (bar charts, box plots, scatter plots).
Export high‑resolution figures to /reports for use in blogs or slide decks.

Findings Preview

Hypothesis	Early signal
Style	Classicism tops the average price list, followed by American Landscape and Orientalism.
Rarity	Styles with <100 works often out‑earn the mainstream genres.
Nationality	Within Impressionism, American painters’ median sale price is ~1.6× French contemporaries.
Lifespan	Works by artists who lived 65–74 years fetch the highest average price.
Subject	Biblical scenes and seascapes outperform portraits.

Landscape, Maritime, and Architecture paintings command the highest average prices, while Still Life and Floral works consistently underperform.

Roadmap

Make it easy to run locally (Docker later if needed) — Owner: @chandlershortlidge
Organize the SQL into reusable chunks — Owner: @chandlershortlidge
Add a basic dashboard with a few charts — Owner: @chandlershortlidge

Contributing

Pull requests are welcome! Please open an issue to discuss any substantial changes first. Make sure that:

pre-commit run --all-files passes
Your SQL complies with sqlfluff

License

MIT License

Acknowledgements

Kaggle’s Famous Paintings dataset
The r/DataIsBeautiful community for visualisation inspiration

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
famous_art_data_visualizations.ipynb		famous_art_data_visualizations.ipynb
sql-database.sql		sql-database.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🖼️ Art Analytics

Table of Contents

Overview

About Brushline Analytics

Engineering principles

1 Clone & create an environment

2 Set up the MySQL database

3 Run the analysis notebook

Repository Layout

Requirements

Analysis Workflow

Findings Preview

Roadmap

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

chandlershortlidge/sql-database

Folders and files

Latest commit

History

Repository files navigation

🖼️ Art Analytics

Table of Contents

Overview

About Brushline Analytics

Engineering principles

1 Clone & create an environment

2 Set up the MySQL database

3 Run the analysis notebook

Repository Layout

Requirements

Analysis Workflow

Findings Preview

Roadmap

Contributing

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages