Analyse how style, rarity, nationality and other factors influence the market value of famous paintings using MySQL and Python.
- Overview
- Quick Start
- Repository Layout
- Requirements
- Analysis Workflow
- Findings Preview
- Roadmap
- Contributing
- License
- Acknowledgements
This project loads a famous_paintings relational database and puts six economic hypotheses to the test:
# | Hypothesis | Short description |
---|---|---|
1 | The Value of Style | Certain art movements (e.g. Classicism, Cubism) systematically fetch higher prices. |
2 | The Value of Rarity | Scarce styles command a premium over saturated ones. |
3 | The Nationality Premium | Does an artist’s nationality influence price within the same movement? |
4 | **Size Matters **(todo) | Are larger canvases worth more? |
5 | Scarcity & Lifespan | Shorter-lived artists may generate scarcity and higher prices. |
6 | Subject Matter & Price | Some themes (e.g. biblical scenes) outperform others. |
The heavy lifting happens in:
sql/sql-database.sql
– sets up the schema, loads data, and declares window‑function queries for each hypothesis.notebooks/famous_art_data_visualizations.ipynb
– runs the queries, cleans the output withpandas
, and visualises results withmatplotlib
/seaborn
.
Brushline Analytics is the data‑science studio behind this repository. Our mission is smarter art investing through data: we turn historical auction records into actionable signals for collectors, galleries, and funds. This repo is one of several open‑core components we maintain to showcase our engineering practices and to help prospective teammates explore our stack before interview day.
- Clarity first – readable code, rigorous tests, explanatory notebooks.
- Automation over toil – Makefile tasks, GitHub Actions CI, dbt for repeatable transformations.
- Own the entire flow – from raw CSV to interactive dashboards.
If you’re evaluating us as a potential employer, jump to Contributing to see how we work day‑to‑day.
$1
$ git clone https://github.com/<your-handle>/famous-art-analytics.git
$ cd famous-art-analytics
$ python -m venv .venv
$ source .venv/bin/activate # Windows: .venv\Scripts\activate
$ pip install -r requirements.txt
$ mysql -u <user> -p < sql/sql-database.sql
This creates a famous_paintings database populated with tables such as artist
, work
, subject
, product_size
, canvas_size
and more.
$ jupyter lab notebooks/famous_art_data_visualizations.ipynb
Execute the cells; feel free to tweak queries or chart styles and re‑run.
.
├── notebooks/
│ └── famous_art_data_visualizations.ipynb
├── sql/
│ └── sql-database.sql
├── data/ # raw CSV dump (git‑ignored)
├── reports/ # exported charts and tables
└── README.md
- Python 3.10+
- MySQL 8.0+ (or MariaDB 10.5+)
- Python packages (see
requirements.txt
):pandas
,numpy
,sqlalchemy
,ipython-sql
,matplotlib
,seaborn
,jupyterlab
- Load the SQL schema and sample data.
- Execute the hypothesis queries directly in MySQL client or from the notebook.
- Pull the results into
pandas
dataframes. - Build descriptive and inferential visualisations (bar charts, box plots, scatter plots).
- Export high‑resolution figures to /reports for use in blogs or slide decks.
Hypothesis | Early signal |
---|---|
Style | Classicism tops the average price list, followed by American Landscape and Orientalism. |
Rarity | Styles with <100 works often out‑earn the mainstream genres. |
Nationality | Within Impressionism, American painters’ median sale price is ~1.6× French contemporaries. |
Lifespan | Works by artists who lived 65–74 years fetch the highest average price. |
Subject | Biblical scenes and seascapes outperform portraits. |
Landscape, Maritime, and Architecture paintings command the highest average prices, while Still Life and Floral works consistently underperform.
- Make it easy to run locally (Docker later if needed) — Owner: @chandlershortlidge
- Organize the SQL into reusable chunks — Owner: @chandlershortlidge
- Add a basic dashboard with a few charts — Owner: @chandlershortlidge
Pull requests are welcome! Please open an issue to discuss any substantial changes first. Make sure that:
pre-commit run --all-files
passes- Your SQL complies with
sqlfluff
MIT License
- Kaggle’s Famous Paintings dataset
- The r/DataIsBeautiful community for visualisation inspiration