Instead of treating LEGO sets as toys, this project treats the entire Rebrickable schema as a multi-decade industrial dataset. The goal was to write complex relational queries and Python aggregation scripts to map exactly how LEGO transitioned from a mid-century plastics manufacturer into a 21st-century global intellectual property powerhouse.
- The IP Monopolization Test: Statistically isolated the "Star Wars Effect" to determine exactly how heavily LEGO relies on licensed cinematic universes over internal themes in the modern era.
- The Complexity Scaling Index: Evaluated how the average brick count per individual set has transformed over time to track LEGO’s strategic pivot toward adult collectors (AFOLs).
- Color Palette Diversification: Charted the chronological expansion of the manufacturing color spectrum from standard solid primary bricks to modern translucent and specialized chemical finishes.
- SQL Advanced Windowing: Engineered analytical views utilizing
ROW_NUMBER(),OVER(PARTITION BY), and conditional aggregations (CASE WHEN) to isolate year-over-year theme dominance without relying on slow subqueries. - Python Data Merging: Utilized Pandas to join highly granular, multi-table structural dimensions (inventories, colors, sets, and themes) into unified data frames for rapid exploration.
- Relational Schema Mapping: Decoded complex relational dependencies where missing parent-theme IDs required explicit left-join logic to maintain data integrity.
- Star Wars Hegemony: In the 21st century, licensed Star Wars sets captured an incredibly disproportionate share of total set production, cementing it as a foundational economic driver for the brand.
- Palette Explosion: The variety of unique parts and specialized colors scaled exponentially post-2000, shifting away from a minimalist inventory model to support highly detailed, collector-grade product lines.
-
LEGO_Data_Analysis.sql— Optimized relational database scripts, view creation logic, and analytical window queries. -
LEGO Data Analysis - Answered.ipynb— Production-ready Jupyter notebook containing the full data workflow from ingestion to final aggregation.