# Lego set analysis

![legoswalk.png](imgs/legoswalk.png)

## Examining the history of Lego sets.

The Rebrickable dataset includes data on every LEGO set that has ever been sold; the names of the sets, what bricks they contain, etc. It might be small bricks, but this is big data! In this project, you will use this dataset together with the pandas library to dig into the history of Lego's licensed sets, including uncovering the percentage of all licensed sets that are Star Wars themed.

## Where did I get my data from?

![schema_v3.png](imgs/schema_v3.png)

The LEGO Parts/Sets/Colors and Inventories of every official LEGO set in the Rebrickable database is available for download as csv files here. These files are automatically updated daily. If you need more details, you can use the API which provides real-time data, but has rate limits that prevent bulk downloading of data.

see https://rebrickable.com/downloads/



In [3]:
import pandas as pd
import numpy as np

# Show all columns when printing a DataFrame
pd.set_option('display.max_columns', None)

# Load the datasets
lego_sets = pd.read_csv('data/sets.csv').dropna(subset=["set_num"])
themes = pd.read_csv('data/themes.csv')

# Merge sets with themes to get theme names
merged_table = lego_sets.merge(
    themes[["id", "name"]],
    left_on="theme_id", right_on="id", how="left"
)
#print(merged_table)
#print(merged_table.columns)

# Create a table with interesting values only
a = merged_table[["set_num", "name_x", "year", "name_y"]]
a.columns = ["set_num", "set_name", "year", "theme_name"]

# Filter to only include Star Wars themed items
c = a[a["theme_name"] == 'Star Wars']

# Calculate percentage of Star Wars sets among all sets
the_force = int((c["theme_name"].count() / len(a)) * 100)
print(the_force)

# Count Star Wars sets by year
d = c.groupby("year")["set_num"].count()
print(d)

# Find the year with the most Star Wars sets
new_era = d.idxmax()
print(new_era)


3
year
1999    14
2000    25
2001    12
2002    28
2003    25
2004    22
2005    27
2006    11
2007    15
2008    22
2009    35
2010    28
2011    33
2012    42
2013    42
2014    47
2015    69
2016    63
2017    64
2018    69
2019    57
2020    46
2021    37
2022    40
2023    47
2024    55
2025    48
2026    12
Name: set_num, dtype: int64
2015
