1. Introduction
In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!

2. Reading Data
A comprehensive database of lego blocks is provided by Rebrickable. The data is available as csv files.

In [1]:
# Import modules
import pandas as pd

# Read colors data
colors = pd.read_csv('colors.csv')

# Print the first few rows
colors.head()

Unnamed: 0,id,name,rgb,is_trans
0,-1,[Unknown],0033B2,f
1,0,Black,05131D,f
2,1,Blue,0055BF,f
3,2,Green,237841,f
4,3,Dark Turquoise,008F9B,f


In [2]:
# 3. Exploring Colors - How many distinct colors are available?
num_colors = len(colors.name.unique())
print(num_colors)

184


4. Transparent Colors in Lego Sets
The colors data has a column named is_trans that indicates whether a color is transparent or not. We will explore the distribution of transparent vs. non-transparent colors.

In [3]:
# colors_summary: Distribution of colors based on transparency
colors_summary = colors.groupby('is_trans').count()
print(colors_summary)

           id  name  rgb
is_trans                
f         151   151  151
t          33    33   33


5. Explore Lego Sets - This dataset contains a comprehensive list of sets over the years and the number of parts that each of these sets contained.

In [5]:
# sets_data - Let us use this data to explore how the average number of parts in Lego sets has varied over the years.

%matplotlib inline
# Read sets data as `sets`
sets = pd.read_csv('sets.csv')

In [8]:
sets.head()

Unnamed: 0,set_num,name,year,theme_id,num_parts
0,001-1,Gears,1965,1,43
1,0011-2,Town Mini-Figures,1978,84,12
2,0011-3,Castle 2 for 1 Bonus Offer,1987,199,0
3,0012-1,Space Mini-Figures,1979,143,12
4,0013-1,Space Mini-Figures,1979,143,12


In [13]:
sets.shape

(15543, 5)

In [18]:
# Create a summary of average number of parts by year: `parts_by_year`
parts_by_year = pd.pivot_table(sets, index='year', values='num_parts')
print(parts_by_year)

       num_parts
year            
1949   99.600000
1950    1.000000
1953   13.500000
1954   12.357143
1955   36.607143
...          ...
2016  211.414439
2017  223.295918
2018  215.929368
2019  208.743713
2020  212.680203

[70 rows x 1 columns]


6. Lego Themes Over Years
Lego blocks ship under multiple themes. Let us try to get a sense of how the number of themes shipped has varied over the years.

In [22]:
# themes_by_year: Number of themes shipped by year
themes_by_year = sets[['year', 'theme_id']].\
  groupby(by='year', as_index = False).\
  agg({"theme_id": pd.Series.count})
themes_by_year.head()

Unnamed: 0,year,theme_id
0,1949,5
1,1950,6
2,1953,4
3,1954,14
4,1955,28
