# Evolution of LEGO

<div style="text-align: right"> May 01, 2021 </div>

<div style="text-align: right"> Sample Notebook by Junghoo Kim </div>

## Introduction

### Motivation

Ever since I played with my first LEGO set, I've noticed there have been huge increases in both the variety of colors and themes in available LEGO sets. The LEGO blocks that I remember playing with were generic square blocks with the iconic red, white, yellow, and green colors. I remember growing up and thinking Star Wars-themed sets were so cool, but now I see even more amazing sets available like [Frozen-themed sets](https://www.lego.com/en-ca/themes/disney-frozen-2), [Super Mario themed sets](https://www.lego.com/en-ca/themes/super-mario) and Venom set! 

<img src="data/venom.jpeg" alt="LEGO Venom set" style="width: 200px;"/>

I feel as if there has been a big increase in the total number of LEGO sets produced over the years. The variety of themes have increased for sure, but are there fan-favourite themes that are still being produced most often? Has the distribution of colors used for the Lego parts changed as the more popular themes have shifted away from the colors like red and yellow used in the classic LEGO sets? Last but not least, how has the number of parts in each set changed over the years? Does each set still contain just as many pieces as they did 20 years ago, or are there more/fewer pieces now? We will be able to address these questions using an interactive dashboard.


### Questions of interest

1. How has the total number of LEGO sets produced changed over the years?
2. How has the distribution of the number of LEGO sets with different themes changed over the years?
3. Which colors were used most often for LEGO parts over the years?
4. How has the number of parts in each set changed over the years?

<br></br>

## Analysis

### Data Imports

**Note to students**: Even though reading data from URLs was necessary for the sample project due to the LEGO dataset being too large, you won't need to read data from URLs for your final project.

In [1]:
# Import libraries needed for this assignment
import altair as alt
import pandas as pd
import os

alt.data_transformers.enable("data_server")

# Data URLs 
themes_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego-themes.csv"
sets_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego-sets.csv"
inventories_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego_inventories.csv"
inventory_parts_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego_inventory_parts.csv"
colors_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego-colors.csv"
combined_url = "https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/lego-combined.csv"

# DataFrames from local csv files
themes_df = pd.read_csv(themes_url)
sets_df = pd.read_csv(sets_url)
inventories_df = pd.read_csv(inventories_url)
inventory_parts_df = pd.read_csv(inventory_parts_url)
colors_df = pd.read_csv(colors_url)
combined_df = pd.read_csv(combined_url)

### Dataset description
The below descriptions were taken directly from the [website](https://www.kaggle.com/rtatman/lego-database) where the datasets were obtained.

"LEGO is a popular brand of toy building bricks. They are often sold in sets with in order to build a specific object. Each set contains a number of parts in different shapes, sizes and colors. This database contains information on which parts are included in different LEGO sets. It was originally compiled to help people who owned some LEGO sets already figure out what other sets they could build with the pieces they had."

The LEGO dataset is composed of $8$ tables, `colors.csv`, `inventories.csv`, `inventory_parts.csv`, `inventory_sets.csv`, `part_categories.csv`, `parts.csv,sets.csv`, and `themes.csv` . Each table is stored in a `.csv` file and contains different information about lego pieces including shapes, sizes, sets, colors, and themes. Tables below summarize the `themes`, `sets`, `inventories`, `inventory_parts`, and `colors` data, as well as which columns will be used to answer which questions. Same colors are used to indicate the columns that are shared between different `.csv` files:

### [`themes.csv`](https://www.kaggle.com/rtatman/lego-database?select=themes.csv): 

| Column                               | Description                                     |
|--------------------------------------|:------------------------------------------------|
| <font color='sky blue'>**id**</font> | Theme unique ID                                 |
| name                                 | Name of the theme (Q2)                          |
| parent_id                            | Unique ID for the larger theme, if there is one |

### [`sets.csv`](https://www.kaggle.com/rtatman/lego-database?select=inventory_sets.csv): 

| Column                                     | Description                                                  |
|--------------------------------------------|:-------------------------------------------------------------|
| <font color='orange'>**set_num**</font>    | Unique set ID                                                |
| name                                       | The name of the set (Q2, Q4)                                 |
| year                                       | Year the set was published. (Q1, Q2, Q3, Q4)                 |
| <font color='sky blue'>**theme_id**</font> | Unique ID for the theme used for the set (from `themes.csv`) |
| num_parts                                  | The number of parts included in the set  (Q4)                |

### [`inventories.csv`](https://www.kaggle.com/rtatman/lego-database?select=inventories.csv): 

| Column                                  | Description                        |
|-----------------------------------------|:-----------------------------------|
| <font color='blue'>**id**</font>        | Unique ID for this inventory entry |
| version                                 | Version number                     |
| <font color='orange'>**set_num**</font> | Set number (from `sets.csv`).      |

### [`inventory_parts.csv`](https://www.kaggle.com/rtatman/lego-database?select=inventory_parts.csv): 

| Column                                     | Description                                                                                        |
|--------------------------------------------|:---------------------------------------------------------------------------------------------------|
| <font color='blue'>**inventory_id**</font> | Unique ID for the inventory this part is appearing in. Same as the id value in `inventories.csv`   |
| part_num                                   | Unique ID for the part                                                                             |
| <font color='green'>**color_id**</font>    | Unique ID for the color, as per `colors.csv`                                                       |
| quantity                                   | The number of copies of this part included in the set (Q3)                                         |
| is_spare                                   | Whether or not this is a spare part. Spare parts are additional parts not needed to finish the set |



### [`colors.csv`](https://www.kaggle.com/rtatman/lego-database?select=colors.csv): 

| Column                            | Description                                               |
|-----------------------------------|:----------------------------------------------------------|
| <font color='green'>**id**</font> | Unique ID for this color                                  |
| name                              | The human-readable name of the color (Q3)                 |
| rgb                               | The approximate RGB color (Q3)                            |
| is_trans                          | Whether or not the given color is transparent/translucent |


Here is the schema of how the `.csv` files are related to each other:

**Note to students**: This schema was included here to provide some background information about the LEGO dataset. You are **not** required to provide schema for your dataset in your final project.

![schema](data/raw/downloads_schema.png)

### Data Summary Tables and Methods

In [2]:
themes_df.info()
print("\n")
themes_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   id         614 non-null    int64  
 1   name       614 non-null    object 
 2   parent_id  503 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 14.5+ KB




Unnamed: 0,id,parent_id
count,614.0,503.0
mean,307.5,274.294235
std,177.390811,176.070151
min,1.0,1.0
25%,154.25,126.0
50%,307.5,264.0
75%,460.75,430.0
max,614.0,591.0


In [3]:
sets_df.info()
print("\n")
sets_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11673 entries, 0 to 11672
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   set_num    11673 non-null  object
 1   name       11673 non-null  object
 2   year       11673 non-null  int64 
 3   theme_id   11673 non-null  int64 
 4   num_parts  11673 non-null  int64 
dtypes: int64(3), object(2)
memory usage: 456.1+ KB




Unnamed: 0,year,theme_id,num_parts
count,11673.0,11673.0,11673.0
mean,2001.972758,311.308575,162.2624
std,13.475364,177.999101,330.192108
min,1950.0,1.0,-1.0
25%,1997.0,161.0,10.0
50%,2005.0,324.0,45.0
75%,2012.0,470.0,172.0
max,2017.0,614.0,5922.0


In [4]:
inventories_df.info()
print("\n")
inventories_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11681 entries, 0 to 11680
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   id       11681 non-null  int64 
 1   version  11681 non-null  int64 
 2   set_num  11681 non-null  object
dtypes: int64(2), object(1)
memory usage: 273.9+ KB




Unnamed: 0,id,version
count,11681.0,11681.0
mean,8412.481551,1.001541
std,4880.737513,0.057018
min,1.0,1.0
25%,4156.0,1.0
50%,8404.0,1.0
75%,12585.0,1.0
max,18708.0,5.0


In [5]:
inventory_parts_df.info()
print("\n")
inventory_parts_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 580251 entries, 0 to 580250
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   inventory_id  580251 non-null  int64 
 1   part_num      580251 non-null  object
 2   color_id      580251 non-null  int64 
 3   quantity      580251 non-null  int64 
 4   is_spare      580251 non-null  object
dtypes: int64(3), object(2)
memory usage: 22.1+ MB




Unnamed: 0,inventory_id,color_id,quantity
count,580251.0,580251.0,580251.0
mean,8605.285444,78.472787,3.32473
std,4958.375522,622.238597,8.229816
min,1.0,-1.0,1.0
25%,4352.0,1.0,1.0
50%,8635.0,15.0,2.0
75%,12794.0,71.0,4.0
max,18708.0,9999.0,1440.0


In [6]:
colors_df.info()
print("\n")
colors_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135 entries, 0 to 134
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        135 non-null    int64 
 1   name      135 non-null    object
 2   rgb       135 non-null    object
 3   is_trans  135 non-null    object
dtypes: int64(1), object(3)
memory usage: 4.3+ KB




Unnamed: 0,id
count,135.0
mean,253.037037
std,878.441466
min,-1.0
25%,34.5
50%,85.0
75%,231.0
max,9999.0


In [7]:
combined_df.info()
print("\n")
combined_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 580251 entries, 0 to 580250
Data columns (total 11 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   Unnamed: 0    580251 non-null  int64 
 1   inventory_id  580251 non-null  int64 
 2   quantity      580251 non-null  int64 
 3   set_num       580251 non-null  object
 4   name          580251 non-null  object
 5   year          580251 non-null  int64 
 6   num_parts     580251 non-null  int64 
 7   id_theme      580251 non-null  int64 
 8   name_theme    580251 non-null  object
 9   id_color      580251 non-null  int64 
 10  name_color    580251 non-null  object
dtypes: int64(7), object(4)
memory usage: 48.7+ MB




Unnamed: 0.1,Unnamed: 0,inventory_id,quantity,year,num_parts,id_theme,id_color
count,580251.0,580251.0,580251.0,580251.0,580251.0,580251.0,580251.0
mean,290125.0,8605.285444,3.32473,2005.695068,506.80587,274.152367,78.472787
std,167504.179861,4958.375522,8.229816,10.860828,606.258946,190.329691,622.238597
min,0.0,1.0,1.0,1950.0,1.0,1.0,-1.0
25%,145062.5,4352.0,1.0,2000.0,134.0,100.0,1.0
50%,290125.0,8635.0,2.0,2009.0,328.0,236.0,15.0
75%,435187.5,12794.0,4.0,2014.0,665.0,466.0,71.0
max,580250.0,18708.0,1440.0,2017.0,5922.0,614.0,9999.0


`combined_df` is the DataFrame with all the relevant columns from the above `.csv` files merged together. This DataFrame links which inventory parts are associated with which sets, in what year they were produced, which color the LEGO pieces are, etc.

We see that there are no missing values in this DataFrame.

Notice the memory usage of almost 50 MB! This will definitely be an issue if we try to make plots using DataFrame as the data source. To reduce the notebook filesize, we will be using URLs for the `.csv` files in GitHub repository as the data source. 

<br></br>

### How has the number of released sets changed over the years?

In [8]:
sets_per_year = (
    alt.Chart(sets_url)
    .mark_bar(color="navy")
    .encode(
        alt.X("year:O", title="Year"),
        alt.Y("count()", title="Number of Sets"),
        tooltip=[alt.Tooltip("count()", title="Number of Sets")])
    .properties(width=350)
)

sets_per_year.properties(title="Fig 1. Number of sets produced each year from 1950-2017")

We can see from the above visualization that there is a fairly clear increasing trend in the number of sets produced each year from 1950 to 2017. But this makes me ask the following question: do the sets released in recent years have the same themes as the sets released in the earlier years? If the themes have changed over the years, what are some of the themes that have "defined" each era? Let's find out!

### How has the distribution of the number of sets with different themes changed over the years?

In [9]:
# On mouse click
select_year_click = alt.selection_multi(encodings=["x"], on='click', nearest=True)

sets_per_year_click = (
    sets_per_year.encode(
        color=alt.condition(select_year_click, alt.value("navy"), alt.value("lightgray")))
    .properties(height=100, width=350)
    .add_selection(select_year_click)
    .properties(title={
        "text" : "Number of sets produced each year from 1950-2017.",
        "subtitle" : ["Click on a bar to select the year. Hold shift to select multiple years.", "Double-click to clear selection(s)."]
    })
)

top_themes = (
    alt.Chart(sets_url)
    .transform_filter(select_year_click)     # filter for selected year
    .mark_bar()
    .encode(
        alt.X("name:N", title="Theme", sort='-y'),
        alt.Y("sets_count:Q", title="Number of Sets"),
        alt.Color(value="navy"),
        tooltip=[alt.Tooltip("sets_count:Q", title="Number of Sets with Theme")])
    .transform_lookup(
        lookup='theme_id',
        from_=alt.LookupData(data=themes_url, key='id',
                         fields=['name']))
    .transform_aggregate(
        sets_count="count()",
        groupby=["name"])
    .transform_window(
        rank='rank(sets_count)',
        sort=[alt.SortField("sets_count", order="descending")])
    .transform_filter(alt.datum.rank <= 10)
    .properties(title="Most used themes in selected year(s)",
                height=350, width=350)
    .add_selection(select_year_click)
)

sets_per_year_click & top_themes

That's interesting! Sets produced in 1950s mostly had "Basic Set", "Town Plan", "Traffic", and "Supplemental" themes. These are the generic yet iconic LEGO themes that I remember. 

In contrast, you can start seeing some trendy themes in 2000s and 2010s. For example, in 2004, when the "Harry Potter and the Prisoner of Azkaban" film was released, there were 11 sets produced with the "Prisoner of Azkaban" theme! Similarly, in 2011 when "Pirates of the Caribbean: On Stranger Tides" movie was released, there were 16 LEGO sets released with "Pirates of the Caribbean" theme. 

It's also interesting to see themes related to other major events such as 33 LEGO sets with "Soccer" theme in 2002, when the 17th FIFA World Cup took place.

Despite these variations in the trendy themes across years, we see that Star Wars themes such as "Star Wars Clone Wars", "Star Wars Episode 4/5/6", and "Star Wars Episode 7"  appear consistently among the most used themes between 2008-2017. 

It appears that there have been some major changes in the themes used for LEGO sets. Have the colors used for the LEGO blocks also changed over the years? We can answer that, too!

### Which colors were used most often over the years?

In [10]:
color_slider = alt.binding_range(
    step=1,
    min=5,
    max=25)

select_colors = alt.selection_single(
    fields=['num_colors'],
    bind=color_slider,
    init={'num_colors' : 10},
    name='Select')

colors_filtered = colors_df.query("id >= 0 & id < 9999")

colors = colors_filtered.name.unique()
# I'm adding "#" prefix to the rgb values (e.g. "0033B2") so that Altair knows these are RGB values
rgbs = [f"#{rgb}" for rgb in colors_filtered.loc[colors_filtered['name'] == colors, 'rgb'].values]

colors_parts = (
    alt.Chart(combined_url)
    .transform_filter(select_year_click)
    .transform_filter(alt.datum.id_color >= 0 & alt.datum.id_color < 9999)
    .mark_bar(stroke="black")
    .encode(
        alt.X("total_parts:Q", title="Number of Parts"),
        alt.Y("name_color:N", title="Color", sort='-x'),
        color=alt.Color('name_color:N', scale=alt.Scale(domain=colors, range=rgbs), legend=None),
        tooltip=[alt.Tooltip("total_parts:Q", title="Number of Parts with Color")])
    .transform_aggregate(
        total_parts="sum(quantity)",
        groupby=["name_color"])
    .transform_window(
        rank='rank(total_parts)',
        sort=[alt.SortField("total_parts", order="descending")])
    .add_selection(select_colors)
    .transform_filter(alt.datum.rank <= select_colors.num_colors)
    .properties(title={
        "text" : "Top colors with most number of parts in selected year(s)",
        "subtitle" : "Use slider below to select number of colors to show."},
                height=550)
)

((sets_per_year_click & top_themes) | colors_parts)

We see in the above visualization that the colors used for the LEGO blocks have become more diverse in recent years. Now the color palette used includes colors such as "Lime", "Trans-Light Blue", and "Pearl Gold". 

In addition, we can see that while grey colors were not used very much in the early years, light grey and dark grey colors are used much more in recent years. This change appears to have started around 1978 onwards. 

So far, we've seen that there have been some major changes in the themes used for the LEGO sets, as well as the color palette used for the LEGO blocks. I'm super interested in these newer LEGO sets with trendy themes and wide array of colors! But do these newer LEGO sets come with just as many pieces as they did in the earlier years?

In [11]:
# Boolean selection for point/line marks
scatter_check = alt.binding_checkbox()
line_check = alt.binding_checkbox()

scatter_selection = alt.selection_single(bind=scatter_check, name="Hide Scatter")
line_selection = alt.selection_single(bind=line_check, name="Hide Mean Trend Line")

max_parts = sets_df["num_parts"].max()

parts_per_set_scatter = (
    alt.Chart(sets_url)
    .mark_point(size=5, fill="navy")
    .encode(
        alt.X("year:O", title="Year"),
        alt.Y("num_parts:Q", title="Number of Parts in a Set",
              scale = alt.Scale(type="sqrt", domain=[0, max_parts])),
        opacity=alt.condition(scatter_selection, alt.value(0.5), alt.value(0.0)),
        tooltip=[alt.Tooltip("name:N", title="Name of Set"),
                 alt.Tooltip("num_parts:Q", title="Number of Parts")])
    .add_selection(scatter_selection)
)
    
parts_per_set_line = (
    alt.Chart(sets_url)
    .mark_line(color="red", strokeWidth=5)
    .encode(
        alt.X("year:O", title="Year"),
        alt.Y("num_parts:Q", aggregate="mean", title="Number of Parts in a Set",
              scale = alt.Scale(type="sqrt", domain=[0, max_parts])),
        opacity=alt.condition(line_selection, alt.value(1), alt.value(0.0)),
        tooltip=[alt.Tooltip("year:O", title="Year"),
                 alt.Tooltip("num_parts:Q", aggregate="mean", title="Mean Number of Parts", format='.2f')])
    .add_selection(line_selection)
)

parts_per_set = (
    (parts_per_set_scatter + parts_per_set_line)
    .properties(height=100, width=350, title="Number of parts in a set")
)

sets_per_year_click & parts_per_set 

**Side Note**: Unforunately, the checkbox widgets do not work after the first click. This issue is documented in [Altair repo](https://github.com/altair-viz/altair/issues/1428) as well as [Vega-Lite repo](https://github.com/vega/vega-lite/issues/4870), and will hopefully be fixed in future versions.

As seen in the plot above, the number of parts in a set has had a generally increasing trend over the years. Using the tooltips, we can also see some sets with many parts above the mean, such as Taj Mahal set released in 2008 with 5922 parts, and Millennium Falcon - UCS set released in 2007 with 5195 parts. 

<br></br>

## Discussion

Ever since the first LEGO wooden blocks were made by a Danish carpentry workshop in 1932, LEGO has had significant influence on the popular culture of the 20th and 21st centuries. In our analysis, we have seen how the LEGO sets have transformed between the years 1950 to 2017.

In Fig 1, we see the number of different LEGO sets produced in each year has increased over the years. An exception to this general trend is seen between 2002 and 2006, when the number of different sets have decreased from 447 sets produced in 2002 to 283 sets produced in 2006. This period may reflect the period of decline in LEGO's profits between 1992 and 2004, described in [Wikipedia article on History of Lego](https://en.wikipedia.org/wiki/History_of_Lego). 

We have also seen how the distribution of popular themes have changed over the years, and the themes appear to match the contemporary trends such as films or sporting events. Interestingly, Star Wars themes have consistently placed among the themes with most number of sets released, which makes sense. According to the above Wikipedia article, the first LEGO sets featuring licensed intellectual property were Star Wars and Winnie the Pooh sets released in 1999. Since then, various sets have followed suit, featuring other blockbuster movies such as Harry Potter and Steven Spielberg movies. 

Visualization of the color palette used over the years show that the colors used for LEGO parts have become more diverse, as expected. In addition, while light gray and dark gray colors were not used as much in earlier years, they have been used increasingly in recent years. There are some fascinating observations made about the change of color palette used in LEGO blocks in [this article](https://www.brothers-brick.com/2015/10/14/the-changing-palette-of-lego-1975-2014/).

Last but not least, we see that there has been a general increasing trend in the number of parts included in a set. To be honest, I expected the number of parts in each set to have decreased over the years, as the sets and the themes have become more specific and specialized. I'm sure many fans of the LEGO franchise will be relieved to learn that, on average, there are more number of pieces in LEGO sets produced now than there used to be in the 1950s. According to the dataset used here, Taj Mahal set had the most number of LEGO pieces up until 2017, with 5922 pieces. (It appears that [a few LEGO sets](https://thecollector.io/features/2017/09/the-25-biggest-lego-sets-ever/) have been released since then with more number of pieces, but Taj Mahal set still maintains a respectable 4th position as of Apr 01, 2021). 

This has been a very interesting dive into the history of LEGO sets! In the future, I would like to examine how the color palette used and the mean number of parts in a set differ across different themes.
<br></br>

## Dashboard

In [12]:
# Resizing top_themes to be shorter to accommodate another plot
top_themes = (
    top_themes    
    .properties(title="Most used themes in selected year(s)",
                height=100, width=350)
    .add_selection(select_year_click)
)

# Boolean selection for point/line marks
scatter_check = alt.binding_checkbox()
line_check = alt.binding_checkbox()

scatter_selection = alt.selection_single(bind=scatter_check, name="Hide Scatter")
line_selection = alt.selection_single(bind=line_check, name="Hide Mean Trend Line")

max_parts = sets_df["num_parts"].max()

parts_per_set_scatter = (
    alt.Chart(sets_url)
    .mark_point(size=5, fill="navy")
    .encode(
        alt.X("year:O", title="Year"),
        alt.Y("num_parts:Q", title="Number of Parts in a Set",
              scale = alt.Scale(type="sqrt", domain=[0, max_parts])),
        opacity=alt.condition(scatter_selection, alt.value(0.5), alt.value(0.0)),
        tooltip=[alt.Tooltip("name:N", title="Name of Set"),
                 alt.Tooltip("num_parts:Q", title="Number of Parts")])
    .add_selection(scatter_selection)
)
    
parts_per_set_line = (
    alt.Chart(sets_url)
    .mark_line(color="red", strokeWidth=5)
    .encode(
        alt.X("year:O", title="Year"),
        alt.Y("num_parts:Q", aggregate="mean", title="Number of Parts in a Set",
              scale = alt.Scale(type="sqrt", domain=[0, max_parts])),
        opacity=alt.condition(line_selection, alt.value(1), alt.value(0.0)),
        tooltip=[alt.Tooltip("year:O", title="Year"),
                 alt.Tooltip("num_parts:Q", aggregate="mean", title="Mean Number of Parts", format='.2f')])
    .add_selection(line_selection)
)

parts_per_set = (
    (parts_per_set_scatter + parts_per_set_line)
    .properties(height=100, width=350, title="Number of parts in a set")
)

((sets_per_year_click & top_themes & parts_per_set) | colors_parts)

<br></br>

## References

Not all the work in this notebook is original. Parts that were borrowed from other resources are as follows:

### Resources used
- Programming in Python for Data Science sample final project for inspiration
- [Data Source](https://www.kaggle.com/rtatman/lego-database)
- Altair documentation including, but not limited to, 
    - [Top K Items](https://altair-viz.github.io/gallery/top_k_items.html)
    - [Top-K plot with Others](https://altair-viz.github.io/gallery/top_k_with_others.html)
    - [Custom Color Mapping](https://altair-viz.github.io/user_guide/customization.html#color-domain-and-range)
- Image of Venom LEGO set taken from the [LEGO store](https://www.lego.com/en-ca/product/venom-76187)
- [Wikipedia article on the History of Lego](https://en.wikipedia.org/wiki/History_of_Lego)
- [Article on history of LEGO color palette](https://www.brothers-brick.com/2015/10/14/the-changing-palette-of-lego-1975-2014/)
- [Article on the 25 biggest LEGO sets ever](https://thecollector.io/features/2017/09/the-25-biggest-lego-sets-ever/)

![lego-may-the-4th.jpg](data/lego-may-the-4th.jpg)

Image credit to [this blog.](https://vaderfan2187.wordpress.com/2017/09/28/lets-rank-the-lego-may-the-4th-promotional-polybags/)