# Examining Lego's Rich History

###### _**A Project by Ian Fernandez**_

[Website](https://ianfernandez.com/) / [LinkedIn](https://ianfernandez.com/) 

## Introduction 

Due in large part to its ubiquitous toy line, box office success in film adaptations, and lucrative video gaming franchises, Lego is a household name all over the globe. This study will investigate a significant turning point in Lego's development: the production and distribution of licensed kits, such as those based on Star Wars, Super Heroes, and Harry Potter.

While it's not generally known, Lego has weathered its fair share of storms since its inception in the early 20th century. The late 1990s were a challenging time. As this article illustrates, Lego's survival was contingent on the release of its first licensed series, Star Wars, and the continued success of an internal brand (Bionicle).

This analysis aims to answer the following questions regarding the history of Lego 

1. _**What proportion of all licensed sets ever released were themed after Star Wars?**_

2. _**The number of Star Wars Lego sets released each year dates back to the 1980s, but in which year was Star Wars not the most popular licensed theme?**_

## Data Preparation and Cleaning

In [1]:
#Import pandas
import pandas as pd
lego_sets = pd.read_csv("datasets/lego_sets.csv")

#Lego Sets Dataset information
print(lego_sets.shape)
print(lego_sets.info())

#First Rows
lego_sets.head()

(11986, 6)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11986 entries, 0 to 11985
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   set_num       11833 non-null  object 
 1   name          11833 non-null  object 
 2   year          11986 non-null  int64  
 3   num_parts     6926 non-null   float64
 4   theme_name    11833 non-null  object 
 5   parent_theme  11986 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 562.0+ KB
None


Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space


In [2]:
#Drop unnecessary values
lego_sets_clean = lego_sets.dropna(subset = ['set_num'])
lego_sets_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11833 entries, 0 to 11832
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   set_num       11833 non-null  object 
 1   name          11833 non-null  object 
 2   year          11833 non-null  int64  
 3   num_parts     6835 non-null   float64
 4   theme_name    11833 non-null  object 
 5   parent_theme  11833 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 647.1+ KB


In [3]:
parent_themes = pd.read_csv("datasets/parent_themes.csv")

#Parent themes information 
parent_themes.info()
parent_themes.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111 entries, 0 to 110
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           111 non-null    int64 
 1   name         111 non-null    object
 2   is_licensed  111 non-null    bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 2.0+ KB


Unnamed: 0,id,name,is_licensed
0,1,Technic,False
1,22,Creator,False
2,50,Town,False
3,112,Racers,False
4,126,Space,False
5,147,Pirates,False
6,155,Modular Buildings,False
7,158,Star Wars,True
8,186,Castle,False
9,204,Designer Sets,False


In [4]:
#Create licensed_themes t determine the number of licensed sets that were Star Wars themed
licensed_themes = parent_themes[parent_themes['is_licensed'] == True ]['name']
print(licensed_themes.head(10))
type(licensed_themes)

7                    Star Wars
12                Harry Potter
16    Pirates of the Caribbean
17               Indiana Jones
18                        Cars
19                      Ben 10
20            Prince of Persia
21       SpongeBob SquarePants
23                   Toy Story
33                      Avatar
Name: name, dtype: object


pandas.core.series.Series

## Answering the Questions

### _Q1. What proportion of all licensed sets ever released were themed after Star Wars?_

In [5]:
licensed = lego_sets_clean ['parent_theme'].isin(licensed_themes)

licensed_sets = lego_sets_clean [licensed]

print(licensed_sets['parent_theme'].unique())

licensed_sets.head(5)

['Star Wars' 'Super Heroes' 'Harry Potter'
 'The Hobbit and Lord of the Rings' 'Disney Princess' 'Indiana Jones'
 'Prince of Persia' 'Minecraft' 'Toy Story' 'Cars'
 'Pirates of the Caribbean' 'The Lone Ranger'
 'Teenage Mutant Ninja Turtles' 'Jurassic World' 'Scooby-Doo'
 "Disney's Mickey Mouse" 'SpongeBob SquarePants' 'Avatar' 'Disney'
 'Angry Birds' 'Ghostbusters' 'Ben 10']


Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
44,10018-1,Darth Maul,2001,1868.0,Star Wars,Star Wars
45,10019-1,Rebel Blockade Runner - UCS,2001,,Star Wars Episode 4/5/6,Star Wars
54,10026-1,Naboo Starfighter - UCS,2002,,Star Wars Episode 1,Star Wars
57,10030-1,Imperial Star Destroyer - UCS,2002,3115.0,Star Wars Episode 4/5/6,Star Wars
95,10075-1,Spider-Man Action Pack,2002,25.0,Spider-Man,Super Heroes


In [6]:
lego_joined = lego_sets_clean.merge(parent_themes,how='left',
                                   left_on='parent_theme',
                                   right_on='name',
                                   suffixes=("","_y" ))

lego_joined.head()

Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme,id,name_y,is_licensed
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland,411,Legoland,False
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town,50,Town,False
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle,186,Castle,False
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space,126,Space,False
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space,126,Space,False


In [7]:
lego_joined_licensed = lego_joined[lego_joined['is_licensed'] == True]

licensed_sets_alt = lego_joined_licensed.drop(axis=1, columns=['id','name_y','is_licensed'])

licensed_sets_alt.equals(licensed_sets)

True

In [8]:
all_sets = len(licensed_sets)
all_sets

1179

In [9]:
star_wars_sets = len(licensed_sets[licensed_sets['parent_theme'] == 'Star Wars'])
star_wars_sets

609

In [10]:
ratio = star_wars_sets/all_sets

the_force = int(ratio*100)

print(the_force)

51


### _Q2. The number of Star Wars Lego sets released each year dates back to the 1980s, but in which year was Star Wars not the most popular licensed theme?_

In [11]:
licensed_pivot = licensed_sets.pivot_table(index='year',
                                          columns = 'parent_theme',
                                          values='set_num',
                                         aggfunc='count')

licensed_pivot

parent_theme,Angry Birds,Avatar,Ben 10,Cars,Disney,Disney Princess,Disney's Mickey Mouse,Ghostbusters,Harry Potter,Indiana Jones,...,Pirates of the Caribbean,Prince of Persia,Scooby-Doo,SpongeBob SquarePants,Star Wars,Super Heroes,Teenage Mutant Ninja Turtles,The Hobbit and Lord of the Rings,The Lone Ranger,Toy Story
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1999,,,,,,,,,,,...,,,,,13.0,,,,,
2000,,,,,,,5.0,,,,...,,,,,26.0,,,,,
2001,,,,,,,,,11.0,,...,,,,,14.0,,,,,
2002,,,,,,,,,19.0,,...,,,,,28.0,3.0,,,,
2003,,,,,,,,,3.0,,...,,,,,32.0,5.0,,,,
2004,,,,,,,,,14.0,,...,,,,,20.0,6.0,,,,
2005,,,,,,,1.0,,5.0,,...,,,,,28.0,1.0,,,,
2006,,2.0,,,,,,,,,...,,,,3.0,11.0,8.0,,,,
2007,,,,,,,,,1.0,,...,,,,2.0,16.0,2.0,,,,
2008,,,,,,,,,,12.0,...,,,,3.0,23.0,5.0,,,,


In [12]:
yearly_max=licensed_pivot.max(axis='columns')

yearly_max

year
1999    13.0
2000    26.0
2001    14.0
2002    28.0
2003    32.0
2004    20.0
2005    28.0
2006    11.0
2007    16.0
2008    23.0
2009    39.0
2010    30.0
2011    32.0
2012    43.0
2013    35.0
2014    45.0
2015    58.0
2016    61.0
2017    72.0
dtype: float64

In [13]:
licensed_pivot[licensed_pivot['Star Wars'] < yearly_max]

parent_theme,Angry Birds,Avatar,Ben 10,Cars,Disney,Disney Princess,Disney's Mickey Mouse,Ghostbusters,Harry Potter,Indiana Jones,...,Pirates of the Caribbean,Prince of Persia,Scooby-Doo,SpongeBob SquarePants,Star Wars,Super Heroes,Teenage Mutant Ninja Turtles,The Hobbit and Lord of the Rings,The Lone Ranger,Toy Story
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017,,,,,,6.0,,,,,...,1.0,,,,55.0,72.0,,,,


In [14]:
most_popular_sets = licensed_sets.groupby('year')['parent_theme'].agg(pd.Series.mode)

most_popular_sets

year
1999       Star Wars
2000       Star Wars
2001       Star Wars
2002       Star Wars
2003       Star Wars
2004       Star Wars
2005       Star Wars
2006       Star Wars
2007       Star Wars
2008       Star Wars
2009       Star Wars
2010       Star Wars
2011       Star Wars
2012       Star Wars
2013       Star Wars
2014       Star Wars
2015       Star Wars
2016       Star Wars
2017    Super Heroes
Name: parent_theme, dtype: object

In [15]:
new_era =int(most_popular_sets[most_popular_sets != 'Star Wars'].index[0])
new_era

2017

## Conclusion

### _A1. 51% of all licensed sets ever released were themed after Star Wars_
### _A2. 2017 was the year in which year Star Wars not the most popular licensed theme_