![star_wars_unsplash](star_wars_unsplash.jpg)

Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.

The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets.


## lego_sets.csv

| Column     | Description              |
|------------|--------------------------|
| `"set_num"` | A code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid! |
| `"name"` | The name of the set. |
| `"year"` | The date the set was released. |
| `"num_parts"` | The number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable. |
| `"theme_name"` | The name of the sub-theme of the set. |
| `"parent_theme"` | The name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.
|

## parent_themes.csv

| Column     | Description              |
|------------|--------------------------|
| `"id"` | A code that is unique to every theme. |
| `"name"` | The name of the parent theme. |
| `"is_licensed"` | A Boolean column specifying whether the theme is a licensed theme. |

In [None]:
import pandas as pd

lego_sets = pd.read_csv('data/lego_sets.csv')
lego_sets.head()

lego_sets_clean = lego_sets.dropna(subset=['set_num','name','theme_name'])
lego_sets_clean.head()

Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space


In [46]:
parent_themes = pd.read_csv('data/parent_themes.csv')
parent_themes.head(50)
licensed_themes = parent_themes[parent_themes["is_licensed"]]["name"]
licensed_themes.head()

7                    Star Wars
12                Harry Potter
16    Pirates of the Caribbean
17               Indiana Jones
18                        Cars
Name: name, dtype: object

In [47]:
licensed = lego_sets_clean["parent_theme"].isin(licensed_themes)
licensed_sets = lego_sets_clean[licensed]
licensed_sets.head()

Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
44,10018-1,Darth Maul,2001,1868.0,Star Wars,Star Wars
45,10019-1,Rebel Blockade Runner - UCS,2001,,Star Wars Episode 4/5/6,Star Wars
54,10026-1,Naboo Starfighter - UCS,2002,,Star Wars Episode 1,Star Wars
57,10030-1,Imperial Star Destroyer - UCS,2002,3115.0,Star Wars Episode 4/5/6,Star Wars
95,10075-1,Spider-Man Action Pack,2002,25.0,Spider-Man,Super Heroes


In [48]:
all_sets = len(licensed_sets)
#print (all_sets)
star_wars_sets = len(licensed_sets[licensed_sets["parent_theme"] == "Star Wars"])
the_force = int((star_wars_sets / all_sets) * 100)
print(f'The percentage of licensed sets that are Star Wars themed is {the_force}%.')

The percentage of licensed sets that are Star Wars themed is 51%.


In [49]:
licensed_pivot = licensed_sets.pivot_table(index = "year", columns = "parent_theme", values="set_num", aggfunc="count")

In [50]:
licensed_pivot.sort_values(by="Star Wars", ascending=False)["Star Wars"]
new_era=2016
print(f"The year when the most Star Wars sets were released was {new_era}.")

The year when the most Star Wars sets were released was 2016.
