![star_wars_unsplash](star_wars_unsplash.jpg)

Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.

The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets. The partnerships team has asked you to perform an analysis of this success, and before diving into the analysis, they have suggested reading the descriptions of the two datasets to use, reported below.

## The Data

You have been provided with two datasets to use. A summary and preview are provided below.

## lego_sets.csv

| Column     | Description              |
|------------|--------------------------|
| `"set_num"` | A code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid! |
| `"name"` | The name of the set. |
| `"year"` | The date the set was released. |
| `"num_parts"` | The number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable. |
| `"theme_name"` | The name of the sub-theme of the set. |
| `"parent_theme"` | The name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.
|

## parent_themes.csv

| Column     | Description              |
|------------|--------------------------|
| `"id"` | A code that is unique to every theme. |
| `"name"` | The name of the parent theme. |
| `"is_licensed"` | A Boolean column specifying whether the theme is a licensed theme. |

In [50]:
# Import pandas, read and inspect the datasets
import pandas as pd

lego_sets = pd.read_csv('data/lego_sets.csv')
lego_sets.head()

Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space


In [51]:
parent_themes = pd.read_csv('data/parent_themes.csv')
parent_themes.head()

Unnamed: 0,id,name,is_licensed
0,1,Technic,False
1,22,Creator,False
2,50,Town,False
3,112,Racers,False
4,126,Space,False


In [52]:
# Start coding here
# Use as many cells as you need

# Check for "set_num" duplicate and missing values:
duplicates = lego_sets["set_num"].duplicated().any()
print(duplicates)
duplicates_list = lego_sets.pivot_table(columns = ["set_num"], aggfunc= "size")
print(duplicates_list)
missing_value = lego_sets["set_num"].isna().any()
print(missing_value)
print(lego_sets.shape)
lego_sets_no_na = lego_sets.dropna(subset = "set_num")
print(lego_sets_no_na.shape)

# Percentage of licensed Star-Wars themed sets released
sets_themes = lego_sets_no_na.merge(parent_themes, how = "right", left_on = "theme_name", right_on= "name", suffixes = ("_set", "_theme"))
print(sets_themes.head())
print(sets_themes.shape)
total_licensed_count = sets_themes[sets_themes["is_licensed"] == True].count()
total_starwars_count = sets_themes[(sets_themes["is_licensed"] == True) & (sets_themes["name_theme"] == "Star Wars")].count()
print(total_licensed_count["set_num"])
print(total_starwars_count["set_num"])
the_force = int(total_starwars_count["set_num"] / total_licensed_count["set_num"] * 100)
print(the_force)

# Year of highest Star-Wars sets releases

pivot_table_years = lego_sets_no_na.pivot_table(values= "set_num", index = "year", columns=[lego_sets_no_na.parent_theme == "Star Wars"], fill_value = 0, aggfunc= "count")
print(pivot_table_years)
pivot_table_df = pivot_table_years.reset_index()
print(pivot_table_df)
pivot_table_df_srt = pivot_table_df.sort_values(True, ascending = False)
print(pivot_table_df_srt)
new_era = int(pivot_table_df_srt.iloc[0,0])
print(new_era)


True
set_num
00-1            1
00-2            1
00-3            1
00-4            1
00-6            1
               ..
tominifigs-1    1
trucapam-1      1
tsuper-1        1
vwkit-1         1
wwgp1-1         1
Length: 11833, dtype: int64
True
(11986, 6)
(11833, 6)
   set_num             name_set    year  ...  id name_theme is_licensed
0  10072-1        TECHNIC Beams  2003.0  ...   1    Technic       False
1  10073-1               Bushes  2003.0  ...   1    Technic       False
2  10074-1          Cross Axles  2003.0  ...   1    Technic       False
3  10076-1  TECHNIC Gear Wheels  2002.0  ...   1    Technic       False
4  10077-1        TECHNIC Motor  2003.0  ...   1    Technic       False

[5 rows x 9 columns]
(4746, 9)
489
211
43
parent_theme  False  True 
year                      
1950              7      0
1953              4      0
1954             14      0
1955             28      0
1956             12      0
...             ...    ...
2013            558     35
2014            