![star_wars_unsplash](star_wars_unsplash.jpg)

Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.

The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets. The partnerships team has asked you to perform an analysis of this success, and before diving into the analysis, they have suggested reading the descriptions of the two datasets to use, reported below.

## The Data

You have been provided with two datasets to use. A summary and preview are provided below.

## lego_sets.csv

| Column     | Description              |
|------------|--------------------------|
| `"set_num"` | A code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid! |
| `"name"` | The name of the set. |
| `"year"` | The date the set was released. |
| `"num_parts"` | The number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable. |
| `"theme_name"` | The name of the sub-theme of the set. |
| `"parent_theme"` | The name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.
|

## parent_themes.csv

| Column     | Description              |
|------------|--------------------------|
| `"id"` | A code that is unique to every theme. |
| `"name"` | The name of the parent theme. |
| `"is_licensed"` | A Boolean column specifying whether the theme is a licensed theme. |

#### The team responsible for the Star Wars partnership has asked for specific information in preparation for their meeting:

#### 1. What percentage of all licensed sets ever released were Star Wars themed? Save your answer as a variable the_force, as an integer (e.g. 25).

#### 2. In which year was the highest number of Star Wars sets released? Save your answer as a variable new_era, as an integer (e.g. 2012).

In [267]:
# Import pandas, read and inspect the datasets
import pandas as pd

lego_sets = pd.read_csv('data/lego_sets.csv')
lego_sets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11986 entries, 0 to 11985
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   set_num       11833 non-null  object 
 1   name          11833 non-null  object 
 2   year          11986 non-null  int64  
 3   num_parts     6926 non-null   float64
 4   theme_name    11833 non-null  object 
 5   parent_theme  11986 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 562.0+ KB


In [268]:
parent_themes = pd.read_csv('data/parent_themes.csv')
parent_themes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111 entries, 0 to 110
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           111 non-null    int64 
 1   name         111 non-null    object
 2   is_licensed  111 non-null    bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 2.0+ KB


#### Question 1

In [269]:
parent_themes = parent_themes.rename(columns={'name' : 'parent_theme'})
print(parent_themes)

      id  parent_theme  is_licensed
0      1       Technic        False
1     22       Creator        False
2     50          Town        False
3    112        Racers        False
4    126         Space        False
..   ...           ...          ...
106  605  Nexo Knights        False
107  606   Angry Birds         True
108  607  Ghostbusters         True
109  608        Disney         True
110  610    Brickheadz        False

[111 rows x 3 columns]


In [270]:
print(lego_sets['parent_theme'].value_counts())

parent_theme
Town              1116
Seasonal           928
Star Wars          609
Technic            536
Service Packs      456
                  ... 
Avatar               2
LEGO Exclusive       1
Universe             1
Disney               1
Ghostbusters         1
Name: count, Length: 109, dtype: int64


In [271]:
merged = parent_themes.merge(lego_sets, on = 'parent_theme')
print(merged)

        id parent_theme  is_licensed  set_num                            name  \
0        1      Technic        False   1030-1  TECHNIC I: Simple Machines Set   
1        1      Technic        False   1032-1           TECHNIC II Set {4.5v}   
2        1      Technic        False   1038-1              ERBIE the Robo-Car   
3        1      Technic        False   1039-1            Manual Control Set 1   
4        1      Technic        False   1061-1                Single Disk Pack   
...    ...          ...          ...      ...                             ...   
11981  610   Brickheadz        False  41593-1            Captain Jack Sparrow   
11982  610   Brickheadz        False  41594-1         Captain Armando Salazar   
11983  610   Brickheadz        False  41595-1                           Belle   
11984  610   Brickheadz        False  41596-1                           Beast   
11985  610   Brickheadz        False  DCBHZ-1                    Wonder Woman   

       year  num_parts     

In [272]:
sorted_by_star = merged[merged['parent_theme'] == 'Star Wars']
licensed_star = sorted_by_star[sorted_by_star['is_licensed'] == True]

is_licensed =merged[merged['is_licensed'] == True]

the_force = int((len(licensed_star)/len(is_licensed))*100)

print(the_force)

45


#### Question 2

In [273]:
new_era_df = sorted_by_star.groupby('year')['theme_name'].count().sort_values(ascending=False)

print(new_era_df)

new_era = 2016

year
2016    61
2015    58
2017    55
2014    45
2012    43
2009    39
2013    35
2003    32
2011    32
2010    30
2002    28
2005    28
2000    26
2008    23
2004    20
2007    16
2001    14
1999    13
2006    11
Name: theme_name, dtype: int64
