## 1. Introduction
<p>In the world of Pokémon academia, one name towers above any other – Professor Samuel Oak. While his colleague Professor Elm specializes in Pokémon evolution, Oak has dedicated his career to understanding the relationship between Pokémon and their human trainers. A former trainer himself, the professor has first-hand experience of how obstinate Pokémon can be – particularly when they hold legendary status.</p>
<p>For his latest research project, Professor Oak has decided to investigate the defining characteristics of legendary Pokémon to improve our understanding of their temperament. Hearing of our expertise in classification problems, he has enlisted us as the lead researchers.</p>
<p>Our journey begins at the professor's research lab in Pallet Town, Kanto. The first step is to open up the Pokédex, an encyclopaedic guide to 801 Pokémon from all seven generations.</p>
<p><img src="https://assets.datacamp.com/production/project_712/img/legendary_pokemon.jpg" alt="Legendary Pokémon"></p>
<p><em>Source: <a href="https://www.flickr.com/photos/bagogames/">bagogames</a> on Flickr</em></p>

These are the variables in the *Pokemon* dataset (*pokedex.csv*):

REWVIEW THIS

- **name**: The English name of the Pokemon
- **japanese_name**: The Original Japanese name of the Pokemon
- **pokedex_number**: The entry number of the Pokemon in the National Pokedex
- **percentage_male**: The percentage of the species that are male. Blank if the Pokemon is genderless.
- **type1**: The Primary Type of the Pokemon
- **type2**: The Secondary Type of the Pokemon
- **classification**: The Classification of the Pokemon as described by the Sun and Moon Pokedex
- **height_m**: Height of the Pokemon in metres
- **weight_kg**: The Weight of the Pokemon in kilograms
- **capture_rate**: Capture Rate of the Pokemon
- **baseeggsteps**: The number of steps required to hatch an egg of the Pokemon
- **abilities**: A stringified list of abilities that the Pokemon is capable of having
- **experience_growth**: The Experience Growth of the Pokemon
- **base_happiness**: Base Happiness of the Pokemon
- **against_?**: Eighteen features that denote the amount of damage taken against an attack of a particular type
- **hp**: The Base HP of the Pokemon
- **attack**: The Base Attack of the Pokemon
- **defense**: The Base Defense of the Pokemon
- **sp_attack**: The Base Special Attack of the Pokemon
- **sp_defense**: The Base Special Defense of the Pokemon
- **speed**: The Base Speed of the Pokemon
- **generation**: The numbered generation which the Pokemon was first introduced
- **is_legendary**: Denotes if the Pokemon is legendary.


### Load your data and take a look at it

In [None]:
# Load libraries

# Import the dataset and convert variables

# Look at the first rows


# Examine the structure and get some descriptive statistics


: 

## 2. How many Pokémon are legendary?
<p>After browsing the Pokédex, we can see several variables that could feasibly explain what makes a Pokémon legendary. We have a series of numerical fighter stats – <code>attack</code>, <code>defense</code>, <code>speed</code> and so on – as well as a categorization of Pokemon <code>type</code> (bug, dark, dragon, etc.). <code>is_legendary</code> is the binary classification variable we will eventually be predicting, tagged <code>1</code> if a Pokémon is legendary and <code>0</code> if it is not.</p>
<p>Before we explore these variables in any depth, let's find out how many Pokémon are legendary out of the 801 total.

In [None]:
# 




: 

## 3. Legendary Pokémon by height and weight
<p>We now know that there are INSERT HERE legendary Pokémon – a sizable minority at INSERT HERE% of the population! Let's start to explore some of their distinguishing characteristics.</p>
<p>First of all, we'll plot the relationship between <code>height_m</code> and <code>weight_kg</code> for all 801 Pokémon, highlighting those that are classified as legendary. We'll also add conditional labels to the plot, which will only print a Pokémon's name if it is taller than 7.5m or heavier than 600kg.</p>

In [None]:
# Prepare the plot


# Print the plot


: 

## 4. Legendary Pokémon by type
<p>It seems that legendary Pokémon are generally INSERT HERE and INSERT HERE, but with many exceptions. For example, Onix (Gen 1), Steelix (Gen 2) and Wailord (Gen 3) are all extremely tall, but none of them have legendary status. There must be other factors at play.</p>
<p>We will now look at the effect of a Pokémon's <code>type</code> on its legendary/non-legendary classification. There are 18 possible types, ranging from the common (Grass / Normal / Water) to the rare (Fairy / Flying / Ice). We will calculate the proportion of legendary Pokémon within each category, and then plot these proportions using a simple bar chart.</p>

In [None]:
# Prepare the data


# Prepare the plot

# Print the plot


## 5. Legendary Pokémon by fighter stats
<p>There are clear differences between Pokémon types in their relation to legendary status. While more than INSERT HERE% of flying and psychic Pokémon are legendary, there is no such thing as a legendary poison or fighting Pokémon!</p>
<p>Before fitting the model, we will consider the influence of a Pokémon's fighter stats (<code>attack</code>, <code>defense</code>, etc.) on its status. Rather than considering each stat in isolation, we will produce a boxplot for all of them simultaneously using the <code>facet_wrap()</code> function.</p>

In [None]:
# Prepare the data


# Prepare the plot

# Print the plot


: 

## 6. Create a training/test split
<p>As we might expect, legendary Pokémon outshine their ordinary counterparts in all fighter stats. Although we haven't formally tested a difference in means, the boxplots suggest a significant difference with respect to all six variables. Nonetheless, there are a number of outliers in each case, meaning that some legendary Pokémon are anomalously weak.</p>
<p>We have now explored all of the predictor variables we will use to explain what makes a Pokémon legendary. Before fitting our model, we will split the <code>pokedex</code> into a training set (<code>pokedex_train</code>) and a test set (<code>pokedex_test</code>). This will allow us to test the model on unseen data.</p>

In [None]:

# Create training and test set


## 7. Fit a decision tree
<p>Now we have our training and test sets, we can go about building our classifier. But before we fit a random forest, we will fit a simple <strong>classification decision tree</strong>. This will give us a baseline fit against which to compare the results of the random forest, as well as an informative graphical representation of the model.</p>
<p>Here, and also in the random forest, we nedd to omit incomplete observations. This will remove a few Pokémon with missing values for <code>height_m</code> and <code>weight_kg</code> from the training set.

In [None]:
# Remove NAs


# Fit decision tree

# Plot decision tree


: 

NOW DESCRIBE THE TREE

## 8. Fit a random forest
<p>Decision trees are unstable and sensitive to small variations in the data. It therefore makes sense to fit a <strong>random forest</strong> – an ensemble method that averages over several decision trees all at once. This should give us a more robust model that classifies Pokémon with greater accuracy.</p>

In [None]:

# Fit random forest

# Print model output


NOW EXPLAIN THE FOREST

## 9. Assess model fit
PROVIDE A PROPER ASSESSMENT OF THE RF
<p>In order to allow direct comparison with the decision tree, we will plot the <strong>ROC curves</strong> for both models, which will visualize their true positive rate (TPR) and false positive rate (FPR) respectively. The closer the curve is to the top left of the plot, the higher the area under the curve (AUC) and the better the model.</p>

In [None]:

# Plot the ROC curves: first for the decision tree, then for the random forest


: 

## 10. Analyze variable importance
<p>It's clear from the ROC curves that the INSERT HERE is a substantially better model, boasting an AUC of INSERT HERE% versus the decision tree's INSERT HERE%. When calculating variable importance, it makes sense to do so with the best model available, so we'll use the INSERT HERE for the final part of our analysis.</p>


In [None]:
# Print variable importance measures

# Create a dotchart of variable importance


## 11. Conclusion
WRITE YOUR CONCLUSIONS HERE


### Congratulations on completing your research into legendary Pokémon – Professor Oak is excited to share the findings! 
