***
Using just the **Sex** feature for each passenger, we are able to increase the accuracy of our predictions by a significant margin. Now, let's consider using an additional feature to see if we can further improve our predictions. For example, consider all of the male passengers aboard the RMS Titanic: Can we find a subset of those passengers that had a higher rate of survival? Let's start by looking at the **Age** of each male, by again using the `survival_stats` function. This time, we'll use a fourth parameter to filter out the data so that only passengers with the **Sex** 'male' will be included.  
Run the code cell below to plot the survival outcomes of male passengers based on their age.

In [None]:
vs.survival_stats(data, outcomes, 'Age', ["Sex == 'male'"])

Examining the survival statistics, the majority of males younger than 10 survived the ship sinking, whereas most males age 10 or older *did not survive* the ship sinking. Let's continue to build on our previous prediction: If a passenger was female, then we will predict they survive. If a passenger was male and younger than 10, then we will also predict they survive. Otherwise, we will predict they do not survive.  
Fill in the missing code below so that the function will make this prediction.  
**Hint:** You can start your implementation of this function using the prediction code you wrote earlier from `predictions_1`.

In [None]:
def predictions_2(data):
    """ Model with two features: 
            - Predict a passenger survived if they are female.
            - Predict a passenger survived if they are male and younger than 10. """
    
    predictions = []
    for _, passenger in data.iterrows():
        
        # Remove the 'pass' statement below 
        # and write your prediction conditions here
        if passenger['Sex']=='female':
            predictions.append(1)
        elif passenger['Sex']=='male' and passenger['Age']<10:
            predictions.append(1)
        else:
            predictions.append(0)
    
    # Return our predictions
    return pd.Series(predictions)

# Make the predictions
predictions = predictions_2(data)

### Question 3
*How accurate would a prediction be that all female passengers and all male passengers younger than 10 survived?*  
**Hint:** Run the code cell below to see the accuracy of this prediction.

In [None]:
print accuracy_score(outcomes, predictions)

**Answer**: *Predictions have an accuracy of 79.35%.*

***
Adding the feature **Age** as a condition in conjunction with **Sex** improves the accuracy by a small margin more than with simply using the feature **Sex** alone. Now it's your turn: Find a series of features and conditions to split the data on to obtain an outcome prediction accuracy of at least 80%. This may require multiple features and multiple levels of conditional statements to succeed. You can use the same feature multiple times with different conditions.   
**Pclass**, **Sex**, **Age**, **SibSp**, and **Parch** are some suggested features to try.

Use the `survival_stats` function below to to examine various survival statistics.  
**Hint:** To use mulitple filter conditions, put each condition in the list passed as the last argument. Example: `["Sex == 'male'", "Age < 18"]`

In [None]:
vs.survival_stats(data, outcomes, 'Age', ["Sex == 'male'", "Age < 18"])

In [None]:
vs.survival_stats(data, outcomes, 'Fare',["Sex == 'male'", "Age > 10", "Pclass == 1"])

After exploring the survival statistics visualization, fill in the missing code below so that the function will make your prediction.  
Make sure to keep track of the various features and conditions you tried before arriving at your final prediction model.  
**Hint:** You can start your implementation of this function using the prediction code you wrote earlier from `predictions_2`.

### Question 3 - explore features, goal predict accuracy of over 80%

In [None]:
# Let's take a look at all the features once again...
data.columns.values

**Right now we know that women and children had a higher chance of survival. 
Perhaps because of the "women and children first" idea
Now, there are some possible things to explore:**
    * Did lower or higher class survive at higher rates?*
    * Perhaps the Fare determine location on the ship and hence survival?*
    * Maybe people with bigger families (spouses, siblings, parents, children) had a hard time moving?*
    * What about the cabin or place of embarkation?*
** Now lets investigate all these possibilities ! **

In [None]:
# Investigate the impact of class variable on women
vs.survival_stats(data, outcomes, 'Pclass',["Sex == 'female'"])

*It's clear that women in class 1 and class 2 survived at higher rates than those 
in class 3. Perhaps there were only a few lifeboats and they were located/reserved for 
class 1 and 2 where more wealthy passengers were?*

** Now let's dig deeper into the women in class 3 and see if we can diffentiate them better**

In [None]:
# Dig more into women on class 3
vs.survival_stats(data, outcomes, 'Embarked',["Sex == 'female'", "Pclass == 3"])

*It seems that women in class 3 who embarked at point C and Q survived at 
higher rates than they didn't *

In [None]:
vs.survival_stats(data, outcomes, 'Fare',["Sex == 'female'", "Pclass == 3"])

*Clearly women in class 3 who paid leass than $20 Fare also survived at higher rates!
That is surprising, maybe the lowest fare cabins where for staff and less priviledged
but were situated in a good position to allow survival*

In [None]:
vs.survival_stats(data, outcomes, 'Parch',["Sex == 'female'", "Pclass == 3"])

*It seems women in class 3 with no parents or children also survived at high rates,
maybe they were unencumbered!*

In [None]:
vs.survival_stats(data, outcomes, 'Age',["Sex == 'female'", "Pclass == 3"])

*Also, women in class 3 and less than 20 years old all survived*

In [None]:
vs.survival_stats(data, outcomes, 'Cabin',["Sex == 'female'", "Pclass == 3"])

In [None]:
vs.survival_stats(data, outcomes, 'Ticket',["Sex == 'female'", "Pclass == 3"])

*But both Cabin and Ticket have too many categories for visualization at the moment.
They are good candidates for feature engineering in the future though...*

#### OK, that's a lot of exploration on female passengers. Now lets see what happened to the men... 

In [None]:
vs.survival_stats(data, outcomes, 'Pclass',["Sex == 'male'", "Age > 10"])

*We know male children (under 10) all survived so looking at males over 10, most of
them did not survive especially those in class 2. So let's investigate males in class
1 and 3*