#### Removing Values

You have seen:

1. sklearn break when introducing missing values
2. reasons for dropping missing values

It is time to make sure you are comfortable with the methods for dropping missing values in pandas.  You can drop values by row or by column, and you can drop based on whether **any** value is missing in a particular row or column or **all** are values in a row or column are missing.

A useful set of many resources in pandas is available [here](https://chrisalbon.com/).  Specifically, Chris takes a close look at missing values [here](https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/).  Another resource can be found [here](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-certain-columns-is-nan).

In [2]:
import numpy as np
import pandas as pd
import RemovingValues as t
import matplotlib.pyplot as plt
%matplotlib inline

small_dataset = pd.DataFrame({'col1': [1, 2, np.nan, np.nan, 5, 6], 
                              'col2': [7, 8, np.nan, 10, 11, 12],
                              'col3': [np.nan, 14, np.nan, 16, 17, 18]})

small_dataset

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
2,,,
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


#### Question 1

**1.** Drop any row with a missing value.

In [37]:
small_dataset.dropna?

[1;31mSignature:[0m
[0msmall_dataset[0m[1;33m.[0m[0mdropna[0m[1;33m([0m[1;33m
[0m    [0maxis[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m[1;33m
[0m    [0mhow[0m[1;33m=[0m[1;34m'any'[0m[1;33m,[0m[1;33m
[0m    [0mthresh[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0msubset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0minplace[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Remove missing values.

See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    Determine if rows or columns which contain missing values are
    removed.

    * 0, or 'index' : Drop rows which contain missing values.
    * 1, or 'columns' : Drop columns which contain missing value.

    .. versionchanged:: 1.0.0

       Pass tuple or list to drop on multiple

In [35]:
all_drop  = small_dataset.dropna()# Drop any row with a missing value


#print result
all_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [36]:
t.all_drop_test(all_drop) #test

Nice job! That looks right!


#### Question 2

**2.** Drop only the row with all missing values.

In [42]:
all_row = small_dataset.dropna(how='all')# Drop only rows with all missing values 


#print result
all_row

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [41]:
t.all_row_test(all_row) #test

That wasn't quite as expected.  Try again, or take a look at the solution notebook if you get stuck.


#### Question 3

**3.** Drop only the rows with missing values in column 3.

In [47]:
only3_drop = small_dataset.dropna(subset=['col3'])# Drop only rows with missing values in column 3


#print result
only3_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [48]:
t.only3_drop_test(only3_drop) #test

Nice job! That looks right!


#### Question 4

**4.** Drop only the rows with missing values in column 3 or column 1.

In [49]:
only3or1_drop = small_dataset.dropna(subset=['col1','col3'])# Drop rows with missing values in column 1 or column 3


#print result
only3or1_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [50]:
t.only3or1_drop_test(only3or1_drop) #test

Nice job! That looks right!
