#### Removing Values

You have seen:

1. sklearn break when introducing missing values
2. reasons for dropping missing values

It is time to make sure you are comfortable with the methods for dropping missing values in pandas.  You can drop values by row or by column, and you can drop based on whether **any** value is missing in a particular row or column or **all** are values in a row or column are missing.

A useful set of many resources in pandas is available [here](https://chrisalbon.com/).  Specifically, Chris takes a close look at missing values [here](https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/).  Another resource can be found [here](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-certain-columns-is-nan).

In [2]:
import numpy as np
import pandas as pd
import RemovingValues as t
import matplotlib.pyplot as plt
%matplotlib inline

small_dataset = pd.DataFrame({'col1': [1, 2, np.nan, np.nan, 5, 6], 
                              'col2': [7, 8, np.nan, 10, 11, 12],
                              'col3': [np.nan, 14, np.nan, 16, 17, 18]})

small_dataset

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
2,,,
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


#### Question 1

**1.** Drop any row with a missing value.

In [56]:
#dropped_anymissing = small_dataset[~small_dataset.isna().any(axis=1)]
dropped_anymissing = small_dataset.dropna()

dropped_anymissing

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [57]:
all_drop  = dropped_anymissing # Drop any row with a missing value

#print result
all_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [58]:
t.all_drop_test(all_drop) #test

Nice job! That looks right!


#### Question 2

**2.** Drop only the row with all missing values.

In [59]:
#dropped_allmissing = small_dataset[~small_dataset.isna().all(axis=1)]
dropped_allmissing = small_dataset.dropna(how='all')

dropped_allmissing

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [60]:
all_row = dropped_allmissing # Drop only rows with all missing values 

#print result
all_row

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [61]:
t.all_row_test(all_row) #test

Nice job! That looks right!


#### Question 3

**3.** Drop only the rows with missing values in column 3.

In [62]:
#selected_colname = small_dataset.columns[2]
#dropped_col3missing = small_dataset[~small_dataset[selected_colname].isna()]

dropped_col3missing = small_dataset.dropna(subset=['col3'])

dropped_col3missing

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [63]:
only3_drop = dropped_col3missing # Drop only rows with missing values in column 3

#print result
only3_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [64]:
t.only3_drop_test(only3_drop) #test

Nice job! That looks right!


#### Question 4

**4.** Drop only the rows with missing values in column 3 or column 1.

In [65]:
#selected_colnames = [small_dataset.columns[0], small_dataset.columns[2]]
#dropped_col13missing = small_dataset[~small_dataset[selected_colnames].isna().any(axis=1)]

dropped_col13missing = small_dataset.dropna(subset=['col1', 'col3'])

dropped_col13missing

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [66]:
only3or1_drop = dropped_col13missing # Drop rows with missing values in column 1 or column 3

#print result
only3or1_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [67]:
t.only3or1_drop_test(only3or1_drop) #test

Nice job! That looks right!
