#### Removing Values

You have seen in this [Repo](https://github.com/A2Amir/Data-Science-Process/blob/master/Code/What%20Happened.ipynb):

1. sklearn break when introducing missing values
2. reasons for dropping missing values

It is time to make sure you are comfortable with the methods for dropping missing values in pandas.  You can drop values by row or by column, and you can drop based on whether **any** value is missing in a particular row or column or **all** are values in a row or column are missing.

A useful set of many resources in pandas is available [here](https://chrisalbon.com/).  Specifically, Chris takes a close look at missing values [here](https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/).  Another resource can be found [here](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-certain-columns-is-nan).

In [1]:
import numpy as np
import pandas as pd
import RemovingValues as t
import matplotlib.pyplot as plt
%matplotlib inline

small_dataset = pd.DataFrame({'col1': [1, 2, np.nan, np.nan, 5, 6], 
                              'col2': [7, 8, np.nan, 10, 11, 12],
                              'col3': [np.nan, 14, np.nan, 16, 17, 18]})

small_dataset

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
2,,,
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [2]:
#to find all columns that have missin values
small_dataset.isna().any()

col1    True
col2    True
col3    True
dtype: bool

In [3]:
small_dataset.isnull()

Unnamed: 0,col1,col2,col3
0,False,False,True
1,False,False,False
2,True,True,True
3,True,False,False
4,False,False,False
5,False,False,False


#### Question 1

**1.** Drop any row with a missing value.

In [4]:
all_drop  = small_dataset.dropna(axis=0,how='any')
# Drop any row with a missing value


#print result
all_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [5]:
t.all_drop_test(all_drop) #test

Nice job! That looks right!


#### Question 2

**2.** Drop only the row with all missing values.

In [6]:
all_row =small_dataset.dropna(axis=0,how='all') # Drop only rows with all missing values 


#print result
all_row

Unnamed: 0,col1,col2,col3
0,1.0,7.0,
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [7]:
t.all_row_test(all_row) #test

Nice job! That looks right!


#### Question 3

**3.** Drop only the rows with missing values in column 3.

In [8]:
only3_drop = small_dataset.dropna(axis=0,subset=['col3'])# Drop only rows with missing values in column 3


#print result
only3_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
3,,10.0,16.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [9]:
t.only3_drop_test(only3_drop) #test

Nice job! That looks right!


#### Question 4

**4.** Drop only the rows with missing values in column 3 or column 1.

In [10]:
only3or1_drop = small_dataset.dropna(axis=0,subset=['col1','col3'])# Drop rows with missing values in column 1 or column 3


#print result
only3or1_drop

Unnamed: 0,col1,col2,col3
1,2.0,8.0,14.0
4,5.0,11.0,17.0
5,6.0,12.0,18.0


In [11]:
t.only3or1_drop_test(only3or1_drop) #test

Nice job! That looks right!
