# Methods for handling missing values
**pandas provides the following methods to handle missing values:**

- `isna`: Returns a Series of booleans based on whether each value is missing or not.
- `notna`: Exact opposite of isna.
- `fillna`: Fills missing values in a variety of ways
- `dropna`: Drops the missing values from the Series

In [1]:
import pandas as pd

In [2]:
movie = pd.read_csv("data/movie.csv")

In [4]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4810 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [6]:
#number of missing values in year column
movie["year"].isna().sum()

106

In [9]:
#get all rows in which the year column is missing
filter_1 = movie["year"].isna()
missing_year = movie[filter_1]


In [10]:
missing_year["year"].count()

0

In [11]:
missing_year["year"]

4      NaN
176    NaN
257    NaN
276    NaN
398    NaN
        ..
4683   NaN
4688   NaN
4704   NaN
4752   NaN
4912   NaN
Name: year, Length: 106, dtype: float64

**let us use `movie` data set folr the following examples:**

In [1]:
# use isna and notna to filter data

In [2]:
# use isna to count the number of missing values

In [3]:
# use fillna to fill missing values.

In [4]:
#use dropna to drop missing values.

# Sorting:

The `sort_values` method sorts the Series `from least to greatest by default`. 

It places `missing values at the end`.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of actor 1 Facebook likes are missing?</span>


### Exercise 2
<span  style="color:green; font-size:16px">Use the notna method to find the number of non-missing values in the actor 1 Facebook like column. Verify this
number is the same as the count method.

### Exercise 3
<span  style="color:green; font-size:16px">Use one line of code to fill the missing values of actor1_fb with the maximum of actor2_fb. Save this result to
variable actor1_fb_full</span>

### Exercise 4
<span  style="color:green; font-size:16px">Verify the results of problem 3 by selecting just the values of actor1_fb_full that were filled by actor2_fb.</span>


# Uniqueness

**There are a few methods that deal with unique values in a Series:**

- `unique`: Returns a numpy array of all the unique values in order of their appearance
- `nunique`: Returns the number of unique values in the Series
- `drop_duplicates`: Returns a pandas Series of just the unique values

In [None]:
# count the number of unique values 