# Methods for handling missing values
**pandas provides the following methods to handle missing values:**

- `isna`: Returns a Series of booleans based on whether each value is missing or not.
- `notna`: Exact opposite of isna.
- `fillna`: Fills missing values in a variety of ways
- `dropna`: Drops the missing values from the Series

In [26]:
5+5

10

In [27]:
x = 5+5

In [28]:
x

10

In [23]:
import pandas as pd

In [35]:
movie = pd.read_csv("data/movie.csv")

In [25]:
movie[movie["year"].isna()]

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1
176,Miami Vice,,Color,TV-14,60.0,,,Don Johnson,982.0,Philip Michael Thomas,...,184.0,,Action|Crime|Drama|Mystery|Thriller,21.0,16769,cult tv|detective|drugs|police|undercover,English,USA,1500000.0,7.5
257,The A-Team,,Color,TV-PG,60.0,,,George Peppard,669.0,Dirk Benedict,...,432.0,,Action|Adventure|Crime,29.0,25402,1980s|cult tv|famous opening theme|good versus...,English,USA,,7.6
276,"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,...,,,Comedy,,6,,,,,7.2
398,Hannibal,,Color,TV-14,44.0,,,Caroline Dhavernas,544.0,Scott Thompson,...,148.0,,Crime|Drama|Horror|Mystery|Thriller,103.0,159910,blood|cannibalism|fbi|manipulation|psychiatrist,English,USA,,8.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4683,Heroes,,Color,TV-14,60.0,,,Sendhil Ramamurthy,1000.0,Masi Oka,...,833.0,,Drama|Fantasy|Sci-Fi|Thriller,75.0,202115,father daughter relationship|serial killer|sup...,English,USA,,7.7
4688,Home Movies,,Color,TV-PG,22.0,,,Brendon Small,59.0,Ron Lynch,...,6.0,,Animation|Comedy|Drama,11.0,7458,coach|friend|school|series|tv series,English,USA,,8.2
4704,Revolution,,Color,TV-14,43.0,,,Billy Burke,2000.0,Tracy Spiridakos,...,576.0,,Action|Adventure|Drama|Sci-Fi,23.0,72017,2020s|near future|one word series title|post a...,English,USA,,6.7
4752,Happy Valley,,Color,TV-MA,58.0,,,Shirley Henderson,887.0,James Norton,...,250.0,,Crime|Drama,11.0,12848,caravan|police|police sergeant|tied to a chair...,English,UK,,8.5


In [6]:
#number of missing values in year column
movie["year"].isna().sum()


106

In [9]:
#get all rows in which the year column is missing
filter_1 = movie["year"].isna()
missing_year = movie[filter_1]


In [10]:
missing_year["year"].count()

0

In [11]:
missing_year["year"]

4      NaN
176    NaN
257    NaN
276    NaN
398    NaN
        ..
4683   NaN
4688   NaN
4704   NaN
4752   NaN
4912   NaN
Name: year, Length: 106, dtype: float64

In [None]:
# use fillna to fill missing values.

In [13]:
complete_year = movie

In [15]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4810 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [18]:
complete_year["year"].fillna(2024,inplace=True)

In [20]:
complete_year.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4916 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [21]:
complete_year[complete_year["year"]==2024]

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
4,Star Wars: Episode VII - The Force Awakens,2024.0,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1
176,Miami Vice,2024.0,Color,TV-14,60.0,,,Don Johnson,982.0,Philip Michael Thomas,...,184.0,,Action|Crime|Drama|Mystery|Thriller,21.0,16769,cult tv|detective|drugs|police|undercover,English,USA,1500000.0,7.5
257,The A-Team,2024.0,Color,TV-PG,60.0,,,George Peppard,669.0,Dirk Benedict,...,432.0,,Action|Adventure|Crime,29.0,25402,1980s|cult tv|famous opening theme|good versus...,English,USA,,7.6
276,"10,000 B.C.",2024.0,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,...,,,Comedy,,6,,,,,7.2
398,Hannibal,2024.0,Color,TV-14,44.0,,,Caroline Dhavernas,544.0,Scott Thompson,...,148.0,,Crime|Drama|Horror|Mystery|Thriller,103.0,159910,blood|cannibalism|fbi|manipulation|psychiatrist,English,USA,,8.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4683,Heroes,2024.0,Color,TV-14,60.0,,,Sendhil Ramamurthy,1000.0,Masi Oka,...,833.0,,Drama|Fantasy|Sci-Fi|Thriller,75.0,202115,father daughter relationship|serial killer|sup...,English,USA,,7.7
4688,Home Movies,2024.0,Color,TV-PG,22.0,,,Brendon Small,59.0,Ron Lynch,...,6.0,,Animation|Comedy|Drama,11.0,7458,coach|friend|school|series|tv series,English,USA,,8.2
4704,Revolution,2024.0,Color,TV-14,43.0,,,Billy Burke,2000.0,Tracy Spiridakos,...,576.0,,Action|Adventure|Drama|Sci-Fi,23.0,72017,2020s|near future|one word series title|post a...,English,USA,,6.7
4752,Happy Valley,2024.0,Color,TV-MA,58.0,,,Shirley Henderson,887.0,James Norton,...,250.0,,Crime|Drama,11.0,12848,caravan|police|police sergeant|tied to a chair...,English,UK,,8.5


**let us use `movie` data set folr the following examples:**

In [2]:
# use isna to count the number of missing values

In [36]:
#use dropna to drop missing values.
#drop any row in which year is missing
movie = movie.dropna(subset=["year"])
movie.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4810 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4810 non-null   object 
 1   year             4810 non-null   float64
 2   color            4795 non-null   object 
 3   content_rating   4552 non-null   object 
 4   duration         4798 non-null   float64
 5   director_name    4810 non-null   object 
 6   director_fb      4810 non-null   float64
 7   actor1           4803 non-null   object 
 8   actor1_fb        4803 non-null   float64
 9   actor2           4800 non-null   object 
 10  actor2_fb        4800 non-null   float64
 11  actor3           4792 non-null   object 
 12  actor3_fb        4792 non-null   float64
 13  gross            4052 non-null   float64
 14  genres           4810 non-null   object 
 15  num_reviews      4770 non-null   float64
 16  num_voted_users  4810 non-null   int64  
 17  plot_keywords 

### Exrcises:

In [None]:
# filter rows in which color is missing

In [37]:
#drop rows in which color is missing --new variable

In [None]:
#fill rows in which color is missing with "Color"

# Sorting:

The `sort_values` method sorts the Series `from least to greatest by default`. 

It places `missing values at the end`.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of actor 1 Facebook likes are missing?</span>


### Exercise 2
<span  style="color:green; font-size:16px">Use the notna method to find the number of non-missing values in the actor 1 Facebook like column. Verify this
number is the same as the count method.

### Exercise 3
<span  style="color:green; font-size:16px">Use one line of code to fill the missing values of actor1_fb with the maximum of actor2_fb. Save this result to
variable actor1_fb_full</span>

### Exercise 4
<span  style="color:green; font-size:16px">Verify the results of problem 3 by selecting just the values of actor1_fb_full that were filled by actor2_fb.</span>


# Uniqueness

**There are a few methods that deal with unique values in a Series:**

- `unique`: Returns a numpy array of all the unique values in order of their appearance
- `nunique`: Returns the number of unique values in the Series
- `drop_duplicates`: Returns a pandas Series of just the unique values

In [None]:
# count the number of unique values 