# 7. Series Methods More

## Other Useful Methods
In the previous notebook, we covered the most essential and common attributes and statistical methods for Pandas Series objects. In this notebook, we will cover several other useful and common methods from the [Series API](http://pandas.pydata.org/pandas-docs/stable/api.html#series).

In [1]:
import pandas as pd
var_m = pd.read_csv('data/movie.csv', index_col='title')
var_m.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1


In [3]:
var_d = var_m['duration']
var_d.head()

title
Avatar                                        178.0
Pirates of the Caribbean: At World's End      169.0
Spectre                                       148.0
The Dark Knight Rises                         164.0
Star Wars: Episode VII - The Force Awakens      NaN
Name: duration, dtype: float64

## Methods for handling missing values
Pandas provides the following methods to handle missing values:

* **`isna`** - Returns a Series of booleans based on whether each value is missing or not
* **`notna`** - Exact opposite of **`isna`**
* **`fillna`** - fills missing values in a variety of ways
* **`dropna`** - Drops the missing values from the Series

### Counting the number of missing values
Pandas doesn't have a single method that counts the number of missing values, so you can find them in two ways. 

* Use the **`count`** method to find the number of non-missing values and subtract this from the total number of values
* Use the **`isna`** method to return a Series of booleans and chain the **`sum`** method

In [4]:
len(var_d) - var_d.count()

15

In [5]:
var_d.isna().sum()

15

## Finding the percentage of missing values
To find the percentage of missing values in a Series we can chain the **`mean`** method to the **`isna`** method.

In [6]:
var_d.isna().mean()

0.0030512611879576893

## Filling missing values
Occasionally, it will be necessary to fill missing values. Pandas provides the **`fillna`** method to do so. There are many strategies on how to replace missing values. We will only cover how to fill the missing values with a constant here. A popular choice is to use the median or mean of the  Series.

In [7]:
var_d.head()

title
Avatar                                        178.0
Pirates of the Caribbean: At World's End      169.0
Spectre                                       148.0
The Dark Knight Rises                         164.0
Star Wars: Episode VII - The Force Awakens      NaN
Name: duration, dtype: float64

Find the median and replace missing values with it.

In [8]:
med_of_var_d = var_d.median()
var_d.fillna(med_of_var_d).head()

title
Avatar                                        178.0
Pirates of the Caribbean: At World's End      169.0
Spectre                                       148.0
The Dark Knight Rises                         164.0
Star Wars: Episode VII - The Force Awakens    103.0
Name: duration, dtype: float64

You can use any constant number directly as well:

In [9]:
var_d.fillna(-99).head()

title
Avatar                                        178.0
Pirates of the Caribbean: At World's End      169.0
Spectre                                       148.0
The Dark Knight Rises                         164.0
Star Wars: Episode VII - The Force Awakens    -99.0
Name: duration, dtype: float64

## Dropping missing values
The **`dropna`** method simply removes the values from the Series that are missing. Notice that the size of the Series has decreased.

In [10]:
len(var_d.dropna())

4901

# Sorting
The **`sort_values`** method sorts the Series from least to greatest by default. It places missing values at the end.

In [11]:
var_d.sort_values().head()

title
The Touch                                7.0
Shaun the Sheep                          7.0
Robot Chicken                           11.0
Vessel                                  14.0
Wal-Mart: The High Cost of Low Price    20.0
Name: duration, dtype: float64

The **`ascending`** parameter can be set to **`False`** to sort from greatest to least:

In [12]:
var_d.sort_values(ascending=False).head()

title
Trapped                     511.0
Carlos                      334.0
Blood In, Blood Out         330.0
Heaven's Gate               325.0
The Legend of Suriyothai    300.0
Name: duration, dtype: float64

## Sorting the index
Since Series also have an index, Pandas allows you to sort by it as well with the **`sort_index`** method.

In [None]:
duration.sort_index().head()

In [None]:
duration.sort_index(ascending=False).head()

## Index of maximum and minimum
Instead of finding the maximum or minimum of the values of the Series, you can return the index of the maximum or minimum with **`idxmax`** and **`idxmin`**.

In [13]:
var_d.idxmax()

'Trapped'

Verify results by sorting: 

In [14]:
var_d.sort_values(ascending=False).head()

title
Trapped                     511.0
Carlos                      334.0
Blood In, Blood Out         330.0
Heaven's Gate               325.0
The Legend of Suriyothai    300.0
Name: duration, dtype: float64

Can also verify by doing boolean indexing:

In [15]:
func_filt_var_d = var_d == var_d.max()
var_d[func_filt_var_d]

title
Trapped    511.0
Name: duration, dtype: float64

# Exercises

### Problem 1
<span  style="color:green; font-size:16px">What percentage of actor 1 Facebook likes are missing?</span>

In [None]:
# your code here

### Problem 2
<span  style="color:green; font-size:16px">Use the `notna` method to find the number of non-missing values in the actor 1 Facebook like column. Verify this number is the same as the `count` method.</span>

In [None]:
# your code here

### Problem 3
<span  style="color:green; font-size:16px">How many unique directors are there? Look up the `unique` and `nunique` methods</span>

In [None]:
# your code here

### Problem 4
<span  style="color:green; font-size:16px">Select the `year` column, sort it, and drop any duplicates? Look up the `drop_duplicates` method.</span>

In [None]:
# your code here