# CMPINF 2100
## Review: Wave 1, Group 5 

### Understanding Unique Values

Sarah Scott

#### Overview: 

Many summary methods important to Exploratory Data Analysis require the ability to identify the **unique values** in a Pandas series or DataFrame. For small, numerical data sets that are easy to quickly summarize through the visualization of the data, this might seem like an unimportant step. However, for large data sets or categorical/string variables, it is an essential one.

We can retrieve the number of unique values using `.nunique()`, which will **count** the number (n) of unique values in the series or column. Comparing this number to the `.size` attribute will provide us with a clear understanding of how many unique values are in the series/column. 
  
In a similar vein, the `.unique()` method lists the unique values in a series/column. This allows us to identify the unique elements themselves. 

One important Pandas method that focuses on unique values is `.value_count()`. This method counts the number of times a unique value occurs, and tells us how many times the values appear per value. 

*Important Note*: By default, these methods do **not** count missing values, so `dropna=False` is often used in conjunction with this method. 

#### Example: 

In the below example, we identify the use of `.unique()`, `.nunique()`, and `.value_count()`. 

In [1]:
import numpy as np
import pandas as pd

In [3]:
the_fellowship = pd.Series( [ 'Hobbit', 'Hobbit', 'Hobbit', 'Hobbit', 'Wizard', 'Man', 'Man', 'Elf', 'Dwarf' ])

In [11]:
print(the_fellowship.size)
print(the_fellowship.nunique())
print(the_fellowship.unique())

9
5
['Hobbit' 'Wizard' 'Man' 'Elf' 'Dwarf']


In the above code, we identify there are 9 values to the series, `the_fellowship`. However, using `.nunique()`, we identified only five of those values are unique. We are able to identify what those unique variables are using `.unique()`. 

In [12]:
the_fellowship.value_counts()

Hobbit    4
Man       2
Wizard    1
Elf       1
Dwarf     1
dtype: int64

Using `.value_counts()`, we can identify how many values are associated with each variable - in `the_fellowship`, there are four hobbits, two men, one wizard, one elf, and one dwarf. 

### Review Questions

**Question 1:** TRUE/FALSE: The `.unique()` method only retrieves the *number* of unique values in a series/column. It does not list the unique values.  

**ANSWER**: False. The `.unique()` method *does* list the unique values.

**Question 2:** You are a teacher with a class of 100 students. You have created a DataFrame (`class_list`) of their names with three columns: first name (`first`), Last Name (`last`) and Full Name (`full`). You want to identify how many students have the same first name. What method would you use? 

* A `class_list.first.nunique()`
* B `class_list.first.unique()` 
* C `first.nunique()`
* D `class_list.full.nunique()`

**ANSWER:** A, `class_list.first.nunique()`