# CHAPTER 5
# Getting Started with pandas
- **pandas** contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python.
- **pandas** is designed for working with tabular data.
- **pandas** became an open source project in 2010.

In [None]:
# Import convetion for pandas
import pandas as pd

# Import Series and SataFrame into the local namespace
from pandas import Series, DataFrame

## Introduction to pandas Data Structures
### Series
- **Series** = one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its **index**.
- Another way to think about a **Series** is as a fixed-length, ordered dict, as it is a mapping of index values to data values.

In [None]:
# Example of a series
obj = pd.Series([4, 7, -5, 3])
obj

# Since we did not specify an index for the data, a default one consisting of the integers 0 through N - 1 
# (where N is the length of the data) is created.

In [None]:
# Array representation of the Series using the values attribute
obj.values

In [None]:
# Index object of the Series using the index attribute
obj.index

In [None]:
# Create a Series with an index identifying each data point with a label
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2

In [None]:
# Let's look at the index
obj2.index

In [None]:
# You can use labels in the index when selecting single values or a set of values
obj2[['c', 'a', 'd']]

# Here ['c', 'a', 'd'] is interpreted as a list of indices, even though it contains strings instead of integers

- Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link.

In [None]:
# Select only Series elemnts that are > 0
obj2[obj2>0]

In [None]:
# Multiply each element in Series obj2 by 2
obj2*2

In [None]:
# Create a Series from a dict
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = pd.Series(sdata)
obj3

# The index in the resulting Series will have the dict’s keys in sorted order

In [None]:
# You can pass the dict keys in the order you want them to appear in the resulting Series
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(sdata, index=states)
obj4

# Since no value for 'California' was found, it appears as NaN (not a number), which is considered 
# in pandas to mark missing or NA values.

- The **isnull** and **notnull** functions in pandas should be used to detect missing data.

In [None]:
# Check for missing values using isnull function or instance method
pd.isnull(obj4)

obj4.isnull()

- A useful Series feature for many applications is that it automatically aligns by index label in arithmetic operations.

In [None]:
# Check obj3 and obj4
print(obj3, '\n')
print(obj4)

In [None]:
# Add elemnts of Series obj3 & obj4
obj3 + obj4

- Both the Series object itself and its index have a name attribute.

In [None]:
# name attribute for the Series object itself & its index
obj4.name = 'population'
obj4.index.name = 'state'

obj4

In [None]:
# A Series’s index can be altered in-place by assignment
print(obj, '\n')

obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']

print(obj)

### DataFrame
- **DataFrame** = a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, etc.).
- The **DataFrame** has both a row and column index; it can be thought of as a dict of Series all sharing the same index.
- You can construct a DataFrame from a dict of equal-length lists or NumPy arrays.

In [None]:
# Construct a DataFrame from a dictionary
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}

frame = pd.DataFrame(data)
frame

# The resulting DataFrame will have its index assigned automatically as with Series, and
# the columns are placed in sorted order

In [None]:
# Use the head method to select only the first 5 rows for large DataFrames
frame.head()

In [None]:
# If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order
pd.DataFrame(data, columns=['year', 'state', 'pop'])

In [None]:
# If you pass a column that isn’t contained in the dict, it will appear with missing values in the result
frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
                      index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2

In [None]:
# You can use the columns attribute
frame2.columns

In [None]:
# Retrieve a column in a DataFrame as a Series

# Dict-like notation works for any column name
frame2['state']

# Attribute-like access works when the column name is a valid Python variable name
frame2.state

In [None]:
# Rows can also be retrieved by position or name with the special loc attribute
frame2.loc['three']

In [None]:
# Columns can be modified by assignment
frame2['debt'] = 16.5
frame2

- When you are assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. 
- If you assign a Series, its labels will be realigned exactly to the DataFrame’s index, inserting missing values in any holes.
- Assigning a column that doesn’t exist will create a new column. 
- The **del** keyword will delete columns as with a dict.

In [None]:
# Create a Series
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
val

In [None]:
# Assign the Series val to the column 'debt'
frame2['debt'] = val
frame2

**REMEMBER**: New columns cannot be created with the attribute-like syntax *frame2.state*. You must use the dict-like syntax *frame2['state']*.

In [None]:
# You can also construct a DataFrame from a nested dict
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

# pandas will interpret the outer dict keys as the columns and the inner keys as the row indices
frame3 = pd.DataFrame(pop)
frame3

In [None]:
# You can transpose the DataFrame (swap rows and columns) with similar syntax to a NumPy array
frame3.T

**TABLE: Possible data inputs to DataFrame constructor**


| Type                  | Notes |
| :---                  |    :----    |
|2D ndarray| A matrix of data, passing optional row and column labels
|dict of arrays, lists, or tuples| Each sequence becomes a column in the DataFrame; all sequences must be the same length
|NumPy structured/record array| Treated as the “dict of arrays” case
|dict of Series| Each value becomes a column; indexes from each Series are unioned together to form the result’s row index if no explicit index is passed
|dict of dicts| Each inner dict becomes a column; keys are unioned to form the row index as in the “dict of Series” case
|List of dicts or Series| Each item becomes a row in the DataFrame; union of dict keys or Series indexes become the DataFrame’s column labels
|List of lists or tuples| Treated as the “2D ndarray” case
|Another DataFrame| The DataFrame’s indexes are used unless different ones are passed
|NumPy MaskedArray| Like the “2D ndarray” case except masked values become NA/missing in the DataFrame result 

### Index Objects
- pandas’s **Index** objects are responsible for holding the axis labels and other metadata (like the axis name or names).
- Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an **Index**.
- **Index** objects are immutable and thus can’t be modified by the user.
- In addition to being array-like, an **Index** also behaves like a fixed-size set. But unlike Python sets, a pandas Indxe can contain duplicate labels.

In [None]:
# Create a pandas Series & check the index
obj = pd.Series(range(3), index=['a', 'b', 'c'])
obj.index

In [None]:
# You can slice an index
obj.index[1:]

**TABLE: Some Index methods and properties**

| Method                  | Description |
| :---                  |    :----    |
|append| Concatenate with additional Index objects, producing a new Index
|difference| Compute set difference as an Index
|intersection| Compute set intersection
|union| Compute set union
|isin| Compute boolean array indicating whether each value is contained in the passed collection
|delete| Compute new Index with element at index i deleted
|drop| Compute new Index by deleting passed values
|insert| Compute new Index by inserting element at index i
|is_monotonic| Returns True if each element is greater than or equal to the previous element
|is_unique| Returns True if the Index has no duplicate values
|unique| Compute the array of unique values in the Index

## Essential Functionality
### Reindexing
- **reindex** method creates a new object with the data conformed to a new index.

In [None]:
# Create a pandas Series
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

In [None]:
# Calling reindex method rearranges the data according to the new index & 
# introducing missing values if any index values were not already present
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
obj2

- For ordered data like time series, it may be desirable to do some interpolation or filling of values when reindexing. 
- The **method** option allows us to do this, using a method such as **ffill**, which forward-fills the values.

In [None]:
# Create a pandas Series
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
obj3

In [None]:
# Reindex using the ffill method
obj3.reindex(range(6), method='ffill')

- With DataFrame, **reindex** can alter either the (row) index, columns, or both. 
- When passed only a sequence, it reindexes the rows in the result.
- The columns can be reindexed with the **columns** keyword.

In [None]:
# Import numpy
import numpy as np

# Create a pandas DataFrame
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
                     index=['a', 'c', 'd'],
                     columns=['Ohio', 'Texas', 'California'])
frame

In [None]:
# Reindex the frame object
frame2 = frame.reindex(['a', 'b', 'c', 'd'])
frame2

**TABLE: reindex function arguments**

| Argument                  | Description |
| :---                  |    :----    |
|index| New sequence to use as index. Can be Index instance or any other sequence-like Python data structure. An Index will be used exactly as is without any copying.
|method| Interpolation (fill) method; 'ffill' fills forward, while 'bfill' fills backward.
|fill_value| Substitute value to use when introducing missing data by reindexing.
|limit| When forward- or backfilling, maximum size gap (in number of elements) to fill.
|tolerance| When forward- or backfilling, maximum size gap (in absolute numeric distance) to fill for inexact matches.
|level| Match simple Index on level of MultiIndex; otherwise select subset of.
|copy| If True, always copy underlying data even if new index is equivalent to old index; if False, do not copy the data when the indexes are equivalent.

### Dropping Entries from an Axis
- Many functions, like **drop**, which modify the size or shape of a Series or DataFrame, can manipulate an object **in-place** without returning a new object.

In [None]:
# Create a Series
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj

In [None]:
# Drop values for a Series
new_obj = obj.drop('c')
new_obj

In [None]:
# Create an example DataFrame
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
data

In [None]:
# Calling drop with a sequence of labels will drop values from the row labels (axis 0)
data.drop(['Colorado', 'Ohio'])

In [None]:
# You can drop values from the columns by passing axis=1 or axis='columns'
data.drop('two', axis=1)

### Indexing, Selection, and Filtering
- **Series** indexing (obj[...]) works analogously to NumPy array indexing, except you can use the Series’s index values instead of only integers.
- Slicing with labels behaves differently than normal Python slicing in that the end‐point is inclusive.

In [None]:
# Create a pandas Series
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
obj

In [None]:
# Select values for index 'b'
obj['b'] #equivaent to obj[1]

In [None]:
# Select a slice
obj[2:4]

In [None]:
# Select a slice with labels
obj['b':'c']

In [None]:
# Create an example DataFrame
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
data

In [None]:
# Select a certain column
data['two']

In [None]:
# Select several columns
data[['three', 'one']]

In [None]:
# Use a boolean array for slicing
data[data['three'] > 5]

In [None]:
# Indexing with a boolean DataFrame
data < 5

In [None]:
# Indexing with a boolean DataFrame
data[data < 5] = 0
data

#### Selection with loc and iloc
- **loc** and **iloc** = enable you to select a subset of the rows and columns from a DataFrame with NumPy-like notation using either axis labels **loc** or integers **iloc**.
- Both indexing functions work with slices in addition to single labels or lists of labels.

In [None]:
# Our example DataFrame
data

In [None]:
# Select a single row and multiple columns by label
data.loc['Colorado', ['two', 'three']]

In [None]:
# Select a single row and multiple columns using iloc
data.iloc[2, [3, 0, 1]]

**TABLE: Indexing options with DataFrame**

| Type                  | Notes |
| :---                  |    :----    |
|df[val]| Select single column or sequence of columns from the DataFrame; special case conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)
|df.loc[val]| Selects single row or subset of rows from the DataFrame by label
|df.loc[:, val]| Selects single column or subset of columns by label
|df.loc[val1,val2]| Select both rows and columns by label
|df.iloc[where]| Selects single row or subset of rows from the DataFrame by integer position
|df.iloc[:, where]| Selects single column or subset of columns by integer position
|df.iloc[where_i,where_j]| Select both rows and columns by integer position
|df.at[label_i,label_j]| Select a single scalar value by row and column label
|df.iat[i, j]| Select a single scalar value by row and column position (integers)
|reindex method| Select either rows or columns by labels
|get_value, set_value methods| Select single value by row and column label

## Arithmetic and Data Alignment
- When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. 
- For users with database experience, this is similar to an automatic outerjoin on the index labels.
- The internal data alignment introduces missing values in the label locations that don’t overlap. 
- Missing values will then propagate in further arithmetic computations.
- In the case of DataFrame, alignment is performed on both the rows and the columns.

In [None]:
# Create 2 pandas Series
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],
               index=['a', 'c', 'e', 'f', 'g'])
print(s1, '\n')
print(s2)

In [None]:
# Add series s1 and s2
s1 + s2

In [None]:
# Create 2 example DataFrames
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
                   index=['Ohio', 'Texas', 'Colorado'])
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                   index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [None]:
# Check df1
df1

In [None]:
# Check df2
df2

In [None]:
# Adding df1 & df2 together returns a DataFrame whose index and columns are the unions
# of the ones in each DataFrame
df1 + df2

### Arithmetic methods with fill values
- In arithmetic operations between differently indexed objects, you might want to fill with a special value, like 0, when an axis label is found in one object but not the other.

In [None]:
# Create 2 example DataFrames
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
                   columns=list('abcd'))
df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)),
                   columns=list('abcde'))

In [None]:
#Check df1
df1

In [None]:
#Check df2
df2

In [None]:
# Use loc method to insert an NaN value for df2
df2.loc[1, 'b'] = np.nan
df2

In [None]:
# Adding df1 & df2 together results in NA values in the locations that don’t overlap
df1 + df2

In [None]:
# Use add method on df1 with the argument fill_value = 0
df1.add(df2, fill_value=0)

**TABLE: Flexible arithmetic methods**

| Method                  | Description |
| :---                  |    :----    |
|add, radd| Methods for addition (+)|
|sub, rsub| Methods for subtraction (-)|
|div, rdiv| Methods for division (/)|
|floordiv, rfloordiv| Methods for floor division (//)|
|mul, rmul| Methods for multiplication (*)|
|pow, rpow| Methods for exponentiation (**)|

### Operations between DataFrame and Series

In [None]:
# Create an example DataFrame
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
                     columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

In [None]:
# Create an example Series
series = frame.iloc[0]
series

- By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame’s columns, broadcasting down the rows.

In [None]:
# Substract the series values from the DataFrame
frame - series

- If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union.

In [None]:
# Create another Series
series2 = pd.Series(range(3), index=['b', 'e', 'f'])
frame + series2

In [None]:
# Create another series
series3 = frame['d']
series3

In [None]:
# Check the example DataFrame
frame

In [None]:
# If you want to instead broadcast over the columns, matching on the rows, you have to
# use one of the arithmetic methods
frame.sub(series3, axis='index')

# The axis number that you pass is the axis to match on. In this case we mean to match
# on the DataFrame’s row index (axis='index' or axis=0) and broadcast across

### Function Application and Mapping
- Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
- Element-wise Python functions can be used with **applymap** for DataFrames and **map** for Series.

In [None]:
# Create an example DataFrame
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

In [None]:
# NumPy ufuncs (element-wise array methods) also work with pandas objects
np.abs(frame)

In [None]:
# Use apply method to apply a function to each column

# Create a lambda function which computes the difference between the maximum and minimum of a Series
f = lambda x: x.max() - x.min()

# Apply the lambda to each column in frame
frame.apply(f)

# The result is a Series having the columns of frame as its index

In [None]:
# If you pass axis='columns' to apply, the function will be invoked once per row
frame.apply(f, axis='columns')

In [None]:
# Apply an element-wise function to a DataFrame

# Define the function
format = lambda x: '%.2f' % x

# Apply the function element-wise
frame.applymap(format)

### Sorting and Ranking
- To sort lexicographically by row or column index, use the **sort_index** method, which returns a new, sorted object.
- With a DataFrame, you can sort by index on either axis.
- The data is sorted in ascending order by default, but can be sorted in descending order.
- To sort a Series by its values, use its **sort_values** method.
- Any missing values are sorted to the end of the Series by default.
- When sorting a DataFrame, you can use the data in one or more columns as the sort keys.

In [None]:
# Create and sort a Series
obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
obj.sort_index()

In [None]:
# Create and sort a DataFrame by rows
frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
                     index=['three', 'one'],
                     columns=['d', 'a', 'b', 'c'])
frame.sort_index()

In [None]:
# Sort by columns
frame.sort_index(axis=1)

In [None]:
# Sort in descending order
frame.sort_index(axis=1, ascending=False)

In [None]:
# Create a Series and sort by its values
obj = pd.Series([4, 7, -3, 2])
obj.sort_values()

In [None]:
# Create DataFrame
frame = pd.DataFrame({'a': [0, 1, 0, 1], 'b': [4, 7, -3, 2]})
frame

In [None]:
# Sort DataFrame by columns a & b
frame.sort_values(by=['a', 'b'])

- **Ranking** assigns ranks from one through the number of valid data points in an array.
- By default **rank** breaks ties by assigning each group the mean rank.
- Ranks can also be assigned according to the order in which they’re observed in the data.
- DataFrame can compute ranks over the rows or the columns.

In [None]:
# Create a Series 
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj

In [None]:
# Use the rank method
obj.rank()

In [None]:
# Create a DataFrame
frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
                      'c': [-2, 5, 8, -2.5]})
frame

In [None]:
# Apply rank method by column
frame.rank(axis='columns')

## Summarizing and Computing Descriptive Statistics
- pandas objects are equipped with a set of common mathematical and statistical methods.
- Most of these fall into the category of reductions or summary statistics, methods that extract a single value (like the sum or mean) from a Series or a Series of values from the rows or columns of a DataFrame.
- Compared with the similar methods found on NumPy arrays, they have built-in handling for missing data.
- NA values are excluded unless the entire slice (row or column in this case) is NA. This can be disabled with the **skipna** option.

In [None]:
# Create an example DataFrame
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
                   [np.nan, np.nan], [0.75, -1.3]],
                  index=['a', 'b', 'c', 'd'],
                  columns=['one', 'two'])
df

In [None]:
# Calling DataFrame’s sum method returns a Series containing column sums
df.sum()

In [None]:
# Passing axis='columns' or axis=1 sums across the columns instead
df.sum(axis='columns')

In [None]:
# Calculate mean value per column
df.mean(axis='columns', skipna=False)

In [None]:
# describe method is  producing multiple summary statistics in one shot
df.describe()

**TABLE: Descriptive and summary statistics**

| Method                  | Description |
| :---                  |    :----    |
|count| Number of non-NA values
|describe| Compute set of summary statistics for Series or each DataFrame column
|min, max| Compute minimum and maximum values
|argmin, argmax| Compute index locations (integers) at which minimum or maximum value obtained, respectively
|idxmin, idxmax| Compute index labels at which minimum or maximum value obtained, respectively
|quantile| Compute sample quantile ranging from 0 to 1
|sum| Sum of values
|mean| Mean of values
|median| Arithmetic median (50% quantile) of values
|mad| Mean absolute deviation from mean value
|prod| Product of all values
|var| Sample variance of values
|std| Sample standard deviation of values
|skew| Sample skewness (third moment) of values
|kurt| Sample kurtosis (fourth moment) of values
|cumsum| Cumulative sum of values
|cummin, cummax |Cumulative minimum or maximum of values, respectively
|cumprod| Cumulative product of values
|diff| Compute first arithmetic difference (useful for time series)
|pct_change| Compute percent changes

### Correlation and Covariance
- Let’s consider some DataFrames of stock prices and volumes obtained from Yahoo! Finance using the add-on **pandas-datareader** package.

In [None]:
# Import the pandas-datareader package.
import pandas_datareader.data as web

In [None]:
# Use the pandas_datareader module to download some data for a few stock tickers
all_data = {ticker: web.get_data_yahoo(ticker)
            for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']}

In [None]:
# Create a price DataFrame
price = pd.DataFrame({ticker: data['Adj Close']
                      for ticker, data in all_data.items()})
price

In [None]:
# Create a volume DataFrame
volume = pd.DataFrame({ticker: data['Volume']
                       for ticker, data in all_data.items()})
volume

In [None]:
# Compute percent changes of the prices
returns = price.pct_change()
returns.tail()

In [None]:
# The corr method of Series computes the correlation of the overlapping, non-NA, 
# aligned-by-index values in two Series
returns['MSFT'].corr(returns['IBM'])

In [None]:
# DataFrame’s corr and cov methods, on the other hand, return a full correlation or
# covariance matrix as a DataFrame
returns.corr()

In [None]:
# Compute covariance matrix
returns.cov()

In [None]:
# Using DataFrame’s corrwith method, you can compute pairwise correlations
# between a DataFrame’s columns or rows with another Series or DataFrame
returns.corrwith(returns.IBM)

### Unique Values, Value Counts, and Membership

In [None]:
# unique gives you an array of the unique values in a Series
obj = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
obj.unique()

# The unique values are not necessarily returned in sorted order, but could be sorted
# after the fact if needed using uniques.sort()

In [None]:
# value_counts computes a Series containing value frequencies
obj.value_counts()

In [None]:
# isin performs a vectorized set membership check and can be useful in filtering a
# dataset down to a subset of values in a Series or column in a DataFrame
mask = obj.isin(['b', 'c'])
mask

In [None]:
# Select only values in the mask
obj[mask]

**TABLE: Unique, value counts, and set membership methods**

| Method                  | Description |
| :---                  |    :----    |
|isin| Compute boolean array indicating whether each Series value is contained in the passed sequence of values
|match| Compute integer indices for each value in an array into another array of distinct values; helpful for data alignment and join-type operations
|unique| Compute array of unique values in a Series, returned in the order observed
|value_counts| Return a Series containing unique values as its index and frequencies as its values, ordered count in descending order

## Book Progress

In [None]:
# Plot book progress

import plotly.graph_objects as go

fig = go.Figure(go.Indicator(
    mode = "number+gauge+delta",
    gauge = {'shape': "bullet"},
    delta = {'reference': 341},
    value = 509,
    domain = {'x': [0.1, 1], 'y': [0.2, 0.9]},
    title = {'text': "Book Pages"}))

fig.show()