# **Chpater 02: The Series object**  

gonna learn about:
- Instantiating `Series` objects from lists, dictionaries, tuples, and more
- Setting a custom index on a `Series`
- Accessing attributes and invoking methods on
a `Series`
- Performing mathematical operations on one
or more `Series`
- Passing the `Series` to Python’s built-in functions

<br>  

`Series`:  
- one of pandas' core data structures  
- a one-dimensional labeled array for homogeneous data  
    - *array*:  
    an ordered collection of values comparable to a list in Python  
    - *homogeneous*:  
    the values are of the same data type  
    (ex. all integers or all Booleans, ...)  
    - *labeled*:  
        - pandas assigns each `Series` value a *label*  
        - *label*: an identifier we can use to locate the value
- pandas assigns each `Series` value an *order*  
    - *order*:  
    a position in line  
    it starts counting from 0  
    &rightarrow; the first `Series` value occupies position 0, the second value occupies position 1, and so on  
- the `Series` is a one-dimensional data structure  
&because; we need one reference point to access a value: either a label or a position  
- a `Series` combines and expands the best features of Python's native data structures  
    - like a list, it holds its values in a sequenced order  
    - like a dictionary, it assigns a label(key) to each value  
    &rightarrow; we gain the benefits of both of those objects plus more than 180 methods for data manipulation  

<br>

We're going to familiarize ourselves with the mechanics of a `Series` object,  
learn how to calculate the sum and average of `Series` values,  
apply mathematical operations to each value in a `Series`, ...  

<br>
<br>

## **2.1 Overview of a Series**  
---  
Let's create some `Series` objects.  
We'll begin by importing the pandas and NumPy packages.  
(We're gonna use NumPy in section 2.1.4.)  
- use the `import` keyword  
- using `as` keyword, assign an alias to each package
- the popular community aliases for libraries:  
    - pandas: `pd`
    - NumPy: `np`

In [1]:
import pandas as pd
import numpy as np

`pd` 
- the `pd` namespace holds the top-level exports of the `pandas` package,  
a bundle of more than 100 classes, functions, exceptions, constants, ...  
- think of `pd` as being the lobby to the library  
&rightarrow; an entrance room where we can access pandas' available features
- the pandas' exports are available as attributes on `pd`  
we can access an attribute with dot syntax:  
`pd.attribute`  

<br>
<br>

### 2.1.1 Classes and instances  
*class*:  
- a blueprint for a Python object  
&rightarrow; The `pd.Series` class is a template
- need to create a concrete instance of it  
- instantiate an object from a class with parentheses:  

In [2]:
pd.Series()

Series([], dtype: object)

### 2.1.2 Populating the Series with values  
*constructor*:  
- a method that builds an object from a class  
ex) when we wrote `pd.Series()`, we used the `Seires` constructor to create a new `Series` object  


starting state:  
- think of it as being object's initial configuration(settings)  
- we can often set it by passing arguments to the constructor  
    - *argument*: an input we pass to a method  

<br>  

practice creating `Series` from manual data:  
- goal:  
to get comfortable with the look and feel of the data structure  
- the first argument:  
an iterable object whose values will populate the `Series`  
    - we can pass various inputs:  
    -lists  
    -dictionaries  
    -tuples  
    -NumPy `ndarray`s  
    -...  
- we're going to create a `Series` object with data from a Python list  
    - declare a list of four strings  
    - pass the list to the `Series` constructor



In [3]:
ice_cream_flavors = [
    "Chocolate",
    "Vanilla",
    "Strawberry",
    "Rum Raisin",
]

pd.Series(ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

We've created a new `Series` with the four values form `ice_cream_flavors` list.  
Notice that pandas preserves the order of the values.  

<br>

*parameter*:  
- a name given to an expected input to a function or method  
- Python matches every argument we pass to a constructor with a parameter  
- the `Series` constructor define six parameters:  
    - `data`
    - `index`  
    - `dtype`  
    - `name`  
    - `copy`  
    - `fastpath`  
- we can use these parameters to set the object's initial state  
&rightarrow; parameters can be regarded as being configuration options for the object  
- if we don't pass a argument to a parameter, Python uses the parameter's *default argument*  
ex) if we don't pass a value for the `name` parameter, Python will use `None` which is the default argument for the `name` parameter  
- a parameter with a default argument is inherently optional  
&because; it will always have some argument, either explicitly from its invocation or implicitly from its definition  
ex) we were able to instantiate a `Series` without arguments because all 6 parameters are optional  
- we can connect parameters and arguments explicitly with keyword arguments  
- keyword arguments are advantageous  
    because:
    - they provide context for what each constructor argument represents  
    - they permit us to pass parameters in any order  
    &therefore; following examples make the same `Series` object:  
    `pd.Series(data=ice_cream_flavors, index=None)`  
    `pd.Series(index=None, data=ice_cream_flavors)`

<br>

We'll pass the argument to the `data` parameter explicitly by using a keyword argument:

In [4]:
pd.Series(data=ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

### 2.1.3 Customizing the Series index  
We mentioned that pandas assigns a position in line to each `Series` value.  


*index*:  
- the collection of increamenting integers on the left side of the `Series`  
- each number of index signifies a value's order within the `Series`  
- starts counting from 0  
the `"Chocolate"` occupies index 0, the `"Vanilla"` occupies index 1, and so on  
- its term "index" describes both the collection of identifiers and an individual identifier  
&rightarrow; following two expressions are valid:  
    1. "The index of the Series consists of integers"
    2. "The value `Strawberry` is found at index 2"  
- the last index position will always be 1 less than the total number of values  
&because; the index starts counting from 0
&rightarrow; the `Series` has four ice cream flavors, so the index counts up to 3  
- we can assign each `Series` value an index label  
    - index labels can be of any immutable data type:  
    strings, tuples, datetimes, ...  
    - its flexibility makes a `Series` powerful  
    we can reference a value by its order or by its key/label  
    &rightarrow; in a sense, each values has two identifiers  
- `index` which is a parameter of the `Series` constructor:  
sets the index labels of the `Series`  
    - if we don't pass an argument to the parameter, pandas defaults to a numeric index starting from 0  
    - with this type of index, the label and the position identifier are one and the same
- index permits duplicates,  
a detail that distinguishes a `Series` from a Python dictionary  
    - but, duplication is not recommended  
    - it is ideal to avoid duplicate index labels whenever possible  
    &because; unique index allows the library to locate index labes more quickly

<br>  

practice:  
- we'll construct a `Series` with a custom index  
- we can pass objects of different data types to the `data` and `index` parameters  
but, they must have the same length  
- we're going to pass a list of strings(length: 4) for the `data` parameter  
- we're going to pass a tuple of strings(length: 4) to the `index` parameter

In [5]:
ice_cream_flavors = [
    "Chocolate",
    "Vanilla",
    "Strawberry",
    "Rum Raisin"
]

days_of_week = ("Monday", "Wednesday", "Friday", "Saturday")

# The following statements are equivalent
pd.Series(ice_cream_flavors, days_of_week)
pd.Series(data = ice_cream_flavors, index = days_of_week)

Monday        Chocolate
Wednesday       Vanilla
Friday       Strawberry
Saturday     Rum Raisin
dtype: object

practice explanation:
- pandas uses shared index positions to associate the values from the `ice_cream_flavors` list and the `days_of_week` tuple  
- pandas sees `"Rum Raisin"` and `"Saturday"` at index position 3 in their respective objects  
&rightarrow; pandas ties them together in the `Series`  
- even though the index consists of string labels,  
pandas still assigns each `Series` value an index position  
&rightarrow; we can access the value `"Vanilla"` either by the index label `"Wednesday"` or by index position 1  

<br>

`dtype` statement at the bottom of the output:  
- reflects the data type of the values in the `Series`  
- for most data types, pandas will display a predictable type  
(such as `bool`, `float`, `int`, ...)  
- for strings and more-complex objects, pandas will display `object`  
- pandas does its best to infer an appropriate data type for the `Series` from the `data` parameter's values  
- we can force coercion to a different type via the constructor's `dtype` parameter  

<br>

practice:  
- we'll create `Series` objects from lists of Boolean, integer, and floating-point values  
- we'll create a `Series` object from list of integer values and specify the `dtype` parameter to force the data type to be `float64`


In [6]:
bunch_of_bools = [True, False, False]
pd.Series(bunch_of_bools)

0     True
1    False
2    False
dtype: bool

In [7]:
stock_prices = [985.32, 950.44]
time_of_day = ["Open", "Close"]
pd.Series(data = stock_prices, index = time_of_day)

Open     985.32
Close    950.44
dtype: float64

In [8]:
lucky_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lucky_numbers)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [9]:
lucky_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lucky_numbers, dtype = "float")

0     4.0
1     8.0
2    15.0
3    16.0
4    23.0
5    42.0
dtype: float64

practice explanation:  
- the `float64` data type indicate that each floating-point value occupies 64 bits(8 bytes) of your computer's memory(RAM)  
- the `int64` data type indicate that each integer value occupies 64 bits(8 bytes) of your computer's memory(RAM)  
- the `Series` constructor expects the `dtype` parameter to be third in line  
&therefore; we cannot pass it directly after the `lucky_numbers`  
&therefore; we have to use keyword arguments

<br>
<br>

### 2.1.4 Creating a Series with missing values  
Our `Series` so far have been simple and complete.  
But, in the real world, data is a lot messier.  
Perhaps the most frequent problem that analysts encounter is missing values.  


`nan` object
- when pandas sees a missing value during a file import,  
the library substitutes NumPy's `nan` object  
- `nan`: a acronym for "not a number"  
- `nan` is catch-all term for an undefined value  
$\therefore$ `nan` is a placeholder object that represents nullness or absence  

<br>

practice:  
Let's sneak a missing value into a `Series`.  
We assigned the NumPy library to the alias `np` when we imported it.  
The `nan` attribute is available as a top-level export of the library.  
- nestle a `np.nan`(`nan` attribute) inside a list of temperatures  
- pass temperatures list to the `Series` constructor


In [10]:
temperatures = [94, 88, np.nan, 91]

pd.Series(data = temperatures)

0    94.0
1    88.0
2     NaN
3    91.0
dtype: float64

practice explaination:  
- the `Series` `dtype` is `float64`  
$\because$ pandas automatically converts numeric values from integers to floating-points when it spots a `nan` value  
- this allows the library to store numeric values and missing values in the same homogeneous `Series`

<br>
<br>

## **2.2 Creating a Series from Python objects**  
---  
The `Series` consturctor's `data` parameter accepts various inputs:
- native Python data structures  
- objects from other libraries  


The `Series` object that pandas returns operates the same way irrespective of its data source.


We'll explore how the `Series` constructor deals with:  
- dictionaries
- tuples
- sets
- NumPy arrays  

<br>

*dictionary*:
- a collection of key-value pairs  
- when it passed to the `Series` constructor,  
the consturctor sets each key as a corresponding index label in the `Series`

In [11]:
calorie_info = {
    "Cereal": 125,
    "Chocoalte Bar": 406,
    "Ice Cream Sundae": 342
}

diet = pd.Series(calorie_info)
diet

Cereal              125
Chocoalte Bar       406
Ice Cream Sundae    342
dtype: int64

*tuple*:  
- an immutable list  
$\rightarrow$ we cannot add, remove, or replace elements in a tuple after creating it  
- when it passed to the `Series` constructor,  
the constructor populates the `Series` in an expected manner

In [12]:
pd.Series(data = ("Red", "Green", "Blue"))

0      Red
1    Green
2     Blue
dtype: object

- to reate a `Series` that stores tuples,  
wrap the tupes in a list  
$\rightarrow$ tuples work well for row values that consist of multiple parts or components

In [13]:
rgb_colors = [(120, 41, 26), (196, 165, 45)]

pd.Series(data = rgb_colors)

0     (120, 41, 26)
1    (196, 165, 45)
dtype: object

*set*:  
- an unordered collection of unique values  
- declare it with a pair of curly braces  
- Python uses the presence of key-value pairs to distinguish between the two data structures
- when it passed to the `Series` constructor,  
pandas raises a `TypeError` exception  
- a set has nether the concept of order(like a list)  
nor the concept of association(like a dictionary)  
$\therefore$ the library cannot assume an order in which to store the set's values  
- if you want to create a `Series` from a set,  
transform it to an ordered data structure before passing it to the constructor
- we're going to make a set,  
and then convert it to a list(using the `list` function)  
before passing it to the `Series` constructor


In [14]:
my_set = {"Ricky", "Bobby"}

pd.Series(list(my_set))

0    Bobby
1    Ricky
dtype: object

*NumPy array*:  
- NumPy `ndarray` object  
- common storage formats for moving data around
- many data science libraries use NumPy arrays  
- when it passed to the `Series` constructor,  
the constructor populates the `Series` with the array's values preserving the order
- we're going to make an `ndarray` using NumPy's `randint` function,  
and pass it to the `Series` constructor  

In [15]:
random_data = np.random.randint(1, 101, 10)
random_data

array([ 59,  13,  35,  82,  88,  42,  44,   4,  31, 100])

In [16]:
pd.Series(random_data)

0     59
1     13
2     35
3     82
4     88
5     42
6     44
7      4
8     31
9    100
dtype: int64

## **2.3 Series attributes**  
---  
*attribute*:  
- a piece of data belonging to an object  
- reveal information about the object's internal state  
- its value may be another object  

<br>

A `Series` is composed of several smaller objects.  
These obejcts can be regarded as puzzle pieces that make up a greater whole.  


Let's explore `Series` attributes through the `calorie_info` `Secreis` from section 2.2:

In [17]:
diet

Cereal              125
Chocoalte Bar       406
Ice Cream Sundae    342
dtype: int64

- this `Series` uses:  
the NumPy `ndarray` object to store the calorie counts  
and the pandas `Index` object to store the food names in the index  
- we can access these nested objects through `Series` attributes  
- the `values` attribute:  
exposes the `ndarray` object that stores the values  
    - we can pass the object to Python's built-in `type` function to confirm its type

In [18]:
diet.values

array([125, 406, 342])

In [19]:
type(diet.values)

numpy.ndarray

- the `index` attribute:  
    - returns the `Index` object that stores the `Series` labels
    - it is built into pandas

In [20]:
diet.index

Index(['Cereal', 'Chocoalte Bar', 'Ice Cream Sundae'], dtype='object')

In [21]:
type(diet.index)

pandas.core.indexes.base.Index

- the `dtype` attribute:  
    - returns the data type of the `Series` values  
    - it reveal helpful details about the object  

In [22]:
diet.dtype

dtype('int64')

- the `size` attribute:
    - retturns the number of values in the `Series`

In [23]:
diet.size

3

- the `shape` attribute:
    - returns a tuple with the demensions of a pandas data structure  
    - for a `Series`, the tuple's only value will be the `Series`' size  
    $\because$ a `Series` is one-dimensional  
    - the comma after the value is a standard visual output for one-element tuples in Python

In [24]:
diet.shape

(3,)

- the `is_unique` attribute:
    - returns a Boolean value indicating whether all the `Series` values are unique

In [25]:
diet.is_unique

True

- the `is_monotonic` attribute:
    - returns a Boolean value indicating whether the `Series` values are in ascending order

In [26]:
diet.is_monotonic_increasing

False

In [27]:
diet.is_monotonic_decreasing

False

summary: 
- attributes ask an object for information on its internal state  
- attributes reveal nested objects, which can have their own functionalities  
- in Python, everything is an object,  
including integers, strings, and Booleans  
$\therefore$ an attribute that returns a number is no technically different from one that returns a complex object such as `ndarray`.

<br>
<br>

## **2.4 Retrieving the first and last rows**  
---  
In this section, we'll start exploring what we can do with `Series` objects.  
A Python object has both attributes and methods.   

<br>

*attribute*:  
- a piece of data belonging to an object:  
a characteristic or detail that the data structure can reveal about itself


*method*:  
- a function that belongs to an object:  
an action or command that we ask the object to perform  
- typically involve some analysis, calculation, or manipulation of the object's attributes  


$\rightarrow$ attributes = an object's state  
$\rightarrow$ methods = an object's behavior  

<br>

Let's create our largest `Series` yet:  
- make an object using Python's built-in `range` function 
    - generate a sequence of all numbers between a starting point and an endpoint  
    - the `range` function has 3 parameters:  
        - a lower bound
        - the upper bound
        - a step sequence  
- pass the `range` object to the `Series` constructor

In [28]:
values = range(0, 500, 5)

nums = pd.Series(values)
nums

0       0
1       5
2      10
3      15
4      20
     ... 
95    475
96    480
97    485
98    490
99    495
Length: 100, dtype: int64

Now, we have a `Series` with 100 values.  

<br>

Let's invoke some simple `Series` methods:  
- `head` method:  
    - returns rows from the beginning or top of the data set  
    - parameter:  
        - `n`: sets the number of rows to extract  
        - its default value = 5
    - we're going to invoke the method with 2 ways:  
        - using the positional argument
        - using the keyword argument

In [29]:
nums.head(3)

0     0
1     5
2    10
dtype: int64

In [30]:
nums.head(n=3)

0     0
1     5
2    10
dtype: int64

- `tail` method  
    - returns rows from the bottom or end of a `Series` 
    - parameter:  
        - `n`: sets the number of rows to extract 
        - its default value = 5

In [31]:
nums.tail(6)

94    470
95    475
96    480
97    485
98    490
99    495
dtype: int64

We can use `head` and `tail` methods to preview the beginning and end of a data set quickly.  

<br>
<br>

## **2.5 Mathematical operations**  
---  
Let's dive into some more advanced `Series` methods.  
A `Series` object includes plenty of statistical and mathematical methods.  
Let's see a few of them in action.  

<br>
<br>

### 2.5.1 Statistical operations  
We'll begin by creating a `Series`:  
- create `Series` from a list of ascending integer numbers
- it includes `np.nan` value in the middle
- the `Series` has missing value  
$\rightarrow$ pandas will coerce the integers to floating-point values

In [32]:
numbers = pd.Series([1, 2, 3, np.nan, 4, 5])
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

Let's invoke some statistical methods:  
- `count`  
    - counts the number of non-missing(null) values

In [33]:
numbers.count()

5

- `sum`  
    - adds the `Series` values together
    - most mathematical methods ignore missing values by default  
        - we can force the inclusion of missing values:  
        pass an argument of `False` to the `skipna` parameter  
    - parameters:  
        - `skipna`:  
        -takes a Boolean value  
        -its default value = `True`  
        -`True`: excludes missing values  
        -`False`: includes missing values  
        - `min_count`:  
        -sets the minimum number of valid values a `Series` must hold for pandas to calculate the sum
        -takes an integer value  


In [34]:
numbers.sum()

15.0

In [35]:
numbers.sum(skipna = False)

nan

Pandas returns a `nan`.  
$\because$ It cannot add the unknown `nan` value to cumulative sum.

In [36]:
numbers.sum(min_count = 3)

15.0

In [37]:
numbers.sum(min_count = 6)

nan

Pandas returns a `nan`.  
$\because$ The number of values in the `Series` is less than the `min_count` parameter's value.

<br>

- `product`
    - multiplies all `Series` values together  
    - parameters:  
        - `skipna`:  
        -takes a Boolean value  
        -its default value = `True`  
        -`True`: excludes missing values  
        -`False`: includes missing values  
        - `min_count`:  
        -sets the minimum number of valid values a `Series` must hold for pandas to calculate the product
        -takes an integer value
        

In [38]:
numbers.product()

120.0

In [39]:
numbers.product(skipna = False)

nan

In [40]:
numbers.product(min_count = 3)

120.0

- `cumsum`:  
    - cumulative sum  
    - returns a new `Series` with a rolling sum of values  
    - each index position holds the sum of values up to and including the value at that index
    - it helps determine which values contribute most to the total

In [41]:
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [42]:
numbers.cumsum()

0     1.0
1     3.0
2     6.0
3     NaN
4    10.0
5    15.0
dtype: float64

Let's walk through some of the calculations in the output:  
- the cumusum at index 0:  
    - `1.0`  
    $\because$ there is nothing to add  
    - the first value in the `numbers` `Series`
- the cumsum at index 1:  
    - `3.0`  
    $\because$  the sum of `1.0` at index 0 and `2.0` at index position 1  
- the cumsum at index 2:  
    - `6.0`  
    $\because$ the sum of `1.0`, `2.0`, and `3.0`
- `numbers` `Series` has a `nan` value at index 3  
pandas cannot add a missing value to the cumsusm  
$\rightarrow$ pandas places a `nan` at the same index in the returned `Series`
- the cumsum at index 4:  
    - `10.0`  
    $\because$ pandas adds the previous cumsum with the current index's value ($1.0 + 2.0 + 3.0 + 4.0$)  

<br>

If we pass the `False` argument to `skipna` parameter, the `Series` will list the cumulative sum up to the index with the first missing value and then `NaN` for the remaining values:


In [43]:
numbers.cumsum(skipna = False)

0    1.0
1    3.0
2    6.0
3    NaN
4    NaN
5    NaN
dtype: float64

- `pct_change`:  
    - returns the percentage difference from one `Series` value to the next  
    - at each index:  
        - adds the last index's value and the current index's value
        - divides the sum by the last index's value
    - pandas can calculate a percentage difference only if both indexes have valid values
    - defaults to a *forward-fill* strategy for missing values  
        - pandas replaces a `nan` with the last valid value it encountered
    - parameters:
        - `fill_method`:  
        customizes the protocol by which `pct_change` substitutes `NaN` values

In [44]:
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [45]:
numbers.pct_change()

0         NaN
1    1.000000
2    0.500000
3    0.000000
4    0.333333
5    0.250000
dtype: float64

Here's how pandas operates:
- at index 0:  
    - cannot compare the value `1.0` with any previous value  
    $\rightarrow$ index 0 in the returned `Series` has `NaN` value
- at index 1:
    - compares index 1's value of `2.0` with index 0's value of `1.0`  
    - the percentage change between 2.0 and 1.0 is 100% (double)  
    $\rightarrow$ the value of index 1 is `1.00000`  
- at index 2:  
    - compares index 2's value of `3.0` with index 1's value of `2.0`  
    - the percentage change between 3.0 and 2.0 is 50% (half)  
    $\rightarrow$ the value of index 2 is `0.50000`
- at index 3:  
    - the `Series` has a `NaN` missing value  
    - pandas substitutes the last encountered value (`3.0` from index 2) in its place  
    - the percentage change between the substituted `3.0` at index 3 and the `3.0` at index 2 is `0`
- at index 4:  
    - compares index 4's value of `4.0` with the previous row's value  
    - pandas again substitutes the `nan` with the last valid value (`3.0` from index 2)
    - the percentage change between `4` and `3` is `0.33333`  

<br>

The following table (Figure 2.3) shows a visual representation of a forward-fill percentage-change calculation:  


| |`numbers`| | | operation | | | output |
|:-:|:-----:|:-:|:-:|:-----:|:-:|:-:|:-----:|
| 0 | 1.0 | | 0 | NaN | | 0 | NaN |
| 1 | 2.0 | | 1 | $\frac{2.0 - 1.0}{1.0}$ | | 1 | 1.00000 |
| 2 | 3.0 | | 2 | $\frac{3.0 - 2.0}{2.0}$ | | 2 | 0.50000 |
| 3 | NaN | | 3 | $\frac{3.0 - 3.0}{3.0}$ | | 3 | 0.00000 |
| 4 | 4.0 | | 4 | $\frac{4.0 - 3.0}{3.0}$ | | 4 | 0.33333 |
| 5 | 5.0 | | 5 | $\frac{5.0 - 4.0}{4.0}$ | | 5 | 0.25000 |  

<br>

`fill_method` parameter:  
- available across many methods  
$\rightarrow$ it is worth taking the time to familiarize yourself with it  
- with the default forward-fill strategy,  
pandas replaces a `nan` value with the last valid observation  
- we can pass an explicit argument of `"pad"` or `"ffill"`
$\rightarrow$ pandas will use the forward-fill strategy  
- we can pass `"bfill"` for *backfill* solution  
$\rightarrow$ pandas replaces a `nan` value with the next valid observation


Let's pass the `fill_method` parameter a value of `"bfill"`:

In [46]:
# The two lines below are equivalent

numbers.pct_change(fill_method="bfill")
numbers.pct_change(fill_method="backfill")

0         NaN
1    1.000000
2    0.500000
3    0.333333
4    0.000000
5    0.250000
dtype: float64

output explanation:  
- the values at index positions 3 and 4 differ between the forward-fill and backfill strategies
- at index 0:  
    - pandas cannot compare the value `1.0` in the `Series` with any previous value  
    $\rightarrow$ index 0 in the returned `Series` has a `NaN` value  
- at index 3:  
    - pandas runs into a `NaN` in the `Series`  
    - pandas substitutes the next valid value (`4.0` at index 4) in its place  
    - the percentage change between `4.0` at index 3 and `3.0` at index 2 in `numbers` is `0.33333`
- at index 4:
    - pandas compares `4.0` with index 3's value  
    - pandas replaces the `NaN` at index 3 with `4.0` which is the next valid value  
    - the percentage change between `4.0` at index 4 and `4.0` at index 3 is `0.00000`


The following table (Figure 2.4) shows a visual representation of a backfill percentage-change calculation:  


| |`numbers`| | | operation | | | output |
|:-:|:-----:|:-:|:-:|:-----:|:-:|:-:|:-----:|
| 0 | 1.0 | | 0 | NaN | | 0 | NaN |
| 1 | 2.0 | | 1 | $\frac{2.0 - 1.0}{1.0}$ | | 1 | 1.00000 |
| 2 | 3.0 | | 2 | $\frac{3.0 - 2.0}{2.0}$ | | 2 | 0.50000 |
| 3 | NaN | | 3 | $\frac{4.0 - 3.0}{3.0}$ | | 3 | 0.33333 |
| 4 | 4.0 | | 4 | $\frac{4.0 - 4.0}{4.0}$ | | 4 | 0.00000 |
| 5 | 5.0 | | 5 | $\frac{5.0 - 4.0}{4.0}$ | | 5 | 0.25000 |

<br>

- `mean`:  
    - returns the average of the values in the `Series`  
    - average is the result of dividing the sum of values by the count of values

In [47]:
numbers.mean()

3.0

- `median`:
    - returns the middle number in a sorted `Series` of values
    - half of the `Series` values will be below the median
    - half of the values will be above the median

In [48]:
numbers.median()

3.0

- `std`:
    - returns the *standard deviation* of the `Series` values
    - *standard deviation*:  
    a measure of the variation in the data

In [49]:
numbers.std()

1.5811388300841898

- `max`:
    - retrieves the largest value from the `Series`
    - pandas sorts a string `Series` alphabetically
    - the largest string is the one closest to the end of the alphabet

In [50]:
numbers.max()

5.0

In [52]:
animals = pd.Series(["koala", "aardvark", "zebra"])

animals.max()

'zebra'

- `min`:
    - retrieves the smallest value from the `Series`
    - the smallest string is the one closest to the start of the alphabet

In [51]:
numbers.min()

1.0

In [53]:
animals.min()

'aardvark'

- `describe`:  
    - returns a summary of the `Series` statistics
    - returns a `Series` of statistical evaluations:  
    count, mean, standard deviation

In [54]:
numbers.describe()

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64

- `sample`:  
    - selects a random assortment of values from the `Series`
    - it is possible for the order of values to differ between the new `Series` and the original  

In [55]:
numbers.sample(3)

0    1.0
2    3.0
5    5.0
dtype: float64

output explanation:  
- the lack of `NaN` values from the random selection allows pandas to return a `Series` of integers  
- if `NaN` was even one of the values,  
pandas would return a `Series` of floats instead

<br>

- `unique`:
    - returns a NumPy `ndarray` of unique values from the `Series`

In [56]:
authors = pd.Series(["Hemingway", "Orwell", "Dostoevsky", "Fitzgerald", "Orwell"])

authors.unique()

array(['Hemingway', 'Orwell', 'Dostoevsky', 'Fitzgerald'], dtype=object)

- `nunique`:
    - returns the number of unique values in the `Series`
    - its return value will be equal to the length of the array that the `unique` method returns

In [57]:
authors.nunique()

4