# **Chpater 02: The Series object**  

gonna learn about:
- Instantiating `Series` objects from lists, dictionaries, tuples, and more
- Setting a custom index on a `Series`
- Accessing attributes and invoking methods on
a `Series`
- Performing mathematical operations on one
or more `Series`
- Passing the `Series` to Python’s built-in functions

<br>  

`Series`:  
- one of pandas' core data structures  
- a one-dimensional labeled array for homogeneous data  
    - *array*:  
    an ordered collection of values comparable to a list in Python  
    - *homogeneous*:  
    the values are of the same data type  
    (ex. all integers or all Booleans, ...)  
    - *labeled*:  
        - pandas assigns each `Series` value a *label*  
        - *label*: an identifier we can use to locate the value
- pandas assigns each `Series` value an *order*  
    - *order*:  
    a position in line  
    it starts counting from 0  
    &rightarrow; the first `Series` value occupies position 0, the second value occupies position 1, and so on  
- the `Series` is a one-dimensional data structure  
&because; we need one reference point to access a value: either a label or a position  
- a `Series` combines and expands the best features of Python's native data structures  
    - like a list, it holds its values in a sequenced order  
    - like a dictionary, it assigns a label(key) to each value  
    &rightarrow; we gain the benefits of both of those objects plus more than 180 methods for data manipulation  

<br>

We're going to familiarize ourselves with the mechanics of a `Series` object,  
learn how to calculate the sum and average of `Series` values,  
apply mathematical operations to each value in a `Series`, ...  

<br>

## **2.1 Overview of a Series**  
---  
Let's create some `Series` objects.  
We'll begin by importing the pandas and NumPy packages.  
(We're gonna use NumPy in section 2.1.4.)  
- use the `import` keyword  
- using `as` keyword, assign an alias to each package
- the popular community aliases for libraries:  
    - pandas: `pd`
    - NumPy: `np`

In [1]:
import pandas as pd
import numpy as np

`pd` 
- the `pd` namespace holds the top-level exports of the `pandas` package,  
a bundle of more than 100 classes, functions, exceptions, constants, ...  
- think of `pd` as being the lobby to the library  
&rightarrow; an entrance room where we can access pandas' available features
- the pandas' exports are available as attributes on `pd`  
we can access an attribute with dot syntax:  
`pd.attribute`  

<br>

### 2.1.1 Classes and instances  
*class*:  
- a blueprint for a Python object  
&rightarrow; The `pd.Series` class is a template
- need to create a concrete instance of it  
- instantiate an object from a class with parentheses:  

In [2]:
pd.Series()

Series([], dtype: object)

### 2.1.2 Populating the Series with values  
*constructor*:  
- a method that builds an object from a class  
ex) when we wrote `pd.Series()`, we used the `Seires` constructor to create a new `Series` object  


starting state:  
- think of it as being object's initial configuration(settings)  
- we can often set it by passing arguments to the constructor  
    - *argument*: an input we pass to a method  

<br>  

practice creating `Series` from manual data:  
- goal:  
to get comfortable with the look and feel of the data structure  
- the first argument:  
an iterable object whose values will populate the `Series`  
    - we can pass various inputs:  
    -lists  
    -dictionaries  
    -tuples  
    -NumPy `ndarray`s  
    -...  
- we're going to create a `Series` object with data from a Python list  
    - declare a list of four strings  
    - pass the list to the `Series` constructor



In [3]:
ice_cream_flavors = [
    "Chocolate",
    "Vanilla",
    "Strawberry",
    "Rum Raisin",
]

pd.Series(ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

We've created a new `Series` with the four values form `ice_cream_flavors` list.  
Notice that pandas preserves the order of the values.  

<br>

*parameter*:  
- a name given to an expected input to a function or method  
- Python matches every argument we pass to a constructor with a parameter  
- the `Series` constructor define six parameters:  
    - `data`
    - `index`  
    - `dtype`  
    - `name`  
    - `copy`  
    - `fastpath`  
- we can use these parameters to set the object's initial state  
&rightarrow; parameters can be regarded as being configuration options for the object  
- if we don't pass a argument to a parameter, Python uses the parameter's *default argument*  
ex) if we don't pass a value for the `name` parameter, Python will use `None` which is the default argument for the `name` parameter  
- a parameter with a default argument is inherently optional  
&because; it will always have some argument, either explicitly from its invocation or implicitly from its definition  
ex) we were able to instantiate a `Series` without arguments because all 6 parameters are optional  
- we can connect parameters and arguments explicitly with keyword arguments  
- keyword arguments are advantageous  
    because:
    - they provide context for what each constructor argument represents  
    - they permit us to pass parameters in any order  
    &therefore; following examples make the same `Series` object:  
    `pd.Series(data=ice_cream_flavors, index=None)`  
    `pd.Series(index=None, data=ice_cream_flavors)`

<br>

We'll pass the argument to the `data` parameter explicitly by using a keyword argument:

In [4]:
pd.Series(data=ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

### 2.1.3 Customizing the Series index  
We mentioned that pandas assigns a position in line to each `Series` value.  


*index*:  
- the collection of increamenting integers on the left side of the `Series`  
- each number of index signifies a value's order within the `Series`  
- starts counting from 0  
the `"Chocolate"` occupies index 0, the `"Vanilla"` occupies index 1, and so on  
- its term "index" describes both the collection of identifiers and an individual identifier  
&rightarrow; following two expressions are valid:  
    1. "The index of the Series consists of integers"
    2. "The value `Strawberry` is found at index 2"  
- the last index position will always be 1 less than the total number of values  
&because; the index starts counting from 0
&rightarrow; the `Series` has four ice cream flavors, so the index counts up to 3  
- we can assign each `Series` value an index label  
    - index labels can be of any immutable data type:  
    strings, tuples, datetimes, ...  
    - its flexibility makes a `Series` powerful  
    we can reference a value by its order or by its key/label  
    &rightarrow; in a sense, each values has two identifiers  
- `index` which is a parameter of the `Series` constructor:  
sets the index labels of the `Series`  
    - if we don't pass an argument to the parameter, pandas defaults to a numeric index starting from 0  
    - with this type of index, the label and the position identifier are one and the same
- index permits duplicates,  
a detail that distinguishes a `Series` from a Python dictionary  
    - but, duplication is not recommended  
    - it is ideal to avoid duplicate index labels whenever possible  
    &because; unique index allows the library to locate index labes more quickly

<br>  

practice:  
- we'll construct a `Series` with a custom index  
- we can pass objects of different data types to the `data` and `index` parameters  
but, they must have the same length  
- we're going to pass a list of strings(length: 4) for the `data` parameter  
- we're going to pass a tuple of strings(length: 4) to the `index` parameter

In [5]:
ice_cream_flavors = [
    "Chocolate",
    "Vanilla",
    "Strawberry",
    "Rum Raisin"
]

days_of_week = ("Monday", "Wednesday", "Friday", "Saturday")

# The following statements are equivalent
pd.Series(ice_cream_flavors, days_of_week)
pd.Series(data = ice_cream_flavors, index = days_of_week)

Monday        Chocolate
Wednesday       Vanilla
Friday       Strawberry
Saturday     Rum Raisin
dtype: object

practice explanation:
- pandas uses shared index positions to associate the values from the `ice_cream_flavors` list and the `days_of_week` tuple  
- pandas sees `"Rum Raisin"` and `"Saturday"` at index position 3 in their respective objects  
&rightarrow; pandas ties them together in the `Series`  
- even though the index consists of string labels,  
pandas still assigns each `Series` value an index position  
&rightarrow; we can access the value `"Vanilla"` either by the index label `"Wednesday"` or by index position 1  

<br>

`dtype` statement at the bottom of the output:  
- reflects the data type of the values in the `Series`  
- for most data types, pandas will display a predictable type  
(such as `bool`, `float`, `int`, ...)  
- for strings and more-complex objects, pandas will display `object`  
- pandas does its best to infer an appropriate data type for the `Series` from the `data` parameter's values  
- we can force coercion to a different type via the constructor's `dtype` parameter  

<br>

practice:  
- we'll create `Series` objects from lists of Boolean, integer, and floating-point values  
- we'll create a `Series` object from list of integer values and specify the `dtype` parameter to force the data type to be `float64`


In [6]:
bunch_of_bools = [True, False, False]
pd.Series(bunch_of_bools)

0     True
1    False
2    False
dtype: bool

In [7]:
stock_prices = [985.32, 950.44]
time_of_day = ["Open", "Close"]
pd.Series(data = stock_prices, index = time_of_day)

Open     985.32
Close    950.44
dtype: float64

In [8]:
lucky_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lucky_numbers)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [9]:
lucky_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lucky_numbers, dtype = "float")

0     4.0
1     8.0
2    15.0
3    16.0
4    23.0
5    42.0
dtype: float64

practice explanation:  
- the `float64` data type indicate that each floating-point value occupies 64 bits(8 bytes) of your computer's memory(RAM)  
- the `int64` data type indicate that each integer value occupies 64 bits(8 bytes) of your computer's memory(RAM)  
- the `Series` constructor expects the `dtype` parameter to be third in line  
&therefore; we cannot pass it directly after the `lucky_numbers`  
&therefore; we have to use keyword arguments

