# Series

A series is simply a single column of data

In [4]:
import pandas as pd

---

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [7]:
# dtype stands for data type, and will show what kind of data type the series is.
# i.e. int, object, bool, etc.

ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rocky Road"]
pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rocky Road
dtype: object

In [8]:
pd.Series(ice_cream[1])

0    Vanilla
dtype: object

In [9]:
lottery_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lottery_numbers)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [10]:
registration = [True, False, True, False, False, True]
pd.Series(registration)

0     True
1    False
2     True
3    False
4    False
5     True
dtype: bool

---

## Create a Series Object from a Dictionary

In [13]:
sushi = {
    "Salmon": "Orange",
    "Tuna": "White",
    "Eel": "Brown",
    "Cod": "White",
}
pd.Series(sushi)

Salmon    Orange
Tuna       White
Eel        Brown
Cod        White
dtype: object

In [14]:
# pd.Series(sushi[1])  # Error
pd.Series(sushi["Salmon"])

0    Orange
dtype: object

---

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [17]:
prices = pd.Series([2.99, 4.50, 6.34, 2.75, 6.23])

In [18]:
prices.sum()

22.81

In [19]:
prices.product()

1461.4827727500003

In [20]:
prices.mean()

4.561999999999999

In [21]:
prices.std()

1.71040053788579

---

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [24]:
adjectives = pd.Series(["Intelligent", "God-Like", "Powerful", "Ambitious", "Transcendent"])
adjectives

0     Intelligent
1        God-Like
2        Powerful
3       Ambitious
4    Transcendent
dtype: object

In [25]:
adjectives.size

5

In [26]:
adjectives.is_unique

True

In [27]:
adjectives.values

array(['Intelligent', 'God-Like', 'Powerful', 'Ambitious', 'Transcendent'],
      dtype=object)

In [28]:
adjectives.index

RangeIndex(start=0, stop=5, step=1)

In [29]:
type(adjectives.values)

numpy.ndarray

In [30]:
type(adjectives.index)

pandas.core.indexes.range.RangeIndex

---

## Methods vs Attributes:
A method is a function (or commnad) that does something. i.e. sum up a list of numbers.

An atrribute is simply a piece of data that belongs to the object; a characteristic detail. i.e. its size or age.

---

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

#### Example
def add_sum( parameter_1, parameter_2 ): <br>
    &emsp;&emsp; return parameter_1 + parameter_2

add_sum( argument_1, argument_2 )

In [35]:
fruits = ["Apple", "Orange", "Plum", "Pineapple", "Banana", "Strawberry", "Grape"]
weekdays = ["Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"]

In [36]:
# NOTE: Both arguments MUST be the same length
pd.Series(fruits, weekdays)

Mon          Apple
Tue         Orange
Wed           Plum
Thur     Pineapple
Fri         Banana
Sat     Strawberry
Sun          Grape
dtype: object

In [37]:
pd.Series(index=weekdays, data=fruits)

Mon          Apple
Tue         Orange
Wed           Plum
Thur     Pineapple
Fri         Banana
Sat     Strawberry
Sun          Grape
dtype: object

In [38]:
pd.Series()

Series([], dtype: object)

In [39]:
pd.Series(dtype=int)

Series([], dtype: int64)

---

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [42]:
pokemon = pd.read_csv("pokemon.csv")
pokemon.head()

Unnamed: 0,Name,Type
0,Bulbasaur,"Grass, Poison"
1,Ivysaur,"Grass, Poison"
2,Venusaur,"Grass, Poison"
3,Charmander,Fire
4,Charmeleon,Fire


In [43]:
pd.read_csv("pokemon.csv", usecols=["Name"]).head().squeeze()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [44]:
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze()
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

---

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [164]:
len(pokemon)

1010

In [170]:
type(pokemon)  # DataFrame
type(google)  # Series

pandas.core.series.Series

In [192]:
list(google)[:5]

[2.490664, 2.51582, 2.758411, 2.770615, 2.614201]

In [226]:
sorted(pokemon["Name"])[:5]

['Abomasnow', 'Abra', 'Absol', 'Accelgor', 'Aegislash']

In [265]:
dict(pokemon)

{'Name': 0          Bulbasaur
 1            Ivysaur
 2           Venusaur
 3         Charmander
 4         Charmeleon
             ...     
 1005    Iron Valiant
 1006        Koraidon
 1007        Miraidon
 1008    Walking Wake
 1009     Iron Leaves
 Name: Name, Length: 1010, dtype: object,
 'Type': 0          Grass, Poison
 1          Grass, Poison
 2          Grass, Poison
 3                   Fire
 4                   Fire
               ...       
 1005     Fairy, Fighting
 1006    Fighting, Dragon
 1007    Electric, Dragon
 1008       Water, Dragon
 1009      Grass, Psychic
 Name: Type, Length: 1010, dtype: object}

In [231]:
max(google)

151.863495

In [233]:
min(google)

2.47049

---

## Check for Inclusion with Python's `in` Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [316]:
"car" in "racecar"  # True
4 in [1, 2, 3]  # False

# NOTE: `in` will look inside the series index, not the values
# You must use list_name.values
"Bulbasaur" in pokemon  # False
"Bulbasaur" in pokemon.values

True

In [309]:
5 in pokemon.index  # True
1050 in pokemon.index

False

---

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

---

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

---

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

---

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

---

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

---

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

---

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

---

## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

---

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

---

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

---

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

---

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.