# Series

A series is simply a single column of data

In [1]:
import pandas as pd

---

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [2]:
# dtype stands for data type, and will show what kind of data type the series is.
# i.e. int, object, bool, etc.

ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rocky Road"]
pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rocky Road
dtype: object

In [3]:
pd.Series(ice_cream[1])

0    Vanilla
dtype: object

In [4]:
lottery_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lottery_numbers)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [5]:
registration = [True, False, True, False, False, True]
pd.Series(registration)

0     True
1    False
2     True
3    False
4    False
5     True
dtype: bool

---

## Create a Series Object from a Dictionary

In [6]:
sushi = {
    "Salmon": "Orange",
    "Tuna": "White",
    "Eel": "Brown",
    "Cod": "White",
}
pd.Series(sushi)

Salmon    Orange
Tuna       White
Eel        Brown
Cod        White
dtype: object

In [7]:
# pd.Series(sushi[1])  # Error
pd.Series(sushi["Salmon"])

0    Orange
dtype: object

---

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [8]:
prices = pd.Series([2.99, 4.50, 6.34, 2.75, 6.23])

In [9]:
prices.sum()

22.81

In [10]:
prices.product()

1461.4827727500003

In [11]:
prices.mean()

4.561999999999999

In [12]:
prices.std()

1.71040053788579

---

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [13]:
adjectives = pd.Series(["Intelligent", "God-Like", "Powerful", "Ambitious", "Transcendent"])
adjectives

0     Intelligent
1        God-Like
2        Powerful
3       Ambitious
4    Transcendent
dtype: object

In [14]:
adjectives.size

5

In [15]:
adjectives.is_unique

True

In [16]:
adjectives.values

array(['Intelligent', 'God-Like', 'Powerful', 'Ambitious', 'Transcendent'],
      dtype=object)

In [17]:
adjectives.index

RangeIndex(start=0, stop=5, step=1)

In [18]:
type(adjectives.values)

numpy.ndarray

In [19]:
type(adjectives.index)

pandas.core.indexes.range.RangeIndex

---

## Methods vs Attributes:
A method is a function (or commnad) that does something. i.e. sum up a list of numbers.

An atrribute is simply a piece of data that belongs to the object; a characteristic detail. i.e. its size or age.

---

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

#### Example
def add_sum( parameter_1, parameter_2 ): <br>
    &emsp;&emsp; return parameter_1 + parameter_2

add_sum( argument_1, argument_2 )

In [20]:
fruits = ["Apple", "Orange", "Plum", "Pineapple", "Banana", "Strawberry", "Grape"]
weekdays = ["Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"]

In [21]:
# NOTE: Both arguments MUST be the same length
pd.Series(fruits, weekdays)

Mon          Apple
Tue         Orange
Wed           Plum
Thur     Pineapple
Fri         Banana
Sat     Strawberry
Sun          Grape
dtype: object

In [22]:
pd.Series(index=weekdays, data=fruits)

Mon          Apple
Tue         Orange
Wed           Plum
Thur     Pineapple
Fri         Banana
Sat     Strawberry
Sun          Grape
dtype: object

In [23]:
pd.Series()

Series([], dtype: object)

In [24]:
pd.Series(dtype=int)

Series([], dtype: int64)

---

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [25]:
pokemon = pd.read_csv("data_files/pokemon.csv")
pokemon.head()

Unnamed: 0,Name,Type
0,Bulbasaur,"Grass, Poison"
1,Ivysaur,"Grass, Poison"
2,Venusaur,"Grass, Poison"
3,Charmander,Fire
4,Charmeleon,Fire


In [26]:
pd.read_csv("data_files/pokemon.csv", usecols=["Name"]).head().squeeze()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [27]:
google = pd.read_csv("data_files/google_stock_price.csv", usecols=["Price"]).squeeze()
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

---

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [28]:
len(pokemon)

1010

In [29]:
type(pokemon)  # DataFrame
type(google)  # Series

pandas.core.series.Series

In [30]:
list(google)[:5]

[2.490664, 2.51582, 2.758411, 2.770615, 2.614201]

In [31]:
sorted(pokemon["Name"])[:5]

['Abomasnow', 'Abra', 'Absol', 'Accelgor', 'Aegislash']

In [32]:
dict(pokemon)

{'Name': 0          Bulbasaur
 1            Ivysaur
 2           Venusaur
 3         Charmander
 4         Charmeleon
             ...     
 1005    Iron Valiant
 1006        Koraidon
 1007        Miraidon
 1008    Walking Wake
 1009     Iron Leaves
 Name: Name, Length: 1010, dtype: object,
 'Type': 0          Grass, Poison
 1          Grass, Poison
 2          Grass, Poison
 3                   Fire
 4                   Fire
               ...       
 1005     Fairy, Fighting
 1006    Fighting, Dragon
 1007    Electric, Dragon
 1008       Water, Dragon
 1009      Grass, Psychic
 Name: Type, Length: 1010, dtype: object}

In [33]:
max(google)

151.863495

In [34]:
min(google)

2.47049

---

## Check for Inclusion with Python's `in` Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [35]:
"car" in "racecar"  # True
4 in [1, 2, 3]  # False

# NOTE: `in` will look inside the series index, not the values
# You must use list_name.values
"Bulbasaur" in pokemon  # False
"Bulbasaur" in pokemon.values

True

In [36]:
5 in pokemon.index  # True
1050 in pokemon.index

False

---

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.
- Numbers are sorted by numerical value [0-9].
- Letters are sorted by alphabetical value [A-Z].

NOTE: These do NOT change the original series. We are getting a new series every time.

In [37]:
google.sort_values().head()

10    2.470490
0     2.490664
13    2.509095
11    2.514326
12    2.515820
Name: Price, dtype: float64

In [38]:
google.sort_values(ascending=False).head()

4395    151.863495
4345    151.000000
4346    150.141754
4336    150.000000
4341    150.000000
Name: Price, dtype: float64

In [39]:
google.sort_values(ascending=False).head()

4395    151.863495
4345    151.000000
4346    150.141754
4336    150.000000
4341    150.000000
Name: Price, dtype: float64

In [40]:
pokemon["Name"].sort_values().head()

459    Abomasnow
62          Abra
358        Absol
616     Accelgor
680    Aegislash
Name: Name, dtype: object

In [41]:
pokemon["Name"].sort_values(ascending=False).head()

717     Zygarde
633    Zweilous
40        Zubat
569       Zorua
570     Zoroark
Name: Name, dtype: object

---

## The sort_index Method
- The `index_col` argument allows you to specify what the index value would be. Simply put in the name of column (in this case "Name") and the values inside that column will become the index.
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [42]:
pokemon = pd.read_csv("data_files/pokemon.csv", index_col="Name")
pokemon.squeeze()  # Still need to use squeeze to turn it into a series

Name
Bulbasaur          Grass, Poison
Ivysaur            Grass, Poison
Venusaur           Grass, Poison
Charmander                  Fire
Charmeleon                  Fire
                      ...       
Iron Valiant     Fairy, Fighting
Koraidon        Fighting, Dragon
Miraidon        Electric, Dragon
Walking Wake       Water, Dragon
Iron Leaves       Grass, Psychic
Name: Type, Length: 1010, dtype: object

In [43]:
pokemon.sort_index().head()

Unnamed: 0_level_0,Type
Name,Unnamed: 1_level_1
Abomasnow,"Grass, Ice"
Abra,Psychic
Absol,Dark
Accelgor,Bug
Aegislash,"Steel, Ghost"


In [44]:
pokemon.sort_index(ascending=False).head()

Unnamed: 0_level_0,Type
Name,Unnamed: 1_level_1
Zygarde,"Dragon, Ground"
Zweilous,"Dark, Dragon"
Zubat,"Poison, Flying"
Zorua,Dark
Zoroark,Dark


---

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [45]:
# Grab a series of names
pokemon = pd.read_csv("data_files/pokemon.csv", usecols=["Name"]).squeeze()

In [46]:
# Get a single index
pokemon.iloc[0]
pokemon.iloc[-2]

'Walking Wake'

In [47]:
# Get multiple indexs
pokemon.iloc[ [1, 100, 3, 52] ]

1         Ivysaur
100     Electrode
3      Charmander
52        Persian
Name: Name, dtype: object

In [48]:
# Get a range of indexs
pokemon.iloc[10:20]
pokemon.iloc[1000::3]

1000       Wo-Chien
1003         Chi-Yu
1006       Koraidon
1009    Iron Leaves
Name: Name, dtype: object

---

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

NOTE: Even though you dont see the numerical index, you can still use it with iloc.

In [49]:
pokemon = pd.read_csv("data_files/pokemon.csv", index_col="Name").squeeze()
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [50]:
pokemon.loc["Bulbasaur"]

'Grass, Poison'

In [51]:
pokemon.loc[ ["Bulbasaur", "Charmander", "Caterpie"] ]

Name
Bulbasaur     Grass, Poison
Charmander             Fire
Caterpie                Bug
Name: Type, dtype: object

In [52]:
pokemon.iloc[0]

'Grass, Poison'

---

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [53]:
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [54]:
error_msg = "Pokemon not found"

pokemon.get("Bulbasaur", error_msg)

'Grass, Poison'

In [55]:
pokemon.get("MegaGaymon", error_msg)

'Pokemon not found'

In [56]:
pokemon.get(["Bulbasaur", "MegaGaymon", "Charmander"], error_msg)

'Pokemon not found'

---

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [57]:
pokemon = pd.read_csv("data_files/pokemon.csv", usecols=["Name"]).squeeze()
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [58]:
pokemon.iloc[0] = "SuperGayMon"

In [59]:
pokemon.head(3)

0    SuperGayMon
1        Ivysaur
2       Venusaur
Name: Name, dtype: object

In [60]:
# NOTE: Both lists MUST be the same length
pokemon.iloc[ [0, 1, 2] ] = ["Mega Faggot", "Robo Bitch", "Squirtle Shit"]

In [61]:
pokemon.head()

0      Mega Faggot
1       Robo Bitch
2    Squirtle Shit
3       Charmander
4       Charmeleon
Name: Name, dtype: object

In [62]:
pokemon = pd.read_csv("data_files/pokemon.csv", index_col="Name").squeeze()
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [63]:
pokemon.loc["Bulbasaur"] = "Nigga"
pokemon.iloc[1] = "Faggot"

pokemon.head()

Name
Bulbasaur             Nigga
Ivysaur              Faggot
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

---

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

As you can see, `pokemon_series` is a **view**. This is because pokemon_series is really just a *placeholder* for pokemon_df. So in reality, what's happening when you input <br>
`pokemon_series[0] = "shit"` the computer reads it as `pokemon_df[0] = "shit"`

### View / Changing

In [64]:
pokemon_df = pd.read_csv("data_files/pokemon.csv", usecols=["Name"])
pokemon_df.head(3)

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur


In [65]:
pokemon_series = pokemon_df.squeeze()

pokemon_series[0] = "Barbie's dream house"

In [66]:
pokemon_df.head(3)

Unnamed: 0,Name
0,Barbie's dream house
1,Ivysaur
2,Venusaur


### Copy

In [67]:
pokemon_df = pd.read_csv("data_files/pokemon.csv", usecols=["Name"])
pokemon_df.head(3)

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur


In [68]:
pokemon_series = pokemon_df.squeeze().copy()

pokemon_series[0] = "Barbie's dream house"

pokemon_series.head(3)

0    Barbie's dream house
1                 Ivysaur
2                Venusaur
Name: Name, dtype: object

In [69]:
pokemon_df.head(3)

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur


---

## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

In [75]:
google = pd.read_csv("data_files/google_stock_price.csv", usecols=["Price"]).squeeze()

In [90]:
google.count()  # This will exclude any missing (null or NaN) values
google.size  # This will give you ALL the values despite if they are missing or not

google.sum()
google.product()  # This will return inf (aka infinity) because the number is just to big

google.mean()
google.median()
google.mode()
google.std()

google.max()
google.min()

google.describe()

  return umr_prod(a, axis, dtype, out, keepdims, initial, where)


count    4793.000000
mean       40.211377
std        37.274753
min         2.470490
25%        12.767395
50%        26.327717
75%        56.311001
max       151.863495
Name: Price, dtype: float64

---

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

In [92]:
google.head(3)

0    2.490664
1    2.515820
2    2.758411
Name: Price, dtype: float64

In [103]:
google.add(10)
google + 10

google.sub(10)
google - 10

google.mul(2)
google * 2

google.div(2)
google / 2

0        1.245332
1        1.257910
2        1.379206
3        1.385307
4        1.307100
          ...    
4788    66.040001
4789    66.499000
4790    67.785004
4791    68.525002
4792    69.214996
Name: Price, Length: 4793, dtype: float64

---

## The value_counts Method
- The `value_counts` method returns the number of times **each unique value** occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

In [108]:
pokemon = pd.read_csv("data_files/pokemon.csv", index_col="Name").squeeze()

In [121]:
pokemon.value_counts()

pokemon.value_counts(ascending=True)  # ascending is false by default

# Find how many fire pokemon there are
pokemon.value_counts().loc["Fire"]

# Get the percent (in number form) for how frequently that item shows up
# i.e. what percent of pokemon are water type? 0.07 or 7%
pokemon.value_counts(normalize=True)
# You can convert the number to percent by multiplying by 100
pokemon.value_counts(normalize=True).mul(100)

Type
Water               7.326733
Normal              7.326733
Grass               4.554455
Psychic             3.861386
Fire                3.564356
                      ...   
Dragon, Normal      0.099010
Flying, Fighting    0.099010
Dragon, Water       0.099010
Ground, Fighting    0.099010
Fairy, Psychic      0.099010
Name: proportion, Length: 200, dtype: float64

---

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

In [124]:
pokemon = pd.read_csv("data_files/pokemon.csv", usecols=["Name"]).squeeze()
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [126]:
pokemon.apply(len).head()

0     9
1     7
2     8
3    10
4    10
Name: Name, dtype: int64

In [132]:
def count_of_a(pokemon):
    return pokemon.count("a")

pokemon.apply(count_of_a).head()

0    2
1    1
2    1
3    2
4    1
Name: Name, dtype: int64

---

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.

In [139]:
pokemon = pd.read_csv("data_files/pokemon.csv", index_col="Name").squeeze()
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [141]:
attack_power = {
    "Grass": 10,
    "Fire": 15,
    "Water": 20,
    "Fairy, Fighting": 50,
    "Grass, Psychic": 100,
}

In [143]:
# NOTE: If even one value is NaN, the dtype will default to float
fallback = 0
pokemon.map(attack_power, "ignore")

ValueError: na_action must either be 'ignore' or None, 0 was passed