# Chapter 6: Operators (& Dunder Methods)

In [2]:
import pandas as pd
import numpy as np

In [3]:
url = "http://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip"
df = pd.read_csv(url)
city_mpg = df.city08
highway_mpg = df.highway08

  df = pd.read_csv(url)


## 6.2 Dunder Methods

In [4]:
2 + 4

6

In [5]:
## under the covers, python runs this
(2).__add__(4)

6

## 6.3 Index Alignment

- Can apply most math operations on a series with another series
- Can also use scalar
- When operating with two series, pandas will align the index before performing the operation
- Aligning takes each index entry in the left series and match it up with every entry with the same name in the index of the right series
- Make sure that the indexes are unique (no duplicates) and are common to both series

In [6]:
# repeated index series
s1 = pd.Series([10, 20, 30], index=[1, 2, 2])
s2 = pd.Series([35, 44, 53], index=[2, 2, 4], name='s2')

In [7]:
s1

1    10
2    20
2    30
dtype: int64

In [8]:
s2

2    35
2    44
4    53
Name: s2, dtype: int64

In [9]:
s1 + s2

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64

## 6.4 Broadcasting

- When we perform math operations with a scalar, pandas broadcasts the operation to all values
- Advantage to broadcasting is that the operations are optimized and happen very quickly in the CPU (vectorization)

## 6.5 Iteration

- Avoid using a for loop with series

## 6.6 Operator Methods

- Dunder methods fill in ``NaN`` when one of the operands in missing following index alignment

In [10]:
s1 + s2

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64

In [11]:
s1.add(s2)

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64

In [12]:
# can allow us to specify missing parameter values
s1.add(s2, fill_value=0)

1    10.0
2    55.0
2    64.0
2    65.0
2    74.0
4    53.0
dtype: float64

## 6.7 Chaining

- Most pandas method do not mutate data in place but return a new object. This allows us to chain operators. Examples:

| Method | Operator | Description |
| --- | --- | --- |
| s.add(s2) | s + s2 | Adds series |
| s.radd(s2) | s2 + s | Adds series |
| s.sub(s2) | s - s2 | Subtracts series |
| s.rsub(s2) | s2 - s2| Subtracts series |
| s.mul(s2).s.multiply(s2) | s * s2 | Multiplies series |
| s.rmul(s2) | s2 * s | Multiplies series |
| s.div(s2).s.truediv(s2) | s / s2 | Divides series |

In [13]:
# calculate the average of city and highway mileage using operators
((city_mpg + highway_mpg)/2)

0        22.0
1        11.5
2        28.0
3        11.0
4        20.0
         ... 
41139    22.5
41140    24.0
41141    21.0
41142    21.0
41143    18.5
Length: 41144, dtype: float64

In [14]:
# chaining to calculate the average of city and highway mileage
(city_mpg
.add(highway_mpg)
.div(2))

0        22.0
1        11.5
2        28.0
3        11.0
4        20.0
         ... 
41139    22.5
41140    24.0
41141    21.0
41142    21.0
41143    18.5
Length: 41144, dtype: float64