In [1]:
import pandas as pd
url = 'https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip'
df = pd.read_csv(url)


  df = pd.read_csv(url)


In [2]:
city_mpg = df.city08
highway_mpg = df.highway08

city_mpg, highway_mpg

(0        19
 1         9
 2        23
 3        10
 4        17
          ..
 41139    19
 41140    20
 41141    18
 41142    18
 41143    16
 Name: city08, Length: 41144, dtype: int64,
 0        25
 1        14
 2        33
 3        12
 4        23
          ..
 41139    26
 41140    28
 41141    24
 41142    24
 41143    21
 Name: highway08, Length: 41144, dtype: int64)

The dir function will list the attributes of an object

In [3]:
len(dir(city_mpg))

420

<h1>Operators & Dunder Methods</h1>

Operators and dunder methods are overloads that determine how Python reacts to operations.

When you run 2 + 4, under the covers Python runs (2).__add__(4)

In [4]:
print(2 + 4)
print((2).__add__(4))

6
6


<h2>Index Alignment</h2>

You can apply most math operations on a series with another series, and you can also use a scalar.
When you operate with 2 series, pandas will align the indexes first by matching each index entry in the series on the left with an entry with the same index on the right.

Because of index alignment, you want to make sure that the indexes:
<ul>
    <li>Are unique</li>
    <li>Are common to both series</li>
</ul>

In [5]:
(city_mpg + highway_mpg) / 2

0        22.0
1        11.5
2        28.0
3        11.0
4        20.0
         ... 
41139    22.5
41140    24.0
41141    21.0
41142    21.0
41143    18.5
Length: 41144, dtype: float64

If you don't have matching, distinct indexes, you will end up with missing values and combinations from the duplicates.

In [6]:
s1 = pd.Series([10,20,30], index=[1,2,2])
s2 = pd.Series([35,44,53], index=[2,2,4], name='s2')

In [7]:
s1 + s2

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64

<h2>Broadcasting</h2>

When you perform math operations with a scaler, pandas broadcasts the operation to all values.
Broadcasting is CPU optimized, since a numeric pandas series is a block of memory.

<h2>Iteration</h2>

the .__iter__ method is what allows iteration in a for loop.
You should avoid using a for loop with a series, because you lose the benefits of vectorization.
There are better ways to search and filter than using a for loop.

<h2>Operators</h2>

Pandas also provides methods for standard operators, like add. 
This lets you change the behavior by using different parameters.
for example, the add method has the optional fill_na parameter to fill NaN values.
Using the .add method with default parameters will produce the same result as the + operator.

In [8]:
s1 + s2

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64

In [9]:
s1.add(s2)

1     NaN
2    55.0
2    64.0
2    65.0
2    74.0
4     NaN
dtype: float64