# Series Deep Dive

In [9]:
import pandas as pd

url = 'https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip'

df = pd.read_csv(url)

city_mpg = df["city08"]
highway_mpg = df["highway08"]

print(city_mpg.head())
print(highway_mpg.head())


print("Number of Series attributes:", len(dir(city_mpg)))

0    19
1     9
2    23
3    10
4    17
Name: city08, dtype: int64
0    25
1    14
2    33
3    12
4    23
Name: highway08, dtype: int64
Number of Series attributes: 419



**Dunder methods** (.__add__, .__iter__, etc.) provide many numeric operations, looping, attribute access, and index access. For the numerice operations, these return Series. 

**Operator methods** - corresponding methods for many of the numeric operations allow us to tweak the behaviour (there is an .add method in addition to .__add__).

**Aggregation methods** and properties which reduce or aggregate the values in a series down to a single scalar value. the .mean, .max, and .sum methods and .is_monotonic property are all examples.

**Conversion methods**. Some of these start with .to_ and export the data to other formats.

**Manipulation methods** such as .sort_values, .drop_duplicates, that return Series objects with the same index.

**Indexing and accessor methods** and attributes such as .loc and .iloc. These return series or scalars

**String manipulation methods** using .str.

**Date manipulation methods** using .dt.

**Plotting methods** using .plot.

**Categorical manipulation methods** using .cat.

**Transformation methods** such as .unstack and .reset_index, .agg, .transform.

**Attributes** such as .index and .dtype.


# Operators (& Dunder Methods)

In [16]:
average_mpg = (city_mpg + highway_mpg) / 2
print(average_mpg.head())

# index alignment - make sure indices are unique and common to both sides when performing operations that compare/combine.
# Example of non-aligning indices
s1 = pd.Series([10, 20, 30], index=[1, 2, 2])
s2 = pd.Series([35, 4, 53], index=[2, 2, 4], name='s2')

print(s1 + s2)

0    22.0
1    11.5
2    28.0
3    11.0
4    20.0
dtype: float64
1     NaN
2    55.0
2    24.0
2    65.0
2    34.0
4     NaN
dtype: float64


### Broadcasting

When you peform math operations with a scalar, pandas *broadcasts* the operation to all values. In the above case, the values are added together. This makes it easy to write mathematical operations. It also makes the code easy to read.
With many math operations, these are optimized and happen very quickly in the CPU. This is called *vectorization*. (A numeric pandas series is a block of memory, and modern CPUs leverage a technology called Single Instruction / Multiple Data (SIMD) to apply a math operation to the block of memory.)

In [19]:
# To overcome the NaNs - fill_value=0
print(s1.add(s2, fill_value=0))

# Chaining to calculate average mpg
print(city_mpg
        .add(highway_mpg)
        .div(2))

1    10.0
2    55.0
2    24.0
2    65.0
2    34.0
4    53.0
dtype: float64
0        22.0
1        11.5
2        28.0
3        11.0
4        20.0
         ... 
41139    22.5
41140    24.0
41141    21.0
41142    21.0
41143    18.5
Length: 41144, dtype: float64


In [21]:
"""
**Method**              **Operator**            **Description**
s.add(s2)               s + s2                  Adds series
s.radd(s2)              s2 + s                  

s.sub(s2)               s - s2                  Subtracts series
s.rsub(s2)              s2 - s                  

s.mul(s2)               s * s2                  Multiply series
s.rmul(s2)              s2 * s                  

s.div(s2)               s / s2                  Divide series
s.rdiv(s2)              s2 / s                  

s.mod(s2)               s % s2                  Modulo of series division
s.rmod(s2)              s2 % s    

s.floordiv(s2)          s // s2                 Floor divides series
s.rfloordiv(s2)         s2 // s  

s.pow(s2)               s ** s2                 Exponential power of series

s.eq(s2)                s2 == s                 Elementwise equals of series

s.ne(s2)                s2 != s                 Elementwise not equals of series

s.gt(s2)                s > s2                  Elementwise greater than

s.ge(s2)                s >= s2                 Elementwise greater than or equal to

s.lt(s2)                s < s2                  Elementwise less than

s.le(s2)                s <= s2                 Elementwise less than or equal to

np.invert(s)            ~s                      Elementwise inversion of boolean series

np.logical_and(s, s2)   s & s2                  Elementwise logical and of boolean series

np.logical_or(s, s2)    s | s2                  Elementwise logical or of boolean series

"""

'\n**Method**              **Operator**            **Description**\ns.add(s2)               s + s2                  Adds series\ns.radd(s2)              s2 + s                  \n\ns.sub(s2)               s - s2                  Subtracts series\ns.rsub(s2)              s2 - s                  \n\ns.mul(s2)               s * s2                  Multiply series\ns.rmul(s2)              s2 * s                  \n\ns.div(s2)               s / s2                  Divide series\ns.rdiv(s2)              s2 / s                  \n\ns.mod(s2)               s % s2                  Modulo of series division\ns.rmod(s2)              s2 % s    \n\ns.floordiv(s2)          s // s2                 Floor divides series\ns.rfloordiv(s2)         s2 // s  \n\ns.pow(s2)               s ** s2                 Exponential power of series\n\ns.eq(s2)                s2 == s                 Elementwise equals of series\n\ns.ne(s2)                s2 != s                 Elementwise not equals of series\n\ns.gt(s

### Exercise - Operators & Dunder Methods

1) Add a numeric series to itself
2) Add 10 to a numeric series
3) Add a numeric series to itself using the .add method
4) Read the documentation for the .add method

In [26]:
UK_temps = [10, 11, 17, 18, 15, 21, 23]

s = pd.Series(UK_temps)

# 1)
heatwave = [10, 12, 11, 13, 12, 13, 10]
s2 = pd.Series(heatwave)
print(s + s2)

# 2)
print(s + 10)

# 3)
print(s.add(s2).add(10))

0    20
1    23
2    28
3    31
4    27
5    34
6    33
dtype: int64
0    20
1    21
2    27
3    28
4    25
5    31
6    33
dtype: int64
0    30
1    33
2    38
3    41
4    37
5    44
6    43
dtype: int64


# Aggregate Methods