## TL;DR

Pandas can also perfrom quick **element-wise** operations (ufuncs) on `Series` and `DataFrame`:

- Unary operations: preserve index and column labels

- Binary operations: automatically align indices 

  - Missing values:
    - `NaN` by default
    - Use **object methods** if missing value need to be filled

  | Python Operator | Pandas Method(s)                 |
  | :-------------- | :------------------------------- |
  | `+`             | `add()`                          |
  | `-`             | `sub()`, `subtract()`            |
  | `*`             | `mul()`, `multiply()`            |
  | `/`             | `truediv()`, `div()`, `divide()` |
  | `//`            | `floordiv()`                     |
  | `%`             | `mod()`                          |
  | `**`            | `pow()`                          |

- Operations between `Series` and `DataFrame`
  - Just like operations between two-dimensional and one-dimensional NumPy array
  - By default row-wise (Use python operators)
  - Use **object methods** for column-wise
    

In [38]:
import pandas as pd
import numpy as np

Pandas inherits much of the Numpy's ability to perform quick element-wise operations. The ufuncs are key to this.

However:

- For unary operations: ufuncs will preserve index and column labels in the output

- For binary operations: Pandas will automatically align indices when passing the objects to the ufunc.

## Ufuns

### Index Preservation

NumPy ufunc will work on Pandas `Series` and `DataFrame` objects. 

In [3]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser 

0    6
1    3
2    7
3    4
dtype: int64

In [4]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


Apply Numpy ufunc on either of these objects, the result will be **another** Pandas object *with the indices preserved*.

In [5]:
np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [6]:
np.sin(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,-1.0,0.7071068,1.0,-1.0
1,-0.707107,1.224647e-16,0.707107,-0.7071068
2,-0.707107,1.0,-0.707107,1.224647e-16


### Index alignment



#### Index alignment in Series

In [7]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [8]:
area

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64

In [9]:
population

California    38332521
Texas         26448193
New York      19651127
Name: population, dtype: int64

In [10]:
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

The resulting array contains the *union* of indices of the two input arrays:

In [11]:
area.index | population.index

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with `NaN`, or "Not a Number," which is how Pandas marks missing data (by default)

In [12]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])

A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators.

In [13]:
#  allows optional explicit specification of the fill value 
# for any elements that might be missing
A.add(B, fill_value=0) 

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

#### Index alignment in DataFrame

Similar type of alignment takes place for both columns and indices

In [16]:
A = pd.DataFrame(rng.randint(0, 20, size=(2, 2)), 
                             columns=list('AB'))

In [17]:
A

Unnamed: 0,A,B
0,1,11
1,5,1


In [18]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))

In [19]:
B

Unnamed: 0,B,A,C
0,4,0,9
1,5,8,0
2,9,2,6


In [20]:
A + B

Unnamed: 0,A,B,C
0,1.0,15.0,
1,13.0,6.0,
2,,,


In [21]:
fill = A.stack().mean()
A.add(B, fill_value=fill)

Unnamed: 0,A,B,C
0,1.0,15.0,13.5
1,13.0,6.0,4.5
2,6.5,13.5,10.5


Python operators and their equivalent Pandas object methods:

| Python Operator | Pandas Method(s)                 |
| :-------------- | :------------------------------- |
| `+`             | `add()`                          |
| `-`             | `sub()`, `subtract()`            |
| `*`             | `mul()`, `multiply()`            |
| `/`             | `truediv()`, `div()`, `divide()` |
| `//`            | `floordiv()`                     |
| `%`             | `mod()`                          |
| `**`            | `pow()`                          |


### Operations Between DataFrame and Series

When performing operations between a `DataFrame` and a `Series`, the index and column alignment is similarly maintained. Operations between a `DataFrame` and a `Series` are similar to operations between a two-dimensional and one-dimensional NumPy array. 

In [22]:
A = rng.randint(10, size=(3, 4))
A

array([[3, 8, 2, 4],
       [2, 6, 4, 8],
       [6, 1, 3, 8]])

In [23]:
A[0]

array([3, 8, 2, 4])

In [24]:
A - A[0]

array([[ 0,  0,  0,  0],
       [-1, -2,  2,  4],
       [ 3, -7,  1,  4]])

According to NumPy's broadcasting rules, subtraction between a two-dimensional array and one of its rows is applied **row-wise**.

In Pandas, the convention similarly operates **row-wise by default**.

In [30]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,3,8,2,4
1,2,6,4,8
2,6,1,3,8


Row-wise operation (Just use python operators)

In [26]:
df.iloc[0] # Access the first row

Q    3
R    8
S    2
T    4
Name: 0, dtype: int64

In [29]:
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-1,-2,2,4
2,3,-7,1,4


Column-wise operation: use the object methods and specifying the `axis` method

In [31]:
df['Q']

0    3
1    2
2    6
Name: Q, dtype: int64

In [33]:
df.subtract(df['Q'], axis=0)

Unnamed: 0,Q,R,S,T
0,0,5,-1,1
1,0,4,2,6
2,0,-5,-3,2


In [34]:
df - df['Q']

Unnamed: 0,Q,R,S,T,0,1,2
0,,,,,,,
1,,,,,,,
2,,,,,,,


These `DataFrame` and `Series` operations will also automatically align indices between the two elements.

In [36]:
df

Unnamed: 0,Q,R,S,T
0,3,8,2,4
1,2,6,4,8
2,6,1,3,8


In [35]:
half_row = df.iloc[0:, ::2]
half_row

Unnamed: 0,Q,S
0,3,2
1,2,4
2,6,3


In [37]:
df - half_row

Unnamed: 0,Q,R,S,T
0,0,,0,
1,0,,0,
2,0,,0,
