## Ufuncs: Index Preservation

Pandas is designed to work with Numpy, any Numpy ufunc will work on Pandas `Series` and `DataFrame` objects

In [1]:
import pandas as pd
import numpy as np

rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0,10,4))
ser

0    6
1    3
2    7
3    4
dtype: int32

In [2]:
df = pd.DataFrame(rng.randint(0,10,(3,4)), columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


If we apply a Numpy ufunc on either of these objects, the result will be another Pandas object with the indices preserved

In [5]:
another_ser = np.exp(ser)
print(type(another_ser))
print(another_ser)

<class 'pandas.core.series.Series'>
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64


In [8]:
np.sin(df*np.pi/4)

Unnamed: 0,A,B,C,D
0,-1.0,0.7071068,1.0,-1.0
1,-0.707107,1.224647e-16,0.707107,-0.7071068
2,-0.707107,1.0,-0.707107,1.224647e-16


## UFuncs:Index Alignment

For binary operations on two `Series` or `DataFrame` objects, Pandas will `align` indices in the process of performing the operation

### Index alignment in Series

In [9]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [12]:
# As we see, area's indices is difference with population's indices
print(area)
print(population)

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64
California    38332521
Texas         26448193
New York      19651127
Name: population, dtype: int64


In [14]:
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

The resulting array contains the union of indices of the two input arrays.  
Which could be determined using standard Python set arithmetic on these indices

In [15]:
area.index | population.index

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with `NaN`. 

Any missing values are filled in with NaN by default

In [17]:
A = pd.Series([2,4,6], index=[1,2,3])
B = pd.Series([1,3,5], index=[0,1,2])
print(A+B)

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64


If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods

In [23]:
print(A.add(B, fill_value=0))
print(A.add(B, fill_value=1))  # fill_value means it will fill the one which have not index in the anoter

0    1.0
1    5.0
2    9.0
3    6.0
dtype: float64
0    2.0
1    5.0
2    9.0
3    7.0
dtype: float64


### Index alignment in DataFrame

In [25]:
A= pd.DataFrame(rng.randint(0,20,(2,2)), columns=list('AB'))
A

Unnamed: 0,A,B
0,0,11
1,11,16


In [37]:
B = pd.DataFrame(rng.randint(0,10,(3,3)),columns=list('BCA'))
B

Unnamed: 0,B,C,A
0,4,8,6
1,1,3,8
2,1,9,8


In [39]:
A + B


Unnamed: 0,A,B,C
0,6.0,15.0,
1,19.0,17.0,
2,,,


Notice that indices are aligned correctly irrespective of their order in the two objects, and indices in the result are sorted

In [30]:
A.add(B, fill_value=0)

Unnamed: 0,A,B,C
0,6.0,20.0,2.0
1,13.0,19.0,8.0
2,6.0,4.0,2.0


In [33]:
A.stack().mean()

9.5

The following table lists Python operators and their equivalent Pandas object methods:


| Python Operator | Pandas Method(s)                      |
|-----------------|---------------------------------------|
| ``+``           | ``add()``                             |
| ``-``           | ``sub()``, ``subtract()``             |
| ``*``           | ``mul()``, ``multiply()``             |
| ``/``           | ``truediv()``, ``div()``, ``divide()``|
| ``//``          | ``floordiv()``                        |
| ``%``           | ``mod()``                             |
| ``**``          | ``pow()``                             |

## Ufuncs:Operations Between DataFrame and Series

Operations between a `DataFrame` and a `Series` are similar to operations between a two-dimensional and one-dimensional Numpy array.

In [41]:
A = rng.randint(10,size=(3,4))
A

array([[1, 5, 5, 9],
       [3, 5, 1, 9],
       [1, 9, 3, 7]])

In [42]:
A - A[0]


array([[ 0,  0,  0,  0],
       [ 2,  0, -4,  0],
       [ 0,  4, -2, -2]])

In Pandas, the convention similarly operates `row-wise` by default

In [43]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,1,5,5,9
1,3,5,1,9
2,1,9,3,7


In [48]:
df-df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,2,0,-4,0
2,0,4,-2,-2


operate `column-wise`

In [50]:
df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,-4,0,0,4
1,-2,0,-4,4
2,-8,0,-6,-2


In [57]:
halfrow = df.iloc[1,::2]
halfrow

Q    3
S    1
Name: 1, dtype: int32

In [58]:
df - halfrow

Unnamed: 0,Q,R,S,T
0,-2.0,,4.0,
1,0.0,,0.0,
2,-2.0,,2.0,
