# Operating on Data in Pandas
## Ufuncs: Index Preservation

In [1]:
import pandas as pd
import numpy as np

In [2]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser

0    6
1    3
2    7
3    4
dtype: int32

In [4]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                 columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,1,7,5,1
1,4,0,9,5
2,8,0,9,2


If we apply a NumPy ufunc on either of these objects, the result will be another Pandas object with the indices preserved:

In [5]:
np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [6]:
np.sin(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,0.7071068,-0.707107,-0.707107,0.707107
1,1.224647e-16,0.0,0.707107,-0.707107
2,-2.449294e-16,0.0,0.707107,1.0


## Index alignment in Series

For binary operations on two `Series` or `DataFrame` objects, Pandas will align indices in the process of performing the operation. This is very convenient when working with incomplete data, as we'll see in some of the examples that follow.

In [7]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [8]:
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

In [9]:
area.index | population.index

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with NaN, or "Not a Number," which is how Pandas marks missing data

In [10]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators. For example, calling `A.add(B)` is equivalent to calling `A + B`, but allows optional explicit specification of the fill value for any elements in `A` or `B` that might be missing:

In [11]:
A.add(B, fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

## Index alignment in DataFrame

In [12]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
A

Unnamed: 0,A,B
0,11,19
1,2,4


In [15]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,0,3,1
1,7,3,1
2,5,5,9


In [16]:
A + B

Unnamed: 0,A,B,C
0,14.0,19.0,
1,5.0,11.0,
2,,,


In [17]:
fill = A.stack().mean()
A.add(B, fill_value=fill)

Unnamed: 0,A,B,C
0,14.0,19.0,10.0
1,5.0,11.0,10.0
2,14.0,14.0,18.0


## Ufuncs: Operations Between DataFrame and Series

In [18]:
A = rng.randint(10, size=(3, 4))
A

array([[3, 5, 1, 9],
       [1, 9, 3, 7],
       [6, 8, 7, 4]])

In [19]:
A - A[0]

array([[ 0,  0,  0,  0],
       [-2,  4,  2, -2],
       [ 3,  3,  6, -5]])

In [21]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,3,5,1,9
1,1,9,3,7
2,6,8,7,4


In [22]:
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-2,4,2,-2
2,3,3,6,-5


In [23]:
# Column wise
df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,-2,0,-4,4
1,-8,0,-6,-2
2,-2,0,-1,-4


In [24]:
halfrow = df.iloc[0, ::2]
halfrow

Q    3
S    1
Name: 0, dtype: int32

In [25]:
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,-2.0,,2.0,
2,3.0,,6.0,
