## Arithmetic and Data Alignment

One of the most important pandas features is the behavior of arithmetic between objects with different indexes. When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. Let’s look at a simple example:

In [1]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame

In [22]:
s1 = Series(np.arange(4), index=['a','c', 'd', 'e'])

s2 = Series(np.ones_like(5), index= ['a', 'c', 'e', 'f', 'g'])

s1, s2

(a    0
 c    1
 d    2
 e    3
 dtype: int32,
 a    1
 c    1
 e    1
 f    1
 g    1
 dtype: int32)

In [31]:
s1[:4] = [7.3, -2.5, 3.4, 1.5]

s2[:5] = [-2.1, 3.6, -1.5, 4, 3.1]

s1, s2

(a    7.3
 c   -2.5
 d    3.4
 e    1.5
 dtype: float64,
 a   -2.1
 c    3.6
 e   -1.5
 f    4.0
 g    3.1
 dtype: float64)

Adding these together yields:

In [41]:
s1 + s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

The internal data alignment introduces NA values in the indices that don’t overlap. Missing values propagate in arithmetic computations.

> In the case of DataFrame, alignment is performed on both the rows and the columns:

In [47]:
df1 = DataFrame(np.arange(9). reshape(3,3), columns=list('abc'),
                index=['Loralai', 'Quetta', 'Multan'])

df1

Unnamed: 0,a,b,c
Loralai,0,1,2
Quetta,3,4,5
Multan,6,7,8


In [51]:
df1[:3] = [[0,3,2], [-1, 0, -3], [-5, 2, 6]]

df1

Unnamed: 0,a,b,c
Loralai,0,3,2
Quetta,-1,0,-3
Multan,-5,2,6


In [56]:
df2 = DataFrame([[2, -3, 4], [3, 0 , 2], [3, 5, -2], [4, -5, 6]], columns=list('bcd'),
                index=['Pshin', 'Loralai', 'Multan', 'Duki'])


df2

Unnamed: 0,b,c,d
Pshin,2,-3,4
Loralai,3,0,2
Multan,3,5,-2
Duki,4,-5,6


Adding these together returns a DataFrame whose index and columns are the unions of the ones in each DataFrame:

In [57]:
df1 + df2

Unnamed: 0,a,b,c,d
Duki,,,,
Loralai,,6.0,2.0,
Multan,,5.0,11.0,
Pshin,,,,
Quetta,,,,


#### Arithmetic methods with fill values

In arithmetic operations between differently-indexed objects, you might want to fill with a special value, like 0, when an axis label is found in one object but not the other:

In [73]:
df1 = DataFrame(np.arange(12).reshape(3,4), columns=list('abcd'), index=list('123'))

df1

Unnamed: 0,a,b,c,d
1,0,1,2,3
2,4,5,6,7
3,8,9,10,11


In [77]:
df2 = DataFrame(np.arange(20).reshape(4,5), columns=list('abcde'), index=list('1234'))

df2

Unnamed: 0,a,b,c,d,e
1,0,1,2,3,4
2,5,6,7,8,9
3,10,11,12,13,14
4,15,16,17,18,19


> Adding these together results in NA values in the locations that don’t overlap:

In [82]:
df3 = df1 +df2

df3

Unnamed: 0,a,b,c,d,e
1,0.0,2.0,4.0,6.0,
2,9.0,11.0,13.0,15.0,
3,18.0,20.0,22.0,24.0,
4,,,,,


Using the add method on df1, I pass df2 and an argument to fill_value:

In [95]:
df2.add(df1, fill_value= 0)

Unnamed: 0,a,b,c,d,e
1,0.0,2.0,4.0,6.0,4.0
2,9.0,11.0,13.0,15.0,9.0
3,18.0,20.0,22.0,24.0,14.0
4,15.0,16.0,17.0,18.0,19.0


 when reindexing a Series or DataFrame, you can also specify a different fillvalue:

In [96]:
df1.reindex(columns=df2.columns, fill_value=0)

Unnamed: 0,a,b,c,d,e
1,0,1,2,3,0
2,4,5,6,7,0
3,8,9,10,11,0


![Flexible arithmetic methods](../../Pictures/Flexible%20arithmetic%20methods.png)

### Operations between DataFrame and Series 

As with NumPy arrays, arithmetic between DataFrame and Series is well-defined. First, as a motivating example, consider the difference between a 2D array and one of its rows:

In [100]:
arr = (np.arange(12).reshape(3,4))

arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [101]:
arr[0]

array([0, 1, 2, 3])

In [102]:
arr - arr[0]

array([[0, 0, 0, 0],
       [4, 4, 4, 4],
       [8, 8, 8, 8]])

In [103]:
frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                  index=['Utah', 'Ohio', 'Texas', 'Oregon'])

frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [105]:
series = frame.iloc[0]

series

b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64

By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame's columns, broadcasting down the rows:

In [106]:
frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union:

In [108]:
series2 = Series(range(3), index=['b', 'e', 'f'])


series2

b    0
e    1
f    2
dtype: int64

In [109]:
frame + series2

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


If you want to instead broadcast over the columns, matching on the rows, you have to use one of the arithmetic methods. For example:

In [115]:
series3 = frame['d']

series3, frame

(Utah       1.0
 Ohio       4.0
 Texas      7.0
 Oregon    10.0
 Name: d, dtype: float64,
           b     d     e
 Utah    0.0   1.0   2.0
 Ohio    3.0   4.0   5.0
 Texas   6.0   7.0   8.0
 Oregon  9.0  10.0  11.0)

In [117]:
frame.sub(series3, axis= 0)

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohio,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0


The axis number that you pass is the axis to match on. In this case we mean to match on the DataFrame’s row index and broadcast across.