# Arithmetic

An important function of pandas is the arithmetic behaviour for objects with different indices. When adding objects, if the index pairs are not equal, the corresponding index in the result will be the union of the index pairs. For users with database experience, this is comparable to an automatic [outer join](https://en.wikipedia.org/wiki/Join_(SQL)#Outer_join) on the index labels. Let’s look at an example:

In [1]:
import numpy as np
import pandas as pd


rng = np.random.default_rng()
s1 = pd.Series(rng.normal(size=5))
s2 = pd.Series(rng.normal(size=7))

If you add these values, you get:

In [2]:
s1 + s2

0    2.596929
1   -2.795545
2   -0.119064
3    0.849508
4   -0.061194
5         NaN
6         NaN
dtype: float64

The internal data matching leads to missing values at the points of the labels that do not overlap. Missing values are then passed on in further arithmetic calculations.

For DataFrames, alignment is performed for both rows and columns:

In [3]:
df1 = pd.DataFrame(rng.normal(size=(5,3)))
df2 = pd.DataFrame(rng.normal(size=(7,2)))

When the two DataFrames are added together, the result is a DataFrame whose index and columns are the unions of those in each of the DataFrames above:

In [4]:
df1 + df2

Unnamed: 0,0,1,2
0,-0.078026,0.643059,
1,-0.383531,2.018909,
2,-2.77013,-0.751184,
3,-0.679346,0.926763,
4,-1.093289,1.424987,
5,,,
6,,,


Since column 2 does not appear in both DataFrame objects, its values appear as missing in the result. The same applies to the rows whose labels do not appear in both objects.

## Arithmetic methods with fill values

In arithmetic operations between differently indexed objects, a special value (e.g. `0`) can be useful if an axis label is found in one object but not in the other.  The `add` method can pass the `fill_value` argument:

In [5]:
df12 = df1.add(df2, fill_value=0)

df12

Unnamed: 0,0,1,2
0,-0.078026,0.643059,0.136076
1,-0.383531,2.018909,-0.660599
2,-2.77013,-0.751184,-1.709924
3,-0.679346,0.926763,-1.403627
4,-1.093289,1.424987,-0.283248
5,0.030022,-1.465972,
6,-0.508131,0.52797,


In the following example, we set the two remaining NaN values to `0`:

In [6]:
df12.iloc[[5, 6], [2]] = 0

In [7]:
df12

Unnamed: 0,0,1,2
0,-0.078026,0.643059,0.136076
1,-0.383531,2.018909,-0.660599
2,-2.77013,-0.751184,-1.709924
3,-0.679346,0.926763,-1.403627
4,-1.093289,1.424987,-0.283248
5,0.030022,-1.465972,0.0
6,-0.508131,0.52797,0.0


## Arithmetic methods

Method | Description
:----- | :----------
`add`, `radd` | methods for addition (`+`)
`sub`, `rsub` | methods for subtraction (`-`)
`div`, `rdiv` | methods for division (`/`)
`floordiv`, `rfloordiv` | methods for floor division (`//`)
`mul`, `rmul` | methods for multiplication (`*`)
`pow`, `rpow` | methods for exponentiation (`**`)

`r` (English: _reverse_) reverses the method.

## Operations between DataFrame and Series

As with NumPy arrays of different dimensions, the arithmetic between DataFrame and Series is also defined.

In [8]:
s1 + df12

Unnamed: 0,0,1,2,3,4
0,0.583883,-1.140178,0.991236,,
1,0.278378,0.235672,0.194562,,
2,-2.108221,-2.534422,-0.854764,,
3,-0.017437,-0.856475,-0.548466,,
4,-0.43138,-0.35825,0.571912,,
5,0.691931,-3.24921,0.855161,,
6,0.153778,-1.255268,0.855161,,


If we add `s1` with `df12`, the addition is done once for each line. This is called _broadcasting_. By default, the arithmetic between the DataFrame and the series corresponds to the index of the series in the columns of the DataFrame, with the rows being broadcast down.

If an index value is found neither in the columns of the DataFrame nor in the index of the series, the objects are re-indexed to form the union:

If instead you want to transfer the columns and match the rows, you must use one of the arithmetic methods, for example:

In [9]:
df12.add(s2, axis="index")

Unnamed: 0,0,1,2
0,1.856994,2.578079,2.071096
1,-1.395838,1.006602,-1.672906
2,-3.744354,-1.725408,-2.684148
3,-0.239294,1.366814,-0.963576
4,-1.067525,1.450751,-0.257484
5,0.005172,-1.490822,-0.02485
6,-0.612072,0.424029,-0.103941


The axis number you pass is the axis to be aligned to. In this case, the row index of the DataFrame (`axis='index'` or `axis=0`) is to be adjusted and transmitted.

## Function application and mapping

`numpy.ufunc` (element-wise array methods) also work with pandas objects:

In [10]:
np.abs(df12)

Unnamed: 0,0,1,2
0,0.078026,0.643059,0.136076
1,0.383531,2.018909,0.660599
2,2.77013,0.751184,1.709924
3,0.679346,0.926763,1.403627
4,1.093289,1.424987,0.283248
5,0.030022,1.465972,0.0
6,0.508131,0.52797,0.0


Another common operation is to apply a function to one-dimensional arrays on each column or row. The [pandas.DataFrame.apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) method does just that:

In [11]:
df12

Unnamed: 0,0,1,2
0,-0.078026,0.643059,0.136076
1,-0.383531,2.018909,-0.660599
2,-2.77013,-0.751184,-1.709924
3,-0.679346,0.926763,-1.403627
4,-1.093289,1.424987,-0.283248
5,0.030022,-1.465972,0.0
6,-0.508131,0.52797,0.0


In [12]:
f = lambda x: x.max() - x.min()

df12.apply(f)

0    2.800152
1    3.484882
2    1.846000
dtype: float64

Here the function `f`, which calculates the difference between the maximum and minimum of a row, is called once for each column of the frame. The result is a row with the columns of the frame as index.

If you pass `axis='columns'` to `apply`, the function will be called once per line instead:

In [13]:
df12.apply(f, axis="columns")

0    0.721086
1    2.679508
2    2.018946
3    2.330389
4    2.518277
5    1.495994
6    1.036101
dtype: float64

Many of the most common array statistics (such as `sum` and `mean`) are DataFrame methods, so the use of `apply` is not necessary.

The function passed to apply does not have to return a single value; it can also return a series with multiple values:

In [14]:
def f(x):
    return pd.Series([x.min(), x.max()], index=["min", "max"])

df12.apply(f)

Unnamed: 0,0,1,2
min,-2.77013,-1.465972,-1.709924
max,0.030022,2.018909,0.136076


You can also use element-wise Python functions. Suppose you want to round each floating point value in `df12` to two decimal places, you can do this with [pandas.DataFrame.applymap](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html):

In [15]:
f = lambda x: round(x, 2)

df12.applymap(f)

Unnamed: 0,0,1,2
0,-0.08,0.64,0.14
1,-0.38,2.02,-0.66
2,-2.77,-0.75,-1.71
3,-0.68,0.93,-1.4
4,-1.09,1.42,-0.28
5,0.03,-1.47,0.0
6,-0.51,0.53,0.0


The reason for the name `applymap` is that Series has a `map` method for applying an element-wise function:

In [16]:
df12[2].map(f)

0    0.14
1   -0.66
2   -1.71
3   -1.40
4   -0.28
5    0.00
6    0.00
Name: 2, dtype: float64