# Operating on Data in Pandas

Copied from [https://github.com/jakevdp/PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook) with modifications to demonstrate notebook diffing.

One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric functions, exponential and logarithmic functions, etc.).
Pandas inherits much of this functionality from NumPy, and the ufuncs that we introduced in [Computation on NumPy Arrays: Universal Functions](https://gitnotebooks.com/blog) are key to this.

Pandas includes a couple useful twists, however: for unary operations like negation and trigonometric functions, these ufuncs will *preserve index and column labels* in the output, and for binary operations such as addition and multiplication, Pandas will automatically *align indices* when passing the objects to the ufunc.
This means that keeping the context of data and combining data from different sources–both potentially error-prone tasks with raw NumPy arrays–become essentially foolproof ones with Pandas.
We will additionally see that there are well-defined operations between one-dimensional ``Series`` structures and two-dimensional ``DataFrame`` structures.

In [26]:
import pandas as pd
import numpy as np

Any item for which one or the other does not have an entry is marked with ``NaN``, or "Not a Number," which is how Pandas marks missing data (see further discussion of missing data in [Handling Missing Data](03.04-Missing-Values.ipynb)).
This index matching is implemented this way for any of Python's built-in arithmetic expressions; any missing values are filled in with NaN by default:

In [27]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])

If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators.
For example, calling ``A.add(B)`` is equivalent to calling ``A + B``, but allows optional explicit specification of the fill value for any elements in ``A`` or ``B`` that might be missing:

In [28]:
A.subtract(B, fill_value=0)

0    2.0
1    3.0
2    3.0
3   -5.0
dtype: float64

Observe that the indices align accurately regardless of their sequence in the two objects, and the result's indices are organized in ascending order.
As was the case with ``Series``, we can use the associated object's arithmetic method and pass any desired ``fill_value`` to be used in place of missing entries.
Here we'll fill with the mean of all values in ``A`` (computed by first stacking the rows of ``A``):

In [29]:
rng = np.random.RandomState(42)
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))

In [127]:
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff

fill = A.stack().sum()
A.add(B, fill_value=fill)

# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff
# Large cells? No problem. Cells are collapsed to showcase the diff

Unnamed: 0,A,B
0,19.0,26.0
1,8.0,19.0
2,53.0,56.0


## Ufuncs: Operations Between DataFrame and Series with a changed header

When performing operations between a ``DataFrame`` and a ``Series``, the index and column alignment is similarly maintained.
Operations between a ``DataFrame`` and a ``Series`` are similar to operations between a two-dimensional and one-dimensional NumPy array.
Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:

In [31]:
A = rng.randint(10, size=(3, 4))
A

array([[7, 7, 2, 5],
       [4, 1, 7, 5],
       [1, 4, 0, 9]])

In [32]:
A - A[0]

array([[ 0,  0,  0,  0],
       [-3, -6,  5,  0],
       [-6, -3, -2,  4]])

### GitNotebooks v1 Features

<table>
    <thead><tr><th>Feature</th><th>Supported</th></tr></thead>
    <tbody>
        <tr>
            <td>Visual notebook diffs</td>
            <td>✓</td>
        </tr>
        <tr>
            <td>Line comments</td>
            <td>✗</td>
        </tr>
        <tr>
            <td>Markdown comment</td>
            <td>✗</td>
        </tr>
        <tr>
            <td>Dataframe diffing</td>
            <td>✗</td>
        </tr>
    </tbody>
</table>