# 03.03 - Operating on Data in Pandas

Pandas keeps the usuful <code>ufuncs</code> seen in section 02.03.  

Additionally, for unary operations ufuncs will **preserve index and column labels** in the output, and for binary operations, Pandas will automatically **align indices** when passing the objects to the ufunc. 

### Ufuncs: Index Preservation

Because Pandas is designed to work with NumPy, any NumPy ufunc will work on Pandas <code>Series</code> and <code>DataFrame</code> objects.

In [1]:
import pandas as pd
import numpy as np

In [2]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser

0    6
1    3
2    7
3    4
dtype: int32

In [3]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


In [4]:
np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [5]:
np.cos(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,-1.83697e-16,0.7071068,6.123234000000001e-17,-1.83697e-16
1,0.7071068,-1.0,-0.7071068,0.7071068
2,0.7071068,6.123234000000001e-17,-0.7071068,-1.0


As we can see, the indexes of the DataFrame have been preserved throughout the operations, no matter which.

### UFuncs: Index Alignment

As an example, suppose we are combining two **different data sources**, and find only the top three US states by area and the top three US states by population:

In [6]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [7]:
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

The resulting array contains the **union** of indices of the two input arrays, which could be determined using standard Python set arithmetic on these indices:

In [8]:
area.index | population.index

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

If necessary, we can explicit the filling value by using a method instead of an operator:

In [9]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A.add(B, fill_value = 0.0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

### Index alignment in DataFrame

A similar type of alignment takes place for both columns and indices when performing operations on DataFrames:

In [10]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
A

Unnamed: 0,A,B
0,1,11
1,5,1


In [11]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,4,0,9
1,5,8,0
2,9,2,6


In [12]:
A + B

Unnamed: 0,A,B,C
0,1.0,15.0,
1,13.0,6.0,
2,,,


Here as well we can fill using a custom value. In this case, we will fill using the mean of all values in <code>A</code>:

In [13]:
mean_fill = A.stack().mean()
A.add(B, fill_value=mean_fill)

Unnamed: 0,A,B,C
0,1.0,15.0,13.5
1,13.0,6.0,4.5
2,6.5,13.5,10.5


The full list of Pandas operators is as follows:

<pre>
+ 	add()
- 	sub(), subtract()
* 	mul(), multiply()
/ 	truediv(), div(), divide()
//    floordiv()
% 	mod()
**    pow()
</pre>

### Ufuncs: Operations Between DataFrame and Series

perations between a <code>DataFrame</code> and a <code>Series</code> are similar to operations between a two-dimensional and one-dimensional NumPy array.  

Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:

In [14]:
A = rng.randint(10, size=(3, 4))
A

array([[3, 8, 2, 4],
       [2, 6, 4, 8],
       [6, 1, 3, 8]])

In [15]:
A - A[0]

array([[ 0,  0,  0,  0],
       [-1, -2,  2,  4],
       [ 3, -7,  1,  4]])

By default, this computation will happen row-wise. If we would like to operate column-wise, we can specify the <code>axis</code> keyword:

In [17]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,3,8,2,4
1,2,6,4,8
2,6,1,3,8


In [18]:
df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,-5,0,-6,-4
1,-4,0,-2,2
2,5,0,2,7
