### Ufuncs: Index Preservation 
* Because Pandas is designed to work with NumPy, any NumPy ufunc will work on Pandas Series and DataFrame objects.
* Let’s start by defining a simple Series and DataFrame on which to demonstrate this:

In [1]:
import numpy as np
import pandas as pd

In [24]:
rng = np.random.RandomState(42) # reproducible
ser = pd.Series(rng.randint(0,10,4))
ser

0    6
1    3
2    7
3    4
dtype: int64

In [4]:
df = pd.DataFrame(rng.randint(0,10,(3,4)), columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


In [6]:
# Apply NumPy ufunc on either of these objects, resul will be another Pandas object w/indices preserved
np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [8]:
np.sin((df * np.pi) / 4)

Unnamed: 0,A,B,C,D
0,-1.0,0.7071068,1.0,-1.0
1,-0.707107,1.224647e-16,0.707107,-0.7071068
2,-0.707107,1.0,-0.707107,1.224647e-16


In [9]:
np.sin(df.loc[0, 'A'] * np.pi / 4)

-1.0

### UFuncs: Index Alignment 

* For binary operations on two Series or DataFrame objects, Pandas will align indices in the process of performing the operation. 
* This is very convenient when you are working with incomplete data, as we’ll see in some of the examples that follow.



In [10]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                         'California': 423967}, name='area')

population = pd.Series({'California': 38332521, 'Texas': 26448193,
                               'New York': 19651127}, name='population')

In [11]:
# Compute population density
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

* The resulting array contains the union of indices of the two input arrays, which we could determine using standard Python set arithmetic on these indices:

In [14]:
area.index.union(population.index) # Union operation called on indexes

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

In [16]:
# Any missing values from unionized series operations, will be filled as a NaN by default
A = pd.Series([2,4,6], index=[0,1,2])
B = pd.Series([1,3,5], index=[1,2,3])
A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [18]:
# Intersecting Indexes having operation performed, Union of all indexes (non-repeating returned)
(A.index.intersection(B.index), A.index.union(B.index))

(Int64Index([1, 2], dtype='int64'), Int64Index([0, 1, 2, 3], dtype='int64'))

In [20]:
# Method with fill_value to account for nulls
A.add(B, fill_value=0) # fill value is using the indexes value if not found in Union operation attempt (not a zero)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

In [28]:
# Index Alignment in DataFrame
A = pd.DataFrame(rng.randint(0,20,(2,2)), columns=list('AB'))
B = pd.DataFrame(rng.randint(0,10,(3,3)), columns=list('BAC'))
display(A)
display(B)

Unnamed: 0,A,B
0,1,19
1,14,6


Unnamed: 0,B,A,C
0,7,2,0
1,3,1,7
2,3,1,5


In [29]:
A + B

Unnamed: 0,A,B,C
0,3.0,26.0,
1,15.0,9.0,
2,,,


* Notice that indices are aligned correctly irrespective of their order in the two objects, and indices in the result are sorted.
* As was the case with Series, we can use the associated object’s arithmetic method and pass any desired fill_value to be used in place of missing entries.

In [30]:
A.stack() # collapses columns from a wide to a long type format for each index in dataframe

0  A     1
   B    19
1  A    14
   B     6
dtype: int64

In [31]:
A.stack().mean() # only the one column

10.0

In [34]:
A.stack().index # no columns as after being collapsed the value would be a pandas series as all columns are collapsed to a multiindex series

MultiIndex([(0, 'A'),
            (0, 'B'),
            (1, 'A'),
            (1, 'B')],
           )

In [35]:
A.add(B, fill_value=A.stack().mean())

Unnamed: 0,A,B,C
0,3.0,26.0,10.0
1,15.0,9.0,17.0
2,11.0,13.0,15.0


In [39]:
# above the fill_value is using the mean of the "long" stacked A's mean (all values added in each column / total values)
# As such, all of C column and the additional row for index 2 are added and columns returned sorted
B[sorted(B.columns.tolist())] # Here are how the fill_value of 10 from the A mean is added and returned from the operation in cell above

Unnamed: 0,A,B,C
0,2,7,0
1,1,3,7
2,1,3,5


* Row 2 values for B all have fill_value of 10 added as well as column C which didn't exist for DataFrame - A

### Ufuncs: Operations Between DataFrame & Series

In [40]:
A_1 = rng.randint(10, size=(3,4))
A_1

array([[5, 9, 3, 5],
       [1, 9, 1, 9],
       [3, 7, 6, 8]])

In [42]:
A_1 - A_1[0]

array([[ 0,  0,  0,  0],
       [-4,  0, -2,  4],
       [-2, -2,  3,  3]])

* According to NumPy’s broadcasting rules (see “Computation on Arrays: Broadcasting”), subtraction between a two-dimensional array and one of its rows is applied row-wise.

In [43]:
# In Pandas, the convention similarly operates row-wise by default:
df_1 = pd.DataFrame(A_1, columns=list('QRST'))
df_1

Unnamed: 0,Q,R,S,T
0,5,9,3,5
1,1,9,1,9
2,3,7,6,8


In [45]:
df_1 - df_1.iloc[0] # translates to row 1 and row 2 deleteting it's same column value from row 0
# Ex : df_1.loc[2,'Q'] = 3 - 5 or df_1.iloc[0, 'Q']

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-4,0,-2,4
2,-2,-2,3,3


In [47]:
# To operate column-wise and not by the default row/shared column level, specify the axis
df_1.subtract(df_1.loc[:,'R'], axis=0)
# original value of 8 at df_1.loc[2,'T'] is subtracted by df_1.loc[0, 'T'] when axis not defined or 8 - 5
# original value of 8 at df_1.loc[2,'T'] is subtracted by df_1.loc[2, 'R'] when axis is defined 

Unnamed: 0,Q,R,S,T
0,-4,0,-6,-4
1,-8,0,-8,0
2,-4,0,-1,1


In [48]:
halfrow = df_1.iloc[0, ::2]
halfrow

Q    5
S    3
Name: 0, dtype: int64

In [49]:
df_1 - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,-4.0,,-2.0,
2,-2.0,,3.0,


In [50]:
df_1

Unnamed: 0,Q,R,S,T
0,5,9,3,5
1,1,9,1,9
2,3,7,6,8


* This preservation and alignment of indices and columns means that operations on data in Pandas will always maintain the data context, which prevents the types of silly errors that might come up when you are working with heterogeneous and/or misaligned data in raw NumPy arrays.
