## Arithmetic, Function Application, Mapping with pandas
### Working with pandas
*Curtis Miller*

Here we will see several examples of concepts discussed in the slides.

### `Series` Arithmetic
Let's first suit up.

In [None]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

srs1 = Series([1, 9, -4, 3, 3])
srs2 = Series([2, 3, 4, 5, 10], index=[0, 1, 2, 3, 5])
print(srs1)

In [None]:
print(srs2)

Notice that the indices do not line up, even though the `Series` are of the same length.

Predict the outcomes:

In [None]:
srs1 + srs2

In [None]:
srs1 * srs2

In [None]:
srs1 ** srs2

In [None]:
# Boolean arithmetic is different
srs1 > srs2

In [None]:
srs1 <= srs2    # Opposite of above

In [None]:
srs1 > Series([1, 2, 3, 4, 5], index = [4, 3, 2, 1, 0])

In [None]:
np.sqrt(srs2)

In [None]:
np.abs(srs1)

In [None]:
type(np.abs(srs1))

In [None]:
# Define a cusom ufunc: notice the decorator notation?
@np.vectorize
def trunc(x):
    return x if x > 0 else 0

trunc(np.array([-1, 5, 4, -3, 0]))

In [None]:
trunc(srs1)

In [None]:
type(trunc(srs1))

### `Series` Methods and Function Application
Having seen basic arithmetic with Series, let's look at useful Series methods.

In [None]:
# Mean of a series
srs1.mean()

In [None]:
srs1.std()

In [None]:
srs1.max()

In [None]:
srs1.argmax()   # Returns the index where the maximum is

In [None]:
srs1.cumsum()

In [None]:
srs1.abs()    # An alternative to the abs function in NumPy

Now let's look at function application and mapping.

In [None]:
srs1.apply(lambda x: x if x > 2 else 2)

In [None]:
srs3 = Series(['alpha', 'beta', 'gamma', 'delta'], index = ['a', 'b', 'c', 'd'])
print(srs3)

In [None]:
obj = {"alpha": 1, "beta": 2, "gamma": -1, "delta": -3}
srs3.map(obj)

In [None]:
srs4 = Series(obj)
print(srs4)

In [None]:
srs3.map(srs4)

In [None]:
srs1.map(lambda x: x if x > 2 else 2)    # Works like apply

### `DataFrame`s
Many of the tricks that work with `Series` work with `DataFrame`s, but with some more complication.

In [None]:
df = DataFrame(np.arange(15).reshape(5, 3), columns=["AAA", "BBB", "CCC"])
print(df)

In [None]:
# Should get 0's, and CCC gets NaN because no match
df - df.loc[:,["AAA", "BBB"]]

In [None]:
df.mean()

In [None]:
df.std()

In [None]:
# This is known as standardization
(df - df.mean())/df.std()

Let's now look at vectorization

In [None]:
np.sqrt(df)

In [None]:
# trunc is a custom ufunc: does not give a DataFrame
trunc(df)

In [None]:
# Mixed data
df2 = DataFrame({"AAA": [1, 2, 3, 4], "BBB": [0, -9, 9, 3], "CCC": ["Bob", "Terry", "Matt", "Simon"]})
print(df2)

In [None]:
# Produces an error
np.sqrt(df2)

In [None]:
# Let's select JUST numeric data
# The select_dtypes() method selects columns based on their dtype
# np.number indicates numeric dtypes
# Here we select columns only with numeric data
df2.select_dtypes([np.number])

In [None]:
np.sqrt(df2.select_dtypes([np.number]))

A brief look at function application. Here we work with a function that computes the geometric mean, which is defined as:

$$\text{geometric mean} = \left(\prod_{i = 1}^n x_i\right)^{\frac{1}{n}}$$

In [None]:
# Define a function for the geometric mean
def geomean(srs):
    return srs.prod() ** (1 / len(srs))   # prod method is product of all elements of srs

# Demo
geomean(Series([2, 3, 4]))

In [None]:
df.apply(geomean)

In [None]:
df.apply(geomean, axis='columns')

In [None]:
# Apply a truncation function to each element of df
df.applymap(lambda x: x if x > 3 else 3)