# 12.1 Categorical Data

## 12.1.1 Background and Motivation

A column in a table may contain repeated instances of a smaller set of distinct values. Functions like unique and value_counts enable us to extract the distinct values from an array and compute their frequencies

In [1]:
import numpy as np
import pandas as pd

values = pd.Series(['apple','orange','apple','apple'] * 2)
values

0     apple
1    orange
2     apple
3     apple
4     apple
5    orange
6     apple
7     apple
dtype: object

In [2]:
pd.unique(values)

array(['apple', 'orange'], dtype=object)

In [3]:
values.unique()

array(['apple', 'orange'], dtype=object)

In [4]:
pd.value_counts(values)

apple     6
orange    2
dtype: int64

In [5]:
values.value_counts()

apple     6
orange    2
dtype: int64

Many data systems (for data warehousing, statistical computing, or other uses) have developed specialized approaches for representing data with repeated values for more efficient storage and computation. In data warehousing, a best practice is to use so-called dimension tables containing the distinct values and storing the primary observations as integer keys referencing the dimension table 

In [6]:
values = pd.Series([0,1,0,0] * 2)
dim = pd.Series(['apple', 'orange'])

In [7]:
values

0    0
1    1
2    0
3    0
4    0
5    1
6    0
7    0
dtype: int64

In [8]:
dim

0     apple
1    orange
dtype: object

In [9]:
dim.take(values)

0     apple
1    orange
0     apple
0     apple
0     apple
1    orange
0     apple
0     apple
dtype: object

This representation as integers is called the categorical or dictionary-encoded representation. The array of distinct values can be called the categories, dictionary, or levels of the data. The integer values that reference the categories are called the category codes or codes

The categorical representation can yield significant performance improvements when doing analytics. You can also perform transformations on the categories while leaving the codes unmodified. Some example transformations that can be made at relatively low cost are:
- renaming categories
- appending a new category without changing the order or position of the existing categories

## 12.1.2 Categorical Type in pandas

pandas has a special Categorical type for holding data that uses the integer-based categorical representation or encoding. 

In [11]:
fruits = ['apple', 'orange', 'apple', 'apple'] * 2
N = len(fruits)
df = pd.DataFrame({'fruit': fruits,
                   'basket_id': np.arange(N),
                   'count': np.random.randint(3,15,size = N),
                   'weight': np.random.uniform(0,4,size = N)},
                   columns = ['basket_id', 'fruit', 'count', 'weight'])
df

Unnamed: 0,basket_id,fruit,count,weight
0,0,apple,9,3.129319
1,1,orange,12,0.290536
2,2,apple,12,1.60351
3,3,apple,11,0.752839
4,4,apple,13,0.575561
5,5,orange,4,1.085071
6,6,apple,6,1.49215
7,7,apple,12,1.61972


df['fruit'] is an array of python string objects, we can convert it to categorical

In [15]:
df['fruit']

0     apple
1    orange
2     apple
3     apple
4     apple
5    orange
6     apple
7     apple
Name: fruit, dtype: object

In [21]:
type(df['fruit'].values)

numpy.ndarray

In [16]:
fruit_cat = df['fruit'].astype(dtype = 'category')
fruit_cat

0     apple
1    orange
2     apple
3     apple
4     apple
5    orange
6     apple
7     apple
Name: fruit, dtype: category
Categories (2, object): [apple, orange]

In [22]:
type(fruit_cat.values)

pandas.core.arrays.categorical.Categorical

The values for fruit_cat are not a NumPy array, but an instance of pandas.Categorical

In [23]:
c = fruit_cat.values
type(c)

pandas.core.arrays.categorical.Categorical

The Categorical object has categories and codes attributes

In [24]:
c.categories

Index(['apple', 'orange'], dtype='object')

In [25]:
c.codes

array([0, 1, 0, 0, 0, 1, 0, 0], dtype=int8)

Convert a DataFrame column to categorical by assigning the converted result

In [26]:
df['fruit'] = df['fruit'].astype(dtype = 'category')
df.fruit

0     apple
1    orange
2     apple
3     apple
4     apple
5    orange
6     apple
7     apple
Name: fruit, dtype: category
Categories (2, object): [apple, orange]

In [27]:
df

Unnamed: 0,basket_id,fruit,count,weight
0,0,apple,9,3.129319
1,1,orange,12,0.290536
2,2,apple,12,1.60351
3,3,apple,11,0.752839
4,4,apple,13,0.575561
5,5,orange,4,1.085071
6,6,apple,6,1.49215
7,7,apple,12,1.61972


create pandas.Categorical directly from other types of python sequences

In [28]:
my_categories = pd.Categorical(values = ['foo', 'bar', 'baz', 'foo', 'bar'])
my_categories

[foo, bar, baz, foo, bar]
Categories (3, object): [bar, baz, foo]

In [29]:
my_categories.categories

Index(['bar', 'baz', 'foo'], dtype='object')

In [30]:
my_categories.codes

array([2, 0, 1, 2, 0], dtype=int8)

In [31]:
categories = ['foo', 'bar', 'baz']
codes = [0, 1, 2, 0, 0, 1]
my_cats_2 = pd.Categorical.from_codes(codes, categories)
my_cats_2

[foo, bar, baz, foo, foo, bar]
Categories (3, object): [foo, bar, baz]

Unless explicitly specified, categorical conversions assume no specific ordering of the categories. So the categories array may be in a different order depending on the ordering of the input data

In [32]:
ordered_cat = pd.Categorical.from_codes(codes, categories, ordered = True)
ordered_cat

[foo, bar, baz, foo, foo, bar]
Categories (3, object): [foo < bar < baz]

In [36]:
ordered_cat.ordered

True

In [37]:
my_cats_2.as_ordered()

[foo, bar, baz, foo, foo, bar]
Categories (3, object): [foo < bar < baz]

**Note:** Categorical data need not be strings. A categorical array can consist of any immutable value types

## 12.1.3 Computations with Categoricals

Using Categorical in pandas compared with the non-encoded version (like an array of strings) generally behaves the same way. Some parts of pandas, like the groupby function, perform better when working with categoricals. There are also some functions that can utilize the ordered flag

For example, consider some random numeric data, and use the pandas.qcut binning function. This return pandas.Categorical

In [41]:
np.random.seed(12345)
draws = np.random.randn(1000)
draws[:5]

array([-0.20470766,  0.47894334, -0.51943872, -0.5557303 ,  1.96578057])

In [42]:
bins = pd.qcut(draws, q = 4)
bins

[(-0.684, -0.0101], (-0.0101, 0.63], (-0.684, -0.0101], (-0.684, -0.0101], (0.63, 3.928], ..., (-0.0101, 0.63], (-0.684, -0.0101], (-2.9499999999999997, -0.684], (-0.0101, 0.63], (0.63, 3.928]]
Length: 1000
Categories (4, interval[float64]): [(-2.9499999999999997, -0.684] < (-0.684, -0.0101] < (-0.0101, 0.63] < (0.63, 3.928]]

In [43]:
bins.value_counts()

(-2.9499999999999997, -0.684]    250
(-0.684, -0.0101]                250
(-0.0101, 0.63]                  250
(0.63, 3.928]                    250
dtype: int64

In [44]:
bins = pd.qcut(draws, q = 4, labels = ['Q1', 'Q2', 'Q3', 'Q4'])
bins

[Q2, Q3, Q2, Q2, Q4, ..., Q3, Q2, Q1, Q3, Q4]
Length: 1000
Categories (4, object): [Q1 < Q2 < Q3 < Q4]

In [45]:
bins.value_counts()

Q1    250
Q2    250
Q3    250
Q4    250
dtype: int64

In [50]:
bins.categories

Index(['Q1', 'Q2', 'Q3', 'Q4'], dtype='object')

In [52]:
bins.codes[:10]

array([1, 2, 1, 1, 3, 3, 2, 2, 3, 3], dtype=int8)

In [49]:
bins.dtype

CategoricalDtype(categories=['Q1', 'Q2', 'Q3', 'Q4'], ordered=True)

The labeled bins categorical does not contain information about the bin edges in the data, so we can use groupby to extract some summary statistics

In [53]:
bins = pd.Series(bins, name = 'quartile')
bins

0      Q2
1      Q3
2      Q2
3      Q2
4      Q4
       ..
995    Q3
996    Q2
997    Q1
998    Q3
999    Q4
Name: quartile, Length: 1000, dtype: category
Categories (4, object): [Q1 < Q2 < Q3 < Q4]

In [54]:
results = (pd.Series(draws)
           .groupby(bins)
           .agg(['count', 'min', 'max'])
           .reset_index())
results

Unnamed: 0,quartile,count,min,max
0,Q1,250,-2.949343,-0.685484
1,Q2,250,-0.683066,-0.010115
2,Q3,250,-0.010032,0.628894
3,Q4,250,0.634238,3.927528


In [55]:
results['quartile']

0    Q1
1    Q2
2    Q3
3    Q4
Name: quartile, dtype: category
Categories (4, object): [Q1 < Q2 < Q3 < Q4]

**Better performance with categoricals**

If you do a lot of analytics on a particular dataset, converting to categorical can yield substantial overall performance gains. A categorical version of a DataFrame column will often use significantly less memory too

In [60]:
N = 10_000_000
draws = pd.Series(np.random.randn(N))
labels = pd.Series(['foo', 'bar', 'baz', 'qux'] * (N // 4))
labels

0          foo
1          bar
2          baz
3          qux
4          foo
          ... 
9999995    qux
9999996    foo
9999997    bar
9999998    baz
9999999    qux
Length: 10000000, dtype: object

In [63]:
categories = labels.astype(dtype = 'category')
categories

0          foo
1          bar
2          baz
3          qux
4          foo
          ... 
9999995    qux
9999996    foo
9999997    bar
9999998    baz
9999999    qux
Length: 10000000, dtype: category
Categories (4, object): [bar, baz, foo, qux]

In [64]:
labels.memory_usage()

80000128

In [65]:
categories.memory_usage()

10000320

In [66]:
%time _ = labels.astype('category')

CPU times: user 264 ms, sys: 28.8 ms, total: 293 ms
Wall time: 293 ms


labels uses significantly more memory than categories. The conversion to category is not free, but it is a one-time cost

GroupBy operations can be significantly faster with categoricals because the underlying algorithms use the integer-based codes array instead of an array of strings

## 12.1.4 Categorical Methods

Series containing categorical data have several special methods similar to the Series.str specialized string methods. This also provides convenient access to the categories and codes

In [67]:
s = pd.Series(['a', 'b', 'c', 'd'] * 2)
s

0    a
1    b
2    c
3    d
4    a
5    b
6    c
7    d
dtype: object

In [68]:
cat_s = s.astype(dtype = 'category')
cat_s

0    a
1    b
2    c
3    d
4    a
5    b
6    c
7    d
dtype: category
Categories (4, object): [a, b, c, d]

The special attribute 'cat' provides access to categorical methods

In [69]:
cat_s.cat.codes

0    0
1    1
2    2
3    3
4    0
5    1
6    2
7    3
dtype: int8

In [70]:
cat_s.values.codes

array([0, 1, 2, 3, 0, 1, 2, 3], dtype=int8)

In [71]:
cat_s.cat.categories

Index(['a', 'b', 'c', 'd'], dtype='object')

In [72]:
cat_s.values.categories

Index(['a', 'b', 'c', 'd'], dtype='object')

In [73]:
actual_categories = ['a', 'b', 'c', 'd', 'e']
cat_s2 = cat_s.cat.set_categories(actual_categories)
cat_s2

0    a
1    b
2    c
3    d
4    a
5    b
6    c
7    d
dtype: category
Categories (5, object): [a, b, c, d, e]

In [75]:
cat_s.value_counts()

d    2
c    2
b    2
a    2
dtype: int64

In [76]:
cat_s2.value_counts()

d    2
c    2
b    2
a    2
e    0
dtype: int64

In large datasets, categoricals are often used as a convenient tool for memory savings and better performance. After you filter a large DataFrame or Series, many of the categories may not appear in the data. To help with this, we can use the remove_unused_categories method to trim unobserved categories

In [77]:
cat_s3 = cat_s[cat_s.isin(['a', 'b'])]
cat_s3

0    a
1    b
4    a
5    b
dtype: category
Categories (4, object): [a, b, c, d]

In [78]:
cat_s3.cat.remove_unused_categories()

0    a
1    b
4    a
5    b
dtype: category
Categories (2, object): [a, b]

**Creating dummy variables for modeling**

When you are using statistics or machine learning tools, you will often transform categorical data into dummy variables, also known as one-hot encoding. This involves creating a DataFrame with a column for each distinct category. These columns contain 1s for occurrences of a given category and 0 otherwise

In [79]:
cat_s = pd.Series(['a', 'b', 'c', 'd'] * 2, dtype = 'category')
cat_s

0    a
1    b
2    c
3    d
4    a
5    b
6    c
7    d
dtype: category
Categories (4, object): [a, b, c, d]

In [80]:
pd.get_dummies(cat_s)

Unnamed: 0,a,b,c,d
0,1,0,0,0
1,0,1,0,0
2,0,0,1,0
3,0,0,0,1
4,1,0,0,0
5,0,1,0,0
6,0,0,1,0
7,0,0,0,1


# 12.2 Advanced GroupBy Use

## 12.2.1 Group Transforms and "Unwrapped" GroupBys

apply method in grouped operations for performing transformations. There is another built-in method called transform, which is similar to apply but imposes more constraints on the kind of function you can use:
- It can produce a scalar value to be broadcast to the shape of the group 
- It can produce an object of the same shape as the input group 
- It must not mutate its input 

In [81]:
df = pd.DataFrame({'key':['a','b','c'] * 4,
                   'value':np.arange(12.)})
df

Unnamed: 0,key,value
0,a,0.0
1,b,1.0
2,c,2.0
3,a,3.0
4,b,4.0
5,c,5.0
6,a,6.0
7,b,7.0
8,c,8.0
9,a,9.0


In [82]:
g = df.groupby('key').value
g.mean()

key
a    4.5
b    5.5
c    6.5
Name: value, dtype: float64

In [83]:
df.groupby('key').mean()

Unnamed: 0_level_0,value
key,Unnamed: 1_level_1
a,4.5
b,5.5
c,6.5


Suppose instead we wanted to produce a Series of the same shape as df['value'] but with values replaced by the average grouped by 'key'. We cab pass the function lambda x: x.mean() to transform

SQL: Window function

In [84]:
g.transform(lambda x: x.mean())

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

In [87]:
df.groupby('key').transform(lambda x: x.mean())

Unnamed: 0,value
0,4.5
1,5.5
2,6.5
3,4.5
4,5.5
5,6.5
6,4.5
7,5.5
8,6.5
9,4.5


For built-in aggregation functions, we can pass a string alias as with the GroupBy agg method

In [88]:
g.transform('mean')

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

Like apply, transform works with functions that return Series, but the result must be the same size as the input

In [89]:
# multiply each group by 2 using a lambda function 
g.transform(lambda x: x * 2)

0      0.0
1      2.0
2      4.0
3      6.0
4      8.0
5     10.0
6     12.0
7     14.0
8     16.0
9     18.0
10    20.0
11    22.0
Name: value, dtype: float64

In [91]:
# compute the ranks in descending order for each group 
g.transform(lambda x: x.rank(ascending = False))

0     4.0
1     4.0
2     4.0
3     3.0
4     3.0
5     3.0
6     2.0
7     2.0
8     2.0
9     1.0
10    1.0
11    1.0
Name: value, dtype: float64

Consider a group transformation function composed from simple aggregations, we can obtain equivalent results in this case either using transform or apply

In [93]:
def normalize(x):
    return (x - x.mean()) / x.std()

In [94]:
g.transform(normalize)

0    -1.161895
1    -1.161895
2    -1.161895
3    -0.387298
4    -0.387298
5    -0.387298
6     0.387298
7     0.387298
8     0.387298
9     1.161895
10    1.161895
11    1.161895
Name: value, dtype: float64

In [95]:
g.apply(normalize)

0    -1.161895
1    -1.161895
2    -1.161895
3    -0.387298
4    -0.387298
5    -0.387298
6     0.387298
7     0.387298
8     0.387298
9     1.161895
10    1.161895
11    1.161895
Name: value, dtype: float64

Built-in aggregate functions like 'mean' or 'sum' are often much faster than a general apply function. These also have a 'fast past' when used with transform. This allows us to perform a so-called unwrapped group operation

In [96]:
g.transform('mean')

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

In [101]:
g.agg('mean')

key
a    4.5
b    5.5
c    6.5
Name: value, dtype: float64

In [104]:
g.apply(np.mean)

key
a    4.5
b    5.5
c    6.5
Name: value, dtype: float64

In [105]:
normalized = (df['value'] - g.transform('mean')) / g.transform('std')
normalized

0    -1.161895
1    -1.161895
2    -1.161895
3    -0.387298
4    -0.387298
5    -0.387298
6     0.387298
7     0.387298
8     0.387298
9     1.161895
10    1.161895
11    1.161895
Name: value, dtype: float64

While an unwrapped group operation may involve multiple group aggregations, the overall benefit of vectorized operations often outweight this

## 12.2.2 Grouped Time Resampling

For time series data, the resample method is semantically a group operation based on a time intervalization

In [106]:
N = 15
times = pd.date_range(start = '2017-05-20 00:00', periods = N, freq = '1min')
df = pd.DataFrame({'time': times,
                   'value': np.arange(N)})
df

Unnamed: 0,time,value
0,2017-05-20 00:00:00,0
1,2017-05-20 00:01:00,1
2,2017-05-20 00:02:00,2
3,2017-05-20 00:03:00,3
4,2017-05-20 00:04:00,4
5,2017-05-20 00:05:00,5
6,2017-05-20 00:06:00,6
7,2017-05-20 00:07:00,7
8,2017-05-20 00:08:00,8
9,2017-05-20 00:09:00,9


In [107]:
df.set_index('time').resample('5min').count()

Unnamed: 0_level_0,value
time,Unnamed: 1_level_1
2017-05-20 00:00:00,5
2017-05-20 00:05:00,5
2017-05-20 00:10:00,5


In [108]:
df.set_index('time').resample('5min').sum()

Unnamed: 0_level_0,value
time,Unnamed: 1_level_1
2017-05-20 00:00:00,10
2017-05-20 00:05:00,35
2017-05-20 00:10:00,60


Suppose that a DataFrame contains multiple time series, marked by an additional group key column

In [110]:
df2 = pd.DataFrame({'time': times.repeat(3),
                    'key': np.tile(['a', 'b', 'c'], N),
                    'value': np.arange(N * 3.)})
df2[:7]

Unnamed: 0,time,key,value
0,2017-05-20 00:00:00,a,0.0
1,2017-05-20 00:00:00,b,1.0
2,2017-05-20 00:00:00,c,2.0
3,2017-05-20 00:01:00,a,3.0
4,2017-05-20 00:01:00,b,4.0
5,2017-05-20 00:01:00,c,5.0
6,2017-05-20 00:02:00,a,6.0


To do the same resampling for each value of 'key', we introduce the pandas.Grouper object

In [116]:
time_key = pd.Grouper(freq = '5min') # def freq for TimeGrouper 
time_key

TimeGrouper(freq=<5 * Minutes>, axis=0, sort=True, closed='left', label='left', how='mean', convention='e', base=0)

In [117]:
resampled = (df2.set_index('time')
             .groupby(['key', time_key])
             .sum())
resampled

Unnamed: 0_level_0,Unnamed: 1_level_0,value
key,time,Unnamed: 2_level_1
a,2017-05-20 00:00:00,30.0
a,2017-05-20 00:05:00,105.0
a,2017-05-20 00:10:00,180.0
b,2017-05-20 00:00:00,35.0
b,2017-05-20 00:05:00,110.0
b,2017-05-20 00:10:00,185.0
c,2017-05-20 00:00:00,40.0
c,2017-05-20 00:05:00,115.0
c,2017-05-20 00:10:00,190.0


In [118]:
df2.set_index('time').groupby(['key', pd.Grouper(freq = '5min')]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,value
key,time,Unnamed: 2_level_1
a,2017-05-20 00:00:00,5
a,2017-05-20 00:05:00,5
a,2017-05-20 00:10:00,5
b,2017-05-20 00:00:00,5
b,2017-05-20 00:05:00,5
b,2017-05-20 00:10:00,5
c,2017-05-20 00:00:00,5
c,2017-05-20 00:05:00,5
c,2017-05-20 00:10:00,5


In [119]:
resampled.reset_index()

Unnamed: 0,key,time,value
0,a,2017-05-20 00:00:00,30.0
1,a,2017-05-20 00:05:00,105.0
2,a,2017-05-20 00:10:00,180.0
3,b,2017-05-20 00:00:00,35.0
4,b,2017-05-20 00:05:00,110.0
5,b,2017-05-20 00:10:00,185.0
6,c,2017-05-20 00:00:00,40.0
7,c,2017-05-20 00:05:00,115.0
8,c,2017-05-20 00:10:00,190.0


# 12.3 Techniques for Method Chaining

When applying a sequence of transformations to a dataset, you may find yourself creating numerous temporary variables that are never used in the analysis

In [None]:
df = load_data()
df2 = df[df['col2'] < 0]
df2['col1_demeaned'] = df2['col1'] - df2['col1'].mean()
result = df2.groupby('key').col1_demeaned.std()

The DataFrame.assign method is a functional alternative to column assignments of the form df[k] = v. Rather than modifying the object in-place, it returns a new DataFrame with the indicated modifications

In [None]:
# Usual non-functional way
df2 = df.copy()
df2['k'] = v
# Functional assign way
df2 = df.assign(k=v)

In [None]:
result = (df2.assign(col1_demeaned = df2.col1 - df2.col2.mean())
          .groupby('key')
          .col1_demeaned.std())

assign and many other pandas functions accept function-like arguments, also known as callables

In [None]:
df = load_data()
df2 = df[df['col2'] < 0]

In [None]:
df = (load_data()
      [lambda x: x['col2'] < 0])

Here, the result of load_data is not assigned to a variable, so the function passed into [] is then bound to the object at that stage of the method chain.

In [None]:
result = (load_data() # load data from origin
          [lambda x: x.col2 < 0] # filter rows 
          .assign(col1_demeaned = lambda x: x.col1 - x.col1.mean()) # mutate, add new column
          .groupby('key') # group by
          .col1_demeaned # select 
          .std()) # summarise function 

## 12.3.1 The pipe Method

In [None]:
a = f(df, arg1 = v1)
b = g(a, v2, arg3 = v3)
c = h(b, arg4 = v4)
# a -> b -> c

In [None]:
result = (df.pipe(f, arg1 = v1)
            .pipe(g, v2, arg3 = v3)
            .pipe(h, arg4 = v4)) 

In [None]:
f(df) = df.pipe(f) # df is the object, and f() is the function

A potentially useful pattern for pipe is to generalize sequences of operations into reusable functions

In [None]:
g = df.groupby(['key1','key2'])
df['col1'] = df['col1'] - g.transform('mean')

In [None]:
def group_demean(df, by, cols):
    result = df.copy()
    g = df.groupby(by)
    for c in cols:
        result[c] = df[c] - g[c].transform('mean')
    return result

In [None]:
result = (df[df.col1 < 0]
          .pipe(group_demean, ['key1', 'key2'], ['col1']))

In [128]:
pd.DataFrame.pipe?

[0;31mSignature:[0m [0mpd[0m[0;34m.[0m[0mDataFrame[0m[0;34m.[0m[0mpipe[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mfunc[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Apply func(self, \*args, \*\*kwargs).

Parameters
----------
func : function
    Function to apply to the Series/DataFrame.
    ``args``, and ``kwargs`` are passed into ``func``.
    Alternatively a ``(callable, data_keyword)`` tuple where
    ``data_keyword`` is a string indicating the keyword of
    ``callable`` that expects the Series/DataFrame.
args : iterable, optional
    Positional arguments passed into ``func``.
kwargs : mapping, optional
    A dictionary of keyword arguments passed into ``func``.

Returns
-------
object : the return type of ``func``.

See Also
--------
DataFrame.apply
DataFrame.applymap
Series.map

Notes
-----

Use ``.pipe`` when chaining together functions that expect
Series, DataFrames or Gr

# 12.4 Conclusion