# Aggregations

In Pandas, the `df.agg('mean')` and `df.agg` are two different ways of using the `agg` function to perform aggregation operations on a DataFrame.

When you use `df.agg('mean')`, you are specifying a specific aggregation function to apply to each column of the DataFrame. In this case, `'mean'` is passed as an argument, indicating that you want to compute the mean (average) value for each column. The result will be a Series where the index represents the columns of the DataFrame, and the values correspond to the mean value of each column.

On the other hand, when you use `df.agg` without specifying a specific function, it behaves differently. In this case, the `agg` function is used to apply one or more aggregation operations to the columns of the DataFrame. It can take different types of arguments to specify the aggregation operations. For example, you can pass a list of functions as arguments to compute multiple aggregations at once.

Here's an example to illustrate the difference:

In [None]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Using df.agg('mean')
result1 = df.agg('mean')
print(result1)
# Output:
# A    2.0
# B    5.0
# dtype: float64

# Using df.agg without specifying a function
result2 = df.agg(['mean', 'sum'])
print(result2)
# Output:
#        A   B
# mean   2   5
# sum    6  15



In the first example (`df.agg('mean')`), the mean value for each column is computed separately, resulting in a Series with column names as indices and mean values as values. In the second example (`df.agg(['mean', 'sum'])`), multiple aggregation operations (mean and sum) are applied to each column, resulting in a DataFrame with the specified operations as columns and column names as indices.

In summary, `df.agg('mean')` applies a specific aggregation function (mean in this case) to each column separately and returns a Series, while `df.agg` without specifying a function allows you to apply multiple aggregation operations to the columns and returns a DataFrame with the results.

## More Examples

In [None]:
# Count of values that meet some criteria, use .sum
s.gt(20).sum()

# Percentage of values that meet some criteria, use .mean()
s.gt(20).mul(100).mean()

In [None]:
# Defining own agg

import numpy as np
def second_to_last (s):
    return s.iloc [-2]

s.agg(['mean ', np.var , max , second_to_last ])

## String Aggregations

```
'all' Returns True if every value is truthy.
'any' Returns True if any value is truthy.
'autocorr' Returns Pearson correlation of series with shifted
self. Can override lag as keyword
argument(default is 1).
'corr' Returns Pearson correlation of series with other
series. Need to specify other.
'count' Returns count of non-missing values.
'cov' Return covariance of series with other series. Need to
specify other.
'dtype' Type of the series.
'dtypes' Type of the series.
'empty' True if no values in series.
'hasnans' True if missing values in series.
'idxmax' Returns index value of maximum value.
'idxmin' Returns index value of minimum value.
'is_monotonic' True if values always increase.
'is_monotonic_decreasing' True if values always decrease.
'is_monotonic_increasing' True if values always increase.
'kurt' Return ”excess” kurtosis (0 is normal distribution).
Values greater than 0 have more outliers than
normal.
'mad' Return the mean absolute deviation.
'max' Return the maximum value.
'mean' Return the mean value.
'median' Return the median value.
'min' Return the minimum value.
'nbytes' Return the number of bytes of the data.
'ndim' Return the number of dimensions (1) of the data.
'nunique' Return the count of unique values.
'quantile' Return the median value. Can override q to specify
other quantile.
'sem' Return the unbiased standard error.
'size' Return the size of the data.
'skew' Return the unbiased skew of the data. Negative
indicates tail is on the left side.
'std' Return the standard deviation of the data.
'sum' Return the sum of the series.
```