<a href="https://colab.research.google.com/github/JonaJS/E_Pandas/blob/main/Chptr7_Aggregate_Methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Aggregate methods collapse the values of a series down to a scalar. Let's say we work at a restaurant, our boss will want to know:

-   How many people came in (count)
-   How much food was ordered (count)
-   What was the total revenue (sum)
-   When did people come (skew)
-   What was the average purchase amount (mean).

Aggregations allow you to take detailed data and collapse it to a single value.


### Aggregations.

In [None]:
import pandas as pd
URL = 'https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip'

In [None]:
df = pd.read_csv(URL)
city_mpg = df.city08
highway_mpg = df.highway08

  df = pd.read_csv(URL)


Calculate the *mean* value of a series, que can use the aggregation method, `.mean`:

In [None]:
city_mpg.mean()

18.369045304297103

The are also a few aggregations properties"
- These start with `.is_`
- They are not called, they will evaluate to `True` or `False`

In [None]:
# Series.is_unique
# Return boolean if values in the object are unique.

city_mpg.is_unique

False

In [None]:
# Series.is_monotonic_increasing
# Return boolean if values in the object are monotonically increasing.

city_mpg.is_monotonic_increasing

False

One method to be aware is the `.quantile` method. By default it returns the 50% quantile. You can specify another level, or you can pass in a list of levels. In the latter case, the result of calling `.quantile` no longer returns a scalar but a *Series* object.

In [None]:
city_mpg.quantile()

17.0

In [None]:
city_mpg.quantile(.5)

17.0

In [None]:
city_mpg.quantile(.9)

24.0

In [None]:
city_mpg.quantile([.1, .5, .9])

0.1    13.0
0.5    17.0
0.9    24.0
Name: city08, dtype: float64

### Count and Mean of an attribute

If you want the count of values that meet some criteria, we can use the `.sum` method. For example, if we want the count and percent of cars with mileage greater than 20, we can use the following code.

In [None]:
# count
(city_mpg
 .gt(20)
 .sum()
)

10272

In [None]:
# percent
(city_mpg
 .gt(20)
 .mul(100)
 .mean()
)

24.965973167412017

In [None]:
city_mpg

0        19
1         9
2        23
3        10
4        17
         ..
41139    19
41140    20
41141    18
41142    18
41143    16
Name: city08, Length: 41144, dtype: int64

**This trick comes from the fact that Python treats `True` as `1` and `False` as `0`. If you sum up a series of boolean values, the result is the count of `True` values.** **If you take the `mean` of a series of boolean values, the result is the fraction of values that are `True`.**


You can use this trick with any series of boolean values.

### .agg and Aggregation strings

Finally, the `.agg` method does aggregations. But like `.quantile`, it also transforms the data in other way depending on how it is called.

In [None]:
# We can use .agg to calculate the mean:
city_mpg.agg('mean')

18.369045304297103

However, that is easier with `city_mpg.mean()`. Where `.agg` shines is in the ability to perform multiple aggregations. In that case, it returns a series. You can pass in the name of aggregations methods, Numpy reduction functions, Python aggregations, or define your own aggregation function.

In [None]:
import numpy as np
def second_to_last(s): # penúltimo número
    return s.iloc[-2]

In [None]:
city_mpg.agg(['mean', np.var, max, second_to_last])

mean               18.369045
var                62.503036
max               150.000000
second_to_last     18.000000
Name: city08, dtype: float64

Aggregation stringd on pages 46/47.

Aggregation methods and properties on pages 47/48.

In [None]:
# Find the count of non-missing values of a series.
(city_mpg
 .count()
)

41144

In [None]:
# Find the number of entries of a series.
(city_mpg
 .size
)

41144

In [None]:
# Find the number of unique entries of a series.
(city_mpg
 .nunique()
)

105

In [None]:
# Find the mean value of a series.
(city_mpg
 .mean()
)

18.369045304297103

In [None]:
# Find the maximum value of a series.
(city_mpg
 .max()
 )

150

In [None]:
# Use the .agg method to find all of the above.
(city_mpg
 .agg(['count', 'size', 'nunique', 'mean', 'max'])
)

count      41144.000000
size       41144.000000
nunique      105.000000
mean          18.369045
max          150.000000
Name: city08, dtype: float64