# Named aggregations
In the previous lesson, we selected the product_group and price columns before applying the groupby and mean functions so that the output only shows the product groups and average prices. We do not have to select the columns beforehand if we use named aggregations. We can simply specify the name of the column and the type of aggregate function. Another advantage of using named aggregations is that we can assign a more descriptive name to the aggregated column. Let’s repeat the example in the previous lesson with a named aggregation.

In [None]:
import pandas as pd

grocery = pd.read_csv("grocery.csv")

print(
  grocery.groupby("product_group").agg(
    avg_price = ("price","mean")
  )
)

We can perform multiple aggregations in a single operation. Let’s calculate the total sales quantity for each group, in addition to the average price.

In [None]:
import pandas as pd

grocery = pd.read_csv("grocery.csv")

print(
  grocery.groupby("product_group").agg(
    avg_price = ("price","mean"),
    total_sales = ("sales_quantity", "sum")
  )
)

# The sort_values function
In some cases, we may want to sort the groups according to the aggregated values. It’s a useful practice, especially when we have several groups. The sort_values function can be used for this task. Let’s calculate the average price and total sales quantity for each product and then sort the values by total sales quantities.

In [None]:
import pandas as pd

grocery = pd.read_csv("grocery.csv")

print(
  grocery.groupby("product_description").agg(
    avg_price = ("price","mean"),
    total_sales = ("sales_quantity", "sum")
  ).sort_values(
    by="total_sales",
    ascending=False
  )
)

The results are sorted in ascending order by default, but we can change this behavior by setting the ascending parameter as False. By default, the groups are shown as the index of the resulting DataFrame. If we want to have the groups as a column in the DataFrame, we need to set the as_index parameter as False.

The groupby function accepts multiple columns for grouping as well. In that case, groups are generated based on the combinations of the distinct values in each column. In the case that we use multiple columns for grouping, the column names must be written in a list.

Suppose we have a DataFrame with three columns: brand, color, and price. The distinct categories in the brand column are Ford and Toyota and the distinct categories in the color column are white and red. If we group the rows by the brand and color columns, the following groups will be generated:

Ford and white
Ford and red
Toyota and white
Toyota and red
Let’s see an example on the grocery to demonstrate the use of multiple columns with the groupby function.

In [None]:
import pandas as pd

grocery = pd.read_csv("grocery.csv")

print(
  grocery.groupby(
    ["product_description", "product_group"]
  ).agg(
    avg_price = ("price","mean"),
    total_sales = ("sales_quantity", "sum")
  )
)

The groupby is a highly flexible and powerful function and is frequently used in data analysis tasks.