# **Guided LAB - 343.4.6 - Pandas Grouping and Aggregate Functions**

---



## **Lab Overview:**

In this lab, we will demonstrate how to group by a single column, multiple columns by using aggregations.

## ** Learning Objective**
By the end of this lab, learners will be able to:-
- Utilize the groupBy() function.
- Combine aggregate functions with groupby() function for data manipulation.

## **Dataset**
**In this lab we will student_scores.csv dataset,[ Click here to download dataset.](https://drive.google.com/file/d/1GxvbD5kV6-zzrbDS3uXUlPtm14sSZkBc/view?usp=sharing)**

## **Introduction:**

Similar to the SQL GROUP BY clause pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on the grouped data. Group by operation involves splitting the data, applying some functions, and finally aggregating the results.

In pandas, you can use groupby() with the combination of sum(), aggregate() and many more methods.

**Syntax of groupby() function**

```
pandas.groupby(by=column or index, axis=0, level=None, as_index=True, sort=True, group_keys=True,  observed=False, dropna=True)
```

- by – List of column names or index label to group by
- axis – Default to 0. It takes 0 or ‘index’, 1 or ‘columns’
- level – Used with MultiIndex.
- as_index – sql style grouped output.
- sort – Default to True. Specify whether to sort after group
- group_keys – add group keys or not
- observed – This only applies if any of the groupers are Categorical
- dropna – Default false. True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.


#**Instruction:**
In order to explain several examples of how to perform group by, first, let’s import student_score.csv file for dataset into Pandas

In [None]:
import pandas as pd
df = pd.read_csv("student_scores.csv", header=0)
df

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.info

# **Split Data into Groups**

- The **by** parameter can accept one column or multiple columns.
- Pandas object can be split into a group in many ways. A **groups** attribute is used to list group data.




### **Example: Groupby using single column – It makes the group by using a single column.**

In [None]:
item_group = df.groupby('first_name')
item_group.groups

### **Example: Groupby using multiple columns – It forms the group by using multiple columns.**

In [None]:
Groupby_MultipleColumns = df.groupby(["first_name", "last_name"])
Groupby_MultipleColumns.groups

### **Example: Iterating through Groups**
You can also print the group elements by iterating through groups using for loop.

In [None]:
for name,group in item_group:
    print('{}:'.format(name))
    print(group)

### **Example: Selecting a Group**
The **get_group()** method is used to select a particular group.

In [None]:
item_group = df.groupby('Subject')
#item_group.groups
item_group.get_group('Calculus')

### **Example: Groupby – Aggregations**

You can use aggregation function such as mean, sum, etc to get the aggregate value of each group. Aggregation functions are used once the group by object is created.

Let’s calculate the average score of each Subject.

In [None]:
# Directly using mean() function
agg_group_subject = df.groupby('Subject')['score'].mean()
agg_group_subject

## **Alternativily**: the below line will give the same output.


In [None]:
agg_group_subject = df.groupby('Subject')['score'].aggregate('mean')
agg_group_subject

Let’s calculate the average score of each Student.

In [None]:
import numpy as np
agg_group_stu = df.groupby(["first_name", "last_name"])['score'].mean()
print(agg_group_stu)

## **Alternativily: the below line will give the same output.**

In [None]:
agg_group_stu = df.groupby(["first_name", "last_name"])['score'].aggregate('mean')
print(agg_group_stu)

### **Example: Aggregation group for Multiple columns:**
You can make groups for aggregation value by using multiple columns

Let’s calculate the average and total score of each student.

In [None]:
import numpy as np
agg_group = df.groupby(["first_name", "last_name"])['score'].aggregate([np.mean,np.sum])
print(agg_group)

### **Example: Lets count the number of students**

In [None]:
agg_group_count = df.groupby(["first_name", "last_name"])["id"].count()

agg_group_count

### **Exmaple: Find the highest score of the each Student**

In [None]:
df.groupby(["first_name", "last_name"]).max()

### **Example: Find the lowest score of the each Student**

In [None]:
df.groupby(["first_name", "last_name"]).min()


##**Submission Instructions**
- Submit your completed lab using the Start Assignment button on the assignment page in Canvas.
- Your submission can be include:
  - if you are using notebook then, all tasks should be written and submitted in a single notebook file, for example: (**your_name_labname.ipynb**).
  - if you are using python script file, all tasks should be written and submitted in a single python script file for example: **(your_name_labname.py)**.
- Add appropriate comments and any additional instructions if required.
