<a href="https://colab.research.google.com/github/ShilpaVasista/Exploratory-Data-Analytics/blob/main/groupby.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's dive into the `groupby()` function in pandas. It's a powerful tool for data analysis, especially for summarizing and aggregating data based on categories.

**What is `groupby()`?**

In essence, `groupby()` does exactly what its name suggests: it groups rows in a DataFrame based on the values in one or more columns. It allows you to:

* **Split:** Divide the DataFrame into groups based on specified criteria.
* **Apply:** Perform calculations or transformations on each group independently.
* **Combine:** Merge the results back into a single DataFrame.

**How it Works (The "Split-Apply-Combine" Strategy)**

1.  **Split:**
    * You specify the column(s) by which you want to group the data.
    * Pandas then identifies unique values in those columns and creates separate groups for each unique value.

2.  **Apply:**
    * Once the groups are formed, you can apply various aggregation functions (e.g., `sum()`, `mean()`, `count()`, `min()`, `max()`) or custom functions to each group.
    * This is where you perform the calculations or transformations you need.

3.  **Combine:**
    * Finally, pandas combines the results from each group back into a new DataFrame or Series, often with the grouping columns as the index.

**Simple Code Examples**

Let's illustrate with some examples using a hypothetical DataFrame:

In [1]:
import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'C'],
    'Value': [10, 15, 20, 25, 30, 35]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Group by 'Category' and calculate the sum of 'Value'
grouped_sum = df.groupby('Category')['Value'].sum()
print("\nSum of 'Value' by Category:")
print(grouped_sum)

# Group by 'Category' and calculate the mean of 'Value'
grouped_mean = df.groupby('Category')['Value'].mean()
print("\nMean of 'Value' by Category:")
print(grouped_mean)

# Group by 'Category' and count the number of occurrences
grouped_count = df.groupby('Category')['Value'].count()
print("\nCount of 'Value' by Category:")
print(grouped_count)

# Group by and get multiple aggregate functions at once.
grouped_agg = df.groupby('Category')['Value'].agg(['sum','mean', 'count'])
print("\nAggregated values by Category:")
print(grouped_agg)

Original DataFrame:
  Category  Value
0        A     10
1        B     15
2        A     20
3        B     25
4        A     30
5        C     35

Sum of 'Value' by Category:
Category
A    60
B    40
C    35
Name: Value, dtype: int64

Mean of 'Value' by Category:
Category
A    20.0
B    20.0
C    35.0
Name: Value, dtype: float64

Count of 'Value' by Category:
Category
A    3
B    2
C    1
Name: Value, dtype: int64

Aggregated values by Category:
          sum  mean  count
Category                  
A          60  20.0      3
B          40  20.0      2
C          35  35.0      1


**Explanation of the Code**

* `df.groupby('Category')`: This creates a `DataFrameGroupBy` object, which represents the groups formed based on the 'Category' column.
* `['Value'].sum()`: this selects the value column from each group, and then applies the sum function to that column.
* `['Value'].mean()`: this selects the value column from each group, and then applies the mean function to that column.
* `['Value'].count()`: this selects the value column from each group, and then counts the number of values in that column.
* `['Value'].agg(['sum','mean', 'count'])`: this selects the value column from each group, and then applies the sum, mean, and count functions to that column, and returns a dataframe with the results.

**Key Points**

* `groupby()` is extremely versatile for data summarization and analysis.
* You can group by multiple columns by passing a list of column names to `groupby()`.
* You can apply custom functions to groups using the `apply()` method.
* The `agg()` function is very useful for applying multiple aggregate functions at the same time.

