### Grouping Data: 

It organizes data into separate groups based on the values in specified columns.

grouped = df.groupby('column_name')

#### Aggregation Functions:

After grouping, you can apply aggregation functions such as sum(), mean(), count(), etc. 

For instance:
result = df.groupby('city')['pop'].sum()

#### Multiple Columns:

You can group by multiple columns by passing a list to groupby(). 

For example:

df.groupby(['city', 'state'])[['pop']].sum()

In [1]:
import pandas as pd

# Sample DataFrame
data = {'city': ['New York', 'Los Angeles', 'New York', 'Chicago'],
        'state': ['NY', 'CA', 'NY', 'IL'],
        'pop': [8000000, 4000000, 8500000, 2700000]}
df = pd.DataFrame(data)
df

Unnamed: 0,city,state,pop
0,New York,NY,8000000
1,Los Angeles,CA,4000000
2,New York,NY,8500000
3,Chicago,IL,2700000


In [2]:
# Group by city and sum the population
result = df.groupby('city')['pop'].sum()
print(result)

city
Chicago         2700000
Los Angeles     4000000
New York       16500000
Name: pop, dtype: int64


Grouping and Aggregating with Multiple Functions

You can apply multiple aggregation functions at once using the .agg() method.

In [3]:
import pandas as pd

# Sample DataFrame
data = {
    'city': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles', 'Chicago'],
    'state': ['NY', 'CA', 'NY', 'IL', 'CA', 'IL'],
    'pop': [8000000, 4000000, 8500000, 2700000, 5000000, 3000000]
}
df = pd.DataFrame(data)

# Group by city and calculate both sum and mean population
result = df.groupby('city')['pop'].agg(['sum', 'mean'])
print(result)

                  sum       mean
city                            
Chicago       5700000  2850000.0
Los Angeles   9000000  4500000.0
New York     16500000  8250000.0


Grouping by Multiple Columns
You can group by more than one column to get multi-level aggregated data.

In [4]:
result = df.groupby(['city', 'state'])['pop'].sum()
print(result)

city         state
Chicago      IL        5700000
Los Angeles  CA        9000000
New York     NY       16500000
Name: pop, dtype: int64


Filtering Groups
You can filter groups based on a condition after grouping.

In [5]:
#### Group by city and filter those with a total population greater than 7 million

grouped = df.groupby('city').sum()
filtered = grouped[grouped['pop'] > 7000000]
print(filtered)

            state       pop
city                       
Los Angeles  CACA   9000000
New York     NYNY  16500000
