<h1>GroupBy Method</h1>

<p>The Pandas GroupBy method is an extremely useful function for data analysis, as it allows grouping a DataFrame by one or more columns and applying aggregation functions to the resulting groups. With it, it is possible to obtain summary statistics such as mean, sum, median, and others, as well as perform complex operations on multiple columns simultaneously. It is a powerful tool for understanding the structure and behavior of data and can be used in a wide variety of applications, from finance and business to data science and academic research.
</p>

<p>O método GroupBy do Pandas é uma função extremamente útil para análise de dados, pois permite agrupar um DataFrame por uma ou mais colunas e aplicar funções de agregação aos grupos resultantes. Com ele, é possível obter estatísticas resumidas, como a média, a soma, a mediana e outras, além de realizar operações complexas em várias colunas simultaneamente. É uma ferramenta poderosa para entender a estrutura e o comportamento dos dados, e pode ser utilizada em uma ampla variedade de aplicações, desde finanças e negócios até ciência de dados e pesquisa acadêmica.
</p>

In [1]:
import pandas as pd

data = {'Category':['Junior', 'Junior', 'Mid-level', 'Mid-level', 'Senior', 'Senior'],
        'Name':['Jorge', 'Carlos', 'Roberta', 'Patrícia', 'Bruno', 'Vera'],
        'Sele':[200,120,340,124,243,350]}

In [2]:
df = pd.DataFrame(data)
df

Unnamed: 0,Category,Name,Sele
0,Junior,Jorge,200
1,Junior,Carlos,120
2,Mid-level,Roberta,340
3,Mid-level,Patrícia,124
4,Senior,Bruno,243
5,Senior,Vera,350


In [3]:
#GroupBy performs the grouping of data and then we can apply
#functions to operate on these groups, as in the following example, 
#where we group by categoriesand then perform the sum.
#In this case, it returns only the column with numeric values since it does not make sense to sum strings.

group = df.groupby('Category')
group.sum()

Unnamed: 0_level_0,Sele
Category,Unnamed: 1_level_1
Junior,320
Mid-level,464
Senior,593


In [4]:
#I can take the mean.

group.mean()

Unnamed: 0_level_0,Sele
Category,Unnamed: 1_level_1
Junior,160.0
Mid-level,232.0
Senior,296.5


In [5]:
#I can perform operations directly on the GroupBy without storing it in a variable, 
#for example, retrieving the maximum values within each group.

df.groupby('Category').max()

Unnamed: 0_level_0,Name,Sele
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Junior,Jorge,200
Mid-level,Roberta,340
Senior,Vera,350


In [6]:
#minimum values

df.groupby('Category').min()

Unnamed: 0_level_0,Name,Sele
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Junior,Carlos,120
Mid-level,Patrícia,124
Senior,Bruno,243


In [12]:
#Creating new DataFrames to test GroupBy.
#When creating a new DataFrame that is equal to the first one, you need to use the Copy method. 
#If you do something like df2 = df, Python will understand that they are equal, and any 
#changes made in df2 will also be made in df.

df2 = df.copy()
df2['Sele'] = [150, 432, 190, 230, 410, 155]

In [8]:
df

Unnamed: 0,Category,Name,Sele
0,Junior,Jorge,200
1,Junior,Carlos,120
2,Mid-level,Roberta,340
3,Mid-level,Patrícia,124
4,Senior,Bruno,243
5,Senior,Vera,350


In [10]:
df2

Unnamed: 0,Category,Name,Sele
0,Junior,Jorge,150
1,Junior,Carlos,432
2,Mid-level,Roberta,190
3,Mid-level,Patrícia,230
4,Senior,Bruno,410
5,Senior,Vera,155


In [11]:
#Function to concatenate DataFrames.

df3 = pd.concat([df, df2])
df3

Unnamed: 0,Category,Name,Sele
0,Junior,Jorge,200
1,Junior,Carlos,120
2,Mid-level,Roberta,340
3,Mid-level,Patrícia,124
4,Senior,Bruno,243
5,Senior,Vera,350
0,Junior,Jorge,150
1,Junior,Carlos,432
2,Mid-level,Roberta,190
3,Mid-level,Patrícia,230


In [17]:
#I can group by more than one column.
#Now each name appears twice.

df3.groupby(['Category', 'Name']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Sele
Category,Name,Unnamed: 2_level_1
Junior,Carlos,552
Junior,Jorge,350
Mid-level,Patrícia,354
Mid-level,Roberta,530
Senior,Bruno,653
Senior,Vera,505
