# **Revision Of GroupBy Object in Pandas: GroupBy Foundation**

In pandas, a `GroupBy` object is a crucial part of the data manipulation process, specifically for data aggregation and transformation. It is a result of the `groupby()` method applied to a pandas DataFrame, which allows you to group the data in the DataFrame based on one or more columns.

When you apply `groupby()` to a DataFrame, it creates a `GroupBy` object, which acts as a kind of intermediate step before applying aggregation functions or other operations to the grouped data. This intermediate step helps you perform operations on subsets of data based on the grouping criteria. Some common aggregation functions you can apply to a `GroupBy` object include `sum()`, `mean()`, `count()`, `max()`, `min()`, and more.

Here's a basic example of how you can create a `GroupBy` object and perform aggregation with it:



In [2]:
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10, 20, 15, 25, 30]
}

df = pd.DataFrame(data)

# Group the data by the 'Category' column
grouped = df.groupby('Category')


In [3]:
# Calculate the sum of 'Value' for each group
sum_values = grouped['Value'].sum()

In [4]:
# Display the result
print(sum_values)

Category
A    55
B    45
Name: Value, dtype: int64


In this example, we group the DataFrame `df` by the 'Category' column, creating a `GroupBy` object. Then, we calculate the sum of 'Value' for each group using the `sum()` method on the `GroupBy` object, resulting in a new DataFrame or Series with the aggregated values.

## **Practical Use** 

In [5]:
movies = pd.read_csv('Data\Day35\imdb-top-1000.csv')

In [7]:
movies.head(3)

Unnamed: 0,Series_Title,Released_Year,Runtime,Genre,IMDB_Rating,Director,Star1,No_of_Votes,Gross,Metascore
0,The Shawshank Redemption,1994,142,Drama,9.3,Frank Darabont,Tim Robbins,2343110,28341469.0,80.0
1,The Godfather,1972,175,Crime,9.2,Francis Ford Coppola,Marlon Brando,1620367,134966411.0,100.0
2,The Dark Knight,2008,152,Action,9.0,Christopher Nolan,Christian Bale,2303232,534858444.0,84.0


### Applying builtin aggregation fuctions on groupby objects

In [13]:
genres = movies.groupby('Genre')

In [14]:
genres.sum(3)

Unnamed: 0_level_0,Runtime,IMDB_Rating,No_of_Votes,Gross,Metascore
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Action,22196,1367.3,72282412,32632260000.0,10499.0
Adventure,9656,571.5,22576163,9496922000.0,5020.0
Animation,8166,650.3,21978630,14631470000.0,6082.0
Biography,11970,698.6,24006844,8276358000.0,6023.0
Comedy,17380,1224.7,27620327,15663870000.0,9840.0
Crime,13524,857.8,33533615,8452632000.0,6706.0
Drama,36049,2299.7,61367304,35409970000.0,19208.0
Family,215,15.6,551221,439110600.0,158.0
Fantasy,170,16.0,146222,782726700.0,0.0
Film-Noir,312,23.9,367215,125910500.0,287.0


In [17]:
genres.mean(3)

Unnamed: 0_level_0,Runtime,IMDB_Rating,No_of_Votes,Gross,Metascore
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Action,129.046512,7.949419,420246.581395,189722400.0,73.41958
Adventure,134.111111,7.9375,313557.819444,131901700.0,78.4375
Animation,99.585366,7.930488,268032.073171,178432600.0,81.093333
Biography,136.022727,7.938636,272805.045455,94049520.0,76.240506
Comedy,112.129032,7.90129,178195.658065,101057200.0,78.72
Crime,126.392523,8.016822,313398.271028,78996560.0,77.08046
Drama,124.737024,7.957439,212343.612457,122525900.0,79.701245
Family,107.5,7.8,275610.5,219555300.0,79.0
Fantasy,85.0,8.0,73111.0,391363300.0,
Film-Noir,104.0,7.966667,122405.0,41970180.0,95.666667


### find the top 3 genres by total earning

In [20]:
movies.groupby('Genre')['Gross'].sum().sort_values(ascending=False).head(3)

Genre
Drama     3.540997e+10
Action    3.263226e+10
Comedy    1.566387e+10
Name: Gross, dtype: float64

In [21]:
movies.groupby('Genre').sum()['Gross'].sort_values(ascending=False).head(3)

Genre
Drama     3.540997e+10
Action    3.263226e+10
Comedy    1.566387e+10
Name: Gross, dtype: float64

### find the genre with highest avg IMDB rating

In [22]:
movies.groupby('Genre')['IMDB_Rating'].mean().sort_values(ascending=False).head(1)

Genre
Western    8.35
Name: IMDB_Rating, dtype: float64

### find director with most popularity

In [23]:
movies.groupby('Director')['No_of_Votes'].sum().sort_values(ascending=False).head(1)

Director
Christopher Nolan    11578345
Name: No_of_Votes, dtype: int64

### find number of movies done by each actor

In [24]:
movies.groupby('Star1')['Series_Title'].count().sort_values(ascending=False)

Star1
Tom Hanks             12
Robert De Niro        11
Clint Eastwood        10
Al Pacino             10
Leonardo DiCaprio      9
                      ..
Glen Hansard           1
Giuseppe Battiston     1
Giulietta Masina       1
Gerardo Taracena       1
Ömer Faruk Sorak       1
Name: Series_Title, Length: 660, dtype: int64

 A GroupBy object is a powerful tool for performing group-wise operations on data. It enables data analysts and scientists to gain insights into their data by aggregating, filtering, and transforming information based on specific grouping criteria. These operations are essential for understanding data patterns and making informed decisions.