In [1]:
import pandas as pd         # type: ignore

In [2]:
df=pd.read_csv("../datasets/drinks.csv")
df.head(10)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa
5,Antigua & Barbuda,102,128,45,4.9,North America
6,Argentina,193,25,221,8.3,South America
7,Armenia,21,179,11,3.8,Europe
8,Australia,261,72,212,10.4,Oceania
9,Austria,279,75,191,9.7,Europe


> ### 👉 **`values`** : to get the values of the dataframe

In [3]:
df.values

array([['Afghanistan', 0, 0, 0, 0.0, 'Asia'],
       ['Albania', 89, 132, 54, 4.9, 'Europe'],
       ['Algeria', 25, 0, 14, 0.7, 'Africa'],
       ...,
       ['Yemen', 6, 0, 0, 0.1, 'Asia'],
       ['Zambia', 32, 19, 4, 2.5, 'Africa'],
       ['Zimbabwe', 64, 18, 4, 4.7, 'Africa']], dtype=object)

> ### 👉 **`value_counts`** : to get the count of unique values in a column

In [4]:
df.value_counts()     

country        beer_servings  spirit_servings  wine_servings  total_litres_of_pure_alcohol  continent    
Afghanistan    0              0                0              0.0                           Asia             1
Lithuania      343            244              56             12.9                          Europe           1
Nicaragua      78             118              1              3.5                           North America    1
Niger          3              2                1              0.1                           Africa           1
Nigeria        42             5                2              9.1                           Africa           1
                                                                                                            ..
Grenada        199            438              28             11.9                          North America    1
Guatemala      53             69               2              2.2                           North America    1
Guinea

In [5]:
df.shape

(193, 6)

> ### 👉 **`info()`** : to get the information about the data like column names, data types, non-null values, memory usage etc.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   country                       193 non-null    object 
 1   beer_servings                 193 non-null    int64  
 2   spirit_servings               193 non-null    int64  
 3   wine_servings                 193 non-null    int64  
 4   total_litres_of_pure_alcohol  193 non-null    float64
 5   continent                     193 non-null    object 
dtypes: float64(1), int64(3), object(2)
memory usage: 9.2+ KB


# 📊 **Difference Between `describe()`, `groupby()`, and `agg()` in Pandas**

| Function | Usage | Scope | Output |
|----------|-------|-------|--------|
| `df.describe()` | Summarizes **numerical** columns | Entire DataFrame | Provides count, mean, std, min, 25%, 50%, 75%, and max |
| `df.groupby("continent").beer_servings.describe()` | Summarizes **numerical** columns **within groups** | Grouped by "continent" | Detailed statistics for each group |
| `df.groupby("continent").beer_servings.agg(["count","min","max","mean","sum","median"])` | Custom aggregation on specific column(s) | Grouped by "continent" | Only selected metrics (count, min, max, etc.) |

### 🚀 **Quick Insights**
- 🔹 **`describe()`** → Great for a **quick statistical overview** of numeric data.  
- 🔹 **`groupby().describe()`** → Same but **applied within groups** (e.g., per continent).  
- 🔹 **`groupby().agg([...])`** → Allows **custom selection** of aggregation functions!  


In [11]:
df.head(10)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa
5,Antigua & Barbuda,102,128,45,4.9,North America
6,Argentina,193,25,221,8.3,South America
7,Armenia,21,179,11,3.8,Europe
8,Australia,261,72,212,10.4,Oceania
9,Austria,279,75,191,9.7,Europe


In [9]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
beer_servings,193.0,106.160622,101.143103,0.0,20.0,76.0,188.0,376.0
spirit_servings,193.0,80.994819,88.284312,0.0,4.0,56.0,128.0,438.0
wine_servings,193.0,49.450777,79.697598,0.0,1.0,8.0,59.0,370.0
total_litres_of_pure_alcohol,193.0,4.717098,3.773298,0.0,1.3,4.2,7.2,14.4


In [8]:
df.describe()       # for all _number_ in table

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


In [10]:
df.groupby("continent").beer_servings.describe()  # for all _continent_ in table

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Africa,53.0,61.471698,80.557816,0.0,15.0,32.0,76.0,376.0
Asia,44.0,37.045455,49.469725,0.0,4.25,17.5,60.5,247.0
Europe,45.0,193.777778,99.631569,0.0,127.0,219.0,270.0,361.0
North America,23.0,145.434783,79.621163,1.0,80.0,143.0,198.0,285.0
Oceania,16.0,89.6875,96.641412,0.0,21.0,52.5,125.75,306.0
South America,12.0,175.083333,65.242845,93.0,129.5,162.5,198.0,333.0


In [12]:
df.groupby("continent").beer_servings.agg(["count","min","max","mean","sum","median"])

Unnamed: 0_level_0,count,min,max,mean,sum,median
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Africa,53,0,376,61.471698,3258,32.0
Asia,44,0,247,37.045455,1630,17.5
Europe,45,0,361,193.777778,8720,219.0
North America,23,1,285,145.434783,3345,143.0
Oceania,16,0,306,89.6875,1435,52.5
South America,12,93,333,175.083333,2101,162.5


# 🤔 So..??
## 📊 Choosing Between `describe()`, `groupby().describe()`, and `groupby().agg([...])`

### 🔹 **1- `df.describe()`** → **Best for quick exploration** ✅  
- Provides a **summary of all numerical columns** in the dataset.  
- Fast and easy, no need for grouping.  
- Useful for **initial data exploration**.

### 🔹 **2- `df.groupby("continent").beer_servings.describe()`** → **Detailed analysis within each group** 🔎  
- Gives the same summary as `describe()` but **for each group separately** (e.g., for each continent).  
- Useful when working with **categorical data**.  

### 🔹 **3- `df.groupby("continent").beer_servings.agg([...])`** → **More flexibility in choosing specific statistics** 🎯  
- Allows selecting only the **metrics you need** (e.g., `mean`, `sum`, `median`).  
- **More efficient** if you don’t need all the values from `describe()`.  
- Commonly used in **Feature Engineering** for Machine Learning.  

### 🎯 **Best Choice in Data Science?**  
- **For quick exploration** → `df.describe()`  
- **To analyze differences between groups** → `groupby().describe()`  
- **For full control and efficiency** → `groupby().agg([...])` ✅ (**Most used in professional coding**)  

> **Summary**: If you need **quick analysis**, use `describe()`. If you're **preparing data for modeling**, use `groupby().agg([...])` because it's more **efficient and flexible**. 🚀  