# üêº Pandas - Class 5: Sorting & Basic Statistics
Welcome to **Class 5** of our Pandas series. Today we‚Äôll learn how to sort data and calculate basic statistics.

## 1. Sorting Data
- Use `sort_values(by='col')` to sort by column values.
- Use `ascending=False` for descending order.
- Sort by multiple columns by passing a list.
- `sort_index()` sorts by row or column index.

In [5]:
import pandas as pd

# New dataset
data = {
    "Product": ["Pen", "Notebook", "Pencil", "Marker", "Eraser", "Sharpener"],
    "Price": [10, 50, 5, 30, 8, 12],
    "Stock": [100, 60, 200, 80, 150, 120],
    "Rating": [4.5, 4.7, 4.2, 4.8, 4.3, 4.4]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,Product,Price,Stock,Rating
0,Pen,10,100,4.5
1,Notebook,50,60,4.7
2,Pencil,5,200,4.2
3,Marker,30,80,4.8
4,Eraser,8,150,4.3
5,Sharpener,12,120,4.4


In [12]:
# 1. Sort by a single column (Price, ascending)
print("\nSorted by Price (ascending):")
df.sort_values(by="Price")   # By default ascending is True


Sorted by Price (ascending):


Unnamed: 0,Product,Price,Stock,Rating
2,Pencil,5,200,4.2
4,Eraser,8,150,4.3
0,Pen,10,100,4.5
5,Sharpener,12,120,4.4
3,Marker,30,80,4.8
1,Notebook,50,60,4.7


In [8]:
# 2. Sort by a single column (Rating, descending)
print("\nSorted by Rating (descending):")
df.sort_values(by="Rating", ascending=False)


Sorted by Rating (descending):


Unnamed: 0,Product,Price,Stock,Rating
3,Marker,30,80,4.8
1,Notebook,50,60,4.7
0,Pen,10,100,4.5
5,Sharpener,12,120,4.4
4,Eraser,8,150,4.3
2,Pencil,5,200,4.2


In [9]:
# 3. Sort by multiple columns (Price ascending, then Stock descending)
print("\nSorted by Price (asc) and Stock (desc):")
df.sort_values(by=["Price", "Stock"], ascending=[True, False])


Sorted by Price (asc) and Stock (desc):


Unnamed: 0,Product,Price,Stock,Rating
2,Pencil,5,200,4.2
4,Eraser,8,150,4.3
0,Pen,10,100,4.5
5,Sharpener,12,120,4.4
3,Marker,30,80,4.8
1,Notebook,50,60,4.7


In [10]:
# 4. Sort by index (row labels)
print("\nSorted by index (descending):")
df.sort_index(ascending=False)


Sorted by index (descending):


Unnamed: 0,Product,Price,Stock,Rating
5,Sharpener,12,120,4.4
4,Eraser,8,150,4.3
3,Marker,30,80,4.8
2,Pencil,5,200,4.2
1,Notebook,50,60,4.7
0,Pen,10,100,4.5


## 2. Descriptive Statistics
- `mean()`, `median()`, `mode()` for central tendency.
- `std()` for standard deviation.
- `describe()` for a summary of stats (count, mean, std, min, quartiles, max).

In [13]:
# 1. Mean of numeric columns
print("\nMean of numeric columns:")
df.mean(numeric_only=True)


Mean of numeric columns:


Unnamed: 0,0
Price,19.166667
Stock,118.333333
Rating,4.483333


In [14]:
# 2. Median of numeric columns
print("\nMedian of numeric columns:")
df.median(numeric_only=True)


Median of numeric columns:


Unnamed: 0,0
Price,11.0
Stock,110.0
Rating,4.45


In [15]:
# 3. Mode of each column
print("\nMode of each column:")
df.mode()


Mode of each column:


Unnamed: 0,Product,Price,Stock,Rating
0,Eraser,5,60,4.2
1,Marker,8,80,4.3
2,Notebook,10,100,4.4
3,Pen,12,120,4.5
4,Pencil,30,150,4.7
5,Sharpener,50,200,4.8


In [16]:
# 4. Standard deviation
print("\nStandard deviation of numeric columns:")
df.std(numeric_only=True)


Standard deviation of numeric columns:


Unnamed: 0,0
Price,17.486185
Stock,50.760877
Rating,0.231661


In [17]:
# 5. Full descriptive summary
print("\nDescriptive statistics summary:")
df.describe()


Descriptive statistics summary:


Unnamed: 0,Price,Stock,Rating
count,6.0,6.0,6.0
mean,19.166667,118.333333,4.483333
std,17.486185,50.760877,0.231661
min,5.0,60.0,4.2
25%,8.5,85.0,4.325
50%,11.0,110.0,4.45
75%,25.5,142.5,4.65
max,50.0,200.0,4.8


## 3. Counting Values
- `value_counts()` shows the frequency of unique values in a Series.
- Use `normalize=True` to see percentages instead of counts.

In [18]:
# 1. Count how many times each product appears
print("\nFrequency of each product:")
df["Product"].value_counts()


Frequency of each product:


Unnamed: 0_level_0,count
Product,Unnamed: 1_level_1
Pen,1
Notebook,1
Pencil,1
Marker,1
Eraser,1
Sharpener,1


In [19]:
# 2. Show percentages instead of counts
print("\nPercentage frequency of each product:")
df["Product"].value_counts(normalize=True)


Percentage frequency of each product:


Unnamed: 0_level_0,proportion
Product,Unnamed: 1_level_1
Pen,0.166667
Notebook,0.166667
Pencil,0.166667
Marker,0.166667
Eraser,0.166667
Sharpener,0.166667


In [20]:
# 3. Example: value_counts on another column (Rating)
print("\nFrequency of each rating:")
df["Rating"].value_counts()


Frequency of each rating:


Unnamed: 0_level_0,count
Rating,Unnamed: 1_level_1
4.5,1
4.7,1
4.2,1
4.8,1
4.3,1
4.4,1


## 4. Correlation & Covariance
- `corr()` computes correlation between numeric columns.
- `cov()` computes covariance.
- Correlation values range from -1 (negative) to +1 (positive).

In [21]:
# 1. Correlation between all numeric columns
print("\nCorrelation matrix:")
df.corr(numeric_only=True)


Correlation matrix:


Unnamed: 0,Price,Stock,Rating
Price,1.0,-0.804028,0.820402
Stock,-0.804028,1.0,-0.921257
Rating,0.820402,-0.921257,1.0


In [22]:
# 2. Covariance between all numeric columns
print("\nCovariance matrix:")
df.cov(numeric_only=True)


Covariance matrix:


Unnamed: 0,Price,Stock,Rating
Price,305.766667,-713.666667,3.323333
Stock,-713.666667,2576.666667,-10.833333
Rating,3.323333,-10.833333,0.053667


In [23]:
# 3. Correlation between two specific columns
print("\nCorrelation between Price and Stock:")
df["Price"].corr(df["Stock"])


Correlation between Price and Stock:


np.float64(-0.8040280933316356)

In [24]:
# 4. Covariance between two specific columns
print("\nCovariance between Price and Stock:")
df["Price"].cov(df["Stock"])


Covariance between Price and Stock:


np.float64(-713.6666666666667)

## Mini Practice
1. Build a DataFrame with columns: Name, Age, Score, Height.
2. Sort the data by Score (descending).
3. Get mean, median, mode, and std of numeric columns.
4. Use value_counts on Name or any categorical column.
5. Compute corr() and cov() for all numeric columns.

In [25]:
import pandas as pd

# 1. Build a DataFrame with columns: Name, Age, Score, Height
data = {
    "Name": ["Liam", "Sophia", "Ethan", "Olivia", "Noah"],
    "Age": [27, 34, 29, 31, 26],
    "Score": [80, 95, 88, 76, 90],
    "Height": [172, 165, 180, 158, 175]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,Name,Age,Score,Height
0,Liam,27,80,172
1,Sophia,34,95,165
2,Ethan,29,88,180
3,Olivia,31,76,158
4,Noah,26,90,175


In [27]:
# 2. Sort the data by Score (descending)
sorted_df = df.sort_values(by="Score", ascending=False)
print("\nSorted by Score (descending):")
sorted_df


Sorted by Score (descending):


Unnamed: 0,Name,Age,Score,Height
1,Sophia,34,95,165
4,Noah,26,90,175
2,Ethan,29,88,180
0,Liam,27,80,172
3,Olivia,31,76,158


In [31]:
# 3. Get mean, median, mode, and std of numeric columns
print("\nMean of numeric columns:")
df.mean(numeric_only=True)


Mean of numeric columns:


Unnamed: 0,0
Age,29.4
Score,85.8
Height,170.0


In [32]:
print("\nMedian of numeric columns:")
df.median(numeric_only=True)


Median of numeric columns:


Unnamed: 0,0
Age,29.0
Score,88.0
Height,172.0


In [33]:
print("\nMode of columns:")
df.mode()


Mode of columns:


Unnamed: 0,Name,Age,Score,Height
0,Ethan,26,76,158
1,Liam,27,80,165
2,Noah,29,88,172
3,Olivia,31,90,175
4,Sophia,34,95,180


In [34]:
print("\nStandard deviation of numeric columns:")
df.std(numeric_only=True)


Standard deviation of numeric columns:


Unnamed: 0,0
Age,3.209361
Score,7.694154
Height,8.631338


In [35]:
# 4. Use value_counts on Name or any categorical column
print("\nFrequency of each Name:")
df["Name"].value_counts()


Frequency of each Name:


Unnamed: 0_level_0,count
Name,Unnamed: 1_level_1
Liam,1
Sophia,1
Ethan,1
Olivia,1
Noah,1


In [36]:
# 5. Compute corr() and cov() for all numeric columns
print("\nCorrelation matrix:")
df.corr(numeric_only=True)


Correlation matrix:


Unnamed: 0,Age,Score,Height
Age,1.0,0.257155,-0.613694
Score,0.257155,1.0,0.387738
Height,-0.613694,0.387738,1.0


In [37]:
print("\nCovariance matrix:")
df.cov(numeric_only=True)


Covariance matrix:


Unnamed: 0,Age,Score,Height
Age,10.3,6.35,-17.0
Score,6.35,59.2,25.75
Height,-17.0,25.75,74.5


---
## Summary
- Learned to sort by columns with `sort_values` and by index with `sort_index`.
- Explored descriptive statistics: mean, median, mode, std, describe.
- Counted values with `value_counts`.
- Calculated correlation and covariance.