# üêº Pandas - Class 5: Sorting & Basic Statistics
Welcome to **Class 5** of our Pandas series. Today we‚Äôll learn how to sort data and calculate basic statistics.

## 1. Sorting Data
- Use `sort_values(by='col')` to sort by column values.
- Use `ascending=False` for descending order.
- Sort by multiple columns by passing a list.
- `sort_index()` sorts by row or column index.

In [1]:
import pandas as pd

# New dataset
data = {
    "Product": ["Pen", "Notebook", "Pencil", "Marker", "Eraser", "Sharpener"],
    "Price": [10, 50, 5, 30, 8, 12],
    "Stock": [100, 60, 200, 80, 150, 120],
    "Rating": [4.5, 4.7, 4.2, 4.8, 4.3, 4.4]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# 1. Sort by a single column (Price, ascending)
print("\nSorted by Price (ascending):")
print(df.sort_values(by="Price"))

# 2. Sort by a single column (Rating, descending)
print("\nSorted by Rating (descending):")
print(df.sort_values(by="Rating", ascending=False))

# 3. Sort by multiple columns (Price ascending, then Stock descending)
print("\nSorted by Price (asc) and Stock (desc):")
print(df.sort_values(by=["Price", "Stock"], ascending=[True, False]))

# 4. Sort by index (row labels)
print("\nSorted by index (descending):")
print(df.sort_index(ascending=False))


Original DataFrame:
     Product  Price  Stock  Rating
0        Pen     10    100     4.5
1   Notebook     50     60     4.7
2     Pencil      5    200     4.2
3     Marker     30     80     4.8
4     Eraser      8    150     4.3
5  Sharpener     12    120     4.4

Sorted by Price (ascending):
     Product  Price  Stock  Rating
2     Pencil      5    200     4.2
4     Eraser      8    150     4.3
0        Pen     10    100     4.5
5  Sharpener     12    120     4.4
3     Marker     30     80     4.8
1   Notebook     50     60     4.7

Sorted by Rating (descending):
     Product  Price  Stock  Rating
3     Marker     30     80     4.8
1   Notebook     50     60     4.7
0        Pen     10    100     4.5
5  Sharpener     12    120     4.4
4     Eraser      8    150     4.3
2     Pencil      5    200     4.2

Sorted by Price (asc) and Stock (desc):
     Product  Price  Stock  Rating
2     Pencil      5    200     4.2
4     Eraser      8    150     4.3
0        Pen     10    100     4.5
5 

## 2. Descriptive Statistics
- `mean()`, `median()`, `mode()` for central tendency.
- `std()` for standard deviation.
- `describe()` for a summary of stats (count, mean, std, min, quartiles, max).

In [2]:
# 1. Mean of numeric columns
print("\nMean of numeric columns:")
print(df.mean(numeric_only=True))

# 2. Median of numeric columns
print("\nMedian of numeric columns:")
print(df.median(numeric_only=True))

# 3. Mode of each column
print("\nMode of each column:")
print(df.mode())

# 4. Standard deviation
print("\nStandard deviation of numeric columns:")
print(df.std(numeric_only=True))

# 5. Full descriptive summary
print("\nDescriptive statistics summary:")
print(df.describe())


Mean of numeric columns:
Price      19.166667
Stock     118.333333
Rating      4.483333
dtype: float64

Median of numeric columns:
Price      11.00
Stock     110.00
Rating      4.45
dtype: float64

Mode of each column:
     Product  Price  Stock  Rating
0     Eraser      5     60     4.2
1     Marker      8     80     4.3
2   Notebook     10    100     4.4
3        Pen     12    120     4.5
4     Pencil     30    150     4.7
5  Sharpener     50    200     4.8

Standard deviation of numeric columns:
Price     17.486185
Stock     50.760877
Rating     0.231661
dtype: float64

Descriptive statistics summary:
           Price       Stock    Rating
count   6.000000    6.000000  6.000000
mean   19.166667  118.333333  4.483333
std    17.486185   50.760877  0.231661
min     5.000000   60.000000  4.200000
25%     8.500000   85.000000  4.325000
50%    11.000000  110.000000  4.450000
75%    25.500000  142.500000  4.650000
max    50.000000  200.000000  4.800000


## 3. Counting Values
- `value_counts()` shows the frequency of unique values in a Series.
- Use `normalize=True` to see percentages instead of counts.

In [3]:

# 1. Count how many times each product appears
print("\nFrequency of each product:")
print(df["Product"].value_counts())

# 2. Show percentages instead of counts
print("\nPercentage frequency of each product:")
print(df["Product"].value_counts(normalize=True))

# 3. Example: value_counts on another column (Rating)
print("\nFrequency of each rating:")
print(df["Rating"].value_counts())


Frequency of each product:
Product
Pen          1
Notebook     1
Pencil       1
Marker       1
Eraser       1
Sharpener    1
Name: count, dtype: int64

Percentage frequency of each product:
Product
Pen          0.166667
Notebook     0.166667
Pencil       0.166667
Marker       0.166667
Eraser       0.166667
Sharpener    0.166667
Name: proportion, dtype: float64

Frequency of each rating:
Rating
4.5    1
4.7    1
4.2    1
4.8    1
4.3    1
4.4    1
Name: count, dtype: int64


## 4. Correlation & Covariance
- `corr()` computes correlation between numeric columns.
- `cov()` computes covariance.
- Correlation values range from -1 (negative) to +1 (positive).

In [4]:

# 1. Correlation between all numeric columns
print("\nCorrelation matrix:")
print(df.corr(numeric_only=True))

# 2. Covariance between all numeric columns
print("\nCovariance matrix:")
print(df.cov(numeric_only=True))

# 3. Correlation between two specific columns
print("\nCorrelation between Price and Stock:")
print(df["Price"].corr(df["Stock"]))

# 4. Covariance between two specific columns
print("\nCovariance between Price and Stock:")
print(df["Price"].cov(df["Stock"]))


Correlation matrix:
           Price     Stock    Rating
Price   1.000000 -0.804028  0.820402
Stock  -0.804028  1.000000 -0.921257
Rating  0.820402 -0.921257  1.000000

Covariance matrix:
             Price        Stock     Rating
Price   305.766667  -713.666667   3.323333
Stock  -713.666667  2576.666667 -10.833333
Rating    3.323333   -10.833333   0.053667

Correlation between Price and Stock:
-0.8040280933316356

Covariance between Price and Stock:
-713.6666666666667


## Mini Practice
1. Build a DataFrame with columns: Name, Age, Score, Height.
2. Sort the data by Score (descending).
3. Get mean, median, mode, and std of numeric columns.
4. Use value_counts on Name or any categorical column.
5. Compute corr() and cov() for all numeric columns.

In [5]:
import pandas as pd

# 1. Build a DataFrame with columns: Name, Age, Score, Height
data = {
    "Name": ["Liam", "Sophia", "Ethan", "Olivia", "Noah"],
    "Age": [27, 34, 29, 31, 26],
    "Score": [80, 95, 88, 76, 90],
    "Height": [172, 165, 180, 158, 175]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# 2. Sort the data by Score (descending)
sorted_df = df.sort_values(by="Score", ascending=False)
print("\nSorted by Score (descending):")
print(sorted_df)

# 3. Get mean, median, mode, and std of numeric columns
print("\nMean of numeric columns:")
print(df.mean(numeric_only=True))

print("\nMedian of numeric columns:")
print(df.median(numeric_only=True))

print("\nMode of columns:")
print(df.mode())

print("\nStandard deviation of numeric columns:")
print(df.std(numeric_only=True))

# 4. Use value_counts on Name or any categorical column
print("\nFrequency of each Name:")
print(df["Name"].value_counts())

# 5. Compute corr() and cov() for all numeric columns
print("\nCorrelation matrix:")
print(df.corr(numeric_only=True))

print("\nCovariance matrix:")
print(df.cov(numeric_only=True))



Original DataFrame:
     Name  Age  Score  Height
0    Liam   27     80     172
1  Sophia   34     95     165
2   Ethan   29     88     180
3  Olivia   31     76     158
4    Noah   26     90     175

Sorted by Score (descending):
     Name  Age  Score  Height
1  Sophia   34     95     165
4    Noah   26     90     175
2   Ethan   29     88     180
0    Liam   27     80     172
3  Olivia   31     76     158

Mean of numeric columns:
Age        29.4
Score      85.8
Height    170.0
dtype: float64

Median of numeric columns:
Age        29.0
Score      88.0
Height    172.0
dtype: float64

Mode of columns:
     Name  Age  Score  Height
0   Ethan   26     76     158
1    Liam   27     80     165
2    Noah   29     88     172
3  Olivia   31     90     175
4  Sophia   34     95     180

Standard deviation of numeric columns:
Age       3.209361
Score     7.694154
Height    8.631338
dtype: float64

Frequency of each Name:
Name
Liam      1
Sophia    1
Ethan     1
Olivia    1
Noah      1
Name: cou

---
## Summary
- Learned to sort by columns with `sort_values` and by index with `sort_index`.
- Explored descriptive statistics: mean, median, mode, std, describe.
- Counted values with `value_counts`.
- Calculated correlation and covariance.