# Pandas Mechanics: Understanding Methods & Functions

## Course Goal
Before we analyze big data, we need to understand the **grammar** of Pandas.

You have likely seen code like this:
```python
df.isna().sum()
```
In this lesson, we will break down exactly what is happening here step-by-step.

**Topics:**
1.  **The Dot (`.`) Notation**: How to talk to Pandas objects.
2.  **Method Chaining**: Connecting multiple actions.
3.  **Aggregation Functions**: How `sum()`, `mean()`, and `count()` work on rows vs columns.

### ‚öôÔ∏è Setup: Create Practice Data
Run this cell to create a DataFrame with some missing numbers.

In [1]:
import pandas as pd
import numpy as np

# Creating a dictionary with some missing values (np.nan)
data = {
    'Math': [90, 80, np.nan, 70],
    'Science': [85, np.nan, 95, 75],
    'English': [88, 92, 78, 85]
}

df = pd.DataFrame(data, index=['Student A', 'Student B', 'Student C', 'Student D'])

print("--- Student Grades ---")
print(df)

--- Student Grades ---
           Math  Science  English
Student A  90.0     85.0       88
Student B  80.0      NaN       92
Student C   NaN     95.0       78
Student D  70.0     75.0       85


## 1. The Dot Notation (`.`)

In Python, everything is an **Object**. A DataFrame is an object. 
Objects have **Methods** (actions they can do).

To use a method, we use a **dot**.
* `df.sum()` literally translates to: "Hey DataFrame, run your summation function."

### Example: The `isna()` Method
`isna()` asks the question: "Is this value missing (Null)?"

In [2]:
# This returns a "Mask" (A DataFrame of True/False)
check_null = df.isna()

print(check_null)

            Math  Science  English
Student A  False    False    False
Student B  False     True    False
Student C   True    False    False
Student D  False    False    False


In [4]:
print(df.isna())

            Math  Science  English
Student A  False    False    False
Student B  False     True    False
Student C   True    False    False
Student D  False    False    False


**Understanding the Output:**
* **False** = The data is there (Not Null).
* **True** = The data is missing (Null).

## 2. Aggregation Functions (`sum`, `mean`)

When you run a math function on a DataFrame, it usually works **column by column**.

### The `sum()` Method
Let's see what happens when we sum our grades.

In [5]:
print(df.sum())

Math       240.0
Science    255.0
English    343.0
dtype: float64


### The Trick: Summing Booleans (True/False)
In Python (and Excel/SQL), **True = 1** and **False = 0**.

If we try to `sum()` the output of `isna()`, we are effectively counting the missing values.

In [6]:
# Step 1: Get the True/False table
bool_table = df.isna()

# Step 2: Sum the True/False table (True counts as 1)
error_count = bool_table.sum()

print(error_count)

Math       1
Science    1
English    0
dtype: int64


## 3. Method Chaining (Putting it together)

Instead of saving intermediate steps to variables (like `bool_table` above), we can **chain** the methods together.

`df.isna().sum()`

**How Python reads this:**
1.  Take `df`.
2.  Apply `.isna()` $\rightarrow$ Result is a True/False DataFrame.
3.  Apply `.sum()` **on that result** $\rightarrow$ Result is the count of True values.

In [7]:
# The most common way to check for missing data
print(df.isna().sum())

Math       1
Science    1
English    0
dtype: int64


## 4. Other Common Functions

Here is how other aggregation functions behave.

### `mean()` (Average)
Calculates the average. Note: Pandas automatically **ignores** NaN values (unlike Excel which might give an error).

In [8]:
print("--- Average Score per Subject ---")
print(df.mean())

--- Average Score per Subject ---
Math       80.00
Science    85.00
English    85.75
dtype: float64


### `count()`
Counts the number of **Non-Null** entries.

In [9]:
print("--- Valid Tests Taken ---")
print(df.count())

--- Valid Tests Taken ---
Math       3
Science    3
English    4
dtype: int64


## 5. Axis Direction (Rows vs Columns)

By default, `sum()` or `mean()` works vertically (down the column). 
If you want to calculate the average grade for **each Student** (across the row), you must specify `axis=1`.

* `axis=0`: Down the column (Default)
* `axis=1`: Across the row

In [10]:
print("--- Average Score per Student (Row-wise) ---")
print(df.mean(axis=1))

--- Average Score per Student (Row-wise) ---
Student A    87.666667
Student B    86.000000
Student C    86.500000
Student D    76.666667
dtype: float64


--- 
## üü¢ Practice Exercise

1. Use chaining to find the `mean()` of the DataFrame.
2. Use chaining to `round()` that result to 2 decimal places.
   * *Hint: You can chain `.mean().round(2)`*

In [14]:
# YOUR CODE HERE
print(df.mean().round(2))

Math       80.00
Science    85.00
English    85.75
dtype: float64
