# Session 22: Deep Dive into Pandas Inspection

**Unit 2: Data Tools and Platforms**
**Hour: 22**
**Mode: Practical Lab**

---

### 1. Objective

This lab reinforces and expands on our initial data inspection techniques. Mastering these commands is crucial as they form the basis of the **Explore** phase and help identify what needs to be done in the **Scrub** phase.

We will use the Telco Churn dataset throughout this lab.

### 2. Setup

As always, we start by importing Pandas and loading our data.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

### 3. Inspection Commands Review

Let's quickly review the commands from our first look at this data.

In [None]:
# View first 3 rows
df.head(3)

In [None]:
# Check dimensions (rows, columns)
df.shape

In [None]:
# Get technical summary
df.info()

### 4. New Inspection Techniques

#### 4.1. `.tail()` - View the Last Few Rows

Similar to `.head()`, `.tail()` is useful for quickly checking the end of your dataset to ensure it loaded correctly.

In [None]:
df.tail(3)

#### 4.2. `.columns` - Get a List of Column Names

This attribute gives you a clean list of all the column names. It's very useful when you have many columns and want to copy-paste a name without making a typo.

In [None]:
df.columns

#### 4.3. `.describe()` - Get a Statistical Summary

This is one of the most powerful inspection tools. For all **numerical columns**, it automatically calculates key descriptive statistics.

In [None]:
df.describe()

**How to Interpret `.describe()`:**
*   `count`: The number of non-null values. Notice it's the same for `tenure` and `MonthlyCharges`.
*   `mean`: The average value. The average tenure is about 32 months.
*   `std`: The standard deviation, a measure of how spread out the data is.
*   `min`: The minimum value. The minimum tenure is 0 months.
*   `25%` (Q1): The first quartile. 25% of customers have a tenure of 9 months or less.
*   `50%` (Q2): The median. 50% of customers have a tenure of 29 months or less.
*   `75%` (Q3): The third quartile. 75% of customers have a tenure of 55 months or less.
*   `max`: The maximum value. The longest-serving customer has been with the company for 72 months.

#### 4.4. `.value_counts()` - Count Unique Values in a Column

This is the best way to understand the distribution of a **categorical column**.

In [None]:
# Let's see the breakdown of contract types
df['Contract'].value_counts()

We can also see this as a percentage by using `normalize=True`.

In [None]:
df['Contract'].value_counts(normalize=True) * 100

**Finding:** Over half of all customers are on a risky month-to-month contract. This reinforces our finding from the BI tool lab.

### 5. Conclusion

In this lab, you expanded your data inspection toolkit with four powerful commands:
1.  `.tail()`: To check the end of the data.
2.  `.columns`: To get a list of column names.
3.  `.describe()`: To get a quick statistical summary of numerical data.
4.  `.value_counts()`: To understand the distribution of categorical data.

Using these in combination with `.head()` and `.info()` provides a comprehensive first look at any dataset.

**Next Session:** We will move from inspecting data to selecting it. You'll learn how to pull out specific columns and rows to focus your analysis.