# Session 23: Data Selection with Pandas

**Unit 2: Data Tools and Platforms**
**Hour: 23**
**Mode: Practical Lab**

---

### 1. Objective

This lab focuses on how to select and isolate specific parts of your data. This is a fundamental skill required before any analysis or visualization. You'll learn how to select columns and how to select rows based on conditions.

We will continue using the Telco Churn dataset.

### 2. Setup

Import Pandas and load the dataset.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

### 3. Selecting Columns

There are two main ways to select a single column.

#### 3.1. Bracket Notation (Recommended)

You use square brackets `[]` with the column name as a string inside. This returns the column as a Pandas **Series**.

In [None]:
df['gender']

#### 3.2. Dot Notation

You can also use a dot `.` followed by the column name. This is convenient but has limitations (e.g., it doesn't work if the column name has spaces).

In [None]:
df.gender

#### 3.3. Selecting Multiple Columns

To select multiple columns, you pass a **list of column names** inside the square brackets. This returns a new, smaller **DataFrame**.

In [None]:
# Note the double square brackets: [[...]]
# The outer brackets are for selecting, the inner brackets create the list.
customer_info = df[['customerID', 'gender', 'tenure', 'Churn']]
customer_info.head()

### 4. Selecting Rows based on a Condition (Boolean Indexing)

This is one of the most powerful features of Pandas. You can filter your DataFrame to see only the rows that meet a certain condition. This works in two steps.

#### Step 1: Create the Boolean Series

First, you write a logical condition. Pandas will apply this condition to every row in the specified column and return a Series of `True` or `False` values.

In [None]:
# Condition: Find all customers who have churned
is_churned = df['Churn'] == 'Yes'
is_churned.head()

This Series shows `True` for every row where the customer churned and `False` otherwise.

#### Step 2: Pass the Boolean Series back to the DataFrame

Now, you use this `True`/`False` Series to select rows from the original DataFrame. Pandas will only return the rows where the Series has a `True` value.

In [None]:
churned_customers = df[is_churned]
churned_customers.head()

Notice that the new `churned_customers` DataFrame only contains rows where `Churn` is 'Yes'.

In [None]:
churned_customers.shape

This matches the count we saw with `.value_counts()` in the previous lab!

#### Combining the steps

You will almost always see these two steps combined into a single, clean line of code:

In [None]:
# Single-line equivalent of the above
long_term_customers = df[df['tenure'] > 60]
long_term_customers.head()

### 5. Conclusion

In this lab, you learned the essentials of data selection in Pandas:
1.  How to select single or multiple **columns**.
2.  How to select **rows** by creating a boolean condition.
3.  How to combine these steps into a single line of code for efficient filtering.

This ability to isolate specific subsets of your data is fundamental to asking and answering targeted questions.

**Next Session:** We will take this one step further by learning how to filter based on *multiple* conditions at once.