<a href="https://colab.research.google.com/github/NicoleVigilant/NicoleVigilant-DataScience-2025/blob/main/Completed/06-Working_with_Data/04-indexing_and_selection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎯 Notebook 04: Indexing and Selection

Time to slice and dice.

In this notebook, you'll:
- Use `.loc[]` for label-based selection
- Use `.iloc[]` for position-based selection
- Create boolean masks to filter rows
- Select specific columns and rows (solo or together)

Let the chaos of index manipulation begin.
---

In [1]:
import pandas as pd
import seaborn as sns

# We'll use the Titanic dataset again
df = sns.load_dataset("titanic")
df.head(3)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True


## 🔎 Select rows by label with `.loc[]`

In [2]:
# .loc[row_label, column_label]
# Titanic uses default RangeIndex, so labels are integers
print(df.loc[0])

# Row range (inclusive!)
df.loc[0:4, ["age", "fare", "sex"]]

survived                 0
pclass                   3
sex                   male
age                   22.0
sibsp                    1
parch                    0
fare                  7.25
embarked                 S
class                Third
who                    man
adult_male            True
deck                   NaN
embark_town    Southampton
alive                   no
alone                False
Name: 0, dtype: object


Unnamed: 0,age,fare,sex
0,22.0,7.25,male
1,38.0,71.2833,female
2,26.0,7.925,female
3,35.0,53.1,female
4,35.0,8.05,male


## 🔢 Select rows by position with `.iloc[]`

In [3]:
# iloc uses pure positions
df.iloc[0]  # First row

# Row slice, column slice
df.iloc[1:4, 0:5]

Unnamed: 0,survived,pclass,sex,age,sibsp
1,1,1,female,38.0,1
2,1,3,female,26.0,0
3,1,1,female,35.0,1


## 🎯 Boolean Masking (Filtering)

In [4]:
# Filter passengers under age 10
kids = df[df["age"] < 10]
kids[["age", "sex", "class"]].head()

Unnamed: 0,age,sex,class
7,2.0,male,Third
10,4.0,female,Third
16,2.0,male,Third
24,8.0,female,Third
43,3.0,female,Second


## 🧠 Compound Conditions

In [5]:
# Adults in First Class who paid over $100
rich_folks = df[(df["age"] > 18) & (df["class"] == "First") & (df["fare"] > 100)]
rich_folks[["age", "class", "fare"]].head()

Unnamed: 0,age,class,fare
27,19.0,First,263.0
88,23.0,First,263.0
118,24.0,First,247.5208
195,58.0,First,146.5208
215,31.0,First,113.275


## 🧹 Select Columns Only

In [6]:
df[["sex", "age", "embarked"]].head()

Unnamed: 0,sex,age,embarked
0,male,22.0,S
1,female,38.0,C
2,female,26.0,S
3,female,35.0,S
4,male,35.0,S


## 🎯 Your Turn

1. Select the first row using `.iloc[]`, then again using `.loc[]`.
2. Slice the first 5 rows and select only `survived`, `sex`, and `age` columns.
3. Filter the dataset for female passengers over 40.
4. Create a DataFrame of passengers in 2nd class under age 15.
5. Use `.loc[]` to select the rows 10–20 and only the columns `fare`, `class`, and `age`.

🎯 **Bonus:** Find passengers with missing `age` and print only their `name`, `age`, and `pclass` (use `.isnull()`).

In [7]:
# Your slicing magic goes here!
import seaborn as sns
import pandas as pd

titanic = sns.load_dataset("titanic")


In [10]:
print(titanic.iloc[0])


survived                 0
pclass                   3
sex                   male
age                   22.0
sibsp                    1
parch                    0
fare                  7.25
embarked                 S
class                Third
who                    man
adult_male            True
deck                   NaN
embark_town    Southampton
alive                   no
alone                False
Name: 0, dtype: object


In [11]:
print(titanic.loc[0])

survived                 0
pclass                   3
sex                   male
age                   22.0
sibsp                    1
parch                    0
fare                  7.25
embarked                 S
class                Third
who                    man
adult_male            True
deck                   NaN
embark_town    Southampton
alive                   no
alone                False
Name: 0, dtype: object


In [12]:
subset = titanic.loc[0:4, ["survived", "sex", "age"]]
print(subset)


   survived     sex   age
0         0    male  22.0
1         1  female  38.0
2         1  female  26.0
3         1  female  35.0
4         0    male  35.0


In [13]:
females_over_40 = titanic[(titanic["sex"] == "female") & (titanic["age"] > 40)]
print(females_over_40)


     survived  pclass     sex   age  sibsp  parch      fare embarked   class  \
11          1       1  female  58.0      0      0   26.5500        S   First   
15          1       2  female  55.0      0      0   16.0000        S  Second   
52          1       1  female  49.0      1      0   76.7292        C   First   
132         0       3  female  47.0      1      0   14.5000        S   Third   
167         0       3  female  45.0      1      4   27.9000        S   Third   
177         0       1  female  50.0      0      0   28.7125        C   First   
194         1       1  female  44.0      0      0   27.7208        C   First   
195         1       1  female  58.0      0      0  146.5208        C   First   
254         0       3  female  41.0      0      2   20.2125        S   Third   
259         1       2  female  50.0      0      1   26.0000        S  Second   
268         1       1  female  58.0      0      1  153.4625        S   First   
272         1       2  female  41.0     

In [14]:
young_2nd_class = titanic[(titanic["class"] == "Second") & (titanic["age"] < 15)]
print(young_2nd_class)


     survived  pclass     sex    age  sibsp  parch     fare embarked   class  \
9           1       2  female  14.00      1      0  30.0708        C  Second   
43          1       2  female   3.00      1      2  41.5792        C  Second   
58          1       2  female   5.00      1      2  27.7500        S  Second   
78          1       2    male   0.83      0      2  29.0000        S  Second   
183         1       2    male   1.00      2      1  39.0000        S  Second   
193         1       2    male   3.00      1      1  26.0000        S  Second   
237         1       2  female   8.00      0      2  26.2500        S  Second   
340         1       2    male   2.00      1      1  26.0000        S  Second   
407         1       2    male   3.00      1      1  18.7500        S  Second   
446         1       2  female  13.00      0      1  19.5000        S  Second   
530         1       2  female   2.00      1      1  26.0000        S  Second   
535         1       2  female   7.00    

In [15]:
subset_rows = titanic.loc[10:20, ["fare", "class", "age"]]
print(subset_rows)


       fare   class   age
10  16.7000   Third   4.0
11  26.5500   First  58.0
12   8.0500   Third  20.0
13  31.2750   Third  39.0
14   7.8542   Third  14.0
15  16.0000  Second  55.0
16  29.1250   Third   2.0
17  13.0000  Second   NaN
18  18.0000   Third  31.0
19   7.2250   Third   NaN
20  26.0000  Second  35.0


In [17]:

missing_age = titanic[titanic["age"].isnull()][["who", "age", "pclass"]]
print(missing_age)

       who  age  pclass
5      man  NaN       3
17     man  NaN       2
19   woman  NaN       3
26     man  NaN       3
28   woman  NaN       3
..     ...  ...     ...
859    man  NaN       3
863  woman  NaN       3
868    man  NaN       3
878    man  NaN       3
888  woman  NaN       3

[177 rows x 3 columns]


---
## 🧼 Why This Matters

Being able to surgically select parts of your dataset is the core of real analysis. Whether you're cleaning junk or feeding a model, proper indexing = power.

Next: cleaning your data with `.dropna()`, `.fillna()`, and more. The fun begins when you try to fix the stuff that's broken.