## Goal
Master slicing, row selection, and column selection in pandas—core skills for cleaning and preparing data for Machine Learning.

## Step 0: Import Pandas and Create a Sample DataFrame

In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 45],
    'Score': [85.5, 90.3, 78.4, 92.1, 88.0],
    'Passed': [True, True, False, True, True]
}

df = pd.DataFrame(data)
print(df)

      Name  Age  Score  Passed
0    Alice   25   85.5    True
1      Bob   30   90.3    True
2  Charlie   35   78.4   False
3    David   40   92.1    True
4      Eva   45   88.0    True


## Step 1: Selecting Columns

**Select one column (returns a Series)**

In [2]:
df['Name']

0      Alice
1        Bob
2    Charlie
3      David
4        Eva
Name: Name, dtype: object

**Select multiple columns (returns a DataFrame)**

In [3]:
df[['Name', 'Score']]

Unnamed: 0,Name,Score
0,Alice,85.5
1,Bob,90.3
2,Charlie,78.4
3,David,92.1
4,Eva,88.0


**Practical Tip:**

Use column selection before feeding data to a model, for example:

In [6]:
X = df[['Age', 'Score']]   # Features
print(X)

   Age  Score
0   25   85.5
1   30   90.3
2   35   78.4
3   40   92.1
4   45   88.0


In [7]:
Y = df['Passed']           # Target
print(Y)

0     True
1     True
2    False
3     True
4     True
Name: Passed, dtype: bool


## Step 2: Slicing Rows by Index

**Syntax: df[start:end]**

In [8]:
df[1:4]   # rows with index 1, 2, 3

Unnamed: 0,Name,Age,Score,Passed
1,Bob,30,90.3,True
2,Charlie,35,78.4,False
3,David,40,92.1,True


**NOTE:**
  
* start is inclusive
* end is exclusive
* This uses the default integer index, not the row labels.

## Step 3: Row Selection with .loc[] and .iloc[]

**df.loc[row_label, column_label]**

In [10]:
df.loc[2]         # Row with label 2 (i.e., Charlie's data)

Name      Charlie
Age            35
Score        78.4
Passed      False
Name: 2, dtype: object

In [11]:
df.loc[2, 'Score']  # Only Charlie's score

78.4

**df.iloc[row_index, column_index]**

Used for position-based indexing.

In [12]:
df.iloc[2]         # Same as above

Name      Charlie
Age            35
Score        78.4
Passed      False
Name: 2, dtype: object

In [13]:
df.iloc[2, 2]      # Row 2, Column 2 (Score)

78.4

## Step 4: Slicing Rows and Columns Together

**Use .loc[] for labels:**

In [15]:
df.loc[1:3, ['Name', 'Score']]  # rows 1 to 3, only Name and Score columns

Unnamed: 0,Name,Score
1,Bob,90.3
2,Charlie,78.4
3,David,92.1


**Use .iloc[] for positions:**

In [16]:
df.iloc[1:4, 0:2]   # rows 1 to 3, columns 0 and 1

Unnamed: 0,Name,Age
1,Bob,30
2,Charlie,35
3,David,40


## Step 5: Filtering Rows Based on Conditions (ML Preprocessing!)

**Get students who passed**

In [17]:
df[df['Passed'] == True]

Unnamed: 0,Name,Age,Score,Passed
0,Alice,25,85.5,True
1,Bob,30,90.3,True
3,David,40,92.1,True
4,Eva,45,88.0,True


**Get students with score > 85**

In [18]:
df[df['Score'] > 85]

Unnamed: 0,Name,Age,Score,Passed
0,Alice,25,85.5,True
1,Bob,30,90.3,True
3,David,40,92.1,True
4,Eva,45,88.0,True


**Practical Tip:**

This is how you clean or split data before training:

In [21]:
high_scores = df[df['Score'] > 85]
X = high_scores[['Age', 'Score']]
Y = high_scores['Passed']

## Step 6: Boolean Indexing with Multiple Conditions

In [22]:
df[(df['Score'] > 85) & (df['Age'] < 40)]

Unnamed: 0,Name,Age,Score,Passed
0,Alice,25,85.5,True
1,Bob,30,90.3,True


Use parentheses () to group conditions!

## Step 7: Resetting or Setting Index

In [23]:
df.set_index('Name', inplace=True)
df.loc['Alice']   # Now you can access rows by names!

Age         25
Score     85.5
Passed    True
Name: Alice, dtype: object

To undo:

In [24]:
df.reset_index(inplace=True)

## Real ML Use Case — Select Features & Target

Suppose you're preparing data for a classifier:

In [26]:
# Select input features and target
X = df[['Age', 'Score']]    # feature matrix
Y = df['Passed']            # target

# Splitting into train/test sets
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)