# Pandas Indexing and Selection
**`04-indexing-selection.ipynb`**

In this notebook, we learn how to **access, select, and filter data** in a Pandas DataFrame or Series.  
Proper indexing is critical for **data analysis and manipulation**.

---

## Step 1: Import Libraries

In [1]:
import pandas as pd
import numpy as np


---

## Step 2: Creating a Sample DataFrame

In [2]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 22, 28],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"],
    "Salary": [50000, 60000, 55000, 65000]
}

df = pd.DataFrame(data)
print(df)

      Name  Age         City  Salary
0    Alice   25     New York   50000
1      Bob   30  Los Angeles   60000
2  Charlie   22      Chicago   55000
3    David   28      Houston   65000


---

## Step 3: Accessing Columns

In [3]:
# Single column
print(df['Name'])

# Multiple columns
print(df[['Name', 'Salary']])

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object
      Name  Salary
0    Alice   50000
1      Bob   60000
2  Charlie   55000
3    David   65000



---

## Step 4: Accessing Rows

### By Integer Location (`iloc`)

In [4]:
# First row
print(df.iloc[0])

# First two rows
print(df.iloc[0:2])

# Specific row and column
print(df.iloc[1, 2])  # Row 1, column 2 (City)

Name         Alice
Age             25
City      New York
Salary       50000
Name: 0, dtype: object
    Name  Age         City  Salary
0  Alice   25     New York   50000
1    Bob   30  Los Angeles   60000
Los Angeles


### By Label (`loc`)

In [5]:
# Set 'Name' as index
df_indexed = df.set_index('Name')
print(df_indexed)

# Access row by index label
print(df_indexed.loc['Bob'])

# Access multiple rows
print(df_indexed.loc[['Alice', 'David']])

         Age         City  Salary
Name                             
Alice     25     New York   50000
Bob       30  Los Angeles   60000
Charlie   22      Chicago   55000
David     28      Houston   65000
Age                30
City      Los Angeles
Salary          60000
Name: Bob, dtype: object
       Age      City  Salary
Name                        
Alice   25  New York   50000
David   28   Houston   65000


---

## Step 5: Accessing Rows and Columns Together

In [6]:
# Single value
print(df_indexed.loc['Charlie', 'Salary'])

# Multiple columns
print(df_indexed.loc['Charlie', ['Age', 'Salary']])

# Multiple rows and columns
print(df_indexed.loc[['Alice', 'David'], ['City', 'Salary']])

55000
Age          22
Salary    55000
Name: Charlie, dtype: object
           City  Salary
Name                   
Alice  New York   50000
David   Houston   65000



---



## Step 6: Boolean Indexing / Conditional Selection


In [7]:
# Filter rows where Age > 25
print(df[df['Age'] > 25])

# Multiple conditions
print(df[(df['Age'] > 25) & (df['City'] == 'Houston')])

# Using query method
print(df.query("Age > 25 & City == 'Houston'"))


    Name  Age         City  Salary
1    Bob   30  Los Angeles   60000
3  David   28      Houston   65000
    Name  Age     City  Salary
3  David   28  Houston   65000
    Name  Age     City  Salary
3  David   28  Houston   65000



---



## Step 7: Selecting by Position with `iloc`


In [8]:
# First two rows and first three columns
print(df.iloc[0:2, 0:3])

# Specific rows and columns
print(df.iloc[[0,2],[1,3]])  # Rows 0 & 2, Columns 1 & 3

    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles
   Age  Salary
0   25   50000
2   22   55000



---



## Step 8: Selecting by Label with `loc`

In [9]:
# Select rows with index 0,1 and columns 'Name' and 'Salary'
df2 = df.set_index('Name')
print(df2.loc[['Alice','Bob'], ['Age', 'Salary']])


       Age  Salary
Name              
Alice   25   50000
Bob     30   60000



---


## Step 9: Setting and Resetting Index


In [10]:
# Current DataFrame
print(df)

# Set 'City' as index
df_city = df.set_index('City')
print(df_city)

# Reset index
df_reset = df_city.reset_index()
print(df_reset)

      Name  Age         City  Salary
0    Alice   25     New York   50000
1      Bob   30  Los Angeles   60000
2  Charlie   22      Chicago   55000
3    David   28      Houston   65000
                Name  Age  Salary
City                             
New York       Alice   25   50000
Los Angeles      Bob   30   60000
Chicago      Charlie   22   55000
Houston        David   28   65000
          City     Name  Age  Salary
0     New York    Alice   25   50000
1  Los Angeles      Bob   30   60000
2      Chicago  Charlie   22   55000
3      Houston    David   28   65000



---



## Step 10: Using `.at` and `.iat` for Fast Access

In [11]:
# .at (label-based)
print(df_indexed.at['David', 'Salary'])

# .iat (position-based)
print(df_indexed.iat[2, 2])  # Row 2, Column 2


65000
55000



---



## Step 11: Slicing DataFrames

In [12]:
# Slice rows
print(df_indexed[0:2])

# Slice columns
print(df_indexed.loc[:, 'Age':'Salary'])


       Age         City  Salary
Name                           
Alice   25     New York   50000
Bob     30  Los Angeles   60000
         Age         City  Salary
Name                             
Alice     25     New York   50000
Bob       30  Los Angeles   60000
Charlie   22      Chicago   55000
David     28      Houston   65000



---


## Step 12: Accessing Elements in a Series

In [13]:
# Create a Series
s = pd.Series([100, 200, 300, 400], index=['a', 'b', 'c', 'd'])
print(s['b'])      # Access by label
print(s[1])        # Access by position
print(s['b':'d'])  # Slice by label


200
200
b    200
c    300
d    400
dtype: int64


  print(s[1])        # Access by position



---



## Step 13: Practical Example

In [14]:
# Employee DataFrame
employees = pd.DataFrame({
    "EmployeeID": [101, 102, 103, 104],
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "IT", "Finance", "IT"],
    "Salary": [50000, 60000, 55000, 65000]
})

# Set index as EmployeeID
employees = employees.set_index('EmployeeID')
print(employees)

# Employees in IT department
it_employees = employees[employees['Department'] == 'IT']
print(it_employees)

# Salary of employee 103
print(employees.at[103, 'Salary'])

               Name Department  Salary
EmployeeID                            
101           Alice         HR   50000
102             Bob         IT   60000
103         Charlie    Finance   55000
104           David         IT   65000
             Name Department  Salary
EmployeeID                          
102           Bob         IT   60000
104         David         IT   65000
55000



---


## ✅ Summary

* **Columns**: Access using `[]` or dot notation.
* **Rows**: Access using `.loc` (label) or `.iloc` (position).
* **Boolean indexing**: Filter data based on conditions.
* **Slicing**: Works on rows or columns.
* `.at` and `.iat` provide fast access to single values.
* Proper indexing and selection are essential for **data analysis and cleaning**.

---
