# Data Selection and Indexing in Pandas

This notebook covers various techniques for selecting and indexing data in Pandas DataFrames and Series.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Pandas version: 2.2.3
NumPy version: 2.2.4


## Sample Data

Let's create a sample DataFrame to demonstrate various selection and indexing techniques.

In [2]:
# Create sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'Salary': [50000, 60000, 70000, 55000, 65000],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'HR']
}

df = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])
print("Sample DataFrame:")
print(df)

Sample DataFrame:
      Name  Age      City  Salary Department
A    Alice   25  New York   50000         HR
B      Bob   30    London   60000         IT
C  Charlie   35     Paris   70000    Finance
D    Diana   28     Tokyo   55000         IT
E      Eve   32    Sydney   65000         HR


## Column Selection

You can select columns using bracket notation or dot notation.

In [3]:
# Select single column
print("Single column selection:")
print(df['Name'])

# Select multiple columns
print("\nMultiple column selection:")
print(df[['Name', 'Age']])

# Select using dot notation (only works for valid column names)
print("\nUsing dot notation:")
print(df.Name)

Single column selection:
A      Alice
B        Bob
C    Charlie
D      Diana
E        Eve
Name: Name, dtype: object

Multiple column selection:
      Name  Age
A    Alice   25
B      Bob   30
C  Charlie   35
D    Diana   28
E      Eve   32

Using dot notation:
A      Alice
B        Bob
C    Charlie
D      Diana
E        Eve
Name: Name, dtype: object


## Label-based Selection (.loc)

The `.loc` indexer is used for label-based data selection.

In [4]:
# Select row by label
print("Select row 'B':")
print(df.loc['B'])

# Select multiple rows
print("\nSelect rows 'A' and 'C':")
print(df.loc[['A', 'C']])

# Select row and column
print("\nSelect row 'B', column 'Name':")
print(df.loc['B', 'Name'])

# Select multiple rows and columns
print("\nSelect rows 'A' to 'C', columns 'Name' and 'Age':")
print(df.loc['A':'C', ['Name', 'Age']])

Select row 'B':
Name             Bob
Age               30
City          London
Salary         60000
Department        IT
Name: B, dtype: object

Select rows 'A' and 'C':
      Name  Age      City  Salary Department
A    Alice   25  New York   50000         HR
C  Charlie   35     Paris   70000    Finance

Select row 'B', column 'Name':
Bob

Select rows 'A' to 'C', columns 'Name' and 'Age':
      Name  Age
A    Alice   25
B      Bob   30
C  Charlie   35


## Integer-based Selection (.iloc)

The `.iloc` indexer is used for integer position-based data selection.

In [5]:
# Select row by integer position
print("Select row at position 1:")
print(df.iloc[1])

# Select multiple rows by position
print("\nSelect rows at positions 0 and 2:")
print(df.iloc[[0, 2]])

# Select row and column by position
print("\nSelect row 1, column 0:")
print(df.iloc[1, 0])

# Select range of rows and columns
print("\nSelect rows 0 to 2, columns 0 to 2:")
print(df.iloc[0:3, 0:3])

Select row at position 1:
Name             Bob
Age               30
City          London
Salary         60000
Department        IT
Name: B, dtype: object

Select rows at positions 0 and 2:
      Name  Age      City  Salary Department
A    Alice   25  New York   50000         HR
C  Charlie   35     Paris   70000    Finance

Select row 1, column 0:
Bob

Select rows 0 to 2, columns 0 to 2:
      Name  Age      City
A    Alice   25  New York
B      Bob   30    London
C  Charlie   35     Paris


## Boolean Indexing

Boolean indexing allows you to filter data based on conditions.

In [6]:
# Boolean indexing
print("Employees older than 30:")
print(df[df['Age'] > 30])

# Multiple conditions
print("\nEmployees in IT department with salary > 55000:")
print(df[(df['Department'] == 'IT') & (df['Salary'] > 55000)])

# Using isin() for multiple values
print("\nEmployees in HR or Finance:")
print(df[df['Department'].isin(['HR', 'Finance'])])

# Using ~ for negation
print("\nEmployees NOT in IT:")
print(df[~(df['Department'] == 'IT')])

Employees older than 30:
      Name  Age    City  Salary Department
C  Charlie   35   Paris   70000    Finance
E      Eve   32  Sydney   65000         HR

Employees in IT department with salary > 55000:
  Name  Age    City  Salary Department
B  Bob   30  London   60000         IT

Employees in HR or Finance:
      Name  Age      City  Salary Department
A    Alice   25  New York   50000         HR
C  Charlie   35     Paris   70000    Finance
E      Eve   32    Sydney   65000         HR

Employees NOT in IT:
      Name  Age      City  Salary Department
A    Alice   25  New York   50000         HR
C  Charlie   35     Paris   70000    Finance
E      Eve   32    Sydney   65000         HR


## Summary

You have learned various data selection and indexing techniques in Pandas:

- **Column Selection**: Using brackets `[]` or dot notation
- **Label-based Selection (.loc)**: Select by row/column labels
- **Integer-based Selection (.iloc)**: Select by integer positions
- **Boolean Indexing**: Filter data based on conditions

These methods provide flexible ways to access and manipulate your data in DataFrames and Series.