# Selecting, Filtering and Indexing Data in Pandas

One of Pandas' strengths is its ability to quickly select, filter, and index data.

Key operations include:

- **Column selection**: Access specific columns by their names.
- **Row selection**: Access rows by position (`iloc`) or label (`loc`).
- **Filtering**: Extract rows that meet certain conditions.
- **Boolean indexing**: Use logical expressions to select rows.
- **Slicing**: Select a range of rows or columns.


In [11]:
import pandas as pd

In [12]:
# Create a sample DataFrame
data = {
    "Name": ["Hayley", "Taylor", "Claire", "Aurora", "Evangeline"],
    "Age": [25, 30, 35, 40, 45],
    "City": ["Houston", "Los Angeles", "Chicago", "New York", "Phoenix"],
    "Salary": [100000, 70000, 70000, 80000, 90000]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

Original DataFrame:
          Name  Age         City  Salary
0      Hayley   25      Houston  100000
1      Taylor   30  Los Angeles   70000
2      Claire   35      Chicago   70000
3      Aurora   40     New York   80000
4  Evangeline   45      Phoenix   90000


In [13]:
# Selecting a single column
print("\nSelect 'Name' column:\n", df["Name"])


Select 'Name' column:
 0        Hayley
1        Taylor
2        Claire
3        Aurora
4    Evangeline
Name: Name, dtype: object


In [14]:

# Selecting multiple columns
print("\nSelect 'Name' and 'Salary' columns:\n", df[["Name", "Salary"]])


Select 'Name' and 'Salary' columns:
          Name  Salary
0      Hayley  100000
1      Taylor   70000
2      Claire   70000
3      Aurora   80000
4  Evangeline   90000


In [15]:
# Selecting rows by position (iloc)
print("\nSelect first two rows:\n", df.iloc[0:2])


Select first two rows:
      Name  Age         City  Salary
0  Hayley   25      Houston  100000
1  Taylor   30  Los Angeles   70000


In [16]:
# Selecting rows by label (loc)
print("\nSelect rows with labels 1 to 3:\n", df.loc[1:3])


Select rows with labels 1 to 3:
      Name  Age         City  Salary
1  Taylor   30  Los Angeles   70000
2  Claire   35      Chicago   70000
3  Aurora   40     New York   80000


In [17]:
# Filtering rows: Age > 30
print("\nRows where Age > 30:\n", df[df["Age"] > 30])


Rows where Age > 30:
          Name  Age      City  Salary
2      Claire   35   Chicago   70000
3      Aurora   40  New York   80000
4  Evangeline   45   Phoenix   90000


In [18]:
# Filtering with multiple conditions: Age > 30 and Salary < 90000
print("\nRows where Age > 30 and Salary < 90000:\n", df[(df["Age"] > 30) & (df["Salary"] < 90000)])


Rows where Age > 30 and Salary < 90000:
      Name  Age      City  Salary
2  Claire   35   Chicago   70000
3  Aurora   40  New York   80000


In [19]:
# Boolean indexing example
is_chicago = df["City"] == "Chicago"
print("\nRows where City is Chicago:\n", df[is_chicago])


Rows where City is Chicago:
      Name  Age     City  Salary
2  Claire   35  Chicago   70000


In [20]:
# Slicing rows
print("\nRows from index 2 to 4:\n", df[2:5])


Rows from index 2 to 4:
          Name  Age      City  Salary
2      Claire   35   Chicago   70000
3      Aurora   40  New York   80000
4  Evangeline   45   Phoenix   90000


# Real-World Analogy: Library Search

- **Column selection** is like choosing which details you want to see about each book (title, author, genre).
- **Row selection** is like picking specific books by their position on the shelf or by their catalog number.
- **Filtering** is like searching for books published after 2010 and written by a certain author.
- **Boolean indexing** is like marking books that match a search query and then pulling only those off the shelf.
