# Pandas Data Selection

## `loc` - Label-Based Indexing:

The `loc` method in Pandas allows you to access DataFrame data using labels or boolean array-based indexing. It's particularly useful for selecting rows and columns based on customized labels or names. This method provides flexibility and intuition in retrieving specific data [Molin and Jee, 2021, Pandas Developers, 2023].

The syntax for using `loc` is:

```python
df.loc[row_indexer, column_indexer]
```

- `row_indexer`: Specifies the row labels to select, which can be a single label, a list of labels, a slice, or a boolean array.

- `column_indexer`: Specifies the column labels to select, with similar indexing options.

<font color='Blue'><b>Example</b></font>:

In [None]:
import pandas as pd

# Create a dictionary for the DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

# Create a DataFrame with custom index
df = pd.DataFrame(data, index=['ID1', 'ID2', 'ID3'])

# Original DataFrame
print("Original DataFrame:")
display(df)

# Access rows with labels 'ID1' and 'ID3' and all columns
print("\nAccess rows with labels ID1 and ID3 and all columns:")
selected_rows = df.loc[['ID1', 'ID3'], :]
display(selected_rows)

# Access rows based on a condition and specific columns
print("\nAccess rows based on a condition and specific columns:")
conditioned_rows = df.loc[df['Age'] > 30, ['Name', 'Age']]
display(conditioned_rows)

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/Pandas_Row_Selection_Fig1.png" alt="picture" width="750">
</center>

## `iloc` - Position-Based Indexing:

The `iloc` method is used for accessing DataFrame data based on integer positions, similar to indexing elements in a Python list. It's valuable when you want to access data using the underlying integer-based index [Molin and Jee, 2021, Pandas Developers, 2023].

The syntax for using `iloc` is:
```python
df.iloc[row_indexer, column_indexer]
```

- `row_indexer`: Specifies the integer positions of the rows to select.
- `column_indexer`: Specifies the integer positions of the columns to select.

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a dictionary for the DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

# Create a DataFrame
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Access the first two rows and all columns using iloc
print("\nAccess the first two rows and all columns:")
first_two_rows = df.iloc[:2, :]
display(first_two_rows)

# Access specific rows and columns by position using iloc
print("\nAccess specific rows and columns by position:")
selected_rows_columns = df.iloc[[0, 2], [0, 1]]
display(selected_rows_columns)

## `at` - Single Value Selection:

The `at` method is ideal for efficiently accessing or modifying a single scalar value in a DataFrame. It offers a direct alternative to `loc` or `iloc` for single element selection [Molin and Jee, 2021, Pandas Developers, 2023].

The syntax for using `at` is:
```python
df.at[row_label, column_label]
```

- `row_label`: Specifies the label of the row where the desired element is located.
- `column_label`: Specifies the label of the column where the element is located.

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a dictionary for the DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

# Create a DataFrame
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Access and modify the element at row label 1 and column label 'Name'
df.at[1, 'Name'] = 'Robert'

# Updated DataFrame
print("\nUpdated DataFrame:")
display(df)

# Access and print the element at row label 2 and column label 'Age'
age = df.at[2, 'Age']
print("\nAge:", age)

The `at` method is particularly efficient for single value retrieval or modification.

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/pd_Selection_Fig1.png" alt="picture" width="700">
</center>

## Attribute Access

You can use attribute-style access to select columns if the column names are valid Python identifiers.

```python
# Access columns as attributes
column_data = df.column_name
```

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Access the 'Name' column as an attribute
name_column = df.Name

# 'Name' Column
print("\n'Name' Column:")
display(name_column)

## `query()` Method

The `query()` method allows you to write complex queries using a more concise syntax.

```python
# Using query() to filter data
result = df.query(expr)
```

You can see the full syntax [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html).

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Value1': [10, 20, 30, 40],
        'Value2': [5, 15, 25, 35]}
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Use the query() method to filter data
result = df.query("Value1 > 20 and Value2 == 35")

# Filtered Result
print("\nFiltered Result:")
display(result)

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/Pandas_Row_Selection_Fig2.png" alt="picture" width="750">
</center>

## `isin()` Method

The `isin()` method allows you to filter data based on values in a list.

```python
# Filtering using isin() method
selected_rows = df[df['column'].isin([value1, value2])]
```

**Advantages**:
- Efficient way to filter data based on multiple values in a column.
- Useful for scenarios where you need to extract rows with specific values.

**Disadvantages**:
- Limited to filtering based on a predefined list of values.
- Not suitable for complex conditions involving multiple columns.

<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Category': ['A', 'B', 'C', 'A', 'B', 'C']}
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Use isin() to filter data
selected_rows = df[df['Category'].isin(['A', 'C'])]

# Selected Rows
print("\nSelected Rows:")
display(selected_rows)

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/Pandas_Row_Selection_Fig3.png" alt="picture" width="700">
</center>

## `where()` Method

The `where()` method returns a DataFrame with the same shape as the original but with NaNs where the condition is not met.

```python
# Using where() to filter data conditionally
filtered_df = df.where(df['column'] > value)
```

You can see the full syntax [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html).


<font color='Blue'><b>Example:</b></font>

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Original DataFrame
print("Original DataFrame:")
display(df)

# Use where() to filter data conditionally
filtered_df = df.where(df['Value'] > 20, other=pd.NA)

# Filtered Data
print("\nFiltered Data:")
display(filtered_df)

<center>
<img src="https://raw.githubusercontent.com/HatefDastour/hatefdastour.github.io/master/_notes/Introduction_to_Digital_Engineering/_images/Pandas_Row_Selection_Fig5.png" alt="picture" width="700">
</center>