# More on Selecting Data in Pandas

In the previous lecture, we covered the basic indexing methods in Pandas, such as `loc`, `iloc`, and square bracket notation `[]`. These methods allow you to access and select data from Pandas Series and DataFrames based on labels, integer positions, or boolean masks. Mastering these basic indexing techniques is crucial for effective data manipulation and analysis in Pandas.


In this lecture, we will explore a set of more advanced indexing methods that expand upon the capabilities of the basic indexing techniques. These selected indexing methods provide additional functionality and flexibility for filtering, selecting, and manipulating data in Pandas. They allow you to handle more complex conditions, work with specific subsets of data, and perform advanced data operations efficiently.


Before diving into the selected indexing methods, let's quickly recap the basic indexing methods covered in the previous lecture:

- `loc`: Label-based indexing for accessing rows and columns by their labels.
```python
df.loc[row_labels, column_labels]
```

- `iloc`: Integer-based indexing for accessing rows and columns by their integer positions.
```python
df.iloc[row_positions, column_positions]
```

- Square bracket notation `[]`: Indexing using labels, integer positions, or boolean masks.
```python
df[column_label]
df[boolean_mask]
```


These basic indexing methods form the foundation for data selection and manipulation in Pandas. They allow you to retrieve specific rows, columns, or subsets of data based on various criteria.


In this lecture, we will focus on the following selected indexing methods:

1. **Boolean Indexing**: Selecting data based on boolean conditions and masks.
   - Creating boolean masks
   - Selecting rows and columns based on conditions
   - Combining boolean conditions using logical operators

2. **Indexing with `isin()`**: Selecting data based on a list of values.
   - Filtering rows based on a list of values
   - Combining `isin()` with boolean indexing

3. **Indexing with `query()`**: Selecting data using string expressions.
   - Filtering data based on complex conditions
   - Performance advantages of using `query()`

4. **Indexing with `where()`**: Selecting and replacing values based on conditions.
   - Selecting values based on a boolean mask
   - Replacing values based on conditions
   - Combining `where()` with other indexing methods


These selected indexing methods provide additional capabilities and flexibility compared to the basic indexing techniques. They allow you to handle more complex filtering and selection scenarios, work with specific subsets of data, and perform advanced data operations efficiently.


Throughout this lecture, we will explore each of these indexing methods in detail, providing explanations, examples, and practical use cases. We will also discuss best practices, performance considerations, and common pitfalls to be aware of when using these methods.


By the end of this lecture, you will have a solid understanding of these selected indexing methods and how to apply them effectively in your data manipulation and analysis tasks using Pandas.


## <a id='toc1_'></a>[Boolean Indexing](#toc0_)

Boolean indexing is a powerful technique in Pandas that allows you to select rows and columns based on boolean conditions. It enables you to filter and extract specific subsets of data that satisfy certain criteria. Boolean indexing is a fundamental skill for data manipulation and analysis in Pandas.


Boolean indexing works by creating a boolean mask, which is an array of True and False values corresponding to each element in the Series or DataFrame. The boolean mask determines which elements are selected and which are not.


When you apply a boolean mask to a Series or DataFrame, only the elements corresponding to True values in the mask are selected, while the elements corresponding to False values are excluded.


Boolean indexing provides a concise and efficient way to filter and select data based on specific conditions.


### <a id='toc1_1_'></a>[Creating Boolean Masks](#toc0_)


To perform boolean indexing, you first need to create a boolean mask. A boolean mask is typically created by applying a condition or a series of conditions to a Series or DataFrame.


Here's an example of creating a boolean mask for a Series:


In [1]:
import pandas as pd

In [2]:
series = pd.Series([1, 2, 3, 4, 5])
series > 3

0    False
1    False
2    False
3     True
4     True
dtype: bool

In this example, the condition `series > 3` creates a boolean mask where True values correspond to elements greater than 3 and False values correspond to elements less than or equal to 3.


Similarly, you can create boolean masks for DataFrames by applying conditions to one or more columns:


In [3]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]})
mask = df['A'] > 3
mask

0    False
1    False
2    False
3     True
4     True
Name: A, dtype: bool

In this case, the condition `df['A'] > 3` creates a boolean mask based on the values in column 'A' of the DataFrame.


### <a id='toc1_2_'></a>[Selecting Rows and Columns Based on Boolean Conditions](#toc0_)


Once you have created a boolean mask, you can use it to select rows and columns from a Series or DataFrame.


For a Series, you can directly apply the boolean mask to select elements:


In [4]:
series[mask]

3    4
4    5
dtype: int64

This will select only the elements from the Series where the corresponding values in the boolean mask are True.


For a DataFrame, you can use the boolean mask to select rows:


In [5]:
df[mask]

Unnamed: 0,A,B
3,4,9
4,5,10


This will select only the rows from the DataFrame where the corresponding values in the boolean mask are True.


You can also use boolean indexing to select columns based on conditions:


In [6]:
df.mean() > 5

A    False
B     True
dtype: bool

In [7]:
df.loc[:, df.mean() > 5]

Unnamed: 0,B
0,6
1,7
2,8
3,9
4,10


In [8]:
df.loc[df['B'] > 8, df.mean() > 5]

Unnamed: 0,B
3,9
4,10


In this example, the condition `df.mean() > 5` creates a boolean mask for the columns based on their mean values. The resulting DataFrame will only contain the columns where the mean value is greater than 5.


### <a id='toc1_3_'></a>[Combining Boolean Conditions Using Logical Operators](#toc0_)


Boolean indexing allows you to combine multiple boolean conditions using logical operators such as `&` (and), `|` (or), and `~` (not).


Here's an example of combining boolean conditions for a DataFrame:


In [9]:
mask = (df['A'] > 3) & (df['B'] < 9)
df[mask]

Unnamed: 0,A,B


In this case, the boolean mask is created by combining two conditions using the `&` operator. The resulting mask will be True only for rows where both conditions are satisfied.


You can also use the `|` operator to select rows where either condition is True, and the `~` operator to negate a condition.


### <a id='toc1_4_'></a>[Practical Examples and Use Cases](#toc0_)


Boolean indexing is widely used in various data manipulation and analysis tasks. Here are a few practical examples and use cases:

1. Filtering data based on specific criteria:
   - Selecting rows where a column value falls within a certain range
   - Filtering out outliers or invalid data points
   - Extracting data that meets specific conditions

2. Conditional data modification:
   - Updating values in a DataFrame based on boolean conditions
   - Applying different operations to subsets of data based on criteria

3. Data cleaning and preprocessing:
   - Removing rows or columns with missing or invalid values
   - Handling outliers and extreme values based on statistical thresholds

4. Feature selection and engineering:
   - Selecting relevant features based on statistical measures or domain knowledge
   - Creating new features based on conditional transformations


Boolean indexing provides a flexible and efficient way to handle these scenarios and more. By leveraging boolean masks and logical operators, you can precisely select and manipulate the desired subsets of data in your Pandas Series and DataFrames.


It's important to note that boolean indexing creates a new Series or DataFrame containing only the selected elements. The original data remains unchanged unless you explicitly assign the result back to the original variable.


Boolean indexing is a fundamental technique in Pandas that every data scientist and analyst should master. It enables you to filter, select, and manipulate data based on specific conditions, making it a powerful tool for data exploration, cleaning, and analysis.

## <a id='toc2_'></a>[Indexing with `isin`](#toc0_)

Pandas provides the `isin()` method as a convenient way to select rows from a Series or DataFrame based on a list of values. The `isin()` method allows you to check whether each element in a Series or DataFrame is contained in a specified set of values and returns a boolean mask that can be used for indexing.


The `isin()` method is used to test whether each element in a Series or DataFrame is included in a provided set of values. It returns a boolean Series or DataFrame of the same shape, where True indicates that the element is present in the specified set, and False indicates that the element is not present.


The basic syntax of the `isin()` method is as follows:

```python
series.isin(values)
dataframe.isin(values)
```

- `series` or `dataframe`: The Series or DataFrame to be tested.
- `values`: A set of values to check against. It can be a list, tuple, Series, DataFrame, or any other iterable.


### <a id='toc2_1_'></a>[Selecting Rows Based on a List of Values](#toc0_)


One common use case of the `isin()` method is to select rows from a DataFrame based on a list of values in a specific column.


Here's an example:


In [10]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e']})
values = [2, 4]
df[df['A'].isin(values)]

Unnamed: 0,A,B
1,2,b
3,4,d


In this example, the `isin()` method is applied to column 'A' of the DataFrame `df`. It checks each value in column 'A' against the list of values `[2, 4]`. The resulting boolean mask is then used to index the DataFrame, selecting only the rows where the value in column 'A' is either 2 or 4.


### <a id='toc2_2_'></a>[Filtering Data Using `isin()`](#toc0_)


The `isin()` method is particularly useful for filtering data based on a specific set of values. It allows you to include or exclude rows that match certain criteria.


Here's an example of filtering data using `isin()`:


In [11]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e']})
values_to_exclude = ['b', 'd']
df[~df['B'].isin(values_to_exclude)]

Unnamed: 0,A,B
0,1,a
2,3,c
4,5,e


In this case, the `isin()` method is used to create a boolean mask that identifies rows where the value in column 'B' is either 'b' or 'd'. The `~` operator is used to negate the boolean mask, effectively selecting the rows where the value in column 'B' is not 'b' or 'd'. The resulting filtered DataFrame contains only the rows that do not match the specified values.


### <a id='toc2_3_'></a>[Combining `isin()` with Boolean Indexing](#toc0_)


The `isin()` method can be combined with boolean indexing to create more complex filtering conditions. By combining `isin()` with logical operators such as `&` (and), `|` (or), and `~` (not), you can select rows based on multiple criteria.


Here's an example:


In [12]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [10, 20, 30, 40, 50]})
df[(df['A'].isin([2, 4])) & (df['C'] > 25)]

Unnamed: 0,A,B,C
3,4,d,40


In this example, the `isin()` method is used to select rows where the value in column 'A' is either 2 or 4. This condition is then combined with another condition using the `&` operator, selecting rows where the value in column 'C' is greater than 25. The resulting DataFrame contains only the rows that satisfy both conditions.


### <a id='toc2_4_'></a>[Practical Examples and Use Cases](#toc0_)


The `isin()` method is widely used in various data manipulation and analysis tasks. Here are a few practical examples and use cases:

1. Filtering data based on a predefined set of values:
   - Selecting rows based on a list of valid categories or labels
   - Excluding rows that contain specific unwanted values

2. Merging and joining datasets:
   - Identifying common values between two datasets using `isin()`
   - Performing inner or outer joins based on a set of keys

3. Data cleaning and preprocessing:
   - Removing rows with invalid or unexpected values
   - Handling missing data by checking against a set of valid values

4. Categorical data analysis:
   - Filtering rows based on specific categories or labels
   - Comparing distributions or statistics across different categories


The `isin()` method provides a simple and efficient way to select and filter data based on a set of values. It is particularly useful when you have a predefined list of values or when you need to compare data against a specific set of criteria.


By combining `isin()` with boolean indexing and logical operators, you can create complex filtering conditions and select precisely the data you need for your analysis.


It's important to note that the `isin()` method is case-sensitive for string comparisons. If you need case-insensitive matching, you can convert the data to a consistent case before using `isin()`.


Indexing with `isin()` is a valuable tool in the Pandas toolkit for data selection and filtering. It simplifies the process of working with specific subsets of data and allows you to focus on the relevant information for your analysis.

## <a id='toc3_'></a>[Indexing with query](#toc0_)

Pandas provides the `query()` method as a powerful and expressive way to select data from a DataFrame based on complex conditions. The `query()` method allows you to filter rows using a string expression that represents a boolean condition. It offers a concise and readable syntax for specifying complex filtering criteria.


The `query()` method is used to select rows from a DataFrame based on a boolean expression provided as a string. It allows you to specify complex conditions using a natural language-like syntax, making the code more readable and intuitive.


The basic syntax of the `query()` method is as follows:


```python
dataframe.query(expr)
```


- `dataframe`: The DataFrame to be queried.
- `expr`: A string expression representing the boolean condition to be evaluated for each row. The expression can include column names, operators, and constants.


The `query()` method evaluates the provided expression for each row in the DataFrame and returns a new DataFrame containing only the rows that satisfy the condition.


### <a id='toc3_1_'></a>[Selecting Data Using String Expressions](#toc0_)

The `query()` method allows you to specify the filtering condition using a string expression. The expression can include column names, operators, and constants, similar to how you would write a boolean condition in Python.


Here's an example of using the `query()` method to select data based on a simple condition:


In [13]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50], 'C': ['a', 'b', 'c', 'd', 'e']})
df.query('A > 2')

Unnamed: 0,A,B,C
2,3,30,c
3,4,40,d
4,5,50,e


In this example, the `query()` method is used to select rows from the DataFrame `df` where the value in column 'A' is greater than 2. The resulting DataFrame `selected_data` contains only the rows that satisfy this condition.


You can also use other operators and combine multiple conditions using logical operators like `&` (and), `|` (or), and `~` (not). Here's an example:


In [14]:
df.query('A > 2 & B < 40')

Unnamed: 0,A,B,C
2,3,30,c


This query selects rows where the value in column 'A' is greater than 2 and the value in column 'B' is less than 40.


### <a id='toc3_2_'></a>[Filtering Data Based on Complex Conditions](#toc0_)


The `query()` method really shines when you need to filter data based on complex conditions. It allows you to express intricate filtering logic using a combination of operators, functions, and conditional statements.


Here are a few examples of using the `query()` method with complex conditions:

1. Filtering based on multiple columns:

In [15]:
df.query('A < B & C == "d"')

Unnamed: 0,A,B,C
3,4,40,d


This query selects rows where the value in column 'A' is greater than the value in column 'B', and the value in column 'C' is equal to 'a'.


2. Using mathematical operations:

In [16]:
df.query('A ** 2 > 10')

Unnamed: 0,A,B,C
3,4,40,d
4,5,50,e


This query selects rows where the square of the value in column 'A' is greater than 10.


3. Applying functions and methods:

In [17]:
df.query('C.str.startswith("b")')

Unnamed: 0,A,B,C
1,2,20,b


This query selects rows where the value in column 'C' starts with the letter 'b'. It uses the `str.startswith()` method to check the condition.

4. Referencing variables:

In [18]:
threshold = 3
df.query('A > @threshold')

Unnamed: 0,A,B,C
3,4,40,d
4,5,50,e


   This query selects rows where the value in column 'A' is greater than the value of the variable `threshold`. The `@` symbol is used to reference variables within the query expression.


The `query()` method provides a powerful and flexible way to filter data based on complex conditions. It allows you to express your filtering logic using a concise and readable syntax, making your code more maintainable and easier to understand.


### <a id='toc3_3_'></a>[When to Use `.query()` vs `.loc`](#toc0_)


There is often more than one way to do things in Pandas. You may be wondering if you should use `.query()` or `.loc`.


If you do a lot of chaining (which is recommended), `.query()` has the advantage of working on the intermediate DataFrame. One could argue that `.loc` does as well, but often when using boolean arrays with `.loc`, users insert a boolean array based on the original data, not the intermediate data. You need to use a function with `.loc` to get access to the original DataFrame.


On the flip side, `.query()` does not support column selection, but `.loc` does.


It's not a situation where you should only learn one of these constructs and neglect the other. Learn both `.query()` and `.loc` and figure out which one is appropriate given your requirements.


### <a id='toc3_4_'></a>[Practical Examples and Use Cases](#toc0_)


The `query()` method is widely used in various data manipulation and analysis tasks. Here are a few practical examples and use cases:

1. Filtering data based on numeric ranges:
   - Selecting rows where a column value falls within a specific range
   - Filtering outliers or extreme values based on statistical thresholds

2. Conditional data selection:
   - Selecting rows based on multiple conditions across different columns
   - Filtering data based on a combination of numeric and categorical criteria

3. Data cleaning and preprocessing:
   - Removing rows with invalid or missing values based on complex conditions
   - Applying conditional transformations or corrections to specific subsets of data

4. Exploratory data analysis:
   - Filtering data based on domain-specific criteria or business rules
   - Selecting relevant subsets of data for further analysis or visualization


The `query()` method provides a powerful and expressive way to filter and select data based on complex conditions. By leveraging the string-based expression syntax, you can create intricate filtering logic that is both concise and readable.


It's important to be mindful of the column names and data types when using the `query()` method. The expression should be valid Python code and follow the rules of operator precedence and parentheses.


Indexing with `query()` is a valuable tool in the Pandas library for efficient and expressive data filtering. It allows you to quickly select relevant subsets of data based on complex conditions, making your data manipulation and analysis tasks more streamlined and effective.

## <a id='toc4_'></a>[Indexing with where](#toc0_)

Pandas provides the `where()` method as a powerful tool for selecting and replacing values in a Series or DataFrame based on boolean conditions. The `where()` method allows you to specify a boolean mask or a condition to determine which values to keep or replace. It offers a concise and flexible way to perform conditional operations on your data.


The `where()` method is used to select or replace values in a Series or DataFrame based on a boolean condition. It takes a boolean mask or a callable as the condition and returns a new Series or DataFrame with the values that satisfy the condition unchanged, while the values that do not satisfy the condition are replaced with a specified value or `NaN` (Not a Number).


The basic syntax of the `where()` method is as follows:


```python
series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise')
dataframe.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise')
```

- `cond`: A boolean Series/DataFrame, array-like, or callable that specifies the condition to be evaluated. True values are kept, while False values are replaced.
- `other`: The value to replace the False values with. It can be a scalar, Series, DataFrame, or callable. If not specified, False values are replaced with `NaN`.
- `inplace`: If True, the operation is performed in-place, modifying the original Series or DataFrame. If False (default), a new Series or DataFrame is returned.
- `axis`: The axis along which the condition is applied (0 for index, 1 for columns). Only applicable for DataFrames.
- `level`: The MultiIndex level to apply the condition on. Only applicable for MultiIndex Series or DataFrames.
- `errors`: How to handle errors when the condition and the Series/DataFrame have different shapes. 'raise' (default) raises an error, 'ignore' suppresses the error.


### <a id='toc4_1_'></a>[Selecting Values Based on a Boolean Mask](#toc0_)


One common use case of the `where()` method is to select values from a Series or DataFrame based on a boolean mask. The boolean mask can be created using comparison operators, logical operators, or any other condition that returns a boolean Series or DataFrame.


Here's an example of using the `where()` method to select values based on a condition:


In [19]:
s = pd.Series([1, 2, 3, 4, 5])
s.where(s > 2)

0    NaN
1    NaN
2    3.0
3    4.0
4    5.0
dtype: float64

In this example, the `where()` method is applied to the Series `s`. The condition `s > 2` creates a boolean mask where True values correspond to elements greater than 2. The resulting Series `selected_values` contains the original values where the condition is True and `NaN` where the condition is False.


Similarly, you can apply the `where()` method to a DataFrame:


In [20]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]})
df.where(df > 20)

Unnamed: 0,A,B
0,,
1,,
2,,30.0
3,,40.0
4,,50.0


In this case, the condition `df > 20` is applied to each element of the DataFrame `df`. The resulting DataFrame `selected_values` contains the original values where the condition is True and `NaN` where the condition is False.


### <a id='toc4_2_'></a>[Replacing Values Based on Conditions](#toc0_)


Another powerful feature of the `where()` method is the ability to replace values based on conditions. By specifying the `other` parameter, you can replace the values that do not satisfy the condition with a specified value or another Series/DataFrame.


Here's an example of replacing values using the `where()` method:


In [21]:
s = pd.Series([1, 2, 3, 4, 5])
s.where(s > 2, 0)

0    0
1    0
2    3
3    4
4    5
dtype: int64

In this example, the `where()` method is used to replace values in the Series `s`. The condition `s > 2` determines which values to keep. The values that do not satisfy the condition are replaced with 0, as specified by the `other` parameter.


You can also use a Series or DataFrame as the `other` parameter to replace values based on corresponding elements from another Series or DataFrame:


In [22]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]})
mask = df['A'] > 2
df.where(mask, df['B'], axis=0)

Unnamed: 0,A,B
0,10,10
1,20,20
2,3,30
3,4,40
4,5,50


In this example, the `where()` method is applied to the DataFrame `df`. The condition `mask` determines which values in column 'A' to keep. The values in column 'A' that do not satisfy the condition are replaced with the corresponding values from column 'B'.


### <a id='toc4_3_'></a>[Combining where() with Other Indexing Methods](#toc0_)


The `where()` method can be combined with other indexing methods, such as `loc` and `iloc`, to perform conditional selection and replacement on specific rows or columns.


Here's an example of combining `where()` with `loc`:


In [23]:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]})
df

Unnamed: 0,A,B
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50


In [24]:
df.loc[df['A'] > 2, 'B'].where(lambda x: x < 40)

2    30.0
3     NaN
4     NaN
Name: B, dtype: float64

In this example, the `loc` accessor is used to select rows where the value in column 'A' is greater than 2 and the column 'B'. Then, the `where()` method is applied to the selected values, keeping only the values in column 'B' that are less than 40.


### <a id='toc4_4_'></a>[Practical Examples and Use Cases](#toc0_)


The `where()` method is widely used in various data manipulation and analysis tasks. Here are a few practical examples and use cases:

1. Handling missing data:
   - Replacing missing values with a specific value or the mean/median of the column
   - Filling missing values based on conditions or patterns

2. Conditional data transformation:
   - Applying different transformations to subsets of data based on conditions
   - Scaling or normalizing values based on specific criteria

3. Data cleaning and preprocessing:
   - Removing or replacing invalid or outlier values based on statistical thresholds
   - Correcting data inconsistencies or errors based on predefined rules

4. Conditional calculations and aggregations:
   - Calculating metrics or statistics only for a subset of data that meets certain conditions
   - Applying different aggregation functions based on categories or groups


The `where()` method provides a flexible and efficient way to perform conditional selection and replacement operations on your data. It allows you to keep or modify values based on boolean conditions, making it a powerful tool for data manipulation and analysis.


It's important to note that the `where()` method returns a new Series or DataFrame by default, preserving the original data. If you want to modify the data in-place, you can set the `inplace` parameter to True.


When using the `where()` method with callable conditions or `other` parameters, be mindful of the shapes and alignments of the Series or DataFrame to avoid potential errors or unexpected results.


Indexing with `where()` is a valuable technique in the Pandas library for conditional data selection and manipulation. By leveraging the power of boolean masks and conditional replacement, you can efficiently transform and analyze your data based on specific criteria.

## <a id='toc5_'></a>[Hands-on Examples and Exercises](#toc0_)

In this section, we will explore practical examples and exercises that demonstrate the usage of the selected indexing methods: boolean indexing, indexing with `isin()`, indexing with `query()`, and indexing with `where()`. These examples and exercises will help reinforce your understanding and application of these concepts in real-world scenarios.


### <a id='toc5_1_'></a>[Example 1: Boolean Indexing](#toc0_)


Suppose you have a DataFrame `df` with columns 'A', 'B', and 'C'. You want to select rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10.


In [25]:
df = pd.DataFrame({'A': [1, 8, 3, 10, 5], 'B': [5, 12, 7, 8, 9], 'C': ['a', 'b', 'c', 'd', 'e']})
df


Unnamed: 0,A,B,C
0,1,5,a
1,8,12,b
2,3,7,c
3,10,8,d
4,5,9,e


In [26]:
df[(df['A'] > 5) & (df['B'] < 10)]


Unnamed: 0,A,B,C
3,10,8,d


In this example, boolean indexing is used to select rows that satisfy the given conditions. The resulting `selected_rows` DataFrame will contain only the rows where column 'A' is greater than 5 and column 'B' is less than 10.


### <a id='toc5_2_'></a>[Example 2: Indexing with `isin()`](#toc0_)


Suppose you have a DataFrame `df` with a column 'Category'. You want to select rows where the value in the 'Category' column is either 'A' or 'C'.


In [27]:
df = pd.DataFrame({'Category': ['A', 'B', 'C', 'A', 'D', 'C'], 'Value': [10, 20, 30, 40, 50, 60]})
df[df['Category'].isin(['A', 'C'])]

Unnamed: 0,Category,Value
0,A,10
2,C,30
3,A,40
5,C,60


In this example, the `isin()` method is used to select rows where the value in the 'Category' column is either 'A' or 'C'. The resulting `selected_rows` DataFrame will contain only the rows that match the specified categories.


### <a id='toc5_3_'></a>[Example 3: Indexing with `query()`](#toc0_)


Suppose you have a DataFrame `df` with columns 'Price' and 'Quantity'. You want to select rows where the price is greater than 50 and the quantity is less than 100.


In [28]:
df = pd.DataFrame({'Price': [40, 60, 80, 100], 'Quantity': [50, 80, 120, 90]})
df.query('Price > 50 & Quantity < 100')

Unnamed: 0,Price,Quantity
1,60,80
3,100,90


In this example, the `query()` method is used to select rows based on the given conditions. The resulting `selected_rows` DataFrame will contain only the rows that satisfy the price and quantity criteria.


### <a id='toc5_4_'></a>[Example 4: Indexing with `where()`](#toc0_)


Suppose you have a DataFrame `df` with columns 'Score' and 'Grade'. You want to replace the grade with 'A' for scores greater than 90, 'B' for scores between 80 and 90, and 'C' for scores below 80.


In [29]:
df = pd.DataFrame({'Score': [85, 92, 78, 95], 'Grade': ['', '', '', '']})
df

Unnamed: 0,Score,Grade
0,85,
1,92,
2,78,
3,95,


In [30]:
df['Grade'] = df['Grade'].where(df['Score'] < 90, 'A')
df

Unnamed: 0,Score,Grade
0,85,
1,92,A
2,78,
3,95,A


In [31]:
df['Grade'] = df['Grade'].where((df['Score'] < 80) | (df['Score'] > 90), 'B')
df

Unnamed: 0,Score,Grade
0,85,B
1,92,A
2,78,
3,95,A


In [32]:
df['Grade'] = df['Grade'].where(df['Score'] > 80, 'C')
df

Unnamed: 0,Score,Grade
0,85,B
1,92,A
2,78,C
3,95,A


In this example, the `where()` method is used to conditionally assign grades based on the score values. The resulting `df` DataFrame will have the 'Grade' column filled with the appropriate grades based on the score thresholds.

## <a id='toc6_'></a>[Conclusion](#toc0_)

In this lecture, we explored several powerful indexing methods in Pandas: boolean indexing, indexing with `isin()`, indexing with `query()`, and indexing with `where()`. These methods provide efficient and flexible ways to select, filter, and manipulate data in Pandas Series and DataFrames.


Recap of the key points covered in this lecture:

1. **Boolean Indexing**: Boolean indexing allows you to select rows and columns based on boolean conditions. By creating boolean masks or using boolean operators, you can filter data that satisfies specific criteria. Boolean indexing is a fundamental technique for data selection and subsetting.

2. **Indexing with `isin()`**: The `isin()` method enables you to select rows based on a list of values. It checks whether each element in a Series or DataFrame is present in a specified set of values and returns a boolean mask. Indexing with `isin()` is particularly useful when you need to filter data based on a predefined set of values or categories.

3. **Indexing with `query()`**: The `query()` method provides a concise and expressive way to select data based on complex conditions. By using a string expression that represents a boolean condition, you can filter rows that satisfy the given criteria. Indexing with `query()` allows you to write readable and intuitive code for data selection, especially when dealing with multiple conditions.

4. **Indexing with `where()`**: The `where()` method is used to select or replace values based on boolean conditions. It takes a boolean mask or a callable as the condition and returns a new Series or DataFrame with the values that satisfy the condition unchanged, while the values that do not satisfy the condition are replaced with a specified value or `NaN`. Indexing with `where()` is useful for conditional data selection and replacement.


Mastering these indexing techniques is crucial for efficient data manipulation and analysis in Pandas. By leveraging the power of boolean indexing, `isin()`, `query()`, and `where()`, you can quickly and effectively select, filter, and transform your data based on specific conditions and criteria.


Efficient indexing allows you to extract relevant subsets of data, handle missing values, apply conditional transformations, and perform complex data operations. It enables you to focus on the data that matters most for your analysis and make data-driven decisions.


Moreover, being proficient in these indexing methods can significantly improve your productivity and code readability. You can write concise and expressive code that clearly communicates your data selection and manipulation logic, making your code more maintainable and easier to understand for yourself and others.


Remember, the key to mastering Pandas indexing is practice and application. Apply these indexing techniques to real-world datasets, experiment with different scenarios, and challenge yourself to solve complex data selection and manipulation problems. With practice and experience, you will develop a strong intuition for choosing the appropriate indexing method for each situation and become a confident and efficient Pandas user.