To apply a condition to a column in a pandas DataFrame, you can use Boolean indexing or the loc method to filter or modify rows. Here are some common ways to use conditions in a DataFrame column:

In [1]:
import pandas as pd

In [3]:
data = {
    'name' : ['Alice', 'Bob', 'Charlie', 'David'],
    'age' : [25, 30, 35, 40],
    'score' : [85, 90, 75, 80]
}
data

{'name': ['Alice', 'Bob', 'Charlie', 'David'],
 'age': [25, 30, 35, 40],
 'score': [85, 90, 75, 80]}

In [5]:
df = pd.DataFrame(data)
df

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,75
3,David,40,80


1. Filtering Rows Based on a Condition
You can filter the rows of a DataFrame by applying a condition to a column.

In [6]:
# Get rows where 'age' is greater than 30
filtered_df = df[df['age']>30]
filtered_df

Unnamed: 0,name,age,score
2,Charlie,35,75
3,David,40,80


2. Modifying Values Based on a Condition
To modify the values in a column based on a condition, you can use the loc method.

In [7]:
# Set 'score' to 100 for all rows where 'age' is greater than 30
df.loc[df['age'] > 30, 'score'] = 100
df

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,100
3,David,40,100


3. Adding a New Column Based on a Condition
You can also create a new column based on a condition.

In [8]:
# Create a new column 'passed' based on whether 'score' is greater than or equal to 80
df['passed'] = df['score'] >= 80
df

Unnamed: 0,name,age,score,passed
0,Alice,25,85,True
1,Bob,30,90,True
2,Charlie,35,100,True
3,David,40,100,True


4. Using apply() with a Lambda Function for Complex Conditions
If you need to apply more complex conditions, you can use the apply() method with a lambda function.

In [9]:
# Create a new column 'category' based on the 'age' column
df['category'] = df['age'].apply(lambda x: 'Young' if x < 30 else 'Old')
df

Unnamed: 0,name,age,score,passed,category
0,Alice,25,85,True,Young
1,Bob,30,90,True,Old
2,Charlie,35,100,True,Old
3,David,40,100,True,Old


Summary
- Use df[condition] to filter rows.
- Use df.loc[condition, 'column'] to modify values.
- Use df['new_column'] = df['column'].apply() to create new columns based on conditions.

The expression df[df['age'] > 30] is a form of Boolean indexing that is used to filter rows in a DataFrame based on a condition. Let me break down how it works step by step:

### Step-by-Step Explanation
#### Condition Evaluation: `df['age'] > 30`

`df['age']` selects the "age" column from the DataFrame, which results in a Series (a one-dimensional array-like object).
The condition > 30 is applied element-wise to this Series. This results in a new Boolean Series with True or False for each element, depending on whether the condition is met.
For the example DataFrame:

`df['age'] > 30`
Output:
0    False
1    False
2     True
3     True
Name: age, dtype: bool

This output indicates that the third and fourth rows satisfy the condition (age > 30).

#### Filtering with Boolean Indexing: `df[df['age'] > 30]`

This part is where Boolean indexing comes into play.
We take the original DataFrame (df) and use the Boolean Series (`df['age'] > 30`) to select rows where the condition is True.
Essentially, `df[condition]` returns only those rows for which the condition holds True.
The filtered DataFrame will contain only rows where age > 30:


    name  age  score
2  Charlie   35     75
3    David   40     80

### Why Use df[] Twice?
The first df[] is used to select the rows that meet the condition (df[condition]).
The condition inside the first df[] is created by evaluating a comparison on one of the columns (df['age'] > 30).
Essentially, we are using df[] twice because:

Inside: df['age'] > 30 generates a Boolean mask of which rows meet the criteria.
Outside: df[...] uses that Boolean mask to select the rows from the DataFrame where the mask is True.

This is a powerful feature of pandas, allowing you to easily filter rows based on conditions applied to column values.

The line df.loc[df['age'] > 30, 'score'] = 100 is used to conditionally modify values in a pandas DataFrame. Let’s break down the syntax to understand how it works:

Breakdown of Syntax
### `df.loc[]`:

`.loc[]` is a label-based selection method in pandas.
It allows you to access a group of rows and columns by labels or a Boolean array.
The general form is df.loc[row_condition, column_selection].
`df['age'] > 30`:

`df['age']` selects the "age" column of the DataFrame, returning a pandas Series of all values in that column.
The condition > 30 is applied element-wise to the "age" column, resulting in a Boolean Series of True and False values.
This Boolean Series is used to filter the rows where the condition is met (i.e., where "age" is greater than 30).

### `df['age'] > 30`
Output:
0    False
1    False
2     True
3     True
Name: age, dtype: bool

This Boolean Series will be used to identify which rows should be selected.

### `df.loc[df['age'] > 30, 'score']`:

`df.loc[df['age'] > 30, 'score']` selects the 'score' column for the rows where "age" is greater than 30.
<b>This means we are selecting all rows where the Boolean condition is True and then selecting the "score" column to make modifications.</b>

### = 100:

This assigns the value 100 to the selected rows in the "score" column.
Specifically, for all rows where the "age" column is greater than 30, the corresponding values in the "score" column are set to 100.

Summary
- df.loc[row_condition, column_selection]: Used to select rows and columns using labels or Boolean conditions.
- df['age'] > 30: Produces a Boolean Series that helps to filter rows where "age" is greater than 30.
= 100: Assigns the value 100 to the selected cells in the "score" column.

This line effectively updates the "score" values for all rows that meet the condition on the "age" column, providing a way to modify the DataFrame based on specified criteria.