<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Data%20Analysis/Level%201/filtering_sorting_and_feature_creation_in_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Filtering, Sorting & Feature Creation in Pandas
Manipulating and transforming data efficiently is key to data analysis. In this section, we’ll explore:

- Boolean indexing & filtering rows

- Sorting data by values or index

- Creating new features/columns from existing data


## 1. Boolean Indexing & Advanced Filtering

Boolean indexing allows you to filter rows based on conditions.

### Basic Boolean Filtering

In [1]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David'],
    'Age': [25, 32, 40, 29],
    'Score': [88.5, 92.0, 79.5, 85.0]
})

# Filter rows where Age > 30
df[df['Age'] > 30]

Unnamed: 0,Name,Age,Score
1,Bob,32,92.0
2,Carol,40,79.5


### Combining Conditions with & and |


In [2]:
# People over 30 AND score above 80
df[(df['Age'] > 30) & (df['Score'] > 80)]

Unnamed: 0,Name,Age,Score
1,Bob,32,92.0


> Wrap conditions in parentheses when combining!

###  Using .isin() for filtering multiple values

In [3]:
df[df['Name'].isin(['Alice', 'David'])]

Unnamed: 0,Name,Age,Score
0,Alice,25,88.5
3,David,29,85.0


## 2. Sorting Rows
You can sort the DataFrame based on column values or index.

### Sort by column value (ascending or descending)

In [4]:
# Sort by Score ascending
df.sort_values(by='Score', ascending=True)

# Sort by Age descending
df.sort_values(by='Age', ascending=False)

Unnamed: 0,Name,Age,Score
2,Carol,40,79.5
1,Bob,32,92.0
3,David,29,85.0
0,Alice,25,88.5


### Sort by index

In [5]:
df.sort_index()

Unnamed: 0,Name,Age,Score
0,Alice,25,88.5
1,Bob,32,92.0
2,Carol,40,79.5
3,David,29,85.0


## 3. Feature Creation: Add New Columns

### Deriving a new column based on other columns

In [7]:
# Add a new column 'Status' based on Score
df['Status'] = df['Score'].apply(lambda x: 'Pass' if x >= 85 else 'Fail')
df['Status']

Unnamed: 0,Status
0,Pass
1,Pass
2,Fail
3,Pass


### Create numeric transformations

In [8]:
# Normalize the Score column (min-max scaling)
df['Score_Normalized'] = (df['Score'] - df['Score'].min()) / (df['Score'].max() - df['Score'].min())
df['Score_Normalized']

Unnamed: 0,Score_Normalized
0,0.72
1,1.0
2,0.0
3,0.44


### Creating new columns using vectorized operations

In [9]:
# Add 5 bonus points to everyone’s score
df['Score_Bonus'] = df['Score'] + 5
df['Score_Bonus']

Unnamed: 0,Score_Bonus
0,93.5
1,97.0
2,84.5
3,90.0


## Summary Table

| Task                       | Method                      |
| -------------------------- | --------------------------- |
| Filter rows by condition   | `df[df['col'] > value]`     |
| Multiple filters           | `df[(cond1) & (cond2)]`     |
| Filter by values           | `df[df['col'].isin([...])]` |
| Sort by values             | `df.sort_values(by='col')`  |
| Sort by index              | `df.sort_index()`           |
| New column from condition  | `.apply(lambda x: ...)`     |
| Vectorized math operations | `df['new'] = df['col'] + 5` |

