# Filter

The most common way to filter a DataFrame is to pass an expression as an "index" that can be used to decide which records to keep and which to discard. You write the expression by combining a column of your DataFrame with an "operator" like `==` or `>` or `<` and a value to compare each row against.

```{note}
If you are familiar with writing [SQL](https://en.wikipedia.org/wiki/SQL) to manipulate databases, pandas' filtering system is somewhat similar to a WHERE query. The [official pandas documentation](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#where) offers direct translations between the two.
```

## Setup data

Let's start by loading our accident data:

In [None]:
import pandas as pd
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()

print(f"Loaded {len(accident_list)} total accidents")
accident_list.head()

## Basic filtering

Let's try filtering against the `state` field. Save a state's postal code into a variable. This will allow us to reuse it later:

In [None]:
# Set a state to filter by
my_state = "IA"
print(f"Filtering data for state: {my_state}")

Now we will ask pandas to narrow down our list of accidents to just those in our state of interest. We will create a filter expression and place it between two square brackets following the DataFrame we wish to filter:

In [None]:
filtered_accidents = accident_list[accident_list["state"] == my_state]
filtered_accidents.head()

Now let's save the results of that filter into a new variable separate from the full list we imported from the CSV file. Since it includes only accidents in our chosen state, let's call it `my_accidents`:

In [None]:
my_accidents = accident_list[accident_list["state"] == my_state]
print(f"Found {len(my_accidents)} accidents in {my_state}")

To check our work and find out how many records are left after the filter, let's run the DataFrame inspection commands we learned earlier:

In [None]:
print("First few accidents in the filtered data:")
my_accidents.head()

In [None]:
print("Shape of filtered data:")
my_accidents.shape

## Different types of filters

You can filter using different operators. Here are some common ones:

In [None]:
# Filter for accidents with more than 2 fatalities
fatal_accidents = accident_list[accident_list["total_fatalities"] > 2]
print(f"Accidents with more than 2 fatalities: {len(fatal_accidents)}")

In [None]:
# Filter for accidents with exactly 0 fatalities
non_fatal = accident_list[accident_list["total_fatalities"] == 0]
print(f"Non-fatal accidents: {len(non_fatal)}")

In [None]:
# Filter for Robinson helicopters
robinson = accident_list[accident_list["latimes_make"] == "ROBINSON"]
print(f"Robinson helicopter accidents: {len(robinson)}")
robinson.head()

## Combining filters

You can combine multiple filter conditions using `&` (and) and `|` (or) operators. Note that each condition must be wrapped in parentheses:

In [None]:
# Robinson accidents with fatalities
robinson_fatal = accident_list[
    (accident_list["latimes_make"] == "ROBINSON") & 
    (accident_list["total_fatalities"] > 0)
]
print(f"Fatal Robinson accidents: {len(robinson_fatal)}")

In [None]:
# Accidents in California or Texas
ca_or_tx = accident_list[
    (accident_list["state"] == "CA") | 
    (accident_list["state"] == "TX")
]
print(f"Accidents in CA or TX: {len(ca_or_tx)}")

## Filtering with string methods

You can also filter using string methods for more complex text matching:

In [None]:
# Find accidents where location contains "Airport"
airport_accidents = accident_list[accident_list["location"].str.contains("Airport", na=False)]
print(f"Accidents at airports: {len(airport_accidents)}")

Filtering is one of the most powerful tools in pandas for data analysis. You can use these techniques to focus your analysis on specific subsets of your data that are most relevant to your research questions.