# Sort

Another simple but common technique for analyzing data is sorting. This can be useful for ranking the DataFrame to show the first and last members of the table according to a particular column.

Let's start by preparing our data:

In [None]:
import pandas as pd

# Load accident data
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()

# Create accident counts
accident_counts = accident_list.groupby("latimes_make_and_model").size().reset_index().rename(columns={0: "accidents"})

# Load survey data and merge
survey = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/faa-survey.csv")
survey["latimes_make_and_model"] = survey["latimes_make_and_model"].str.upper()
merged_list = pd.merge(accident_counts, survey, on="latimes_make_and_model")

# Calculate accident rates
merged_list["per_hour"] = merged_list.accidents / merged_list.total_hours
merged_list["per_100k_hours"] = (merged_list.accidents / merged_list.total_hours) * 100_000

print("Data prepared for sorting analysis")
merged_list.head()

## Basic sorting

The [`sort_values`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) method is how pandas sorts DataFrames. It expects you to provide it with the name of the column to sort by in quotes. Try sorting by our computed field:

In [None]:
merged_list.sort_values("per_100k_hours")

Note that by default `sort_values` returns the DataFrame sorted in ascending order from lowest to highest. You can show the largest values first by passing in an optional keyword argument called `ascending`. When it is set to `False`, the DataFrame is sorted in descending order:

In [None]:
merged_list.sort_values("per_100k_hours", ascending=False)

## Sorting by multiple columns

You can also sort by multiple columns by passing a list of column names. This is useful when you want a primary and secondary sort:

In [None]:
# Sort by accidents (descending) then by total hours (descending)
merged_list.sort_values(["accidents", "total_hours"], ascending=[False, False])

## Finding the top and bottom values

Sometimes you just want the highest or lowest values. Pandas provides convenient methods for this:

In [None]:
# Top 5 models by accident rate
print("Top 5 models by accident rate per 100k hours:")
print(merged_list.nlargest(5, "per_100k_hours")[["latimes_make_and_model", "per_100k_hours"]])

In [None]:
# Bottom 5 models by accident rate
print("Bottom 5 models by accident rate per 100k hours:")
print(merged_list.nsmallest(5, "per_100k_hours")[["latimes_make_and_model", "per_100k_hours"]])

## Practice exercises

Congratulations! With sorting, you've covered most of the basic skills necessary to access and analyze data with pandas. Here are some practice questions you can answer using the techniques we've learned:

1. What's the date of the most recent fatal helicopter accident in Texas?
2. How many fatalities occurred in Texas accidents?
3. What helicopter model logged the most flight hours?
4. Where did the accident with the NTSB number `ERA13LA057` occur?

Try to answer these questions using the sorting and filtering techniques we've learned!

In [None]:
# Space for your practice exercises
# Try to answer the questions above using pandas methods