# Compute

Computing new values from existing data is one of the most common tasks in data analysis. Whether you're calculating rates, percentages, or creating categorical variables, pandas provides powerful tools for creating new columns and transforming your data.

## Setup data

In [None]:
import pandas as pd

# Load and prepare the data
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()

print(f"Loaded {len(accident_list)} accidents")
accident_list.head()

## Basic arithmetic operations

You can perform mathematical operations on columns to create new computed fields:

In [None]:
# Calculate total people involved (fatalities + injuries)
accident_list["total_people"] = accident_list["total_fatalities"] + accident_list["total_serious_injuries"] + accident_list["total_minor_injuries"]

print("Added total_people column:")
accident_list[["total_fatalities", "total_serious_injuries", "total_minor_injuries", "total_people"]].head()

## Conditional calculations

Use `numpy.where()` or boolean indexing for conditional calculations:

In [None]:
import numpy as np

# Create severity categories
accident_list["severity"] = np.where(
    accident_list["total_fatalities"] > 0, "Fatal",
    np.where(accident_list["total_serious_injuries"] > 0, "Serious", "Minor")
)

print("Severity categories:")
accident_list["severity"].value_counts()

## Working with dates

Convert date strings to datetime objects for date calculations:

In [None]:
# Convert date column to datetime
accident_list["date"] = pd.to_datetime(accident_list["date"])

# Extract year, month, day
accident_list["year"] = accident_list["date"].dt.year
accident_list["month"] = accident_list["date"].dt.month
accident_list["day_of_week"] = accident_list["date"].dt.day_name()

print("Accidents by year:")
print(accident_list["year"].value_counts().sort_index())

## Using apply() for complex calculations

For more complex computations, use the `apply()` method with custom functions:

In [None]:
def calculate_fatality_rate(row):
    """Calculate what percentage of people involved died"""
    if row["total_people"] == 0:
        return 0
    return (row["total_fatalities"] / row["total_people"]) * 100

accident_list["fatality_rate"] = accident_list.apply(calculate_fatality_rate, axis=1)

print("Fatality rate statistics:")
print(accident_list["fatality_rate"].describe())

## Ranking and percentiles

Create rankings and percentile scores:

In [None]:
# Rank accidents by total people involved
accident_list["severity_rank"] = accident_list["total_people"].rank(ascending=False, method="dense")

# Show the most severe accidents
most_severe = accident_list.nsmallest(10, "severity_rank")
print("Top 10 most severe accidents by people involved:")
most_severe[["accident_number", "date", "location", "total_people", "total_fatalities", "severity_rank"]].head()

Computing new values from your data is essential for analysis. These techniques allow you to transform raw data into meaningful insights and create the specific metrics you need for your investigations.