# Columns

One of the most powerful features of pandas is the ability to access and analyze individual columns from your DataFrame. This allows you to focus on specific variables and perform detailed analysis on the data that matters most to your investigation.

## Setup data

Let's start by loading our accident data:

In [None]:
import pandas as pd
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")

print(f"Loaded {len(accident_list)} accidents")
print(f"Columns: {list(accident_list.columns)}")

## Accessing individual columns

We'll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a pair of square brackets with the column's name in quotes inside:

In [None]:
# Access a specific column
accident_list["latimes_make_and_model"]

That will list the column out as a `Series`, just like the ones we created from scratch earlier. Just as we did then, you can now start tacking on additional methods that will analyze the contents of the column.

## Analyzing column data

There's a built-in pandas tool that will total up the frequency of values in a column. The method is called [`value_counts`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html) and it's just as easy to use as `sum`, `min` or `max`. All you need to do is add a period after the column name and chain it on the tail end of your code:

In [None]:
accident_list["latimes_make_and_model"].value_counts()

## Working with numeric columns

Let's look at the fatalities column and see what statistical methods we can apply:

In [None]:
# Basic statistics for total fatalities
fatalities = accident_list["total_fatalities"]
print(f"Total fatalities across all accidents: {fatalities.sum()}")
print(f"Average fatalities per accident: {fatalities.mean():.2f}")
print(f"Maximum fatalities in single accident: {fatalities.max()}")
print(f"Minimum fatalities: {fatalities.min()}")

## String operations on columns

For text columns, pandas provides special string methods through the `.str` accessor:

In [None]:
# Convert make and model to uppercase for consistency
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()
print("Converted to uppercase:")
accident_list["latimes_make_and_model"].value_counts().head()

In [None]:
# Check if location contains certain words
airport_mask = accident_list["location"].str.contains("Airport", na=False)
print(f"Accidents at locations containing 'Airport': {airport_mask.sum()}")

## Creating new columns

You can create new columns by assigning values to a new column name:

In [None]:
# Create a column indicating if accident was fatal
accident_list["is_fatal"] = accident_list["total_fatalities"] > 0
print("Fatal vs non-fatal accidents:")
accident_list["is_fatal"].value_counts()

In [None]:
# Create a severity category based on fatalities
def categorize_severity(fatalities):
    if fatalities == 0:
        return "Non-fatal"
    elif fatalities <= 2:
        return "Low fatality"
    else:
        return "High fatality"

accident_list["severity_category"] = accident_list["total_fatalities"].apply(categorize_severity)
print("Accident severity categories:")
accident_list["severity_category"].value_counts()

## Selecting multiple columns

You can select multiple columns by passing a list of column names:

In [None]:
# Select specific columns for analysis
key_columns = ["accident_number", "date", "state", "latimes_make_and_model", "total_fatalities", "severity_category"]
summary_data = accident_list[key_columns]
summary_data.head()

## Column information

You can get information about your columns and their data types:

In [None]:
# Get column data types
print("Column data types:")
print(accident_list.dtypes)

In [None]:
# Check for missing values in each column
print("Missing values per column:")
print(accident_list.isnull().sum())

Working with individual columns is fundamental to pandas data analysis. These techniques allow you to clean, transform, and analyze your data column by column, building up to more complex analyses that combine multiple variables.