# Data Analysis with Pandas: Pivot Tables
This notebook demonstrates how to use pivot tables in pandas for data analysis.

## Step 1: Importing Required Libraries
We start by importing the pandas library, which is essential for data manipulation and analysis.

In [None]:
import pandas as pd

## Step 2: Loading the Dataset
We load the dataset containing homicide statistics from a CSV file. The dataset uses a semicolon (`;`) as the delimiter.

In [None]:
df = pd.read_csv("../data/ipea/homicidios-consolidados.csv", sep=";")
# Display the first few rows of the dataset to understand its structure
df.head()

## Step 3: Reshaping the Data
We reshape the dataset using the `stack` method to convert columns into rows. This is useful for creating a long-format dataset.

In [None]:
# Reshape the dataset by stacking columns into rows
df_stack = (
    df.set_index(["nome", "período"])  # Set 'nome' and 'período' as the index
    .stack()  # Stack the remaining columns into rows
    .reset_index()  # Reset the index to return to a tabular format
)

### Renaming Columns
After reshaping, we rename the columns for better readability and consistency.

In [None]:
# Rename the columns for clarity
df_stack.columns = ["nome", "periodo", "metrica", "valor"]
# Display the reshaped dataset
df_stack

## Step 4: Creating Pivot Tables
Pivot tables allow us to summarize and analyze data by reorganizing it into a more readable format.

### Pivot Table 1: Grouping by 'nome' and 'periodo'
This pivot table summarizes the data by grouping it by `nome` and `periodo` and displaying the values for each metric.

In [None]:
# Create a pivot table grouped by 'nome' and 'periodo', with metrics as columns
df_stack.pivot_table(values="valor",
                     index=["nome", "periodo"],
                     columns="metrica")

### Pivot Table 2: Aggregating by Mean
This pivot table calculates the mean value for each metric, grouped only by `nome`.

In [None]:
# Create a pivot table grouped by 'nome', aggregating metrics using the mean function
df_stack.pivot_table(values="valor",
                     index=["nome"],
                     columns="metrica",
                     aggfunc="mean")  # Aggregation function: mean