# Triangle Ingest

This notebook demonstrates how to ingest triangle data from various external formats into Bermuda. We'll work with Excel and CSV files in different formats: long, wide, and array.

## Setup

In [None]:
import bermuda as tri
import pandas as pd
import altair as alt
from datetime import date
import os

# Enable HTML rendering
alt.renderers.enable("html")

# Check that our data files exist
if not os.path.exists('data/excel/triangle_data.xlsx'):
    print("Data files not found. Running data generation script...")
    !python create_excel_data.py
else:
    print("Data files found.")

## Understanding Data Formats

Bermuda supports three main tabular formats for triangle data:

1. **Long Format**: Each row represents a single cell value
2. **Wide Format**: Each row represents all values for a cell
3. **Array Format**: Traditional actuarial triangle layout

## 1. Long Format Ingestion

Long format is the most flexible - each row contains one value for one cell.

In [None]:
# Load and examine the long format CSV
gl_long_df = pd.read_csv('data/excel/gl_long.csv')
print("Long format structure (first 5 rows):")
print(gl_long_df.head())
print(f"\nShape: {gl_long_df.shape}")
print(f"Columns: {list(gl_long_df.columns)}")

In [None]:
# Ingest long format CSV
gl_triangle = tri.long_csv_to_triangle('data/excel/gl_long.csv')
print("Triangle loaded from long CSV:")
print(gl_triangle)

In [None]:
# Note the renamed columns - we need to fix the mapping
# The CSV has 'paid_losses' but Bermuda expects 'paid_loss'
# Let's reload with proper column mapping

# First, load as DataFrame and rename columns
gl_df = pd.read_csv('data/excel/gl_long.csv')
gl_df = gl_df.rename(columns={
    'paid_losses': 'paid_loss',
    'incurred_losses': 'reported_loss',
    'earned_prem': 'earned_premium'
})

# Now ingest from DataFrame
gl_triangle = tri.long_data_frame_to_triangle(gl_df)
print("Triangle with corrected column names:")
print(gl_triangle)

## 2. Wide Format Ingestion

Wide format has one row per cell with all fields as columns.

In [None]:
# Load wide format from Excel
ca_wide_df = pd.read_excel('data/excel/triangle_data.xlsx', sheet_name='ca_wide_format')
print("Wide format structure (first 5 rows):")
print(ca_wide_df.head())
print(f"\nShape: {ca_wide_df.shape}")

In [None]:
# Ingest wide format
ca_triangle = tri.wide_data_frame_to_triangle(ca_wide_df)
print("Triangle loaded from wide format:")
print(ca_triangle)

## 3. Array Format Ingestion

Array format is the traditional actuarial triangle layout with accident periods as rows and development periods as columns.

In [None]:
# Load array format from Excel
pa_array_df = pd.read_excel('data/excel/triangle_data.xlsx', sheet_name='pa_array_format')
print("Array format structure:")
print(pa_array_df)
print(f"\nShape: {pa_array_df.shape}")

In [None]:
# Ingest array format
# Array format requires specifying the field name
pa_triangle = tri.array_data_frame_to_triangle(
    pa_array_df, 
    field='paid_loss',
    period_resolution=12,  # Annual periods
    eval_resolution=12     # Annual evaluations
)
print("Triangle loaded from array format:")
print(pa_triangle)

## Interactive Exercise: Load Your Own Data

Now it's your turn! Complete the code below to load triangle data in different formats.

### Exercise 1: Load Wide Format CSV

Complete the code to load the CA wide format from CSV:

In [None]:
# TODO: Load ca_wide.csv using the appropriate Bermuda function
# Hint: Use tri.wide_csv_to_triangle()

# ca_triangle_csv = tri.________('data/excel/ca_wide.csv')
# print(ca_triangle_csv)

### Exercise 2: Load Array Format from CSV

Complete the code to load PA array data from CSV:

In [None]:
# TODO: Load pa_array.csv and convert to triangle
# Remember: array format needs the field name!

# pa_csv_df = pd.read_csv('data/excel/pa_array.csv')
# pa_triangle_csv = tri.array_data_frame_to_triangle(
#     pa_csv_df,
#     field='______',  # What field should go here?
#     period_resolution=___,  # Annual = ?
#     eval_resolution=___     # Annual = ?
# )
# print(pa_triangle_csv)

### Exercise 3: Handle Multiple Fields in Array Format

Load both paid and reported losses from array format:

In [None]:
# TODO: Create a triangle with multiple fields from array format
# Hint: Use tri.array_triangle_builder() with lists of DataFrames and field names

# Load a sample triangle to get both fields
# sample = tri.binary_to_triangle('data/excel/ca_filtered.trib')
# 
# # Export to array format for both fields
# paid_array = tri.triangle_to_array_data_frame(sample, field='paid_loss')
# reported_array = tri.triangle_to_array_data_frame(sample, field='reported_loss')
# 
# # Now build a multi-field triangle
# multi_triangle = tri.array_triangle_builder(
#     dfs=[_____, _____],  # List of DataFrames
#     fields=['_____', '_____'],  # Corresponding field names
#     period_resolution=12,
#     eval_resolution=12
# )
# print(multi_triangle)

## Creating Multi-Slice Triangles

Now let's combine our three triangles (GL, CA, PA) into a single multi-slice triangle.

In [None]:
# Add metadata to distinguish the slices
gl_cells = []
for cell in gl_triangle:
    new_meta = cell.metadata.copy()
    new_meta.details['line'] = 'GL'
    gl_cells.append(cell.copy(metadata=new_meta))
gl_with_meta = tri.Triangle(gl_cells)

ca_cells = []
for cell in ca_triangle:
    new_meta = cell.metadata.copy()
    new_meta.details['line'] = 'CA'
    ca_cells.append(cell.copy(metadata=new_meta))
ca_with_meta = tri.Triangle(ca_cells)

pa_cells = []
for cell in pa_triangle:
    new_meta = cell.metadata.copy()
    new_meta.details['line'] = 'PA'
    pa_cells.append(cell.copy(metadata=new_meta))
pa_with_meta = tri.Triangle(pa_cells)

# Combine into multi-slice triangle
combined = gl_with_meta + ca_with_meta + pa_with_meta
print("Combined multi-slice triangle:")
print(combined)
print(f"\nNumber of slices: {len(combined.slices)}")

In [None]:
# Visualize the combined triangle
if len(combined.slices) > 1:
    combined.plot_right_edge()
else:
    # If slices didn't separate, just plot the combined data
    combined.plot_data_completeness()

## Saving Triangle Data

Bermuda supports multiple output formats for saving your work.

In [None]:
# Save to CSV (long format)
tri.triangle_to_long_csv(combined, 'data/excel/combined_long.csv')
print("Saved to long CSV: data/excel/combined_long.csv")

# Save to CSV (wide format)
tri.triangle_to_wide_csv(combined, 'data/excel/combined_wide.csv')
print("Saved to wide CSV: data/excel/combined_wide.csv")

In [None]:
# Save to JSON
tri.triangle_to_json(combined, 'data/excel/combined.json')
print("Saved to JSON: data/excel/combined.json")

# Peek at the JSON structure
import json
with open('data/excel/combined.json', 'r') as f:
    json_data = json.load(f)
    print(f"\nJSON has {len(json_data)} cells")
    print("First cell structure:")
    print(json.dumps(json_data[0], indent=2, default=str)[:500] + "...")

In [None]:
# Save to binary trib format (most efficient)
tri.triangle_to_binary(combined, 'data/excel/combined.trib')
print("Saved to binary trib: data/excel/combined.trib")

# Compare file sizes
import os
sizes = {
    'CSV (long)': os.path.getsize('data/excel/combined_long.csv'),
    'CSV (wide)': os.path.getsize('data/excel/combined_wide.csv'),
    'JSON': os.path.getsize('data/excel/combined.json'),
    'Trib (binary)': os.path.getsize('data/excel/combined.trib')
}

print("\nFile size comparison:")
for format_name, size in sizes.items():
    print(f"  {format_name}: {size:,} bytes")

print(f"\nBinary format is {sizes['JSON'] / sizes['Trib (binary)']:.1f}x smaller than JSON")

## Summary

In this notebook, we've covered:

1. **Data Format Types**: Long, wide, and array formats each have their use cases
2. **Ingestion Methods**: Different functions for each format
3. **Column Mapping**: Aligning external column names with Bermuda conventions
4. **Multi-Slice Triangles**: Combining triangles from different sources
5. **Export Options**: CSV, JSON, and binary formats with different trade-offs

The binary trib format is Ledger's proprietary format that:
- Saves space (typically 5-10x smaller than JSON)
- Loads faster (no parsing overhead)
- Preserves all metadata and structure perfectly
- Works seamlessly across Bermuda versions

### Answer Key for Exercises

**Exercise 1:**
```python
ca_triangle_csv = tri.wide_csv_to_triangle('data/excel/ca_wide.csv')
```

**Exercise 2:**
```python
pa_csv_df = pd.read_csv('data/excel/pa_array.csv')
pa_triangle_csv = tri.array_data_frame_to_triangle(
    pa_csv_df,
    field='paid_loss',
    period_resolution=12,
    eval_resolution=12
)
```

**Exercise 3:**
```python
multi_triangle = tri.array_triangle_builder(
    dfs=[paid_array, reported_array],
    fields=['paid_loss', 'reported_loss'],
    period_resolution=12,
    eval_resolution=12
)
```