# R vs Python Comparison

This notebook compares outputs from the R implementation with the Python conversion
to ensure accuracy of the migration.

## Comparison Tests

1. Helper functions (bond returns, maturity calculations)
2. Yield curve fitting (Nelson-Siegel & Svensson)
3. Swap analysis results

**Note:** You'll need R installed with rpy2 to run R code directly from Python,
or you can run the R scripts separately and compare the saved outputs.

In [1]:
import sys
sys.path.insert(0, '../src')

import numpy as np
import pandas as pd
from pathlib import Path

from yield_curves.helpers import (
    monthly_bond_return,
    round_month_to_maturity,
    generate_month_end_series
)

print("Python modules loaded successfully!")

Python modules loaded successfully!


## Test 1: Monthly Bond Return Function

Compare the Henckel formula implementation.

In [2]:
# Test case: yields from example
test_yields = np.array([2.5, 2.6, 2.4, 2.7, 2.5])

print("Test Yields:", test_yields)
print("\nPython Bond Returns:")

python_returns = []
for i in range(1, len(test_yields)):
    ret = monthly_bond_return(test_yields, i)
    python_returns.append(ret)
    print(f"  Month {i}: {ret:.6f} ({ret*100:.4f}%)")

print("\nR Implementation:")
print("To compare with R:")
print("1. Source helper_fn.R in R")
print("2. Run: yield <- c(2.5, 2.6, 2.4, 2.7, 2.5)")
print("3. Run: sapply(2:5, function(i) monthly_bond_ret(yield, i))")
print("\nExpected R output should match Python results above.")

Test Yields: [2.5 2.6 2.4 2.7 2.5]

Python Bond Returns:
  Month 1: -0.006609 (-0.6609%)
  Month 2: 0.019723 (1.9723%)
  Month 3: -0.023948 (-2.3948%)
  Month 4: 0.019720 (1.9720%)

R Implementation:
To compare with R:
1. Source helper_fn.R in R
2. Run: yield <- c(2.5, 2.6, 2.4, 2.7, 2.5)
3. Run: sapply(2:5, function(i) monthly_bond_ret(yield, i))

Expected R output should match Python results above.


## Test 2: Time to Maturity Calculation

Compare month-based maturity calculations.

In [3]:
# Test cases
test_cases = [
    ('2020-01-15', '2025-01-15'),  # Exactly 5 years
    ('2020-01-15', '2020-07-15'),  # 6 months
    ('2018-12-31', '2026-12-31'),  # From main.R example
]

print("Python Time to Maturity Results:")
for current, maturity in test_cases:
    ttm = round_month_to_maturity(current, maturity, num_digits=4)
    print(f"  {current} to {maturity}: {ttm:.4f} years")

print("\nTo verify in R:")
print("source('helper_fn.R')")
for current, maturity in test_cases:
    print(f"round_mo_to_maturity('{current}', '{maturity}', NUM_DIG_MATURITY=4)")

Python Time to Maturity Results:
  2020-01-15 to 2025-01-15: 5.0000 years
  2020-01-15 to 2020-07-15: 0.5000 years
  2018-12-31 to 2026-12-31: 8.0000 years

To verify in R:
source('helper_fn.R')
round_mo_to_maturity('2020-01-15', '2025-01-15', NUM_DIG_MATURITY=4)
round_mo_to_maturity('2020-01-15', '2020-07-15', NUM_DIG_MATURITY=4)
round_mo_to_maturity('2018-12-31', '2026-12-31', NUM_DIG_MATURITY=4)


## Test 3: Month-End Series Generation

Compare date series generation.

In [4]:
# Generate series matching main.R example
orig_date = '2018-12-31'
mat_date = '2026-12-31'

df_python = generate_month_end_series(
    orig_date, mat_date,
    columns=['Time_to_Maturity', 'long_bond_yield', 'short_bond_yield']
)

print(f"Python generated {len(df_python)} month-end dates")
print(f"First date: {df_python.index[0]}")
print(f"Last date: {df_python.index[-1]}")
print(f"\nFirst 5 dates:")
print(df_python.index[:5])

print("\nTo compare with R:")
print("source('helper_fn.R')")
print(f"xts_res <- generate_month_end_xts('{orig_date}', '{mat_date}', cnames=c('Value'))")
print("index(xts_res)")

Python generated 84 month-end dates
First date: 2018-12-31 00:00:00
Last date: 2025-11-30 00:00:00

First 5 dates:
DatetimeIndex(['2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31',
               '2019-04-30'],
              dtype='datetime64[ns]', freq='ME')

To compare with R:
source('helper_fn.R')
xts_res <- generate_month_end_xts('2018-12-31', '2026-12-31', cnames=c('Value'))
index(xts_res)


  month_ends = pd.date_range(


## Test 4: Load and Compare Actual Data

Load the same data files and compare processing.

In [5]:
# Load FX data (same file used by both R and Python)
fx_path = Path('../data/raw/wpu_exchange_rates.csv')

if fx_path.exists():
    fx_data = pd.read_csv(fx_path, parse_dates=['Date'], index_col='Date')
    print(f"Loaded FX data: {fx_data.shape}")
    print(f"Date range: {fx_data.index.min()} to {fx_data.index.max()}")
    print(f"\nFirst few rows:")
    print(fx_data.head())
    
    # Calculate returns
    fx_returns = fx_data['WPUUSD'].pct_change()
    print(f"\nPython FX returns statistics:")
    print(f"  Mean: {fx_returns.mean():.6f}")
    print(f"  Std: {fx_returns.std():.6f}")
    print(f"  Min: {fx_returns.min():.6f}")
    print(f"  Max: {fx_returns.max():.6f}")
    
    print("\nCompare with R using:")
    print("wpu <- read_csv('data/raw/wpu_exchange_rates.csv')")
    print("wpu_returns <- diff(wpu$WPUUSD) / head(wpu$WPUUSD, -1)")
    print("summary(wpu_returns)")
else:
    print(f"FX data not found at {fx_path}")
    print("Make sure to copy wpu_exchange_rates.csv to data/raw/")

Loaded FX data: (9082, 12)
Date range: 1989-11-30 00:00:00 to 2025-07-29 00:00:00

First few rows:
            WPUUSD  WPUEUR  WPUGBP  WPUAUD    WPUJPY  WPUCHF  WPUCAD  WPUCNY  \
Date                                                                           
1989-11-30  0.7907  0.6937  0.5042  1.0122  113.1182  1.2606  0.9199     NaN   
1989-12-01  0.7900  0.6938  0.5051  1.0125  113.1698  1.2627  0.9222     NaN   
1989-12-04  0.7898  0.6918  0.5055  1.0112  113.4541  1.2637  0.9197     NaN   
1989-12-05  0.7909  0.6919  0.5047  1.0146  113.9400  1.2567  0.9195     NaN   
1989-12-06  0.7905  0.6904  0.5045  1.0149  114.4042  1.2612  0.9191     NaN   

             WPUINR  WPUBRL  WPURUB  WPUMXN  
Date                                         
1989-11-30  13.3399     0.0     NaN     NaN  
1989-12-01  13.3268     0.0     NaN     NaN  
1989-12-04  13.3547     0.0     NaN     NaN  
1989-12-05  13.3523     0.0     NaN     NaN  
1989-12-06  13.3635     0.0     NaN     NaN  

Python FX returns

  fx_returns = fx_data['WPUUSD'].pct_change()


## Test 5: Numerical Precision Check

Check if Python and R produce identical numerical results within floating-point precision.

In [6]:
# Create a comprehensive test case
print("Comprehensive Bond Return Test")
print("=" * 60)

test_yields = np.array([2.0, 2.1, 2.2, 2.3, 2.4, 2.5])

print(f"\nYields: {test_yields}")
print(f"\nPython Results:")

results_df = pd.DataFrame({
    'Index': range(1, len(test_yields)),
    'Prior_Yield': test_yields[:-1],
    'Current_Yield': test_yields[1:],
    'Return': [monthly_bond_return(test_yields, i) for i in range(1, len(test_yields))]
})

print(results_df.to_string(index=False))

print("\n" + "="*60)
print("To replicate in R:")
print("""source('helper_fn.R')
yields <- c(2.0, 2.1, 2.2, 2.3, 2.4, 2.5)
results <- data.frame(
  Index = 1:5,
  Prior_Yield = yields[-length(yields)],
  Current_Yield = yields[-1],
  Return = sapply(2:6, function(i) monthly_bond_ret(yields, i))
)
print(results, digits=10)""")

print("\nExpected: Differences should be < 1e-10 (floating point precision)")

Comprehensive Bond Return Test

Yields: [2.  2.1 2.2 2.3 2.4 2.5]

Python Results:
 Index  Prior_Yield  Current_Yield    Return
     1          2.0            2.1 -0.007244
     2          2.1            2.2 -0.007116
     3          2.2            2.3 -0.006989
     4          2.3            2.4 -0.006862
     5          2.4            2.5 -0.006735

To replicate in R:
source('helper_fn.R')
yields <- c(2.0, 2.1, 2.2, 2.3, 2.4, 2.5)
results <- data.frame(
  Index = 1:5,
  Prior_Yield = yields[-length(yields)],
  Current_Yield = yields[-1],
  Return = sapply(2:6, function(i) monthly_bond_ret(yields, i))
)
print(results, digits=10)

Expected: Differences should be < 1e-10 (floating point precision)


## Summary

Run the R code snippets shown above in RStudio and compare with the Python outputs.

### Expected Results
- Bond returns should match to ~10 decimal places
- Time to maturity should match exactly
- Date sequences should be identical
- FX statistics should match closely

### Next Steps
1. Run original `main.R` to generate reference outputs
2. Run Python notebooks to generate new outputs
3. Compare final results (swap returns, cumulative performance)
4. Document any differences found