# Teradata SpoolSpace History Report

This notebook demonstrates how to retrieve spool space usage history from Teradata PDCR data using the `PDCRInfoReport` class.

**Report Parameters:**
- Database filter: `DWP01%` (all databases starting with DWP01)
- Time range: Last 3 years
- Data source: `PDCRINFO.SpoolSpace_Hst`

## 1. Import Required Libraries

Import necessary libraries for PDCR reporting and data analysis.

In [1]:
import logging
import sys
from pathlib import Path
from datetime import date, timedelta
import pandas as pd

# Add src to path for imports
sys.path.insert(0, str(Path.cwd()))

# Import the reporting module
from src.reports import PDCRInfoReport
from src.connection import TeradataConnectionError

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("✓ Libraries imported successfully!")

✓ Libraries imported successfully!


## 2. Configure Date Range

Calculate the date range for the last 3 years of data.

In [2]:
# Calculate last 3 years date range
end_date = date.today() - timedelta(days=1)  # Yesterday
start_date = end_date - timedelta(days=3*365)   # 3 years ago

# Database filter pattern
database_pattern = "DWP01%"

print(f"Date Range:")
print(f"  Start Date: {start_date}")
print(f"  End Date:   {end_date}")
print(f"  Database Pattern: {database_pattern}")
print(f"  Days: {(end_date - start_date).days + 1}")

Date Range:
  Start Date: 2023-01-15
  End Date:   2026-01-14
  Database Pattern: DWP01%
  Days: 1096


## 3. Initialize PDCR Report Generator

Create an instance of the `PDCRInfoReport` class to access PDCR data.

In [3]:
try:
    # Initialize the report generator
    report = PDCRInfoReport()
    print("✓ PDCRInfoReport initialized successfully")
    
    # List available environments
    environments = report.conn_mgr.list_environments()
    print(f"✓ Available environments: {environments}")
    
except TeradataConnectionError as e:
    print(f"✗ Connection Error: {e}")
    print("\nPlease ensure:")
    print("1. td_env.yaml file exists in the project root")
    print("2. Copy td_env.yaml.template to td_env.yaml")
    print("3. Update credentials for your test/prod environments")

2026-01-15 16:27:57,937 - src.connection - INFO - Loaded configuration for: ['test', 'prod']


✓ PDCRInfoReport initialized successfully
✓ Available environments: ['test', 'prod']


## 4. Retrieve SpoolSpace History Data

Query `PDCRINFO.SpoolSpace_Hst` for all databases starting with `DWP01%` over the last 3 years.

In [15]:
try:
    # Retrieve spoolspace history
    df = report.get_spoolspace_history(
        env_name='test',  # Change to 'prod' for production data
        start_date=start_date,
        end_date=end_date
    )
    
    print(f"✓ Retrieved {len(df):,} rows from PDCRINFO.SpoolSpace_Hst")
    print(f"\nDataFrame Shape: {df.shape}")
    print(f"Memory Usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
except Exception as e:
    print(f"✗ Error retrieving spoolspace data: {e}")
    df = None

2026-01-15 16:28:25,746 - src.reports - INFO - Query Text: 
        SELECT
            LogDate, UserName, AccountName, CURRENTSPOOL, PEAKSPOOL,
		MAXSPOOL, PEAKSPOOLSKEW, CURRENTTEMP, PEAKTEMP, MAXTEMP, PEAKTEMPSKEW
        FROM PDCRINFO.SpoolSpace_Hst
        WHERE Logdate BETWEEN :start_date AND :end_date
          AND TRIM(UserName) LIKE :user_name
          AND TRIM(AccountName) LIKE :account_name
        ORDER BY 1, 2, 3;
        
2026-01-15 16:28:25,747 - src.reports - INFO - Fetching SpoolSpace history for test between 2023-01-15 and 2026-01-14
2026-01-15 16:28:25,748 - src.connection - INFO - Connection string: teradatasql://@teradw/ALL?logmech=BROWSER


Connection string: teradatasql://@teradw/ALL?logmech=BROWSER
2026-01-15 16:28:35,695 INFO sqlalchemy.engine.Engine SELECT dbc.dbcinfov."InfoData" 
FROM dbc.dbcinfov 
WHERE dbc.dbcinfov."InfoKey" = ?


2026-01-15 16:28:35,695 - sqlalchemy.engine.Engine - INFO - SELECT dbc.dbcinfov."InfoData" 
FROM dbc.dbcinfov 
WHERE dbc.dbcinfov."InfoKey" = ?


2026-01-15 16:28:35,698 INFO sqlalchemy.engine.Engine [dialect teradatasql+teradatasql does not support caching 0.00273s] ('VERSION',)


2026-01-15 16:28:35,698 - sqlalchemy.engine.Engine - INFO - [dialect teradatasql+teradatasql does not support caching 0.00273s] ('VERSION',)


2026-01-15 16:28:35,969 INFO sqlalchemy.engine.Engine select database


2026-01-15 16:28:35,969 - sqlalchemy.engine.Engine - INFO - select database


2026-01-15 16:28:35,972 INFO sqlalchemy.engine.Engine [dialect teradatasql+teradatasql does not support caching 0.00239s] ()


2026-01-15 16:28:35,972 - sqlalchemy.engine.Engine - INFO - [dialect teradatasql+teradatasql does not support caching 0.00239s] ()


2026-01-15 16:28:36,057 INFO sqlalchemy.engine.Engine BEGIN (implicit)


2026-01-15 16:28:36,057 - sqlalchemy.engine.Engine - INFO - BEGIN (implicit)


2026-01-15 16:28:36,060 INFO sqlalchemy.engine.Engine SELECT 1


2026-01-15 16:28:36,060 - sqlalchemy.engine.Engine - INFO - SELECT 1


2026-01-15 16:28:36,062 INFO sqlalchemy.engine.Engine [dialect teradatasql+teradatasql does not support caching 0.00421s] ()


2026-01-15 16:28:36,062 - sqlalchemy.engine.Engine - INFO - [dialect teradatasql+teradatasql does not support caching 0.00421s] ()


2026-01-15 16:28:36,104 INFO sqlalchemy.engine.Engine ROLLBACK


2026-01-15 16:28:36,104 - sqlalchemy.engine.Engine - INFO - ROLLBACK
2026-01-15 16:28:36,149 - src.connection - INFO - Created connection to 'test' environment


2026-01-15 16:28:36,196 INFO sqlalchemy.engine.Engine BEGIN (implicit)


2026-01-15 16:28:36,196 - sqlalchemy.engine.Engine - INFO - BEGIN (implicit)


2026-01-15 16:28:36,200 INFO sqlalchemy.engine.Engine SELECT dbc."tablesV"."TableName" 
FROM dbc."tablesV" 
WHERE DatabaseName (NOT CASESPECIFIC) = ? (NOT CASESPECIFIC) AND TableName=? AND TableKind IN ('O', 'Q', 'T', 'V')


2026-01-15 16:28:36,200 - sqlalchemy.engine.Engine - INFO - SELECT dbc."tablesV"."TableName" 
FROM dbc."tablesV" 
WHERE DatabaseName (NOT CASESPECIFIC) = ? (NOT CASESPECIFIC) AND TableName=? AND TableKind IN ('O', 'Q', 'T', 'V')


2026-01-15 16:28:36,206 INFO sqlalchemy.engine.Engine [dialect teradatasql+teradatasql does not support caching 0.01035s] ('ALL', <sqlalchemy.sql.elements.TextClause object at 0x000002BCD0DECCD0>)


2026-01-15 16:28:36,206 - sqlalchemy.engine.Engine - INFO - [dialect teradatasql+teradatasql does not support caching 0.01035s] ('ALL', <sqlalchemy.sql.elements.TextClause object at 0x000002BCD0DECCD0>)


2026-01-15 16:28:36,212 INFO sqlalchemy.engine.Engine 
        SELECT
            LogDate, UserName, AccountName, CURRENTSPOOL, PEAKSPOOL,
		MAXSPOOL, PEAKSPOOLSKEW, CURRENTTEMP, PEAKTEMP, MAXTEMP, PEAKTEMPSKEW
        FROM PDCRINFO.SpoolSpace_Hst
        WHERE Logdate BETWEEN ? AND ?
          AND TRIM(UserName) LIKE ?
          AND TRIM(AccountName) LIKE ?
        ORDER BY 1, 2, 3;
        


2026-01-15 16:28:36,212 - sqlalchemy.engine.Engine - INFO - 
        SELECT
            LogDate, UserName, AccountName, CURRENTSPOOL, PEAKSPOOL,
		MAXSPOOL, PEAKSPOOLSKEW, CURRENTTEMP, PEAKTEMP, MAXTEMP, PEAKTEMPSKEW
        FROM PDCRINFO.SpoolSpace_Hst
        WHERE Logdate BETWEEN ? AND ?
          AND TRIM(UserName) LIKE ?
          AND TRIM(AccountName) LIKE ?
        ORDER BY 1, 2, 3;
        


2026-01-15 16:28:36,219 INFO sqlalchemy.engine.Engine [dialect teradatasql+teradatasql does not support caching 0.00726s] ('2023-01-15', '2026-01-14', '%', '%')


2026-01-15 16:28:36,219 - sqlalchemy.engine.Engine - INFO - [dialect teradatasql+teradatasql does not support caching 0.00726s] ('2023-01-15', '2026-01-14', '%', '%')


2026-01-15 16:28:42,099 INFO sqlalchemy.engine.Engine ROLLBACK


2026-01-15 16:28:42,099 - sqlalchemy.engine.Engine - INFO - ROLLBACK


✓ Retrieved 34,054 rows from PDCRINFO.SpoolSpace_Hst

DataFrame Shape: (34054, 11)
Memory Usage: 7.83 MB


## 5. Display Sample Data

Preview the first few rows to understand the data structure.

In [16]:
if df is not None and not df.empty:
    print("First 10 rows:")
    display(df.head(10))
    
    print("\nColumn Data Types:")
    print(df.dtypes)
else:
    print("No data available to display.")

First 10 rows:


Unnamed: 0,LogDate,UserName,AccountName,CURRENTSPOOL,PEAKSPOOL,MAXSPOOL,PEAKSPOOLSKEW,CURRENTTEMP,PEAKTEMP,MAXTEMP,PEAKTEMPSKEW
0,2024-12-07,ajukes,$L00DUSR&D&H,0.0,33017680000.0,522122500000.0,1.52733,0.0,0.0,524288000.0,
1,2024-12-07,baradmin,DBC,0.0,337567700.0,536870900000.0,89.087129,0.0,0.0,524288000.0,
2,2024-12-07,DM_USER_CLD,$L00TDDM&D&H,0.0,465506300.0,3221225000000.0,69.943033,0.0,0.0,0.0,
3,2024-12-07,DM_USER_CLD_01,$L00TDDM&D&H,0.0,7072883000.0,3221225000000.0,0.2333,0.0,0.0,0.0,
4,2024-12-07,DM_USER_CLD_02,$L00TDDM&D&H,0.0,32301060.0,3221225000000.0,94.547843,0.0,0.0,0.0,
5,2024-12-07,DM_USER_CLD_03,$L00TDDM&D&H,0.0,5224677000.0,3221225000000.0,0.585782,0.0,0.0,0.0,
6,2024-12-07,DM_USER_CLD_04,$L00TDDM&D&H,0.0,13561030000.0,3221225000000.0,0.658228,0.0,0.0,0.0,
7,2024-12-07,DM_USER_CLD_05,$L00TDDM&D&H,0.0,754126800.0,3221225000000.0,1.345486,0.0,0.0,0.0,
8,2024-12-07,DM_USER_CLD_06,$L00TDDM&D&H,0.0,3015135000.0,3221225000000.0,1.542303,0.0,0.0,0.0,
9,2024-12-07,DM_USER_CLD_07,$L00TDDM&D&H,0.0,6807446000.0,3221225000000.0,0.565386,0.0,0.0,0.0,



Column Data Types:
LogDate           object
UserName          object
AccountName       object
CURRENTSPOOL     float64
PEAKSPOOL        float64
MAXSPOOL         float64
PEAKSPOOLSKEW    float64
CURRENTTEMP      float64
PEAKTEMP         float64
MAXTEMP          float64
PEAKTEMPSKEW      object
dtype: object


## 6. Data Summary Statistics

Analyze the spoolspace usage across all retrieved data.

In [20]:
if df is not None and not df.empty:
    print("=" * 80)
    print("SPOOLSPACE SUMMARY STATISTICS")
    print("=" * 80)
    
    # Date range
    print(f"\nDate Range:")
    print(f"  First Log Date: {df['LogDate'].min()}")
    print(f"  Last Log Date:  {df['LogDate'].max()}")
    print(f"  Unique Dates:   {df['LogDate'].nunique()}")
    
    # Database coverage
    print(f"\nDatabase Coverage:")
    print(f"  Unique Users: {df['UserName'].nunique()}")
    print(f"  Unique Accounts:  {df['AccountName'].nunique()}")
    
    # Space usage statistics (in bytes, convert to GB)
    print(f"\nCurrent Spool Usage (GB):")
    print(f"  Total:   {df['CURRENTSPOOL'].sum() / 1024**3:,.2f}")
    print(f"  Mean:    {df['CURRENTSPOOL'].mean() / 1024**3:,.2f}")
    print(f"  Median:  {df['CURRENTSPOOL'].median() / 1024**3:,.2f}")
    print(f"  Max:     {df['CURRENTSPOOL'].max() / 1024**3:,.2f}")
    
    print(f"\nPeak Spool Usage (GB):")
    print(f"  Total:   {df['PEAKSPOOL'].sum() / 1024**3:,.2f}")
    print(f"  Mean:    {df['PEAKSPOOL'].mean() / 1024**3:,.2f}")
    print(f"  Median:  {df['PEAKSPOOL'].median() / 1024**3:,.2f}")
    print(f"  Max:     {df['PEAKSPOOL'].max() / 1024**3:,.2f}")
    
    print(f"\nMax Spool Usage (GB):")
    print(f"  Total:   {df['MAXSPOOL'].sum() / 1024**3:,.2f}")
    print(f"  Mean:    {df['MAXSPOOL'].mean() / 1024**3:,.2f}")
    print(f"  Median:  {df['MAXSPOOL'].median() / 1024**3:,.2f}")
    print(f"  Max:     {df['MAXSPOOL'].max() / 1024**3:,.2f}")
else:
    print("No data available for analysis.")

SPOOLSPACE SUMMARY STATISTICS

Date Range:
  First Log Date: 2024-12-07
  Last Log Date:  2026-01-14
  Unique Dates:   403

Database Coverage:
  Unique Users: 273
  Unique Accounts:  9

Current Spool Usage (GB):
  Total:   124,311.54
  Mean:    3.65
  Median:  0.00
  Max:     1,879.88

Peak Spool Usage (GB):
  Total:   4,628,820.04
  Mean:    135.93
  Median:  10.75
  Max:     3,475.17

Max Spool Usage (GB):
  Total:   60,995,698.26
  Mean:    1,791.15
  Median:  1,000.00
  Max:     49,233.62


## 7. Top Databases by Current Spool Usage

Identify the databases with highest current spool space usage.

In [32]:
if df is not None and not df.empty:
    # Get the most recent data for each user
    latest_data = df.loc[df.groupby('UserName')['LogDate'].idxmax()]
    print("columns:", latest_data.columns.tolist())
    # Sort by current spool usage
    top_dbs = latest_data.nlargest(20, 'CURRENTSPOOL')[[
        'UserName', 'AccountName', 
        'CURRENTSPOOL', 'PEAKSPOOL', 'MAXSPOOL', 'PEAKSPOOLSKEW','PEAKSPOOLSKEW', 'PEAKTEMP', 'MAXTEMP', 'PEAKTEMPSKEW'
    ]].copy()
    
    # Convert to GB for readability
    # Rounding will be done during display
    top_dbs['CURRENTSPOOL_GB'] = top_dbs['CURRENTSPOOL'] / 1024**3
    top_dbs['PEAKSPOOL_GB'] = top_dbs['PEAKSPOOL'] / 1024**3
    top_dbs['MAXSPOOL_GB'] = top_dbs['MAXSPOOL'] / 1024**3
    
    print("\nTop 20 Users by Current Spool Space Usage:")
    print("=" * 120)
    display(top_dbs[[
        'UserName', 'CURRENTSPOOL_GB', 'PEAKSPOOL_GB', 'MAXSPOOL_GB'
    ]].sort_values('CURRENTSPOOL_GB', ascending=False).round(2))
else:
    print("No data available for database ranking.")

columns: ['LogDate', 'UserName', 'AccountName', 'CURRENTSPOOL', 'PEAKSPOOL', 'MAXSPOOL', 'PEAKSPOOLSKEW', 'CURRENTTEMP', 'PEAKTEMP', 'MAXTEMP', 'PEAKTEMPSKEW']

Top 20 Users by Current Spool Space Usage:


Unnamed: 0,UserName,CURRENTSPOOL_GB,PEAKSPOOL_GB,MAXSPOOL_GB
34033,rbhattacharya,75.7,101.11,486.26
34003,DWP01U_STG_ODI,11.81,1836.27,2862.65
34037,rranjan,1.28,24.6,486.26
33424,mdas,1.23,0.85,486.26
33985,DWP01U_ACC_TBU_ORR,0.06,1038.36,2862.65
34035,rlister,0.02,0.16,600.0
33319,gganguly,0.02,0.02,486.26
27161,amishra2,0.01,0.01,486.26
33870,amraj,0.0,438.1,486.26
33612,hbansal,0.0,0.02,486.26


## 8. Database-Level Aggregation

Get summary statistics for each database showing spool usage patterns.

In [30]:
if df is not None and not df.empty:
    # Get most recent data
    latest_data = df.loc[df.groupby('UserName')['LogDate'].idxmax()]
    
    # Aggregate by user
    user_summary = latest_data.groupby('UserName').agg({
        'CURRENTSPOOL': 'first',
        'PEAKSPOOL': 'first',
        'MAXSPOOL': 'first'
    }).round(2)
    
    # Convert to GB
    user_summary['CURRENTSPOOL_GB'] = (user_summary['CURRENTSPOOL'] / 1024**3).round(2)
    user_summary['PEAKSPOOL_GB'] = (user_summary['PEAKSPOOL'] / 1024**3).round(2)
    user_summary['MAXSPOOL_GB'] = (user_summary['MAXSPOOL'] / 1024**3).round(2)
    
    # Sort by current usage
    user_summary = user_summary.sort_values('CURRENTSPOOL_GB', ascending=False)
    
    print("\nUser-Level Spool Space Usage Summary (Latest):")
    print("=" * 100)
    display(user_summary[['CURRENTSPOOL_GB', 'PEAKSPOOL_GB', 'MAXSPOOL_GB']])
    
    print(f"\nTotal Spool Space Across All DWP01% Users: {user_summary['CURRENTSPOOL_GB'].sum():,.2f} GB")
else:
    print("No data available for user aggregation.")


User-Level Spool Space Usage Summary (Latest):


Unnamed: 0_level_0,CURRENTSPOOL_GB,PEAKSPOOL_GB,MAXSPOOL_GB
UserName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
rbhattacharya,75.70,101.11,486.26
DWP01U_STG_ODI,11.81,1836.27,2862.65
rranjan,1.28,24.60,486.26
mdas,1.23,0.85,486.26
DWP01U_ACC_TBU_ORR,0.06,1038.36,2862.65
...,...,...,...
DM_USER_CLD_07,0.00,0.00,3000.00
DM_USER_CLD_08,0.00,0.01,3000.00
DM_USER_CLD_09,0.00,0.01,3000.00
DM_USER_CLD_10,0.00,0.03,3000.00



Total Spool Space Across All DWP01% Users: 90.13 GB


## 9. Visualize Database Spool Usage

Visualize the top databases by current spool space usage using a bar chart.

In [9]:
import matplotlib.pyplot as plt
import seaborn as sns

if df is not None and not df.empty:
    # Get most recent data
    latest_data = df.loc[df.groupby('DatabaseName')['LogDate'].idxmax()]

    # Get top 10 databases
    top_dbs = latest_data.nlargest(10, 'CURRENTSPOOL')[['DatabaseName', 'CURRENTSPOOL']].copy()
    top_dbs['CURRENTSPOOL_GB'] = top_dbs['CURRENTSPOOL'] / 1024**3

    # Plot
    plt.figure(figsize=(12, 6))
    sns.barplot(
        data=top_dbs,
        x='DatabaseName',
        y='CURRENTSPOOL_GB',
        palette='viridis'
    )
    plt.title('Top 10 Databases by Current Spool Space Usage (GB)')
    plt.xlabel('Database Name')
    plt.ylabel('Current Spool Space (GB)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

## 10. Spool Usage Trends Over Time

Plot total spool usage for all databases over the 3-year period.

In [10]:
# Plot total spool usage over time
import matplotlib.pyplot as plt

if df is not None and not df.empty:
    df_databases_over_time = df.groupby(['LogDate'])['CURRENTSPOOL'].sum().reset_index()
    plt.figure(figsize=(14, 6))
    plt.plot(df_databases_over_time['LogDate'], df_databases_over_time['CURRENTSPOOL'] / 1024**3, marker='o', linewidth=2)
    plt.title('Total Current Spool Space Usage Over 3 Years')
    plt.xlabel('Log Date')
    plt.ylabel('Total Current Spool Space (GB)')
    plt.grid()
    plt.tight_layout()
    plt.show()

## 11. Database Spool Usage Distribution

Pie chart showing spool distribution among top databases.

In [11]:
# Plot as pie chart per database usage
import matplotlib.pyplot as plt

if df is not None and not df.empty:
    df_latest = df.loc[df.groupby('DatabaseName')['LogDate'].idxmax()]
    df_db_usage = df_latest.groupby('DatabaseName')['CURRENTSPOOL'].sum().reset_index()
    df_db_usage = df_db_usage.sort_values('CURRENTSPOOL', ascending=False)
    
    top_n = 5
    df_top = df_db_usage.head(top_n)
    df_other = pd.DataFrame({
        'DatabaseName': ['Other'],
        'CURRENTSPOOL': [df_db_usage['CURRENTSPOOL'][top_n:].sum()]
    })
    df_pie = pd.concat([df_top, df_other])
    
    plt.figure(figsize=(10, 8))
    plt.pie(
        df_pie['CURRENTSPOOL'],
        labels=df_pie['DatabaseName'],
        autopct='%1.1f%%',
        startangle=140
    )
    plt.title('Current Spool Space Usage by Database (Latest)')
    plt.show()

## 12. Top Database Spool Analysis

Analyze spool usage trends for the top 6 databases over time.

In [12]:
# Plot the spool usage of the top 6 databases over time in subplots
import matplotlib.pyplot as plt
import numpy as np

if df is not None and not df.empty:
    df_latest = df.loc[df.groupby('DatabaseName')['LogDate'].idxmax()]
    top_dbs = df_latest.nlargest(6, 'CURRENTSPOOL')['DatabaseName'].tolist()
    
    fig, axes = plt.subplots(3, 2, figsize=(16, 12))
    axes = axes.flatten()
    
    for i, db_name in enumerate(top_dbs):
        ax = axes[i]
        df_db = df[df['DatabaseName'] == db_name].sort_values('LogDate').reset_index(drop=True)
        
        # Plot the data
        ax.plot(df_db.index, df_db['CURRENTSPOOL'] / 1024**3, marker='o', label='Current Spool', linewidth=2)
        ax.plot(df_db.index, df_db['PEAKSPOOL'] / 1024**3, marker='s', label='Peak Spool', linewidth=2, alpha=0.7)
        
        # Add a regression line for trend
        if len(df_db) > 1:
            z = np.polyfit(df_db.index, df_db['CURRENTSPOOL'] / 1024**3, 1)
            p = np.poly1d(z)
            ax.plot(df_db.index, p(df_db.index), "r--", alpha=0.7, label='Trend', linewidth=2)
        
        # Set x-axis labels to show dates
        ax.set_xticks(df_db.index[::max(1, len(df_db)//5)])
        ax.set_xticklabels([str(d) for d in df_db.loc[df_db.index[::max(1, len(df_db)//5)], 'LogDate']], rotation=45)
        
        ax.set_title(f'{db_name} - Spool Usage Over Time')
        ax.set_xlabel('Log Date')
        ax.set_ylabel('Spool Space (GB)')
        ax.grid()
        ax.legend()
    
    plt.tight_layout()
    plt.show()
else:
    print("No data available for spool analysis.")

No data available for spool analysis.


## 13. Export Results to CSV (Optional)

Save the results to CSV files for further analysis or reporting.

In [13]:
if df is not None and not df.empty:
    # Create output directory if it doesn't exist
    output_dir = Path('output')
    output_dir.mkdir(exist_ok=True)
    
    # Generate filename with date range
    filename = f"spoolspace_dwp01_{start_date}_{end_date}.csv"
    output_path = output_dir / filename
    
    # Save to CSV
    df.to_csv(output_path, index=False)
    print(f"✓ Data exported to: {output_path}")
    print(f"  Rows: {len(df):,}")
    print(f"  File size: {output_path.stat().st_size / 1024**2:.2f} MB")
else:
    print("No data to export.")

No data to export.


## 14. Close Connections

Properly clean up database connections when done.

In [14]:
try:
    if 'report' in locals():
        report.close()
        print("✓ All database connections closed successfully")
    else:
        print("No report instance to close")
except Exception as e:
    print(f"✗ Error closing connections: {e}")

✓ All database connections closed successfully
