# Trial Balance Automation - MVP

**Purpose**: Load, validate, and analyze trial balance data

**Author**: Raiden Velarde Guillergan - Data Scientist 

**Date**: November 4, 2025

**Data Source**: `data/raw/Trial Balance/2025/September/`

## Workflow Diagram

```mermaid
flowchart TD
    Start([Start]) --> Init[1. Initialize<br/>Libraries & Logger]
    Init --> LoadFunc[2-3. Define<br/>Loading Functions]
    LoadFunc --> Load[4. Load Data<br/>TB + References]
    Load --> Separate[5. Separate Data]
    Separate --> AddDate[6. Add Date Column]
    AddDate --> Consolidate[7. Consolidate TB]
    Consolidate --> Pivot[8. Create Pivot Table]
    Pivot --> Match[9. Match GL Accounts]
    Match --> CheckNew{New Accounts?}
    CheckNew -->|Yes| Export[Export Updated COA]
    CheckNew -->|No| Done
    Export --> Done([End])
    
    style Start fill:#e1f5e1
    style Done fill:#ffe1e1
    style Pivot fill:#f0e1ff
    style Export fill:#e1f0ff
```

**Note**: Install `Markdown Preview Mermaid Support` extension to view diagrams.  
**Full Documentation**: See `docs/workflow-diagram.md`

In [24]:
# Import required libraries
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

print("‚úì Libraries imported successfully")

‚úì Libraries imported successfully


In [25]:
# Setup logging configuration
import logging

# Create logs directory if it doesn't exist
log_dir = Path('../logs')
log_dir.mkdir(parents=True, exist_ok=True)

# Create log filename with timestamp
log_filename = f"trial_balance_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
log_path = log_dir / log_filename

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_path),
        logging.StreamHandler()  # Also print to console
    ]
)

logger = logging.getLogger(__name__)

logger.info("="*60)
logger.info("TRIAL BALANCE AUTOMATION - LOGGING INITIALIZED")
logger.info("="*60)
logger.info(f"Log file: {log_path}")
logger.info(f"Session started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
logger.info(f"Working directory: {Path.cwd()}")

print(f"\n‚úì Logging configured successfully")
print(f"üìù Log file: {log_path}")

2025-11-12 11:12:36,802 - INFO - TRIAL BALANCE AUTOMATION - LOGGING INITIALIZED
2025-11-12 11:12:36,805 - INFO - Log file: ..\logs\trial_balance_20251112_111236.log
2025-11-12 11:12:36,802 - INFO - TRIAL BALANCE AUTOMATION - LOGGING INITIALIZED
2025-11-12 11:12:36,805 - INFO - Log file: ..\logs\trial_balance_20251112_111236.log
2025-11-12 11:12:36,808 - INFO - Session started: 2025-11-12 11:12:36
2025-11-12 11:12:36,810 - INFO - Working directory: d:\UserProfile\Documents\@ VFC\pemi-automation\trial-balance\notebooks



‚úì Logging configured successfully
üìù Log file: ..\logs\trial_balance_20251112_111236.log


## 1. Setup and Configuration

## 2. Data Loading Function

## 3. Reference Data Loading Function

In [26]:
def load_reference_data(base_path='../data/references'):
    """
    Load reference data (COA Mapping and Portfolio Mapping) from the latest files.
    Supports both CSV and XLSX file formats.
    
    Returns:
        dict: Dictionary containing:
            - 'coa_mapping': DataFrame from COA Mapping folder (latest file)
            - 'portfolio_mapping': DataFrame from Portfolio Mapping folder (latest file)
            - 'metadata': dict with loading information
    """
    
    base_path = Path(base_path)
    
    # Initialize result dictionary
    result = {
        'coa_mapping': None,
        'portfolio_mapping': None,
        'metadata': {
            'load_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'coa_mapping_file': None,
            'portfolio_mapping_file': None
        }
    }
    
    # Helper function to load file (CSV or XLSX)
    def load_file(file_path):
        if file_path.suffix.lower() == '.csv':
            return pd.read_csv(file_path)
        elif file_path.suffix.lower() in ['.xlsx', '.xls']:
            return pd.read_excel(file_path)
        else:
            raise ValueError(f"Unsupported file format: {file_path.suffix}")
    
    # Define folder paths
    coa_mapping_folder = base_path / 'COA Mapping'
    portfolio_mapping_folder = base_path / 'Portfolio Mapping'
    
    # ========== Load COA Mapping (Latest File) ==========
    if coa_mapping_folder.exists():
        print(f"üìÇ Loading COA Mapping from: {coa_mapping_folder}")
        
        # Get all CSV and XLSX files sorted by modification time (latest first)
        files = sorted(
            list(coa_mapping_folder.glob('*.csv')) + 
            list(coa_mapping_folder.glob('*.xlsx')) + 
            list(coa_mapping_folder.glob('*.xls')),
            key=lambda f: f.stat().st_mtime, 
            reverse=True
        )
        
        if not files:
            print(f"  ‚ö†Ô∏è  WARNING: No CSV or XLSX files found in {coa_mapping_folder}")
        else:
            latest_file = files[0]
            result['coa_mapping'] = load_file(latest_file)
            result['metadata']['coa_mapping_file'] = latest_file.name
            
            print(f"  ‚úì Loaded latest file: {latest_file.name}")
            print(f"    Records: {len(result['coa_mapping'])}")
            
            if len(files) > 1:
                print(f"    Note: {len(files)} files found, loaded the most recent")
    else:
        print(f"‚ö†Ô∏è  WARNING: COA Mapping folder not found: {coa_mapping_folder}")
    
    # ========== Load Portfolio Mapping (Latest File) ==========
    if portfolio_mapping_folder.exists():
        print(f"\nüìÇ Loading Portfolio Mapping from: {portfolio_mapping_folder}")
        
        # Get all CSV and XLSX files sorted by modification time (latest first)
        files = sorted(
            list(portfolio_mapping_folder.glob('*.csv')) + 
            list(portfolio_mapping_folder.glob('*.xlsx')) + 
            list(portfolio_mapping_folder.glob('*.xls')),
            key=lambda f: f.stat().st_mtime, 
            reverse=True
        )
        
        if not files:
            print(f"  ‚ö†Ô∏è  WARNING: No CSV or XLSX files found in {portfolio_mapping_folder}")
        else:
            latest_file = files[0]
            result['portfolio_mapping'] = load_file(latest_file)
            result['metadata']['portfolio_mapping_file'] = latest_file.name
            
            print(f"  ‚úì Loaded latest file: {latest_file.name}")
            print(f"    Records: {len(result['portfolio_mapping'])}")
            
            if len(files) > 1:
                print(f"    Note: {len(files)} files found, loaded the most recent")
    else:
        print(f"‚ö†Ô∏è  WARNING: Portfolio Mapping folder not found: {portfolio_mapping_folder}")
    
    return result

In [27]:
def load_trial_balance_data(base_path='../data/raw/Trial Balance'):
    """
    Load trial balance data dynamically based on the latest year and month folders.
    
    Returns:
        dict: Dictionary containing:
            - 'trial_balance': dict of DataFrames with date keys (from Trial Balance folder)
            - 'chart_of_accounts': DataFrame (from Chart of Accounts folder)
            - 'metadata': dict with loading information
    """
    
    base_path = Path(base_path)
    
    # Find the latest year folder (reverse sort to get latest first)
    year_folders = sorted((f for f in base_path.iterdir() if f.is_dir()), reverse=True)
    if not year_folders:
        raise ValueError(f"No year folders found in {base_path}")
    
    latest_year = year_folders[0]
    print(f"üìÖ Latest year folder: {latest_year.name}")
    
    # Find the latest month folder (reverse sort to get latest first)
    month_folders = sorted((f for f in latest_year.iterdir() if f.is_dir()), reverse=True)
    if not month_folders:
        raise ValueError(f"No month folders found in {latest_year}")
    
    latest_month = month_folders[0]
    print(f"üìÖ Latest month folder: {latest_month.name}")
    
    # Initialize result dictionary
    result = {
        'trial_balance': {},
        'chart_of_accounts': None,
        'metadata': {
            'year': latest_year.name,
            'month': latest_month.name,
            'load_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'tb_files': [],  # List of loaded Trial Balance files
            'coa_file': None  # Chart of Accounts file
        }
    }
    
    # Define folder paths
    tb_folder = latest_month / 'Trial Balance'
    coa_folder = latest_month / 'Chart of Accounts'
    
    # ========== Load Trial Balance Files ==========
    if tb_folder.exists():
        print(f"\nüìÇ Loading Trial Balance files from: {tb_folder}")
        
        csv_files = list(tb_folder.glob('*.csv'))
        non_compliant_files = []
        
        for file in csv_files:
            filename = file.stem  # Remove .csv extension
            
            try:
                # Parse date from filename and convert to YYYY-MM-DD format
                file_date = datetime.strptime(filename, '%m-%d-%Y')
                date_key = file_date.strftime('%Y-%m-%d')
                
                # Load CSV and store in dictionary
                result['trial_balance'][date_key] = pd.read_csv(file)
                
                # Store file info in metadata
                result['metadata']['tb_files'].append({
                    'filename': file.name,
                    'date': date_key,
                    'records': len(result['trial_balance'][date_key])
                })
                
                print(f"  ‚úì Loaded: {file.name} -> {date_key} ({len(result['trial_balance'][date_key])} records)")
                
            except ValueError:
                # File doesn't follow naming convention
                non_compliant_files.append(file.name)
                print(f"  ‚ö†Ô∏è  WARNING: File does not follow naming convention (MM-DD-YYYY.csv): {file.name}")
        
        # Store non-compliant files in metadata if any
        if non_compliant_files:
            result['metadata']['non_compliant_files'] = non_compliant_files
        
        print(f"\nüìä Total Trial Balance files loaded: {len(result['trial_balance'])}")
        
    else:
        print(f"‚ö†Ô∏è  WARNING: Trial Balance folder not found: {tb_folder}")
    
    # ========== Load Chart of Accounts ==========
    if coa_folder.exists():
        print(f"\nüìÇ Loading Chart of Accounts from: {coa_folder}")
        
        csv_files = list(coa_folder.glob('*.csv'))
        
        # Validate number of files
        if not csv_files:
            print(f"  ‚ö†Ô∏è  WARNING: No CSV files found in {coa_folder}")
        elif len(csv_files) > 1:
            print(f"  ‚ö†Ô∏è  WARNING: Multiple files found in Chart of Accounts folder!")
            print(f"              Expected only 1 file, found {len(csv_files)}:")
            for f in csv_files:
                print(f"              - {f.name}")
            print(f"              Loading the first file: {csv_files[0].name}")
        
        # Load first CSV file if available
        if csv_files:
            coa_file = csv_files[0]
            result['chart_of_accounts'] = pd.read_csv(coa_file)
            result['metadata']['coa_file'] = coa_file.name
            print(f"  ‚úì Loaded: {coa_file.name} ({len(result['chart_of_accounts'])} accounts)")
    else:
        print(f"‚ö†Ô∏è  WARNING: Chart of Accounts folder not found: {coa_folder}")
    
    return result

## 4. Load Data

In [28]:
# Load all data
data = load_trial_balance_data()

# print("\n" + "="*60)
# print("üìã DATA LOADING SUMMARY")
# print("="*60)
# print(f"Year: {data['metadata']['year']}")
# print(f"Month: {data['metadata']['month']}")
# print(f"Load Time: {data['metadata']['load_timestamp']}")
# print(f"\nTrial Balance DataFrames: {len(data['trial_balance'])}")
# print(f"Chart of Accounts: {'Loaded' if data['chart_of_accounts'] is not None else 'Not Loaded'}")

# if 'non_compliant_files' in data['metadata']:
#     print(f"\n‚ö†Ô∏è  Non-compliant files: {len(data['metadata']['non_compliant_files'])}")
    
# print("\n" + "="*60)

üìÖ Latest year folder: 2025
üìÖ Latest month folder: September

üìÇ Loading Trial Balance files from: ..\data\raw\Trial Balance\2025\September\Trial Balance
  ‚úì Loaded: 09-01-2025.csv -> 2025-09-01 (1314 records)
  ‚úì Loaded: 09-02-2025.csv -> 2025-09-02 (1320 records)
  ‚úì Loaded: 09-03-2025.csv -> 2025-09-03 (1326 records)
  ‚úì Loaded: 09-04-2025.csv -> 2025-09-04 (1326 records)
  ‚úì Loaded: 09-05-2025.csv -> 2025-09-05 (1330 records)
  ‚úì Loaded: 09-06-2025.csv -> 2025-09-06 (1330 records)
  ‚úì Loaded: 09-07-2025.csv -> 2025-09-07 (1330 records)
  ‚úì Loaded: 09-08-2025.csv -> 2025-09-08 (1330 records)
  ‚úì Loaded: 09-09-2025.csv -> 2025-09-09 (1330 records)
  ‚úì Loaded: 09-10-2025.csv -> 2025-09-10 (1337 records)
  ‚úì Loaded: 09-11-2025.csv -> 2025-09-11 (1339 records)
  ‚úì Loaded: 09-12-2025.csv -> 2025-09-12 (1339 records)
  ‚úì Loaded: 09-13-2025.csv -> 2025-09-13 (1339 records)
  ‚úì Loaded: 09-14-2025.csv -> 2025-09-14 (1339 records)
  ‚úì Loaded: 09-15-2025.cs

In [29]:
# Load reference data
reference_data = load_reference_data()

print("\n" + "="*60)
print("üìã REFERENCE DATA LOADING SUMMARY")
print("="*60)
print(f"Load Time: {reference_data['metadata']['load_timestamp']}")
print(f"\nCOA Mapping: {'Loaded' if reference_data['coa_mapping'] is not None else 'Not Loaded'}")
if reference_data['metadata']['coa_mapping_file']:
    print(f"  File: {reference_data['metadata']['coa_mapping_file']}")
print(f"\nPortfolio Mapping: {'Loaded' if reference_data['portfolio_mapping'] is not None else 'Not Loaded'}")
if reference_data['metadata']['portfolio_mapping_file']:
    print(f"  File: {reference_data['metadata']['portfolio_mapping_file']}")
print("\n" + "="*60)

üìÇ Loading COA Mapping from: ..\data\references\COA Mapping
  ‚úì Loaded latest file: Chart of Accounts Mapping as of 11.04.2025.xlsx
    Records: 316
    Note: 2 files found, loaded the most recent

üìÇ Loading Portfolio Mapping from: ..\data\references\Portfolio Mapping
  ‚úì Loaded latest file: PEMI_Account_name_porfolio_mapping.xlsx
    Records: 10

üìã REFERENCE DATA LOADING SUMMARY
Load Time: 2025-11-12 11:13:06

COA Mapping: Loaded
  File: Chart of Accounts Mapping as of 11.04.2025.xlsx

Portfolio Mapping: Loaded
  File: PEMI_Account_name_porfolio_mapping.xlsx



## 5. Separate Data by Source

In [30]:
# Separate data into distinct variables based on folder structure

# Trial Balance data (dictionary of DataFrames by date)
trial_balance_data = data['trial_balance']

# Chart of Accounts data (single DataFrame)
chart_of_accounts = data['chart_of_accounts']

# Metadata
metadata = data['metadata']

# Reference data
coa_mapping = reference_data['coa_mapping']
portfolio_mapping = reference_data['portfolio_mapping']

print("‚úì Data separated successfully")
print(f"\nüìä Trial Balance: {len(trial_balance_data)} date(s)")
print(f"üìä Chart of Accounts: {len(chart_of_accounts) if chart_of_accounts is not None else 0} account(s)")
print(f"üìä COA Mapping: {len(coa_mapping) if coa_mapping is not None else 0} mapping(s)")
print(f"üìä Portfolio Mapping: {len(portfolio_mapping) if portfolio_mapping is not None else 0} mapping(s)")
print(f"üìä Metadata: {list(metadata.keys())}")

‚úì Data separated successfully

üìä Trial Balance: 30 date(s)
üìä Chart of Accounts: 3840 account(s)
üìä COA Mapping: 316 mapping(s)
üìä Portfolio Mapping: 10 mapping(s)
üìä Metadata: ['year', 'month', 'load_timestamp', 'tb_files', 'coa_file']


## 6. Add Date Column to Trial Balance Data

In [31]:
# Add 'Date' column to each Trial Balance DataFrame
for date_key, df in trial_balance_data.items():
    df['Date'] = date_key

print("‚úì Date column added to all Trial Balance DataFrames")
print(f"\nProcessed {len(trial_balance_data)} date(s)")

‚úì Date column added to all Trial Balance DataFrames

Processed 30 date(s)


## 7. Consolidate Trial Balance Data

In [32]:
# Consolidate all Trial Balance DataFrames into a single DataFrame
trial_balance_consolidated = pd.concat(trial_balance_data.values(), ignore_index=True)

print("‚úì Trial Balance data consolidated")
print(f"\nTotal records: {len(trial_balance_consolidated):,}")
print(f"Date range: {trial_balance_consolidated['Date'].min()} to {trial_balance_consolidated['Date'].max()}")
print(f"Unique dates: {trial_balance_consolidated['Date'].nunique()}")
print(f"\nColumns: {trial_balance_consolidated.columns.tolist()}")

‚úì Trial Balance data consolidated

Total records: 40,162
Date range: 2025-09-01 to 2025-09-30
Unique dates: 30

Columns: ['bookname', 'level1accountname', 'level2accountname', 'accountname', 'Portfolio', 'Portcode ', 'Ext. Portfolio Code', 'Ext. Portcode 2', 'openingdebit', 'openingcredit', 'perioddebit', 'periodcredit', 'closingdebit', 'closingcredit', 'netamt', 'GL Code', 'cust_accountid', 'Account ID', 'accounttype', 'accountsubtype', 'descr', 'Entity', 'Date']


In [33]:
# len(trial_balance_consolidated['Date'].unique())

trial_balance_consolidated

Unnamed: 0,bookname,level1accountname,level2accountname,accountname,Portfolio,Portcode,Ext. Portfolio Code,Ext. Portcode 2,openingdebit,openingcredit,perioddebit,periodcredit,closingdebit,closingcredit,netamt,GL Code,cust_accountid,Account ID,accounttype,accountsubtype,descr,Entity,Date
0,MAIN ACCOUNT,Bangko Sentral Ng Pilipinas Provident Fund (bs...,Asset,A/R - Others PHP,Bangko Sentral Ng Pilipinas Provident Fund (bs...,607896,,,1008.49,1008.49,0.00,0.00,1008.49,1008.49,0.00,,16204,16204,B/S,A,INVESTMENT,,2025-09-01
1,MAIN ACCOUNT,Bangko Sentral Ng Pilipinas Provident Fund (bs...,Asset,ACCRUED INTEREST RECEIVABLE,Bangko Sentral Ng Pilipinas Provident Fund (bs...,607896,,,106763.37,106763.37,0.00,0.00,106763.37,106763.37,0.00,,16149,16149,B/S,A,INVESTMENT,,2025-09-01
2,MAIN ACCOUNT,Bangko Sentral Ng Pilipinas Provident Fund (bs...,Asset,AIR - MMP - PHP,Bangko Sentral Ng Pilipinas Provident Fund (bs...,607896,,,209773.51,209773.51,3228.33,0.00,213001.84,209773.51,3228.33,,16369,16369,B/S,A,INVESTMENT,,2025-09-01
3,MAIN ACCOUNT,Bangko Sentral Ng Pilipinas Provident Fund (bs...,Asset,DUE FROM BROKERS,Bangko Sentral Ng Pilipinas Provident Fund (bs...,607896,,,54701864.23,54701864.23,3629042.67,0.00,58330906.90,54701864.23,3629042.67,,16367,16367,B/S,A,INVESTMENT,,2025-09-01
4,MAIN ACCOUNT,Bangko Sentral Ng Pilipinas Provident Fund (bs...,Asset,Dividend Receivable,Bangko Sentral Ng Pilipinas Provident Fund (bs...,607896,,,15387963.51,12854473.67,0.00,0.00,15387963.51,12854473.67,2533489.84,,16320,16320,B/S,A,INVESTMENT,,2025-09-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40157,MAIN ACCOUNT,Social Security System Ima (sss)-607900,Income,Int. Income - CASA Bank Deposits-Tax-Pd (net)-PhP,Social Security System Ima (sss)-607900,607900,,,0.00,9319.86,0.00,0.00,0.00,9319.86,-9319.86,,16895,16895,P/L,I,INCOME,,2025-09-30
40158,MAIN ACCOUNT,Social Security System Ima (sss)-607900,Income,Trading Gain / Loss,Social Security System Ima (sss)-607900,607900,,,1398560.00,11153168.26,0.00,0.00,1398560.00,11153168.26,-9754608.26,,17019,17019,P/L,E,EXPENSE,,2025-09-30
40159,MAIN ACCOUNT,Social Security System Ima (sss)-607900,Income,Unrealized Gain / Loss,Social Security System Ima (sss)-607900,607900,,,74807865735.52,74773552443.83,335201253.77,325544964.47,75143066989.29,75099097408.30,43969580.99,,17017,17017,P/L,I,INCOME,,2025-09-30
40160,MAIN ACCOUNT,Social Security System Ima (sss)-607900,Liability,A/P - Other,Social Security System Ima (sss)-607900,607900,,,9314935.00,9314935.00,0.00,0.00,9314935.00,9314935.00,0.00,,16774,16774,B/S,L,CURRENT LIABILITIES,,2025-09-30


## 8. Create Pivot Table

In [34]:
# Create pivot table
trial_balance_pivot_table = trial_balance_consolidated.pivot_table(
    index='accountname',           # Rows: GL Account
    columns='level1accountname',   # Columns: Fund Name
    values='netamt',               # Values: Balance
    aggfunc='sum',                 # Sum the netamt
    fill_value=0                   # Fill missing values with 0
)

# Rename index and columns for clarity
trial_balance_pivot_table.index.name = 'GL Account'
trial_balance_pivot_table.columns.name = 'Fund Name'

print("‚úì Pivot table created")
print(f"\nShape: {trial_balance_pivot_table.shape[0]} GL Accounts √ó {trial_balance_pivot_table.shape[1]} Funds")
print(f"Total Balance: {trial_balance_pivot_table.sum().sum():,.2f}")

# Display pivot table
trial_balance_pivot_table

‚úì Pivot table created

Shape: 313 GL Accounts √ó 16 Funds
Total Balance: -0.25


Fund Name,Bangko Sentral Ng Pilipinas Provident Fund (bsppf)-607896,"De La Salle - College Of Saint Benilde, Inc. (dlsu-csb)-607894","De La Salle University, Inc. (dlsu)-607892",GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Government Service Insurance System Ima (gsis)-607898,PHILEQUITY DIVIDEND YIELD FUND,PHILEQUITY DOLLAR INCOME FUND,"PHILEQUITY FUND, INC.","PHILEQUITY MANAGEMENT, INC.",PHILEQUITY PESO BOND FUND,PHILEQUITY PSE INDEX FUND,"Philequity Alpha One Fund, Inc.","Philequity Alpha One Fund, Inc. s","Philequity MSCI Philippines Index Fund, Inc.",Philequity Mgt Inc. Ima#008 - 85399-607902,Social Security System Ima (sss)-607900
GL Account,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
ANNUAL FEE - DIRECTOR,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.09,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
Annual Fee - Director,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
BIR Registration/License Fee,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-3000.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
Business Permit,0.00,0.00,0.00,0.00,0.00,0.00,-25.50,-11283543.05,0.00,-0.13,0.00,0.00,0.00,0.00,0.00,0.00
Doc. Stamp Tax,0.00,0.00,0.00,0.00,0.00,-3373495.28,-0.00,-1358391.92,0.00,-0.12,-0.03,0.00,0.00,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Union Bank of the Philippines, Inc.-6101011305",0.00,0.00,0.00,0.00,0.00,0.00,1974222.28,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
Unrealized Gain / Loss,-55505407.93,14759960.46,-43211680.26,0.00,58149555.65,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,92292705543.68,77587036.34
Vat Output - Others,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-6253161246.30,0.00,0.00,0.00,0.00,0.00,0.00,0.00
Withdrawal-Principal,1530000000.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,42877500000.00,0.00


In [35]:
list(trial_balance_pivot_table.columns)

['Bangko Sentral Ng Pilipinas Provident Fund (bsppf)-607896',
 'De La Salle - College Of Saint Benilde, Inc. (dlsu-csb)-607894',
 'De La Salle University, Inc. (dlsu)-607892',
 'GOVERNMENT SERVICE INSURANCE SYSTEM-500032',
 'Government Service Insurance System Ima (gsis)-607898',
 'PHILEQUITY DIVIDEND YIELD FUND',
 'PHILEQUITY DOLLAR INCOME FUND',
 'PHILEQUITY FUND, INC.',
 'PHILEQUITY MANAGEMENT, INC.',
 'PHILEQUITY PESO BOND FUND',
 'PHILEQUITY PSE INDEX FUND',
 'Philequity Alpha One Fund, Inc.',
 'Philequity Alpha One Fund, Inc. s',
 'Philequity MSCI Philippines Index Fund, Inc.',
 'Philequity Mgt Inc. Ima#008 - 85399-607902',
 'Social Security System Ima (sss)-607900']

In [36]:
def capitalize_pivot_columns(pivot_df):
    """
    Capitalize all column names in the pivot table except for the index ('GL Account').
    
    Parameters:
        pivot_df (DataFrame): The pivot table with Fund Names as columns
    
    Returns:
        DataFrame: Pivot table with capitalized column names
    """
    # Create a copy to avoid modifying original
    df_copy = pivot_df.copy()
    
    # Capitalize all column names
    df_copy.columns = [col.upper() for col in df_copy.columns]
    
    print("‚úì Pivot table columns capitalized")
    print(f"  Columns: {list(df_copy.columns)}")
    
    return df_copy

# Apply the function to the pivot table
trial_balance_pivot_table = capitalize_pivot_columns(trial_balance_pivot_table)

# Display updated columns
print(f"\nüìã Updated columns: {list(trial_balance_pivot_table.columns)}")


‚úì Pivot table columns capitalized
  Columns: ['BANGKO SENTRAL NG PILIPINAS PROVIDENT FUND (BSPPF)-607896', 'DE LA SALLE - COLLEGE OF SAINT BENILDE, INC. (DLSU-CSB)-607894', 'DE LA SALLE UNIVERSITY, INC. (DLSU)-607892', 'GOVERNMENT SERVICE INSURANCE SYSTEM-500032', 'GOVERNMENT SERVICE INSURANCE SYSTEM IMA (GSIS)-607898', 'PHILEQUITY DIVIDEND YIELD FUND', 'PHILEQUITY DOLLAR INCOME FUND', 'PHILEQUITY FUND, INC.', 'PHILEQUITY MANAGEMENT, INC.', 'PHILEQUITY PESO BOND FUND', 'PHILEQUITY PSE INDEX FUND', 'PHILEQUITY ALPHA ONE FUND, INC.', 'PHILEQUITY ALPHA ONE FUND, INC. S', 'PHILEQUITY MSCI PHILIPPINES INDEX FUND, INC.', 'PHILEQUITY MGT INC. IMA#008 - 85399-607902', 'SOCIAL SECURITY SYSTEM IMA (SSS)-607900']

üìã Updated columns: ['BANGKO SENTRAL NG PILIPINAS PROVIDENT FUND (BSPPF)-607896', 'DE LA SALLE - COLLEGE OF SAINT BENILDE, INC. (DLSU-CSB)-607894', 'DE LA SALLE UNIVERSITY, INC. (DLSU)-607892', 'GOVERNMENT SERVICE INSURANCE SYSTEM-500032', 'GOVERNMENT SERVICE INSURANCE SYSTEM IMA (G

## 9. Match GL Accounts with COA Mapping

In [37]:
# Get GL Accounts from pivot table (index)
pivot_gl_accounts = set(trial_balance_pivot_table.index)

# Get GL Accounts from COA Mapping
coa_gl_accounts = set(coa_mapping['GL Account'])

# Find accounts in pivot table that are NOT in COA Mapping
missing_in_coa = pivot_gl_accounts - coa_gl_accounts

# Find accounts in COA Mapping that are NOT in pivot table
missing_in_pivot = coa_gl_accounts - pivot_gl_accounts

print("="*60)
print("GL ACCOUNT MATCHING ANALYSIS")
print("="*60)
print(f"\nüìä Total GL Accounts in Pivot Table: {len(pivot_gl_accounts)}")
print(f"üìä Total GL Accounts in COA Mapping: {len(coa_gl_accounts)}")
print(f"\n‚úì Matching Accounts: {len(pivot_gl_accounts & coa_gl_accounts)}")
print(f"‚ö†Ô∏è  Accounts in Pivot but NOT in COA Mapping: {len(missing_in_coa)}")
print(f"‚ÑπÔ∏è  Accounts in COA Mapping but NOT in Pivot: {len(missing_in_pivot)}")

# Display missing accounts
if missing_in_coa:
    print("\n" + "="*60)
    print("‚ö†Ô∏è  NEW ACCOUNTS FOUND (Need to be added to COA Mapping):")
    print("="*60)
    for i, account in enumerate(sorted(missing_in_coa), 1):
        print(f"{i:3}. {account}")
else:
    print("\n‚úì All accounts in pivot table exist in COA Mapping!")

# Create indicator DataFrame for new accounts
if missing_in_coa:
    new_accounts_df = pd.DataFrame({
        'GL Account': sorted(missing_in_coa),
        'Status': 'NEW - Not in COA Mapping',
        'TB Account Name': '',
        'Account Type': '',
        'FS Classification': ''
    })
    
    print(f"\nüìù Created DataFrame with {len(new_accounts_df)} new account(s) to be added")
    print("    Variable: new_accounts_df")
else:
    new_accounts_df = None
    print("\n‚úì No new accounts to add")

GL ACCOUNT MATCHING ANALYSIS

üìä Total GL Accounts in Pivot Table: 313
üìä Total GL Accounts in COA Mapping: 316

‚úì Matching Accounts: 313
‚ö†Ô∏è  Accounts in Pivot but NOT in COA Mapping: 0
‚ÑπÔ∏è  Accounts in COA Mapping but NOT in Pivot: 3

‚úì All accounts in pivot table exist in COA Mapping!

‚úì No new accounts to add


In [38]:
# Display new accounts DataFrame
if new_accounts_df is not None:
    print(f"üìã New Accounts to Add to COA Mapping ({len(new_accounts_df)} accounts):\n")
    display(new_accounts_df)
else:
    print("‚úì No new accounts found")

‚úì No new accounts found


In [39]:
# Create updated COA Mapping with new accounts inserted
if new_accounts_df is not None:
    # Combine original COA mapping with new accounts
    updated_coa_mapping = pd.concat([coa_mapping, new_accounts_df], ignore_index=True)
    
    # Sort by GL Account for better organization
    updated_coa_mapping = updated_coa_mapping.sort_values('GL Account').reset_index(drop=True)
    
    print("‚úì Updated COA Mapping created with new accounts")
    print(f"\nüìä Original COA Mapping: {len(coa_mapping)} accounts")
    print(f"üìä New Accounts Added: {len(new_accounts_df)} accounts")
    print(f"üìä Updated COA Mapping: {len(updated_coa_mapping)} accounts")
    print(f"\nüíæ Variable: updated_coa_mapping")
    
    # Create indicator column to show which accounts are new
    updated_coa_mapping['Is_New_Account'] = updated_coa_mapping['GL Account'].isin(missing_in_coa)
    
    print(f"\n‚úì Added 'Is_New_Account' indicator column")
    print(f"   - True: Account is newly found (not in original COA Mapping)")
    print(f"   - False: Account existed in original COA Mapping")
else:
    updated_coa_mapping = coa_mapping.copy()
    updated_coa_mapping['Is_New_Account'] = False
    print("‚úì No new accounts to add - using original COA Mapping")

‚úì No new accounts to add - using original COA Mapping


In [40]:
# Display updated COA Mapping - showing only new accounts
print("üìã Updated COA Mapping - New Accounts Only:\n")
display(updated_coa_mapping[updated_coa_mapping['Is_New_Account'] == True])

üìã Updated COA Mapping - New Accounts Only:



Unnamed: 0,GL Account,TB Account Name,Account Type,FS Classification,Status,Is_New_Account


In [41]:
# Export updated COA Mapping if new accounts were added
if new_accounts_df is not None and len(new_accounts_df) > 0:
    # Define export path
    export_folder = Path('../data/references/COA Mapping')
    export_folder.mkdir(parents=True, exist_ok=True)
    
    # Create filename with current date (MM.DD.YYYY format)
    current_date = datetime.now().strftime('%m.%d.%Y')
    export_filename = f'Chart of Accounts Mapping as of {current_date}.xlsx'
    export_path = export_folder / export_filename
    
    # Export to Excel
    updated_coa_mapping.to_excel(export_path, index=False, engine='openpyxl')
    
    print("="*60)
    print("üì§ EXPORT SUCCESSFUL")
    print("="*60)
    print(f"‚úì File exported to: {export_path}")
    print(f"‚úì Filename: {export_filename}")
    print(f"‚úì Total records: {len(updated_coa_mapping)}")
    print(f"‚úì New accounts added: {len(new_accounts_df)}")
    print(f"‚úì Export timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print("\nüí° Note: The 'Is_New_Account' column indicates which accounts are newly added (True)")
else:
    print("‚ÑπÔ∏è  No new accounts to export - COA Mapping unchanged")

‚ÑπÔ∏è  No new accounts to export - COA Mapping unchanged


In [42]:
chart_of_accounts

Unnamed: 0,book,Account ID,accountname,level1accountname,level2accountname,levels,include_for_aum,accounttype,accountsubtype,Portcode,currencycode,active_status,Authorisation Status,Authorised By,Authorised On,createdby,Deleted,ac_type,gl_code,template_id
0,MAIN ACCOUNT,1329,Business Permit,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Expense,3,NO,P/L,EXPENSE,500032.00,PHP,Yes,AUTHORISED,IT,07-02-2016 12:13:46,TEST1,N,G/L Account,96-00000,256.00
1,MAIN ACCOUNT,1330,SEC Registration/License Fee,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Expense,3,NO,P/L,EXPENSE,500032.00,PHP,Yes,AUTHORISED,TEST1,07-01-2016 00:00:00,TEST1,N,G/L Account,96-00000,67.00
2,MAIN ACCOUNT,1331,BIR Registration/License Fee,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Expense,3,NO,P/L,EXPENSE,500032.00,PHP,Yes,AUTHORISED,TEST1,07-01-2016 00:00:00,TEST1,N,G/L Account,96-00000,257.00
3,MAIN ACCOUNT,1332,Doc. Stamp Tax,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Expense,3,NO,P/L,EXPENSE,500032.00,PHP,Yes,AUTHORISED,TEST1,07-01-2016 00:00:00,TEST1,N,G/L Account,96-00000,258.00
4,MAIN ACCOUNT,1333,Community TAx Cert (CTC),GOVERNMENT SERVICE INSURANCE SYSTEM-500032,Expense,3,NO,P/L,EXPENSE,500032.00,PHP,Yes,AUTHORISED,TEST1,07-01-2016 00:00:00,TEST1,N,G/L Account,96-00000,68.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3835,MAIN ACCOUNT,16916,East West Bank-Tektite-200021011982,"De La Salle - College Of Saint Benilde, Inc. (...",Asset,3,NO,B/S,CURRENT ASSET,607894.00,PHP,Yes,AUTHORISED,mmagcamit,01-31-2025 21:22:38,CREDENCE,N,,,
3836,MAIN ACCOUNT,16917,East West Bank-Tektite-200021012048,"De La Salle University, Inc. (dlsu)-607892",Asset,3,NO,B/S,CURRENT ASSET,607892.00,PHP,Yes,AUTHORISED,mmagcamit,01-31-2025 21:23:08,CREDENCE,N,,,
3837,MAIN ACCOUNT,16918,East West Bank-Tektite-200005350106,Government Service Insurance System Ima (gsis)...,Asset,3,NO,B/S,CURRENT ASSET,607898.00,PHP,Yes,AUTHORISED,mmagcamit,01-31-2025 21:23:43,CREDENCE,N,,,
3838,MAIN ACCOUNT,16919,Land Bank of the Philippines-The Luxe-3902112501,Social Security System Ima (sss)-607900,Asset,3,NO,B/S,CURRENT ASSET,607900.00,PHP,No,UNAUTHORISED,,,CREDENCE,N,,,


In [43]:
coa_mapping

Unnamed: 0,GL Account,TB Account Name,Account Type,FS Classification,Status,Is_New_Account
0,ANNUAL FEE - DIRECTOR,Annual Fee - Director,Expense,Director's Fee,,False
1,Annual Fee - Director,Annual Fee - Director,Expense,Director's Fee,,False
2,BIR Registration/License Fee,BIR Fees,Expense,Taxes and Licenses,,False
3,Bank Charges,Bank Charges,Expense,Others,,False
4,Business Permit,Municipal Permit,Expense,Taxes and Licenses,,False
...,...,...,...,...,...,...
311,"Union Bank of the Philippines, Inc.-6101011305",Cash in Bank,Asset,Cash and Cash Equivalents,,False
312,Unrealized Gain / Loss,,,,NEW - Not in COA Mapping,True
313,Vat Output - Others,VAT Payable,Expense,Accrued Expenses and Other Liabilities,,False
314,Withdrawal-Principal,,,,NEW - Not in COA Mapping,True


In [44]:
portfolio_mapping

Unnamed: 0,level1accountname\t,Fund_Code
0,"Philequity Alpha One Fund, Inc.",PAOF
1,PHILEQUITY DIVIDEND YIELD FUND\t,PDYFI
2,PHILEQUITY DOLLAR INCOME FUND\t,PDIF
3,"PHILEQUITY FUND, INC.\t",PEFI
4,"Philequity MSCI Philippines Index Fund, Inc.\t",PMPI
5,PHILEQUITY PESO BOND FUND\t,PPBF
6,PHILEQUITY PSE INDEX FUND\t,PPSE
7,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,GSIS
8,"PHILEQUITY MANAGEMENT, INC.\t",PEMI
9,"Philequity Alpha One Fund, Inc. s\t",PAOFS


In [46]:
def capitalize_portfolio_mapping(portfolio_df):
    """
    Capitalize all values in the 'level1accountname' column of the portfolio mapping.
    
    Parameters:
        portfolio_df (DataFrame): The portfolio mapping DataFrame
    
    Returns:
        DataFrame: Portfolio mapping with capitalized 'level1accountname' values, or None if input is None
    """
    # Handle None case
    if portfolio_df is None:
        print("‚ö†Ô∏è  WARNING: portfolio_mapping is None - no data to capitalize")
        return None
    
    # Create a copy to avoid modifying original
    df_copy = portfolio_df.copy()
    
    # Find the correct column name (it has a tab character)
    level1_col = [col for col in df_copy.columns if 'level1accountname' in col.lower()]
    
    if level1_col:
        col_name = level1_col[0]
        df_copy[col_name] = df_copy[col_name].str.upper()
        print(f"‚úì Portfolio mapping '{col_name}' column capitalized")
        print(f"  Updated {len(df_copy)} row(s)")
        print(f"  Unique values: {df_copy[col_name].nunique()}")
    else:
        print("‚ö†Ô∏è  WARNING: 'level1accountname' column not found in portfolio_mapping")
    
    return df_copy

# Apply the function to portfolio_mapping
portfolio_mapping = capitalize_portfolio_mapping(portfolio_mapping)

# Display updated portfolio mapping
if portfolio_mapping is not None and len([col for col in portfolio_mapping.columns if 'level1accountname' in col.lower()]) > 0:
    level1_col = [col for col in portfolio_mapping.columns if 'level1accountname' in col.lower()][0]
    print(f"\nüìã Updated portfolio_mapping unique values:")
    print(portfolio_mapping[level1_col].unique().tolist())
else:
    print("\n‚ÑπÔ∏è  No portfolio mapping data available to display")

‚úì Portfolio mapping 'level1accountname	' column capitalized
  Updated 10 row(s)
  Unique values: 10

üìã Updated portfolio_mapping unique values:
['PHILEQUITY ALPHA ONE FUND, INC.', 'PHILEQUITY DIVIDEND YIELD FUND\t', 'PHILEQUITY DOLLAR INCOME FUND\t', 'PHILEQUITY FUND, INC.\t', 'PHILEQUITY MSCI PHILIPPINES INDEX FUND, INC.\t', 'PHILEQUITY PESO BOND FUND\t', 'PHILEQUITY PSE INDEX FUND\t', 'GOVERNMENT SERVICE INSURANCE SYSTEM-500032', 'PHILEQUITY MANAGEMENT, INC.\t', 'PHILEQUITY ALPHA ONE FUND, INC. S\t']


In [None]:
portfolio_mapping

Unnamed: 0,level1accountname\t,Fund_Code
0,"Philequity Alpha One Fund, Inc.",PAOF
1,PHILEQUITY DIVIDEND YIELD FUND\t,PDYFI
2,PHILEQUITY DOLLAR INCOME FUND\t,PDIF
3,"PHILEQUITY FUND, INC.\t",PEFI
4,"Philequity MSCI Philippines Index Fund, Inc.\t",PMPI
5,PHILEQUITY PESO BOND FUND\t,PPBF
6,PHILEQUITY PSE INDEX FUND\t,PPSE
7,GOVERNMENT SERVICE INSURANCE SYSTEM-500032,GSIS
8,"PHILEQUITY MANAGEMENT, INC.\t",PEMI
9,"Philequity Alpha One Fund, Inc. s\t",PAOFS


## 10. Automation Workflow - [Next Steps]

In [None]:
# TODO: Add automation logic here
# - Validation
# - Reconciliation
# - Report generation
# - Export processed data

print("Ready for automation workflow implementation")