# Migration Status Spreadsheet Notebook (Part 1)

## Overview
This notebook generates the data for the migration tracking spreadsheet.

## What it does
- Extracts migration data from COLIN Extract database
- Retrieves filing information from LEAR database  
- Merges and exports data to Excel format

## Output
A formatted Excel spreadsheet tracking corporation migration status.

In [None]:
%pip install pandas
%pip install sqlalchemy
%pip install dotenv
%pip install psycopg2-binary
%pip install openpyxl

In [1]:
import os
import pandas as pd
from sqlalchemy import create_engine, text
from sqlalchemy.exc import SQLAlchemyError, OperationalError
from dotenv import load_dotenv
from datetime import datetime

load_dotenv()

CONFIG = {
    'batch_size': 5000,
    'final_excel_fields': [
        'Admin Email', 'Incorporation Number', 'Company Name', 'Type',
        'Migration Status', 'Migrated Date', 'Filings Done', 'Last Filing Date'
    ],
    'excel_export': {
        'font_size': 12,
        'max_column_width': 50,
        'output_dir': os.getenv('EXPORT_OUTPUT_DIR')
    }
}

# Configuration
BATCH_SIZE = CONFIG['batch_size']
FINAL_EXCEL_FIELDS = CONFIG['final_excel_fields']

print("Libraries imported and configuration loaded successfully.")

Libraries imported and configuration loaded successfully.


## Database Setup

Configure database connections for COLIN Extract and LEAR databases using environment variables.

In [4]:
DATABASE_CONFIG = {
    'colin_extract': {
        'username': os.getenv("DATABASE_COLIN_EXTRACT_USERNAME"),
        'password': os.getenv("DATABASE_COLIN_EXTRACT_PASSWORD"),
        'host': os.getenv("DATABASE_COLIN_EXTRACT_HOST"),
        'port': os.getenv("DATABASE_COLIN_EXTRACT_PORT"),
        'name': os.getenv("DATABASE_COLIN_EXTRACT_NAME")
    },
    'lear': {
        'username': os.getenv("DATABASE_LEAR_USERNAME"),
        'password': os.getenv("DATABASE_LEAR_PASSWORD"),
        'host': os.getenv("DATABASE_LEAR_HOST"),
        'port': os.getenv("DATABASE_LEAR_PORT"),
        'name': os.getenv("DATABASE_LEAR_NAME")
    }
}


for db_key, db_config in DATABASE_CONFIG.items():
    # Build URI
    uri = f"postgresql://{db_config['username']}:{db_config['password']}@{db_config['host']}:{db_config['port']}/{db_config['name']}"
    DATABASE_CONFIG[db_key] = {'uri': uri}

print("Database configurations successfully.")


Database configurations successfully.


## Create Database Engines

Create and test database connections for all configured databases.

In [5]:
engines = {}

for db_key, config in DATABASE_CONFIG.items():
    try:
        engine = create_engine(config['uri'])
        
        # Test connection
        with engine.connect() as conn:
            conn.execute(text("SELECT 1"))
        
        engines[db_key] = engine
        print(f"{db_key.upper()} database engine created and tested successfully.")
    
    except OperationalError as e:
        print(f"{db_key.upper()} database connection failed: {e}")
        raise
    except SQLAlchemyError as e:
        print(f"{db_key.upper()} database engine creation failed: {e}")
        raise
    except Exception as e:
        print(f"{db_key.upper()} unexpected error: {e}")
        raise

colin_engine = engines['colin_extract']
lear_engine = engines['lear']

print("All database engines ready for use.")


COLIN_EXTRACT database engine created and tested successfully.
LEAR database engine created and tested successfully.
All database engines ready for use.


## Extract Migration Data

Query COLIN Extract database to get list of migrated corporations with their details.

In [6]:
colin_extract_query = """
SELECT
    c.admin_email AS "Admin Email",
    c.corp_num AS "Incorporation Number",
    cn.corp_name AS "Company Name",
    cp.corp_type_cd AS "Type",
    CASE
        WHEN cp.processed_status = 'COMPLETED' THEN 'Migrated'
        ELSE cp.processed_status
    END AS "Migration Status",
    cp.create_date::date AS "Migrated Date"
FROM
    corp_processing cp
JOIN
    corporation c ON cp.corp_num = c.corp_num
LEFT JOIN
    corp_name cn ON c.corp_num = cn.corp_num 
        AND cn.corp_name_typ_cd IN ('CO', 'NB')
        AND cn.end_event_id IS NULL
WHERE
    cp.environment = 'prod'
    AND cp.processed_status = 'COMPLETED'
ORDER BY
    cp.create_date DESC;
"""
    
try:
    with colin_engine.connect() as conn:
        colin_extract_df = pd.read_sql(colin_extract_query, conn)

    if colin_extract_df.empty:
        raise ValueError("COLIN database query returned empty result")
    
    print(f"Fetched {len(colin_extract_df)} rows from COLIN Extract database.")
    
except Exception as e:
    print(f"Error fetching data from COLIN Extract: {e}")
    raise

# Display results
with pd.option_context('display.max_rows', None):
    display(colin_extract_df)


Fetched 64 rows from COLIN Extract database.


Unnamed: 0,Admin Email,Incorporation Number,Company Name,Type,Migration Status,Migrated Date
0,vancorp@bennettjones.com,BC1033896,1033896 B.C. LTD.,BC,Migrated,2025-06-13
1,VanCorp@bennettjones.com,BC1341547,TRIUMPH PROPERTIES EVERSYDE LTD.,BC,Migrated,2025-06-13
2,VanCorp@bennettjones.com,BC1297308,EIGHTH ON ELM CAPITAL INC.,BC,Migrated,2025-06-13
3,VanCorp@bennettjones.com,BC1185476,TRIASIA INVESTMENT PARTNERS LTD.,BC,Migrated,2025-06-13
4,VanCorp@bennettjones.com,BC1180203,A.SPIRE BY NATURE INVESTMENTS LTD.,BC,Migrated,2025-06-13
5,VanCorp@bennettjones.com,BC1173897,PROVENTUS HOLDINGS LTD.,BC,Migrated,2025-06-13
6,VanCorp@bennettjones.com,BC1172657,TIME IS TIGHT COMMUNICATIONS LTD.,BC,Migrated,2025-06-13
7,VanCorp@bennettjones.com,BC1169589,NMT INTERNATIONAL SHIPPING CANADA LTD.,BC,Migrated,2025-06-13
8,VanCorp@bennettjones.com,BC1161928,DCLK CONSULTING CORP.,BC,Migrated,2025-06-13
9,VanCorp@bennettjones.com,BC1140525,FAIRFAX MINING CORP.,BC,Migrated,2025-06-13


## Get Filing Data

Retrieve and aggregate filing information from LEAR database for migrated corporations.

In [7]:
lear_combined_query = """
SELECT 
    b.id,
    b.identifier,
    COALESCE(
        STRING_AGG(f.filing_type, ', ' ORDER BY f.filing_type), 
        ''
    ) AS "Filings Done",
    MAX(f.filing_date)::date AS "Last Filing Date"
FROM businesses b
LEFT JOIN filings f ON b.id = f.business_id 
    AND f.source = 'LEAR' 
    AND f.status = 'COMPLETED'
WHERE b.identifier = ANY(%(identifiers)s)
GROUP BY b.id, b.identifier;
"""

corp_nums = colin_extract_df['Incorporation Number'].unique().tolist()
batches_identifiers = [corp_nums[i:i + BATCH_SIZE] for i in range(0, len(corp_nums), BATCH_SIZE)]

# Execute combined query with batch processing
lear_combined_results = []
for idx, batch_identifiers in enumerate(batches_identifiers):
    if not batch_identifiers:
        continue
    try:
        with lear_engine.connect() as conn:
            df = pd.read_sql(
                lear_combined_query,
                conn,
                params={"identifiers": batch_identifiers}
            )
        
        lear_combined_results.append(df)
        print(f"Batch {idx+1}: {len(df)} records fetched")
    except Exception as e:
        print(f"Batch {idx+1}/{len(batches_identifiers)} failed: {e}")
        continue

# Process combied results
if lear_combined_results:
    lear_combined_df = pd.concat(lear_combined_results, ignore_index=True)
    lear_combined_df = lear_combined_df.drop_duplicates('identifier', keep='last')
    print(f"Total combined records fetched: {len(lear_combined_df)}")
else:
    lear_combined_df = pd.DataFrame(columns=['id', 'identifier', 'Filings Done', 'Last Filing Date'])

# Display results
with pd.option_context('display.max_rows', None):
    display(lear_combined_df)


Batch 1: 64 records fetched
Total combined records fetched: 64


Unnamed: 0,id,identifier,Filings Done,Last Filing Date
0,631316,BC0754828,,
1,631317,BC0769801,,
2,631318,BC0910591,,
3,631319,BC0934777,,
4,631320,BC0971192,,
5,631321,BC1034551,,
6,631322,BC0988623,,
7,631323,BC0934782,,
8,631324,BC1033896,,
9,631325,BC1072742,,


## Merge Data

Merge COLIN Extract migration data with LEAR filing data into a merged dataset.

In [8]:
try:
    result = (colin_extract_df
              .merge(lear_combined_df, 
                     left_on='Incorporation Number', 
                     right_on='identifier', 
                     how='left'))
    
    # Select final fields
    merged_df = result[FINAL_EXCEL_FIELDS]
    print(f"Data merged successfully: {len(merged_df)} rows")
        
except Exception as e:
    print(f"Error merging data: {e}")

# Display merged results
with pd.option_context('display.max_rows', None):
    display(merged_df)

Data merged successfully: 64 rows


Unnamed: 0,Admin Email,Incorporation Number,Company Name,Type,Migration Status,Migrated Date,Filings Done,Last Filing Date
0,vancorp@bennettjones.com,BC1033896,1033896 B.C. LTD.,BC,Migrated,2025-06-13,,
1,VanCorp@bennettjones.com,BC1341547,TRIUMPH PROPERTIES EVERSYDE LTD.,BC,Migrated,2025-06-13,,
2,VanCorp@bennettjones.com,BC1297308,EIGHTH ON ELM CAPITAL INC.,BC,Migrated,2025-06-13,,
3,VanCorp@bennettjones.com,BC1185476,TRIASIA INVESTMENT PARTNERS LTD.,BC,Migrated,2025-06-13,,
4,VanCorp@bennettjones.com,BC1180203,A.SPIRE BY NATURE INVESTMENTS LTD.,BC,Migrated,2025-06-13,,
5,VanCorp@bennettjones.com,BC1173897,PROVENTUS HOLDINGS LTD.,BC,Migrated,2025-06-13,,
6,VanCorp@bennettjones.com,BC1172657,TIME IS TIGHT COMMUNICATIONS LTD.,BC,Migrated,2025-06-13,,
7,VanCorp@bennettjones.com,BC1169589,NMT INTERNATIONAL SHIPPING CANADA LTD.,BC,Migrated,2025-06-13,,
8,VanCorp@bennettjones.com,BC1161928,DCLK CONSULTING CORP.,BC,Migrated,2025-06-13,,
9,VanCorp@bennettjones.com,BC1140525,FAIRFAX MINING CORP.,BC,Migrated,2025-06-13,,


## Export to Excel

Generate formatted Excel file with the merged migration tracking data.

In [None]:
from openpyxl.styles import Font

if merged_df.empty:
    raise ValueError("Data is empty, cannot export")

# Create output directory
os.makedirs(CONFIG['excel_export']['output_dir'], exist_ok=True)

# Generate filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
excel_filename = f"migration_status_{timestamp}.xlsx"
excel_filepath = os.path.join(CONFIG['excel_export']['output_dir'], excel_filename)

try:
    with pd.ExcelWriter(excel_filename, engine='openpyxl') as writer:
        # Export data
        merged_df.to_excel(writer, sheet_name='Migration Status', index=False)
        worksheet = writer.sheets['Migration Status']

        # Adjust format
        for row_num, row in enumerate(worksheet.iter_rows(), 1):
            for cell in row:
                cell.font = Font(
                    size=CONFIG['excel_export']['font_size'], 
                    bold=(row_num == 1)
                )

        # Freeze header row
        worksheet.freeze_panes = 'A2'
        
        # Adjust column width
        for column in worksheet.columns:
            max_length = 0
            column_letter = column[0].column_letter
            
            for cell in column:
                try:
                    if cell.value and len(str(cell.value)) > max_length:
                        max_length = len(str(cell.value))
                except (TypeError, AttributeError):
                    continue
            
            adjusted_width = min(max_length + 2, CONFIG['excel_export']['max_column_width'])
            worksheet.column_dimensions[column_letter].width = adjusted_width
    
    print(f"Excel export successful: {excel_filename}")
    
except Exception as e:
    print(f"Excel export failed: {e}")
    raise