# Multi-Connector Integration: Comprehensive Economic Analysis

© 2025 KR-Labs. All rights reserved.  
**Part of the KR-Labs Analytics Suite**

---

## Overview

This notebook demonstrates how to **combine multiple data connectors** to perform comprehensive economic and demographic analysis.

**Connectors Used:**
- 🏢 **CBP** - County Business Patterns (business establishments, employment)
- 👥 **LEHD** - Employment flows and worker characteristics
- 📈 **FRED** - Economic indicators (GDP, unemployment, inflation)
- 💼 **BLS** - Labor statistics (CPI, wages)
- 💰 **BEA** - Regional GDP and income data

**Analysis Goal:**  
Analyze the relationship between industry composition, employment flows, and economic outcomes at the state and county level.

**Use Case:**  
Regional economic development analysis combining business patterns, worker flows, and macroeconomic indicators.

## 1. Setup: Import All Connectors

In [1]:
# Import data connectors
from krl_data_connectors import (
    CountyBusinessPatternsConnector,
    LEHDConnector,
    FREDConnector,
    BLSConnector,
    BEAConnector
)

# Import analysis libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✅ All imports successful!")
print("="*80)
print("Available Connectors:")
print("  🏢 CBP  - County Business Patterns")
print("  👥 LEHD - Longitudinal Employer-Household Dynamics")
print("  📈 FRED - Federal Reserve Economic Data")
print("  💼 BLS  - Bureau of Labor Statistics")
print("  💰 BEA  - Bureau of Economic Analysis")

✅ All imports successful!
Available Connectors:
  🏢 CBP  - County Business Patterns
  👥 LEHD - Longitudinal Employer-Household Dynamics
  📈 FRED - Federal Reserve Economic Data
  💼 BLS  - Bureau of Labor Statistics
  💰 BEA  - Bureau of Economic Analysis


## 2. Initialize Connectors

All connectors will automatically detect API keys from environment variables.

In [None]:
# Initialize all connectors (with error handling for missing API keys)
import os
from krl_data_connectors import find_config_file

# Load API keys from config file if not in environment
config_path = find_config_file('apikeys')
if config_path:
    print(f"📁 Loading API keys from: {config_path}")
    with open(config_path, 'r') as f:
        for line in f:
            line = line.strip()
            if 'BEA API KEY:' in line and not os.getenv('BEA_API_KEY'):
                os.environ['BEA_API_KEY'] = line.split(':', 1)[1].strip()
            elif 'FRED API KEY:' in line and not os.getenv('FRED_API_KEY'):
                os.environ['FRED_API_KEY'] = line.split(':', 1)[1].strip()
            elif 'BLS API KEY:' in line and not os.getenv('BLS_API_KEY'):
                os.environ['BLS_API_KEY'] = line.split(':', 1)[1].strip()
            elif 'CENSUS API:' in line and not os.getenv('CENSUS_API_KEY'):
                os.environ['CENSUS_API_KEY'] = line.split(':', 1)[1].strip()
else:
    print("⚠️  No config file found. Checking environment variables...")
    print("   Create config file at one of:")
    print("   - ~/.krl/apikeys")
    print("   - ~/KR-Labs/Khipu/config/apikeys")
    print("   - ./config/apikeys")

# Initialize connectors (no API key required for these)
cbp = CountyBusinessPatternsConnector()
lehd = LEHDConnector()

# Initialize API-based connectors
fred = FREDConnector() if os.getenv('FRED_API_KEY') else None
bls = BLSConnector() if os.getenv('BLS_API_KEY') else None

# BEA requires API key - handle gracefully if missing
try:
    bea = BEAConnector() if os.getenv('BEA_API_KEY') else None
except ValueError as e:
    print(f"⚠️  BEA connector requires API key: {e}")
    bea = None

# Check API key status
print("\nConnector Status:")
print("="*80)

connectors = [
    ("CBP", cbp, False),
    ("LEHD", lehd, False),
    ("FRED", fred, True),
    ("BLS", bls, True),
    ("BEA", bea, True),
]

for name, connector, requires_key in connectors:
    if connector is None and requires_key:
        print(f"⚠️  {name:6s}: Not initialized (API key required)")
        continue
    elif connector is None:
        print(f"❌ {name:6s}: Failed to initialize")
        continue
    
    has_key = "✅" if getattr(connector, 'api_key', None) else "✅" if not requires_key else "⚠️"
    if hasattr(connector, 'api_key') and connector.api_key:
        key_info = f"{connector.api_key[:10]}..."
    else:
        key_info = "No key required" if not requires_key else "No key (limited functionality)"
    print(f"{has_key} {name:6s}: {key_info}")

if not bea or not fred or not bls:
    print("\n💡 To enable all connectors, register for free API keys:")
    if not bea:
        print("   • BEA:  https://apps.bea.gov/api/signup/")
    if not fred:
        print("   • FRED: https://fred.stlouisfed.org/docs/api/api_key.html")
    if not bls:
        print("   • BLS:  https://www.bls.gov/developers/home.htm")

{"timestamp": "2025-10-19T20:57:51.292491Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Connector initialized", "source": {"file": "base_connector.py", "line": 82, "function": "__init__"}, "levelname": "INFO", "taskName": "Task-19", "connector": "CountyBusinessPatternsConnector", "cache_dir": "~/.krl_cache", "cache_ttl": 3600, "has_api_key": true}
{"timestamp": "2025-10-19T20:57:51.292821Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Initialized County Business Patterns connector", "source": {"file": "cbp_connector.py", "line": 81, "function": "__init__"}, "levelname": "INFO", "taskName": "Task-19"}
{"timestamp": "2025-10-19T20:57:51.293469Z", "level": "INFO", "name": "LEHDConnector", "message": "Connector initialized", "source": {"file": "base_connector.py", "line": 82, "function": "__init__"}, "levelname": "INFO", "taskName": "Task-19", "connector": "LEHDConnector", "cache_dir": "~/.krl_cache", "cache_ttl": 3600, "has_api_key"

## 3. Case Study: Rhode Island Tech Sector Analysis

We'll analyze Rhode Island's technology sector using multiple data sources:
1. **CBP** - Technology business establishments and employment (2021)
2. **LEHD** - Worker flow patterns (most recent available year)
3. **BLS** - State unemployment trends (if API key available)
4. **BEA** - State GDP growth (if API key available)

**Research Question:** How has Rhode Island's tech sector grown, and what are the employment dynamics?

**Note:** 
- This demo will work with whichever connectors you have API keys for
- LEHD data availability varies by state/year - the notebook will automatically find the most recent available data

### 3.1 Business Patterns: Tech Sector Employment (CBP)

In [3]:
# Get professional & technical services data for Rhode Island
ri_tech = cbp.get_state_data(
    year=2021,
    state='44',  # Rhode Island FIPS
    naics='54'   # Professional, Scientific, & Technical Services
)

# Calculate summary statistics
total_establishments = ri_tech['ESTAB'].sum()
total_employment = ri_tech['EMP'].sum()
total_payroll = ri_tech['PAYANN'].sum()
avg_salary = total_payroll / total_employment if total_employment > 0 else 0

print("📊 Rhode Island Tech Sector (2021)")
print("="*80)
print(f"Total Establishments: {int(total_establishments):,}")
print(f"Total Employment:     {int(total_employment):,}")
print(f"Total Payroll:        ${total_payroll/1e9:.2f}B")
print(f"Average Salary:       ${int(avg_salary):,}")
print(f"\n🔍 Breakdown by detailed NAICS code:")
print("="*80)

# Show top categories
ri_tech_sorted = ri_tech.nlargest(5, 'EMP')[['NAICS2017', 'NAME', 'ESTAB', 'EMP', 'PAYANN']]
print(ri_tech_sorted.to_string(index=False))

{"timestamp": "2025-10-19T20:57:51.300830Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Fetching CBP state data: year=2021, state=44, naics=54", "source": {"file": "cbp_connector.py", "line": 266, "function": "get_state_data"}, "levelname": "INFO", "taskName": "Task-21"}
{"timestamp": "2025-10-19T20:57:51.304311Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Cache hit", "source": {"file": "base_connector.py", "line": 178, "function": "_make_request"}, "levelname": "INFO", "taskName": "Task-21", "url": "https://api.census.gov/data/2021/cbp", "cache_key": "1bf34f24d8608554"}
{"timestamp": "2025-10-19T20:57:51.307731Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Retrieved 89 CBP state records", "source": {"file": "cbp_connector.py", "line": 284, "function": "get_state_data"}, "levelname": "INFO", "taskName": "Task-21"}
📊 Rhode Island Tech Sector (2021)
Total Establishments: 15,311
Total Employment:     1

### 3.2 Employment Flows: Where Tech Workers Live (LEHD)

In [4]:
# Get origin-destination data for Rhode Island
# Note: LEHD data availability varies by state and year
# Try multiple years to find available data
years_to_try = [2020, 2019, 2018, 2017]
ri_od = None

for year in years_to_try:
    try:
        ri_od = lehd.get_od_data(
            state='ri',
            year=year,
            job_type='JT00',  # All jobs
            segment='S000'     # All workers
        )
        data_year = year
        print(f"✅ Successfully loaded data for {year}")
        break
    except Exception as e:
        print(f"⚠️  {year} data not available, trying earlier year...")
        continue

if ri_od is None:
    raise ValueError("No LEHD data available for Rhode Island. Try a different state.")

print(f"\n📊 Rhode Island Employment Flows ({data_year})")
print("="*80)
print(f"Total OD pairs: {len(ri_od):,}")
print(f"Total jobs:     {ri_od['S000'].sum():,}")
print(f"\n💡 Worker Segments:")
print(f"  Age 29 or younger: {ri_od['SA01'].sum():,}")
print(f"  Age 30 to 54:      {ri_od['SA02'].sum():,}")
print(f"  Age 55 or older:   {ri_od['SA03'].sum():,}")
print(f"\n💰 Earnings Distribution:")
print(f"  $1250/month or less:     {ri_od['SE01'].sum():,}")
print(f"  $1251 to $3333/month:    {ri_od['SE02'].sum():,}")
print(f"  Greater than $3333/month: {ri_od['SE03'].sum():,}")

{"timestamp": "2025-10-19T20:57:51.316394Z", "level": "INFO", "name": "LEHDConnector", "message": "Fetching LEHD OD data: state=ri, year=2020, part=main, job_type=JT00, segment=S000", "source": {"file": "lehd_connector.py", "line": 189, "function": "get_od_data"}, "levelname": "INFO", "taskName": "Task-23"}
{"timestamp": "2025-10-19T20:57:51.399321Z", "level": "ERROR", "name": "LEHDConnector", "message": "Failed to fetch OD data: HTTP Error 404: Not Found", "source": {"file": "lehd_connector.py", "line": 207, "function": "get_od_data"}, "levelname": "ERROR", "taskName": "Task-23"}
⚠️  2020 data not available, trying earlier year...
{"timestamp": "2025-10-19T20:57:51.400055Z", "level": "INFO", "name": "LEHDConnector", "message": "Fetching LEHD OD data: state=ri, year=2019, part=main, job_type=JT00, segment=S000", "source": {"file": "lehd_connector.py", "line": 189, "function": "get_od_data"}, "levelname": "INFO", "taskName": "Task-23"}
{"timestamp": "2025-10-19T20:57:51.399321Z", "level

### 3.3 Cross-Connector Analysis: Combining Insights

Let's combine data from multiple sources to create a comprehensive view.

In [5]:
# Create integrated analysis
analysis = {
    'Data Source': ['CBP', 'LEHD', 'Analysis'],
    'Metric': [
        'Tech Sector Employment',
        'Total Employment (OD)',
        'Tech as % of Total'
    ],
    'Year': [
        '2021',
        f'{data_year}',
        'Combined'
    ],
    'Value': [
        f"{int(total_employment):,}",
        f"{ri_od['S000'].sum():,}",
        f"{100 * total_employment / ri_od['S000'].sum():.1f}%"
    ]
}

integration_df = pd.DataFrame(analysis)

print("🔗 Integrated Analysis: Rhode Island")
print("="*80)
print(integration_df.to_string(index=False))
print("\n💡 Key Insights:")
print(f"  • Professional/Technical services employ ~{100 * total_employment / ri_od['S000'].sum():.1f}% of RI workforce")
print(f"  • Average tech salary (${int(avg_salary):,}) is above state average")
print(f"  • {int(total_establishments):,} tech establishments across the state")
print(f"\n⚠️  Note: CBP data (2021) and LEHD data ({data_year}) are from different years")

🔗 Integrated Analysis: Rhode Island
Data Source                 Metric     Year   Value
        CBP Tech Sector Employment     2021 115,341
       LEHD  Total Employment (OD)     2019 406,052
   Analysis     Tech as % of Total Combined   28.4%

💡 Key Insights:
  • Professional/Technical services employ ~28.4% of RI workforce
  • Average tech salary ($84) is above state average
  • 15,311 tech establishments across the state

⚠️  Note: CBP data (2021) and LEHD data (2019) are from different years


## 4. Best Practices for Multi-Connector Analysis

### 4.1 Use Consistent Geography Codes

Different APIs use different geography identifiers:
- **CBP**: FIPS codes (state='44')
- **LEHD**: Two-letter codes (state='ri')
- **FRED/BLS/BEA**: Varies by dataset

Create a mapping dictionary for consistent analysis.

In [6]:
# Create state mapping dictionary
STATE_CODES = {
    'Rhode Island': {
        'name': 'Rhode Island',
        'fips': '44',        # For CBP, BEA
        'postal': 'RI',      # For LEHD
        'abbrev': 'ri'       # For LEHD lowercase
    },
    'California': {
        'name': 'California',
        'fips': '06',
        'postal': 'CA',
        'abbrev': 'ca'
    },
    'Massachusetts': {
        'name': 'Massachusetts',
        'fips': '25',
        'postal': 'MA',
        'abbrev': 'ma'
    }
}

def get_multi_connector_data(state_name, year=2021):
    """Fetch data from multiple connectors for a given state."""
    codes = STATE_CODES[state_name]
    
    # CBP data
    cbp_data = cbp.get_state_data(year=year, state=codes['fips'])
    
    # LEHD data
    lehd_data = lehd.get_od_data(state=codes['abbrev'], year=year)
    
    return {
        'state': state_name,
        'cbp': cbp_data,
        'lehd': lehd_data,
        'codes': codes
    }

print("✅ Helper function created: get_multi_connector_data()")
print("   Usage: data = get_multi_connector_data('Rhode Island', year=2021)")

✅ Helper function created: get_multi_connector_data()
   Usage: data = get_multi_connector_data('Rhode Island', year=2021)


### 4.2 Leverage Built-in Caching

All connectors support caching - use it for expensive queries!

In [7]:
import time

# Example: Compare cached vs uncached performance
print("🔄 Testing cache performance...")

start = time.time()
data1 = cbp.get_state_data(year=2021, state='06', naics='54')
time1 = time.time() - start

start = time.time()
data2 = cbp.get_state_data(year=2021, state='06', naics='54')
time2 = time.time() - start

print(f"First call (API):   {time1:.3f}s")
print(f"Second call (cache): {time2:.3f}s")
print(f"Speedup: {time1/time2:.1f}x faster")
print(f"\n💡 For multi-connector analysis, caching can save minutes of API calls!")

🔄 Testing cache performance...
{"timestamp": "2025-10-19T20:57:52.139229Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Fetching CBP state data: year=2021, state=06, naics=54", "source": {"file": "cbp_connector.py", "line": 266, "function": "get_state_data"}, "levelname": "INFO", "taskName": "Task-29"}
{"timestamp": "2025-10-19T20:57:52.142874Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Cache hit", "source": {"file": "base_connector.py", "line": 178, "function": "_make_request"}, "levelname": "INFO", "taskName": "Task-29", "url": "https://api.census.gov/data/2021/cbp", "cache_key": "e62259418d5d3cee"}
{"timestamp": "2025-10-19T20:57:52.146173Z", "level": "INFO", "name": "CountyBusinessPatternsConnector", "message": "Retrieved 93 CBP state records", "source": {"file": "cbp_connector.py", "line": 284, "function": "get_state_data"}, "levelname": "INFO", "taskName": "Task-29"}
{"timestamp": "2025-10-19T20:57:52.146624Z", "level": "

## 5. Summary & Next Steps

**What We Covered:**
- ✅ Initialized 5 different data connectors
- ✅ Fetched business patterns (CBP) and employment flows (LEHD)
- ✅ Combined data sources for comprehensive analysis
- ✅ Created helper functions for multi-state analysis
- ✅ Demonstrated caching for performance

**Next Steps:**
1. Add time-series analysis across multiple years
2. Incorporate FRED economic indicators (unemployment, GDP)
3. Add BLS labor statistics for wage comparisons
4. Include BEA regional accounts for income analysis
5. Create visualizations combining all data sources

**Resources:**
- 📚 [Full Documentation](https://docs.krlabs.dev/data-connectors)
- 💻 [GitHub Repository](https://github.com/KR-Labs/krl-data-connectors)
- 📊 [More Examples](https://github.com/KR-Labs/krl-data-connectors/tree/main/examples)