# Business Acquisition Opportunity Scoring Algorithm

This notebook analyzes business listings from BizBuySell and applies a comprehensive scoring system to identify the best acquisition opportunities based on multiple criteria including price, location, business type, and market potential.

## Section 1: Fetch Business Data

Import and run the fetch_businesses script to load data from the BizBuySell API with caching.


In [33]:
import sys
from pathlib import Path

# Add current directory to path to import fetch_businesses
sys.path.insert(0, str(Path.cwd()))

from fetch_businesses import fetch_businesses

# Fetch business data (will use cache if available)
print("Fetching business data...")
api_response = fetch_businesses(use_cache=True)

print(f"‚úì Data fetch complete")
print(f"Response keys: {api_response.keys()}")


Fetching business data...
Loaded 0 businesses from cache
‚úì Data fetch complete
Response keys: dict_keys(['status', 'timeMs', 'value', 'message', '_cached_at'])


In [34]:
import json
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from typing import List, Dict, Tuple
import re
from datetime import datetime

# Use the API response from cell 1 (already fetched)
raw_data = api_response

print("‚úì API data loaded successfully")
print(f"Response contains: {list(raw_data.keys())}")
if 'bfsSearchResult' in raw_data:
    listings_count = len(raw_data['bfsSearchResult'].get('listings', []))
    print(f"Total listings from API: {listings_count}")

‚úì API data loaded successfully
Response contains: ['status', 'timeMs', 'value', 'message', '_cached_at']


## Section 2: Extract and Explore Data Structure

Parse the nested JSON to extract business listings and examine relevant fields for analysis.

In [35]:
# Extract business listings from API response
def extract_businesses(data: Dict) -> List[Dict]:
    """Extract business listings from the BizBuySell API response"""
    businesses = []
    
    try:
        # API response structure uses bfsSearchResult.listings
        if 'bfsSearchResult' in data:
            listings = data['bfsSearchResult'].get('listings', [])
        else:
            # Fallback to old schema structure if available
            listings = data['value']['schemaElements']['listProductItemSchema']
        
        for item in listings:
            # Handle API response format
            if isinstance(item, dict):
                # Determine the structure type
                if 'header' in item:  # API format
                    extracted = {
                        'position': item.get('positionNumber', 0),
                        'name': item.get('header', 'N/A'),
                        'productId': item.get('id', 'N/A'),
                        'description': item.get('snippet', 'N/A'),
                        'url': item.get('url', 'N/A'),
                        'price': item.get('price', 0),
                        'availability': 'http://schema.org/InStock',  # API listings are all in stock
                        'address_locality': item.get('cityState', '').split(',')[0].strip() if ',' in item.get('cityState', '') else item.get('cityState', 'N/A'),
                        'address_region': item.get('stateCode', 'N/A'),
                    }
                    businesses.append(extracted)
                elif '@type' in item and item.get('@type') == 'ListItem':  # Schema format
                    business = item.get('item', {})
                    extracted = {
                        'position': item.get('position'),
                        'name': business.get('name', 'N/A'),
                        'productId': business.get('productId', 'N/A'),
                        'description': business.get('description', 'N/A'),
                        'url': business.get('url', 'N/A'),
                        'price': business.get('offers', {}).get('price', 0),
                        'availability': business.get('offers', {}).get('availability', 'N/A'),
                        'address_locality': business.get('offers', {}).get('availableAtOrFrom', {}).get('address', {}).get('addressLocality', 'N/A'),
                        'address_region': business.get('offers', {}).get('availableAtOrFrom', {}).get('address', {}).get('addressRegion', 'N/A'),
                    }
                    businesses.append(extracted)
    except Exception as e:
        print(f"Error navigating data structure: {e}")
    
    return businesses

# Extract all businesses
businesses = extract_businesses(raw_data)
df_raw = pd.DataFrame(businesses)

print(f"‚úì Extracted {len(df_raw)} business listings")
print(f"\nColumns: {df_raw.columns.tolist()}")
print(f"\nFirst few businesses:")
print(df_raw[['name', 'price', 'address_locality', 'address_region']].head(10))

‚úì Extracted 56 business listings

Columns: ['position', 'name', 'productId', 'description', 'url', 'price', 'availability', 'address_locality', 'address_region']

First few businesses:
                                                name     price  \
0                      Lucrative Consulting Business   55000.0   
1                  LOOK-Turnkey Breakfast-$100k dn-.  225000.0   
2   Established Bakery and Deli with Loyal Following  160000.0   
3  Agreement Pending - Established Italian Restau...  250000.0   
4  Riverfront Tranquility: A Proven Day Spa Busin...  139000.0   
5    Online RV Furniture Store 3k Profit Projections  112990.0   
6  Senior Transition Franchise ‚Äì Strong, Affluent...   79900.0   
7  Highly profitable window cleaning business in ...   60000.0   
8             Incredible Kids Focused Amazon Listing   72550.0   
9  Established Massage Business for Sale with Unl...   75000.0   

  address_locality address_region  
0   Norfolk County             MA  
1       Fram

## Section 3: Define Scoring Criteria and Weighting System

The acquisition scoring algorithm evaluates businesses across multiple dimensions:

**Scoring Factors:**
1. **Price-to-Value Ratio (25%)** - Lower prices relative to business type are better
2. **Location Desirability (20%)** - Boston metro area and high-demand markets score higher
3. **Business Stability (20%)** - Established, proven businesses with recurring revenue
4. **Market Opportunity (15%)** - Growth potential and recurring revenue models
5. **Price Range Efficiency (20%)** - Optimal price window for ROI ($100K-$1M range)

In [36]:
# Define scoring weights and thresholds
SCORING_CONFIG = {
    'weights': {
        'price_value': 0.25,
        'location': 0.20,
        'stability': 0.20,
        'opportunity': 0.15,
        'price_efficiency': 0.20
    },
    'high_value_locations': [
        'Boston', 'Cambridge', 'Brookline', 'Newton', 'Wellesley',
        'Needham', 'Waltham', 'Arlington', 'Somerville', 'Watertown'
    ],
    'metro_areas': ['Middlesex County', 'Suffolk County', 'Essex County'],
    'price_range_target': (100000, 1000000),  # Optimal ROI range
    'established_keywords': [
        'established', 'profitable', 'well-established', 'proven',
        'turnkey', 'successful', 'steady', 'growing'
    ],
    'recurring_revenue_keywords': [
        'subscription', 'franchise', 'license', 'contract', 'recurring',
        'multi-unit', 'scalable', 'passive', 'recurring'
    ],
    'high_potential_industries': {
        'Healthcare': 1.2,  # High margins, recurring
        'Professional Services': 1.15,  # Recurring revenue
        'Technology/SaaS': 1.2,  # Scalable, recurring
        'Food Service': 0.9,  # Thin margins, labor-intensive
        'Retail': 0.85,  # Declining industry
        'Service': 1.0,  # Stable, repeatable
        'Education': 1.1,  # Growing demand
        'Real Estate/Property': 1.05,  # Stable income
    },
    # PERSONAL PREFERENCES - Adjust these to favor/disfavor business types
    # 1.0 = neutral, >1.0 = preferred, <1.0 = less preferred
    'business_type_preferences': {
        'Healthcare': 0.8,            
        'Technology/SaaS': 0.9,       
        'Professional Services': 1.0, 
        'Food Service': 0.7,          
        'Service': 1.0,               
        'Education': 0.8,             
        'Retail': 1.0,                
        'Other': 1.0,                 
    }
}

print("‚úì Scoring configuration loaded")
print(f"Weights: {SCORING_CONFIG['weights']}")
print(f"Price efficiency target: ${SCORING_CONFIG['price_range_target'][0]:,} - ${SCORING_CONFIG['price_range_target'][1]:,}")
print(f"\nBusiness Type Preferences:")
for industry, preference in SCORING_CONFIG['business_type_preferences'].items():
    print(f"  {industry}: {preference:.2f}x")


‚úì Scoring configuration loaded
Weights: {'price_value': 0.25, 'location': 0.2, 'stability': 0.2, 'opportunity': 0.15, 'price_efficiency': 0.2}
Price efficiency target: $100,000 - $1,000,000

Business Type Preferences:
  Healthcare: 0.80x
  Technology/SaaS: 0.90x
  Professional Services: 1.00x
  Food Service: 0.70x
  Service: 1.00x
  Education: 0.80x
  Retail: 1.00x
  Other: 1.00x


## Section 4: Implement Filtering Logic

Filter businesses based on key criteria to focus on viable acquisition targets.

In [37]:
def classify_industry(name: str, description: str) -> str:
    """Classify business into industry categories"""
    text = (name + ' ' + description).lower()
    
    # Healthcare & Personal Services
    if any(word in text for word in ['dental', 'medical', 'practice', 'healthcare', 'spa', 'salon', 'barber', 'salon']):
        return 'Healthcare'
    # Technology/SaaS
    elif any(word in text for word in ['software', 'saas', 'tech', 'app', 'digital', 'web', 'it ']):
        return 'Technology/SaaS'
    # Food Service
    elif any(word in text for word in ['restaurant', 'pizza', 'bar', 'cafe', 'diner', 'bakery', 'catering']):
        return 'Food Service'
    # Retail - standalone stores (no franchise)
    elif any(word in text for word in ['convenience store', 'liquor store', 'retail', 'store', 'shop']) and 'franchise' not in text:
        return 'Retail'
    # Franchise/Multi-unit models
    elif any(word in text for word in ['franchise', 'franchised', 'c-store', 'quick service']):
        return 'Professional Services'  # Treat franchises as higher-potential
    # Service Businesses
    elif any(word in text for word in ['cleaning', 'maintenance', 'plumbing', 'hvac', 'maid', 'painting', 'vending', 'screen printing', 'lash', 'beauty', 'salon']):
        return 'Service'
    # Education
    elif any(word in text for word in ['education', 'school', 'training', 'tutoring', 'learning']):
        return 'Education'
    else:
        return 'Other'

def apply_filters(df: pd.DataFrame, min_price: float = 0, max_price: float = float('inf'), 
                  min_location_quality: bool = False) -> pd.DataFrame:
    """Apply basic filters to the dataset"""
    df_filtered = df.copy()
    
    # Filter by price range
    df_filtered = df_filtered[(df_filtered['price'] >= min_price) & (df_filtered['price'] <= max_price)]
    
    # Filter by availability (only InStock items)
    df_filtered = df_filtered[df_filtered['availability'] == 'http://schema.org/InStock']
    
    # Remove businesses with missing critical data
    df_filtered = df_filtered[df_filtered['price'] > 0]
    
    # Optional: filter by location quality
    if min_location_quality:
        quality_locations = SCORING_CONFIG['high_value_locations'] + SCORING_CONFIG['metro_areas']
        df_filtered = df_filtered[
            df_filtered['address_locality'].isin(quality_locations) | 
            df_filtered['address_region'].str.contains('MA', case=False, na=False)
        ]
    
    return df_filtered

# Apply filters
df_filtered = apply_filters(df_raw, min_price=50000, max_price=2000000)
print(f"‚úì Applied filters:")
print(f"  - Original businesses: {len(df_raw)}")
print(f"  - After filtering: {len(df_filtered)}")
print(f"  - Filtered out: {len(df_raw) - len(df_filtered)}")

# Add industry classification
df_filtered['industry'] = df_filtered.apply(
    lambda row: classify_industry(row['name'], row['description']), 
    axis=1
)

print(f"\nIndustry breakdown:")
print(df_filtered['industry'].value_counts())


‚úì Applied filters:
  - Original businesses: 56
  - After filtering: 56
  - Filtered out: 0

Industry breakdown:
industry
Other                    14
Technology/SaaS          11
Food Service              9
Retail                    8
Healthcare                6
Professional Services     5
Service                   2
Education                 1
Name: count, dtype: int64


## Section 5: Calculate Opportunity Scores

Calculate composite acquisition opportunity scores based on weighted criteria.

In [38]:
# Define individual scoring functions
def score_price_value(price: float, industry: str) -> float:
    """Score based on price relative to industry benchmarks (0-100)"""
    # Normalize price: lower prices get higher scores
    # Industry-specific price expectations
    industry_benchmarks = {
        'Healthcare': 450000,
        'Technology/SaaS': 350000,
        'Professional Services': 250000,
        'Service': 200000,
        'Education': 300000,
        'Real Estate/Property': 600000,
        'Food Service': 300000,
        'Retail': 250000,
        'Other': 300000
    }
    
    benchmark = industry_benchmarks.get(industry, 300000)
    
    # Score: higher for prices below benchmark
    if price <= benchmark * 0.7:
        score = 100
    elif price <= benchmark:
        score = 85 - (price - benchmark * 0.7) / (benchmark * 0.3) * 15
    elif price <= benchmark * 1.5:
        score = 70 - (price - benchmark) / (benchmark * 0.5) * 30
    else:
        score = max(20, 40 - (price - benchmark * 1.5) / 500000 * 20)
    
    return max(0, min(100, score))

def score_location(locality: str, region: str) -> float:
    """Score based on location desirability (0-100)"""
    # High-value locations
    high_value = SCORING_CONFIG['high_value_locations']
    metro_areas = SCORING_CONFIG['metro_areas']
    
    if locality in high_value:
        return 95
    elif region in metro_areas or region == 'MA':
        return 80
    elif region in ['CT', 'RI', 'VT', 'NH', 'ME']:
        return 60  # New England region
    else:
        return 40

def score_stability(description: str) -> float:
    """Score based on indicators of business stability (0-100)"""
    text = description.lower()
    
    stability_score = 50  # Base score
    
    # Keywords indicating stability
    if 'established' in text:
        stability_score += 15
    if 'profitable' in text or 'profitability' in text:
        stability_score += 12
    if 'proven' in text:
        stability_score += 10
    if 'successful' in text:
        stability_score += 10
    if 'steady' in text or 'steady growth' in text:
        stability_score += 8
    if 'growing' in text or 'growth' in text:
        stability_score += 10
    if 'turnkey' in text:
        stability_score += 8
    
    # Negative indicators
    if 'struggling' in text or 'challenged' in text:
        stability_score -= 20
    if 'startup' in text or 'new' in text:
        stability_score -= 15
    if 'declining' in text:
        stability_score -= 25
    
    return max(0, min(100, stability_score))

def score_opportunity(name: str, description: str, industry: str) -> float:
    """Score based on growth and recurring revenue potential (0-100)"""
    text = (name + ' ' + description).lower()
    
    opportunity_score = 50  # Base score
    
    # Recurring revenue indicators
    if 'subscription' in text:
        opportunity_score += 20
    if 'franchise' in text:
        opportunity_score += 15
    if 'license' in text:
        opportunity_score += 12
    if 'recurring' in text:
        opportunity_score += 15
    if 'multi-unit' in text or 'multiunit' in text:
        opportunity_score += 18
    if 'scalable' in text:
        opportunity_score += 15
    if 'passive' in text:
        opportunity_score += 10
    
    # Growth indicators
    if 'growing' in text or 'growth' in text:
        opportunity_score += 10
    if 'expanding' in text:
        opportunity_score += 8
    
    # Industry-based opportunity multiplier
    industry_multipliers = SCORING_CONFIG['high_potential_industries']
    multiplier = industry_multipliers.get(industry, 1.0)
    opportunity_score = opportunity_score * multiplier
    
    return max(0, min(100, opportunity_score))

def score_price_efficiency(price: float) -> float:
    """Score based on being in optimal price range for ROI (0-100)"""
    min_target, max_target = SCORING_CONFIG['price_range_target']
    
    if min_target <= price <= max_target:
        # Perfect zone - award based on proximity to midpoint
        midpoint = (min_target + max_target) / 2
        deviation = abs(price - midpoint)
        max_deviation = (max_target - min_target) / 2
        score = 100 - (deviation / max_deviation) * 20
        return score
    elif price < min_target:
        # Below minimum - less likely to be quality business
        score = 50 + (min_target - price) / min_target * 20
        return score
    else:
        # Above maximum - lower ROI potential
        deviation = price - max_target
        score = max(20, 100 - (deviation / max_target) * 50)
        return score

def calculate_composite_score(row: pd.Series) -> float:
    """Calculate weighted composite score with personal preferences"""
    weights = SCORING_CONFIG['weights']
    
    price_value = score_price_value(row['price'], row['industry'])
    location = score_location(row['address_locality'], row['address_region'])
    stability = score_stability(row['description'])
    opportunity = score_opportunity(row['name'], row['description'], row['industry'])
    price_efficiency = score_price_efficiency(row['price'])
    
    composite = (
        price_value * weights['price_value'] +
        location * weights['location'] +
        stability * weights['stability'] +
        opportunity * weights['opportunity'] +
        price_efficiency * weights['price_efficiency']
    )
    
    # Apply personal business type preference (includes franchise penalty for Professional Services)
    preference_multiplier = SCORING_CONFIG['business_type_preferences'].get(row['industry'], 1.0)
    composite = composite * preference_multiplier
    
    return composite

# Apply scoring functions to create score columns
print("Calculating opportunity scores...")
df_filtered['score_price_value'] = df_filtered.apply(lambda row: score_price_value(row['price'], row['industry']), axis=1)
df_filtered['score_location'] = df_filtered.apply(lambda row: score_location(row['address_locality'], row['address_region']), axis=1)
df_filtered['score_stability'] = df_filtered.apply(lambda row: score_stability(row['description']), axis=1)
df_filtered['score_opportunity'] = df_filtered.apply(lambda row: score_opportunity(row['name'], row['description'], row['industry']), axis=1)
df_filtered['score_price_efficiency'] = df_filtered.apply(lambda row: score_price_efficiency(row['price']), axis=1)
df_filtered['opportunity_score'] = df_filtered.apply(lambda row: calculate_composite_score(row), axis=1)

print(f"‚úì Scores calculated for {len(df_filtered)} businesses")
print(f"  - Average opportunity score: {df_filtered['opportunity_score'].mean():.1f}")
print(f"  - Median opportunity score: {df_filtered['opportunity_score'].median():.1f}")
print(f"  - Score range: {df_filtered['opportunity_score'].min():.1f} - {df_filtered['opportunity_score'].max():.1f}")


Calculating opportunity scores...
‚úì Scores calculated for 56 businesses
  - Average opportunity score: 62.5
  - Median opportunity score: 63.6
  - Score range: 44.5 - 84.2


## Section 6: Sort and Rank Businesses

Sort businesses by opportunity score to identify top acquisition candidates.

In [39]:
# Sort businesses by opportunity score
df_ranked = df_filtered.sort_values('opportunity_score', ascending=False).reset_index(drop=True)
df_ranked['rank'] = range(1, len(df_ranked) + 1)

# Display top 15 opportunities
print("=" * 120)
print("TOP ACQUISITION OPPORTUNITIES")
print("=" * 120)

top_15 = df_ranked.head(15)[['rank', 'name', 'industry', 'price', 'address_locality', 
                               'opportunity_score', 'score_stability', 'score_location']]

for idx, row in top_15.iterrows():
    print(f"\n{int(row['rank'])}. {row['name'][:70]}")
    print(f"   Industry: {row['industry']} | Price: ${row['price']:,.0f}")
    locality = row.get('address_locality', 'N/A')
    region = df_ranked.loc[idx, 'address_region'] if idx < len(df_ranked) else 'N/A'
    print(f"   Location: {locality}, {region}")
    print(f"   üìä Opportunity Score: {row['opportunity_score']:.1f}/100")
    print(f"      ‚îî‚îÄ Stability: {row['score_stability']:.1f} | Location: {row['score_location']:.1f}")

# Summary table
print("\n" + "=" * 120)
print("DETAILED RANKING TABLE (Top 20)")
print("=" * 120)

summary_df = df_ranked.head(20)[['rank', 'name', 'industry', 'price', 'opportunity_score', 
                                   'score_price_value', 'score_stability', 'score_location', 
                                   'score_opportunity', 'score_price_efficiency']].copy()

# Shorten name for display
summary_df['name'] = summary_df['name'].str[:50]

print(summary_df.to_string(index=False))

print(f"\n‚úì Total opportunities ranked: {len(df_ranked)}")
print(f"‚úì Average opportunity score: {df_ranked['opportunity_score'].mean():.1f}")
print(f"‚úì Median opportunity score: {df_ranked['opportunity_score'].median():.1f}")

TOP ACQUISITION OPPORTUNITIES

1. Established StretchLab Franchise With Loyal Clients
   Industry: Professional Services | Price: $100,000
   Location: Wellesley,  MA
   üìä Opportunity Score: 84.2/100
      ‚îî‚îÄ Stability: 65.0 | Location: 95.0

2. Established Commercial and Residential Cleaning Company
   Industry: Service | Price: $200,000
   Location: Boston,  MA
   üìä Opportunity Score: 73.9/100
      ‚îî‚îÄ Stability: 65.0 | Location: 95.0

3. Healthy Snack and Beverage Vending Business
   Industry: Service | Price: $128,900
   Location: Worcester,  MA
   üìä Opportunity Score: 73.3/100
      ‚îî‚îÄ Stability: 75.0 | Location: 40.0

4. Highly Successful Sandwich Deli, Chestnut Hill MA
   Industry: Retail | Price: $160,000
   Location: Middlesex County,  MA
   üìä Opportunity Score: 72.9/100
      ‚îî‚îÄ Stability: 85.0 | Location: 40.0

5. Expedia Cruises
   Industry: Professional Services | Price: $150,000
   Location: None, Available in Massachusetts
   üìä Opportunity 

## Section 7: Visualize Results

Create interactive visualizations to compare business opportunities.

In [40]:
# Visualization 1: Top 15 Opportunities Bar Chart
top_n = 15
viz_data = df_ranked.head(top_n).copy()
viz_data['short_name'] = viz_data['name'].str[:40]

fig1 = go.Figure(data=[
    go.Bar(
        x=viz_data['opportunity_score'].values,
        y=viz_data['short_name'].values,
        orientation='h',
        marker=dict(
            color=viz_data['opportunity_score'].values,
            colorscale='Viridis',
            showscale=True,
            colorbar=dict(title="Score")
        ),
        text=viz_data['opportunity_score'].round(1),
        textposition='auto',
        hovertemplate='<b>%{y}</b><br>Score: %{x:.1f}<extra></extra>'
    )
])

fig1.update_layout(
    title=f'Top {top_n} Business Acquisition Opportunities by Score',
    xaxis_title='Opportunity Score',
    yaxis_title='Business Name',
    height=600,
    margin=dict(l=250, r=100),
    template='plotly_white'
)
fig1.write_html('./results/chart_1_top_opportunities.html')
print("‚úì Chart 1 displayed (saved to results/chart_1_top_opportunities.html)")

‚úì Chart 1 displayed (saved to results/chart_1_top_opportunities.html)


In [41]:
# Visualization 2: Price vs Opportunity Score Scatter Plot
fig2 = px.scatter(
    df_ranked,
    x='price',
    y='opportunity_score',
    color='industry',
    size='score_stability',
    hover_name='name',
    hover_data={'price': '$,.0f', 'opportunity_score': ':.1f'},
    title='Business Price vs Acquisition Opportunity Score',
    labels={'price': 'Price ($)', 'opportunity_score': 'Opportunity Score'},
    height=600
)

fig2.update_layout(
    xaxis_title='Price ($)',
    yaxis_title='Opportunity Score',
    hovermode='closest',
    template='plotly_white'
)
fig2.write_html('./results/chart_2_price_vs_score.html')
print("‚úì Chart 2 displayed (saved to results/chart_2_price_vs_score.html)")

‚úì Chart 2 displayed (saved to results/chart_2_price_vs_score.html)


In [42]:
# Visualization 3: Score Components Breakdown for Top 5
top_5 = df_ranked.head(5).copy()

fig3 = go.Figure()

score_components = ['score_price_value', 'score_location', 'score_stability', 
                    'score_opportunity', 'score_price_efficiency']
component_labels = ['Price Value', 'Location', 'Stability', 'Opportunity', 'Price Efficiency']

for component, label in zip(score_components, component_labels):
    fig3.add_trace(go.Scatterpolar(
        r=top_5[component].values,
        theta=component_labels,
        fill='toself',
        name=top_5['name'].iloc[0][:30]
    ))

fig3.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 100]
        )),
    title='Score Components Breakdown - Top 5 Opportunities',
    height=600,
    showlegend=False,
    template='plotly_white'
)

# Create individual radars for each top 5
for idx, (_, row) in enumerate(top_5.iterrows()):
    fig_temp = go.Figure()
    
    fig_temp.add_trace(go.Scatterpolar(
        r=[row['score_price_value'], row['score_location'], row['score_stability'],
           row['score_opportunity'], row['score_price_efficiency']],
        theta=component_labels,
        fill='toself',
        name=row['name'][:30]
    ))
    
    fig_temp.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 100]
            )),
        title=f"#{int(row['rank'])} - {row['name'][:50]}<br>Score: {row['opportunity_score']:.1f}",
        height=500,
        template='plotly_white'
    )
    rank_num = int(row['rank'])
    fig_temp.write_html(f'./results/chart_3_radar_rank_{rank_num}.html')

print("‚úì Chart 3 (Radar charts) displayed (saved as chart_3_radar_rank_*.html)")

‚úì Chart 3 (Radar charts) displayed (saved as chart_3_radar_rank_*.html)


In [43]:
# Visualization 4: Industry Distribution and Average Scores
from plotly.subplots import make_subplots

industry_stats = df_ranked.groupby('industry').agg({
    'opportunity_score': ['mean', 'count'],
    'price': 'mean'
}).round(1)

industry_stats.columns = ['Avg Score', 'Count', 'Avg Price']
industry_stats = industry_stats.sort_values('Avg Score', ascending=False)

fig4 = make_subplots(specs=[[{"secondary_y": True}]])

fig4.add_trace(
    go.Bar(x=industry_stats.index, y=industry_stats['Avg Score'], 
           name='Avg Score', marker_color='rgba(99, 110, 250, 0.7)'),
    secondary_y=False,
)

fig4.add_trace(
    go.Scatter(x=industry_stats.index, y=industry_stats['Count'], 
               name='Count', marker=dict(size=10, color='red')),
    secondary_y=True,
)

fig4.update_xaxes(title_text="Industry")
fig4.update_yaxes(title_text="Average Opportunity Score", secondary_y=False)
fig4.update_yaxes(title_text="Number of Businesses", secondary_y=True)
fig4.update_layout(
    title="Opportunity Scores by Industry",
    height=500,
    hovermode='x unified',
    template='plotly_white'
)
fig4.write_html('./results/chart_4_industry_analysis.html')
print("‚úì Chart 4 displayed (saved to results/chart_4_industry_analysis.html)")
print("\nIndustry Analysis:")
print(industry_stats)

‚úì Chart 4 displayed (saved to results/chart_4_industry_analysis.html)

Industry Analysis:
                       Avg Score  Count  Avg Price
industry                                          
Service                     73.6      2   164450.0
Professional Services       71.8      5   119380.0
Other                       66.7     14   143207.1
Retail                      65.9      8   147187.5
Technology/SaaS             63.6     11   120187.6
Healthcare                  57.8      6   154498.2
Education                   50.3      1    60000.0
Food Service                48.7      9   165333.3


In [44]:
# Visualization 5: Location Heatmap
location_analysis = df_ranked.groupby('address_locality').agg({
    'opportunity_score': ['mean', 'count'],
    'price': 'mean'
}).round(1)

location_analysis.columns = ['Avg Score', 'Count', 'Avg Price']
location_analysis = location_analysis[location_analysis['Count'] >= 2].sort_values('Avg Score', ascending=False)

fig5 = go.Figure(data=[
    go.Bar(
        y=location_analysis.index,
        x=location_analysis['Avg Score'],
        orientation='h',
        marker=dict(
            color=location_analysis['Avg Score'],
            colorscale='RdYlGn',
            showscale=True,
            colorbar=dict(title="Avg Score")
        ),
        text=location_analysis['Avg Score'].round(1),
        textposition='auto',
        hovertemplate='<b>%{y}</b><br>Avg Score: %{x:.1f}<br>Count: %{customdata}<extra></extra>',
        customdata=location_analysis['Count']
    )
])

fig5.update_layout(
    title='Average Opportunity Score by Location (Min. 2 businesses)',
    xaxis_title='Average Opportunity Score',
    yaxis_title='Location',
    height=500,
    margin=dict(l=150),
    template='plotly_white'
)
fig5.write_html('./results/chart_5_location_analysis.html')
print("‚úì Chart 5 displayed (saved to results/chart_5_location_analysis.html)")

‚úì Chart 5 displayed (saved to results/chart_5_location_analysis.html)


## Section 8: Summary and Key Insights

Analyze overall results and provide actionable recommendations.

In [45]:
# Generate comprehensive summary report
print("\n" + "=" * 120)
print("ACQUISITION STRATEGY SUMMARY & RECOMMENDATIONS")
print("=" * 120)

top_1 = df_ranked.iloc[0]
print(f"\nüèÜ HIGHEST OPPORTUNITY (Rank #1):")
print(f"   Name: {top_1['name']}")
print(f"   Industry: {top_1['industry']}")
print(f"   Price: ${top_1['price']:,.0f}")
print(f"   Location: {top_1['address_locality']}, {top_1['address_region']}")
print(f"   Overall Score: {top_1['opportunity_score']:.1f}/100")
print(f"   Why: Strong {top_1['industry']} opportunity with excellent location and stability")

# Segment analysis
print(f"\nüìä MARKET SEGMENTATION:")

price_segments = {
    'Under $200K': df_ranked[df_ranked['price'] < 200000],
    '$200K - $500K': df_ranked[(df_ranked['price'] >= 200000) & (df_ranked['price'] < 500000)],
    '$500K - $1M': df_ranked[(df_ranked['price'] >= 500000) & (df_ranked['price'] < 1000000)],
    'Over $1M': df_ranked[df_ranked['price'] >= 1000000]
}

for segment, segment_df in price_segments.items():
    if len(segment_df) > 0:
        print(f"\n   {segment}:")
        print(f"      Count: {len(segment_df)} businesses")
        print(f"      Avg Score: {segment_df['opportunity_score'].mean():.1f}")
        print(f"      Top Opportunity: {segment_df.iloc[0]['name'][:50]} ({segment_df.iloc[0]['opportunity_score']:.1f})")

# Industry recommendations
print(f"\nüéØ INDUSTRY RECOMMENDATIONS:")
industry_ranking = df_ranked.groupby('industry')['opportunity_score'].agg(['mean', 'count']).sort_values('mean', ascending=False)

for idx, (industry, row) in enumerate(industry_ranking.iterrows(), 1):
    if row['count'] > 0:
        print(f"   {idx}. {industry}: Avg Score {row['mean']:.1f} ({int(row['count'])} opportunities)")

# Risk factors
print(f"\n‚ö†Ô∏è  KEY DECISION FACTORS:")
print(f"   ‚Ä¢ Price Range: ${df_ranked['price'].min():,.0f} - ${df_ranked['price'].max():,.0f}")
print(f"   ‚Ä¢ Average Price: ${df_ranked['price'].mean():,.0f}")
print(f"   ‚Ä¢ Median Price: ${df_ranked['price'].median():,.0f}")
print(f"   ‚Ä¢ Best Location: {df_ranked['score_location'].idxmax()} with score {df_ranked['score_location'].max():.1f}")
print(f"   ‚Ä¢ Most Established: {df_ranked.loc[df_ranked['score_stability'].idxmax(), 'name'][:50]}")

# Filtering recommendations
print(f"\n‚úÖ ACQUISITION CRITERIA RECOMMENDATIONS:")
print(f"   ‚Ä¢ Target Price Range: $200,000 - $750,000 (optimal ROI window)")
print(f"   ‚Ä¢ Target Industries: Healthcare, Technology/SaaS, Professional Services")
print(f"   ‚Ä¢ Target Locations: Boston metro area (higher growth/stability)")
print(f"   ‚Ä¢ Minimum Stability Score: 70+ (established, proven track record)")
print(f"   ‚Ä¢ Minimum Opportunity Score: 70+ (strong growth/recurring revenue potential)")

# Create exportable ranking
export_df = df_ranked[[
    'rank', 'name', 'industry', 'price', 'address_locality', 
    'opportunity_score', 'score_stability', 'score_location', 'score_opportunity',
    'url'
]].head(30).copy()

export_df.columns = [
    'Rank', 'Business Name', 'Industry', 'Price', 'Location',
    'Opportunity Score', 'Stability', 'Location Score', 'Growth Score', 'URL'
]

print(f"\nüíæ TOP 30 OPPORTUNITIES (Ready for Export):")
print(export_df.to_string(index=False))


ACQUISITION STRATEGY SUMMARY & RECOMMENDATIONS

üèÜ HIGHEST OPPORTUNITY (Rank #1):
   Name: Established StretchLab Franchise With Loyal Clients
   Industry: Professional Services
   Price: $100,000
   Location: Wellesley,  MA
   Overall Score: 84.2/100
   Why: Strong Professional Services opportunity with excellent location and stability

üìä MARKET SEGMENTATION:

   Under $200K:
      Count: 44 businesses
      Avg Score: 62.2
      Top Opportunity: Established StretchLab Franchise With Loyal Client (84.2)

   $200K - $500K:
      Count: 12 businesses
      Avg Score: 63.6
      Top Opportunity: Established Commercial and Residential Cleaning Co (73.9)

üéØ INDUSTRY RECOMMENDATIONS:
   1. Service: Avg Score 73.6 (2 opportunities)
   2. Professional Services: Avg Score 71.8 (5 opportunities)
   3. Other: Avg Score 66.7 (14 opportunities)
   4. Retail: Avg Score 65.9 (8 opportunities)
   5. Technology/SaaS: Avg Score 63.6 (11 opportunities)
   6. Healthcare: Avg Score 57.8 (6 opport

In [46]:
# Quick Reference: Top 10 Condensed Table with Clickable URLs
from IPython.display import HTML

print("\n" + "=" * 120)
print("QUICK REFERENCE: TOP 10 OPPORTUNITIES (WITH CLICKABLE LINKS)")
print("=" * 120)

# Create HTML table with clickable links
html_content = """
<table style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; font-size: 12px; color: #000;">
    <tr style="background-color: #f0f0f0; font-weight: bold; border-bottom: 2px solid #333;">
        <td style="padding: 8px; border: 1px solid #ddd; width: 4%; color: #000;">#</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 25%; color: #000;">Business</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 15%; color: #000;">Industry</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 10%; color: #000;">Price</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 8%; color: #000;">Score</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 18%; color: #000;">Location</td>
        <td style="padding: 8px; border: 1px solid #ddd; width: 20%; color: #000;">Link</td>
    </tr>
"""

for idx, row in df_ranked.head(10).iterrows():
    rank = int(row['rank'])
    business_name = row['name'][:30]
    industry = row['industry']
    price = f"${row['price']/1000:.0f}K"
    score = f"{row['opportunity_score']:.1f}"
    location = row['address_locality']
    url = row['url']
    
    # Alternate row colors for better readability
    bg_color = "#ffffff" if rank % 2 == 1 else "#f9f9f9"
    
    html_content += f"""
    <tr style="background-color: {bg_color}; border-bottom: 1px solid #ddd;">
        <td style="padding: 8px; border: 1px solid #ddd; text-align: center; font-weight: bold; color: #000;">{rank}</td>
        <td style="padding: 8px; border: 1px solid #ddd; color: #000;">{business_name}</td>
        <td style="padding: 8px; border: 1px solid #ddd; color: #000;">{industry}</td>
        <td style="padding: 8px; border: 1px solid #ddd; text-align: right; color: #000;">{price}</td>
        <td style="padding: 8px; border: 1px solid #ddd; text-align: center; color: #000;">{score}/100</td>
        <td style="padding: 8px; border: 1px solid #ddd; color: #000;">{location}</td>
        <td style="padding: 8px; border: 1px solid #ddd;"><a href="{url}" target="_blank" style="color: #0066cc; text-decoration: none;">üîó View</a></td>
    </tr>
"""

html_content += """
</table>
"""

display(HTML(html_content))
print("\n" + "=" * 120)


QUICK REFERENCE: TOP 10 OPPORTUNITIES (WITH CLICKABLE LINKS)


0,1,2,3,4,5,6
#,Business,Industry,Price,Score,Location,Link
1,Established StretchLab Franchi,Professional Services,$100K,84.2/100,Wellesley,üîó View
2,Established Commercial and Res,Service,$200K,73.9/100,Boston,üîó View
3,Healthy Snack and Beverage Ven,Service,$129K,73.3/100,Worcester,üîó View
4,Highly Successful Sandwich Del,Retail,$160K,72.9/100,Middlesex County,üîó View
5,Expedia Cruises,Professional Services,$150K,72.7/100,,üîó View
6,High-Demand Short Term Rental,Other,$159K,71.6/100,Falmouth,üîó View
7,"Mission's Tortilla Route, Sale",Other,$112K,71.6/100,Salem,üîó View
8,Convenience Store w Screen Ken,Retail,$165K,71.0/100,Middlesex County,üîó View
9,iFOAM,Other,$200K,70.9/100,,üîó View





In [47]:
# Print summary of top opportunities with proper formatting
for idx, row in df_ranked.head(10).iterrows():
    rank = int(row['rank'])
    print(f"{rank:>2}. {row['url']}")


 1. https://www.bizbuysell.com/business-opportunity/established-stretchlab-franchise-with-loyal-clients/2447834/
 2. https://www.bizbuysell.com/business-opportunity/established-commercial-and-residential-cleaning-company/2454604/
 3. https://www.bizbuysell.com/business-opportunity/healthy-snack-and-beverage-vending-business/2449395/
 4. https://www.bizbuysell.com/business-opportunity/highly-successful-sandwich-deli-chestnut-hill-ma/2445904/
 5. https://www.bizbuysell.com/franchise-for-sale/expedia-cruises/
 6. https://www.bizbuysell.com/business-opportunity/high-demand-short-term-rental-property-management/2448131/
 7. https://www.bizbuysell.com/business-opportunity/missions-tortilla-route-salem-ma/2451366/
 8. https://www.bizbuysell.com/business-opportunity/convenience-store-w-screen-keno-on-main-street/2410804/
 9. https://www.bizbuysell.com/franchise-for-sale/ifoam/
10. https://www.bizbuysell.com/business-opportunity/senior-transition-franchise-strong-affluent-massachusetts-territor

In [48]:
# Export results to files
import os

# Create results folder if it doesn't exist
results_folder = './results'
os.makedirs(results_folder, exist_ok=True)

# Prepare comprehensive export dataframe
export_full = df_ranked[[
    'rank', 'name', 'industry', 'price', 'address_locality', 'address_region',
    'opportunity_score', 'score_price_value', 'score_stability', 'score_location', 
    'score_opportunity', 'score_price_efficiency', 'description', 'url'
]].copy()

export_full.columns = [
    'Rank', 'Business Name', 'Industry', 'Price', 'City', 'State',
    'Overall Score', 'Price Value Score', 'Stability Score', 'Location Score',
    'Opportunity Score', 'Price Efficiency Score', 'Description', 'URL'
]

# Export to CSV
csv_path = os.path.join(results_folder, 'business_opportunities_ranked.csv')
export_full.to_csv(csv_path, index=False)
print(f"‚úì CSV exported to: {csv_path}")

# Export to JSON
json_path = os.path.join(results_folder, 'business_opportunities_ranked.json')
export_full.to_json(json_path, orient='records', indent=2)
print(f"‚úì JSON exported to: {json_path}")

# Export summary statistics
summary_stats = {
    'total_opportunities': len(df_ranked),
    'average_score': float(df_ranked['opportunity_score'].mean()),
    'median_score': float(df_ranked['opportunity_score'].median()),
    'min_price': float(df_ranked['price'].min()),
    'max_price': float(df_ranked['price'].max()),
    'median_price': float(df_ranked['price'].median()),
    'industry_breakdown': df_ranked['industry'].value_counts().to_dict(),
    'top_5': export_full.head(5).to_dict('records')
}

import json as json_lib
stats_path = os.path.join(results_folder, 'analysis_summary.json')
with open(stats_path, 'w') as f:
    json_lib.dump(summary_stats, f, indent=2)
print(f"‚úì Summary statistics exported to: {stats_path}")

print(f"\n‚úì All results exported to: {results_folder}")


‚úì CSV exported to: ./results/business_opportunities_ranked.csv
‚úì JSON exported to: ./results/business_opportunities_ranked.json
‚úì Summary statistics exported to: ./results/analysis_summary.json

‚úì All results exported to: ./results
