## Setup

Import required libraries

In [1]:
import requests
import json
import pandas as pd
from collections import Counter
from IPython.display import display, HTML, Markdown
import time

# API Configuration
API_BASE = "https://mssoldir-app-prd.azurewebsites.net/api/Industry"

print("‚úÖ Libraries imported successfully")
print(f"üì° API Base URL: {API_BASE}")

‚úÖ Libraries imported successfully
üì° API Base URL: https://mssoldir-app-prd.azurewebsites.net/api/Industry


## 1. Industry Menu API

The **getMenu** endpoint returns the complete industry hierarchy including industries, themes, and solution areas.

In [2]:
# Fetch the industry menu
menu_url = f"{API_BASE}/getMenu"
print(f"üì• Fetching industry menu from: {menu_url}")

response = requests.get(menu_url, timeout=30)
response.raise_for_status()
menu_data = response.json()

print(f"‚úÖ Successfully fetched menu data")
print(f"üìä Number of industries: {len(menu_data)}")

üì• Fetching industry menu from: https://mssoldir-app-prd.azurewebsites.net/api/Industry/getMenu
‚úÖ Successfully fetched menu data
üìä Number of industries: 10


### Display Industry Overview

In [3]:
# Create a DataFrame of industries and their themes
industry_overview = []

for industry in menu_data:
    industry_name = industry.get("industryName", "Unknown")
    sub_industries = industry.get("subIndustries", [])
    
    for sub in sub_industries:
        solution_areas = sub.get("solutionAreas") or []
        industry_overview.append({
            "Industry": industry_name,
            "Theme": sub.get("subIndustryName", "Unknown"),
            "Theme Slug": sub.get("industryThemeSlug", "N/A"),
            "Solution Areas": len(solution_areas)
        })

df_industries = pd.DataFrame(industry_overview)
print(f"\nüìã Total Industries: {df_industries['Industry'].nunique()}")
print(f"üìã Total Themes: {len(df_industries)}")
print(f"\nüîç First 40 themes:")
display(df_industries.head(40))


üìã Total Industries: 10
üìã Total Themes: 40

üîç First 40 themes:


Unnamed: 0,Industry,Theme,Theme Slug,Solution Areas
0,Defense Industrial Base,Sharpen Your Competitive Edge with AI-Infused ...,sharpen-your-competitive-edge-with-cloud-based...,3
1,Education,Institutional Innovation,improve-operational-efficiencies-for-modernize...,3
2,Education,Simplify and Secure IT,simplified-and-secure-it-to-deliver-safe-and-s...,2
3,Education,Student Success,microsoft-and-its-top-education-partners-help1...,3
4,Education,testing industry,,0
5,Energy & Resources,Advance Your Net-Zero Journey,reach-net-zero-commitments-through-emissions-r...,2
6,Energy & Resources,Empower an AI-First Energy Workforce,automate-workflows-and-improve-ai-enabled-recr...,2
7,Energy & Resources,"Grow Sustainable, AI-Powered Businesses",unlock-data-insights-that-accelerate-climate-i...,2
8,Energy & Resources,Operate for a Secure and Efficient Energy Future,energy-companies-transforming-operations-to-po...,2
9,Financial Services,Empowering Employees and Agents,microsoft-is-empowering-employees-in-the-finan...,3


### Industry Summary Statistics

In [4]:
# Count themes per industry
themes_per_industry = df_industries.groupby('Industry').size().sort_values(ascending=False)

print("üìä Themes per Industry:")
for industry, count in themes_per_industry.items():
    print(f"  ‚Ä¢ {industry}: {count} themes")

üìä Themes per Industry:
  ‚Ä¢ Financial Services: 5 themes
  ‚Ä¢ Healthcare & Life Sciences: 5 themes
  ‚Ä¢ Manufacturing & Mobility: 5 themes
  ‚Ä¢ State & Local Government: 5 themes
  ‚Ä¢ Education: 4 themes
  ‚Ä¢ Energy & Resources: 4 themes
  ‚Ä¢ Media & Entertainment: 4 themes
  ‚Ä¢ Retail & Consumer Goods: 4 themes
  ‚Ä¢ Telecommunications: 3 themes
  ‚Ä¢ Defense Industrial Base: 1 themes


### Inspect a Sample Industry Structure

In [5]:
# Display the first industry in detail (formatted JSON)
sample_industry = menu_data[0]

print("üìÑ Sample Industry Structure (Education):")
print("\nIndustry Level:")
print(json.dumps({
    "industryId": sample_industry.get("industryId"),
    "industryName": sample_industry.get("industryName"),
    "industrySlug": sample_industry.get("industrySlug"),
    "hasMultipleTheme": sample_industry.get("hasMultipleThme"),
    "hasSubMenu": sample_industry.get("hasSubMenu")
}, indent=2))

print("\nFirst Theme/Sub-Industry:")
if sample_industry.get("subIndustries"):
    first_theme = sample_industry["subIndustries"][0]
    print(json.dumps({
        "subIndustryName": first_theme.get("subIndustryName"),
        "subIndustrySlug": first_theme.get("subIndustrySlug"),
        "industryThemeSlug": first_theme.get("industryThemeSlug"),
        "solutionAreas": [area.get("solutionAreaName") for area in first_theme.get("solutionAreas", [])]
    }, indent=2))

üìÑ Sample Industry Structure (Education):

Industry Level:
{
  "industryId": "1a3b63eb-c5ae-4006-96fd-30401132676f",
  "industryName": "Defense Industrial Base",
  "industrySlug": "defense-industrial-base",
  "hasMultipleTheme": false,
  "hasSubMenu": true
}

First Theme/Sub-Industry:
{
  "subIndustryName": "Sharpen Your Competitive Edge with AI-Infused Cloud Tech",
  "subIndustrySlug": "sharpen-your-competitive-edge-with-ai-infused-cloud-tech-483",
  "industryThemeSlug": "sharpen-your-competitive-edge-with-cloud-based-digital-first-offerings-infused-with-data-and-ai",
  "solutionAreas": [
    "AI Business Solutions",
    "Cloud and AI Platforms",
    "Security"
  ]
}


## 2. Theme Details API

The **GetThemeDetalsByViewId** endpoint returns all solutions for a specific theme.

In [6]:
# Pick a theme to explore - find one that has solutions
sample_theme_slug = None
sample_theme_name = None
sample_industry_name = None

print("üîç Looking for a theme with solutions...")

for industry in menu_data:
    for sub in industry.get("subIndustries", []):
        slug = sub.get("industryThemeSlug")
        if slug:
            # Test this theme to see if it has solutions
            try:
                test_response = requests.get(
                    f"{API_BASE}/GetThemeDetalsByViewId",
                    params={"slug": slug},
                    timeout=30
                )
                if test_response.status_code == 200:
                    test_data = test_response.json()
                    
                    # Check if this theme has any actual partner solutions
                    has_solutions = False
                    
                    # Check solution areas for partner solutions
                    for area in test_data.get("themeSolutionAreas", []):
                        if area.get("partnerSolutions") and len(area["partnerSolutions"]) > 0:
                            has_solutions = True
                            break
                    
                    # Also check spotlight solutions
                    if not has_solutions and test_data.get("spotLightPartnerSolutions"):
                        has_solutions = len(test_data["spotLightPartnerSolutions"]) > 0
                    
                    if has_solutions:
                        sample_theme_slug = slug
                        sample_theme_name = sub.get("subIndustryName", "Unknown")
                        sample_industry_name = industry.get("industryName", "Unknown")
                        theme_data = test_data  # Use the data we already fetched
                        break
            except Exception as e:
                continue
        
    if sample_theme_slug:
        break

if not sample_theme_slug:
    raise Exception("Could not find a theme with solutions")

print(f"‚úÖ Found theme with solutions: {sample_industry_name} > {sample_theme_name}")
print(f"   Slug: {sample_theme_slug}")
print(f"\nüìä Theme Overview:")
print(f"  ‚Ä¢ Theme Title: {theme_data.get('industryThemeTitle', 'N/A')}")
print(f"  ‚Ä¢ Solution Areas: {len(theme_data.get('themeSolutionAreas', []))}")
print(f"  ‚Ä¢ Spotlight Solutions: {len(theme_data.get('spotLightPartnerSolutions', []))}")

üîç Looking for a theme with solutions...
‚úÖ Found theme with solutions: Education > Institutional Innovation
   Slug: improve-operational-efficiencies-for-modernized-school-experiences-850

üìä Theme Overview:
  ‚Ä¢ Theme Title: N/A
  ‚Ä¢ Solution Areas: 3
  ‚Ä¢ Spotlight Solutions: 3


### Extract Solutions from Theme

In [7]:
# Extract all solutions from this theme
solutions = []

# Regular solutions from solution areas
for area in theme_data.get("themeSolutionAreas", []):
    area_name = area.get("solutionAreaName", "Unknown")
    
    for sol in area.get("partnerSolutions", []):
        title = sol.get("solutionName", "")
        
        # Extract partner from title (format: "Solution by Partner")
        partner = title.split(" by ")[-1] if " by " in title else "Unknown"
        
        solutions.append({
            "Solution ID": sol.get("partnerSolutionId"),
            "Title": title,
            "Partner": partner,
            "Solution Area": area_name,
            "Type": "Regular",
            "Description Length": len(sol.get("solutionDescription", "")),
            "URL": f"https://solutions.microsoftindustryinsights.com/solutiondetails/{sol.get('partnerSolutionSlug', '')}"
        })

# Spotlight solutions
for sol in theme_data.get("spotLightPartnerSolutions", []):
    title = sol.get("solutionName", "")
    partner = title.split(" by ")[-1] if " by " in title else "Unknown"
    
    solutions.append({
        "Solution ID": sol.get("partnerSolutionId"),
        "Title": title,
        "Partner": partner,
        "Solution Area": "Spotlight",
        "Type": "Spotlight",
        "Description Length": len(sol.get("solutionDescription", "")),
        "URL": f"https://solutions.microsoftindustryinsights.com/solutiondetails/{sol.get('partnerSolutionSlug', '')}"
    })

df_solutions = pd.DataFrame(solutions)
print(f"\n‚úÖ Extracted {len(df_solutions)} solutions from this theme")

if len(df_solutions) > 0:
    print(f"\nüîç Sample Solutions:")
    display(df_solutions[['Title', 'Partner', 'Solution Area', 'Type']].head(10))
else:
    print("\n‚ö†Ô∏è  No solutions found in this theme")


‚úÖ Extracted 54 solutions from this theme

üîç Sample Solutions:


Unnamed: 0,Title,Partner,Solution Area,Type
0,greymatter Student Lifecycle CRM by Frequency ...,Frequency Foundry,AI Business Solutions,Regular
1,CDW,Unknown,AI Business Solutions,Regular
2,Blackbaud Enterprise Fundraising CRM,Unknown,AI Business Solutions,Regular
3,Higher Education Blueprint,Unknown,AI Business Solutions,Regular
4,PwC Digital Relationship Management Solution,Unknown,AI Business Solutions,Regular
5,NICE-20240419022414,Unknown,AI Business Solutions,Regular
6,Nerdio Manager for Enterprise,Unknown,AI Business Solutions,Regular
7,PwC Total Professional Effort (TPE),Unknown,AI Business Solutions,Regular
8,Terawe Corporation ManageX,Unknown,AI Business Solutions,Regular
9,AvePoint Inc,Unknown,AI Business Solutions,Regular


### Inspect a Sample Solution

In [8]:
# Get the first solution for detailed inspection
if theme_data.get("themeSolutionAreas") and len(theme_data["themeSolutionAreas"]) > 0:
    first_area = theme_data["themeSolutionAreas"][0]
    if first_area.get("partnerSolutions") and len(first_area["partnerSolutions"]) > 0:
        first_solution = first_area["partnerSolutions"][0]

        print("üìÑ Sample Solution Structure:")
        print("\nKey Fields:")
        print(json.dumps({
            "partnerSolutionId": first_solution.get("partnerSolutionId"),
            "solutionName": first_solution.get("solutionName"),
            "partnerSolutionSlug": first_solution.get("partnerSolutionSlug"),
            "orgName": first_solution.get("orgName"),  # Often null
            "publisherName": first_solution.get("publisherName"),  # Often null
            "descriptionLength": len(first_solution.get("solutionDescription", "")),
        }, indent=2))

        print("\nüìù Solution Description (first 500 chars):")
        description = first_solution.get("solutionDescription", "")
        print(description[:500] + "..." if len(description) > 500 else description)
    else:
        print("‚ö†Ô∏è  No partner solutions found in the first solution area")
else:
    print("‚ö†Ô∏è  No solution areas found in this theme")

üìÑ Sample Solution Structure:

Key Fields:
{
  "partnerSolutionId": "bd8cdf5a-42b2-4396-920f-514c9e040430",
  "solutionName": "greymatter Student Lifecycle CRM by Frequency Foundry",
  "partnerSolutionSlug": "greymatter-student-lifecycle-crm-by-frequency-foundry",
  "orgName": "greymatter Student Lifecycle CRM by Frequency Foundry",
  "publisherName": null,
  "descriptionLength": 3978
}

üìù Solution Description (first 500 chars):
<p>Frequency Foundry&rsquo;s greymatter Student Lifecycle CRM is a Customer Relationship Management solution built from the ground up for higher education that includes robust and flexible, recruiting and admissions features, advising, retention, and early alert functionality, as well as coverage for nearly every aspect of the student lifecycle that has an impact on the student experience. The Foundry understands that every higher education institution is unique in its student experience challen...


## 3. Complete Data Extraction (All Solutions)

Now let's fetch solutions from multiple themes to get a broader view.

In [9]:
# Fetch solutions from first 3 themes (to keep execution time reasonable)
all_solutions = []
themes_to_fetch = 3

print(f"üì• Fetching solutions from {themes_to_fetch} themes...\n")

count = 0
for industry in menu_data:
    industry_name = industry.get("industryName")
    
    for theme in industry.get("subIndustries", []):
        if count >= themes_to_fetch:
            break
            
        theme_name = theme.get("subIndustryName")
        theme_slug = theme.get("industryThemeSlug")
        
        if not theme_slug:
            continue
        
        print(f"  Fetching: {industry_name} > {theme_name}")
        
        try:
            response = requests.get(
                f"{API_BASE}/GetThemeDetalsByViewId",
                params={"slug": theme_slug},
                timeout=30
            )
            response.raise_for_status()
            theme_data = response.json()
            
            # Extract solutions
            for area in theme_data.get("themeSolutionAreas", []):
                for sol in area.get("partnerSolutions", []):
                    title = sol.get("solutionName", "")
                    partner = title.split(" by ")[-1] if " by " in title else "Unknown"
                    
                    all_solutions.append({
                        "Industry": industry_name,
                        "Theme": theme_name,
                        "Title": title,
                        "Partner": partner,
                        "Description Length": len(sol.get("solutionDescription", ""))
                    })
            
            count += 1
            time.sleep(0.5)  # Be polite to the API
            
        except Exception as e:
            print(f"    ‚ö†Ô∏è  Error: {e}")
            continue
    
    if count >= themes_to_fetch:
        break

df_all = pd.DataFrame(all_solutions)
print(f"\n‚úÖ Total solutions fetched: {len(df_all)}")
print(f"üìä Unique partners: {df_all['Partner'].nunique()}")

üì• Fetching solutions from 3 themes...

  Fetching: Defense Industrial Base > Sharpen Your Competitive Edge with AI-Infused Cloud Tech
  Fetching: Education > Institutional Innovation
  Fetching: Education > Simplify and Secure IT

‚úÖ Total solutions fetched: 62
üìä Unique partners: 2


## 4. Partner Analysis

Analyze the distribution of solutions across partners.

In [10]:
# Count solutions per partner
partner_counts = df_all['Partner'].value_counts().head(10)

print("üèÜ Top 10 Partners by Solution Count (from sample):")
for i, (partner, count) in enumerate(partner_counts.items(), 1):
    print(f"  {i}. {partner}: {count} solutions")

# Create a visualization-ready DataFrame
df_partner_summary = pd.DataFrame({
    'Partner': partner_counts.index,
    'Solution Count': partner_counts.values
})

display(df_partner_summary)

üèÜ Top 10 Partners by Solution Count (from sample):
  1. Unknown: 60 solutions
  2. Frequency Foundry: 2 solutions


Unnamed: 0,Partner,Solution Count
0,Unknown,60
1,Frequency Foundry,2


## 5. Industry Distribution

In [11]:
# Solutions per industry (from sample)
industry_counts = df_all['Industry'].value_counts()

print("üìä Solutions per Industry (from sample):")
for industry, count in industry_counts.items():
    print(f"  ‚Ä¢ {industry}: {count} solutions")

üìä Solutions per Industry (from sample):
  ‚Ä¢ Education: 62 solutions


## 6. Content Analysis

In [12]:
# Analyze description lengths
print("üìù Description Length Statistics:")
print(f"  ‚Ä¢ Average: {df_all['Description Length'].mean():.0f} characters")
print(f"  ‚Ä¢ Median: {df_all['Description Length'].median():.0f} characters")
print(f"  ‚Ä¢ Min: {df_all['Description Length'].min()} characters")
print(f"  ‚Ä¢ Max: {df_all['Description Length'].max()} characters")

# Distribution
print("\nüìä Description Length Distribution:")
bins = [0, 1000, 2000, 3000, 5000, 10000]
labels = ['<1K', '1K-2K', '2K-3K', '3K-5K', '>5K']
df_all['Length Category'] = pd.cut(df_all['Description Length'], bins=bins, labels=labels)

for category in labels:
    count = (df_all['Length Category'] == category).sum()
    print(f"  ‚Ä¢ {category}: {count} solutions")

üìù Description Length Statistics:
  ‚Ä¢ Average: 1723 characters
  ‚Ä¢ Median: 1099 characters
  ‚Ä¢ Min: 324 characters
  ‚Ä¢ Max: 5655 characters

üìä Description Length Distribution:
  ‚Ä¢ <1K: 29 solutions
  ‚Ä¢ 1K-2K: 17 solutions
  ‚Ä¢ 2K-3K: 4 solutions
  ‚Ä¢ 3K-5K: 9 solutions
  ‚Ä¢ >5K: 3 solutions


## 7. Sample Solution Details

Display a few complete solution records.

In [13]:
print("üìã Sample Solution Records:\n")

for i, row in df_all.head(5).iterrows():
    print(f"{'='*70}")
    print(f"Solution #{i+1}")
    print(f"{'='*70}")
    print(f"Industry: {row['Industry']}")
    print(f"Theme: {row['Theme']}")
    print(f"Title: {row['Title']}")
    print(f"Partner: {row['Partner']}")
    print(f"Description Length: {row['Description Length']} chars")
    print()

üìã Sample Solution Records:

Solution #1
Industry: Education
Theme: Institutional Innovation
Title: greymatter Student Lifecycle CRM by Frequency Foundry
Partner: Frequency Foundry
Description Length: 3978 chars

Solution #2
Industry: Education
Theme: Institutional Innovation
Title: CDW
Partner: Unknown
Description Length: 519 chars

Solution #3
Industry: Education
Theme: Institutional Innovation
Title: Blackbaud Enterprise Fundraising CRM
Partner: Unknown
Description Length: 561 chars

Solution #4
Industry: Education
Theme: Institutional Innovation
Title: Higher Education Blueprint
Partner: Unknown
Description Length: 1995 chars

Solution #5
Industry: Education
Theme: Institutional Innovation
Title: PwC Digital Relationship Management Solution
Partner: Unknown
Description Length: 1111 chars



## 8. Data Export

Export the collected data to CSV for further analysis.

In [14]:
# Export to CSV
output_file = "isd_sample_solutions.csv"
df_all.to_csv(output_file, index=False)

print(f"üíæ Exported {len(df_all)} solutions to: {output_file}")
print(f"\nüìä Summary:")
print(f"  ‚Ä¢ Total Solutions: {len(df_all)}")
print(f"  ‚Ä¢ Unique Industries: {df_all['Industry'].nunique()}")
print(f"  ‚Ä¢ Unique Themes: {df_all['Theme'].nunique()}")
print(f"  ‚Ä¢ Unique Partners: {df_all['Partner'].nunique()}")

üíæ Exported 62 solutions to: isd_sample_solutions.csv

üìä Summary:
  ‚Ä¢ Total Solutions: 62
  ‚Ä¢ Unique Industries: 1
  ‚Ä¢ Unique Themes: 2
  ‚Ä¢ Unique Partners: 2


## Summary

This notebook demonstrated:

1. ‚úÖ **Industry Menu API** - Retrieved complete industry hierarchy
2. ‚úÖ **Theme Details API** - Fetched solutions for specific themes
3. ‚úÖ **Data Parsing** - Extracted and structured solution information
4. ‚úÖ **Partner Extraction** - Parsed partner names from solution titles
5. ‚úÖ **Analysis** - Analyzed partner and industry distributions
6. ‚úÖ **Content Inspection** - Examined description characteristics

### Key Findings

- The ISD API uses a **two-phase approach**: Menu ‚Üí Theme Details
- Partner names are typically embedded in solution titles ("Solution by Partner")
- Solution descriptions are HTML formatted and vary in length (500-5000+ chars)
- Each solution has a unique ID and URL slug for direct access

### Next Steps

- Run `integrated-vectorization/` scripts to index all solutions
- Use `update-monitor/` scripts to detect changes
- Query indexed data via the FastAPI backend

---

**Documentation:** See [ISD_WEBSITE_STRUCTURE.md](ISD_WEBSITE_STRUCTURE.md) for detailed API documentation.