#### 1. Setup and Configuration

##### 1.1 Import Libraries
 Import necessary Python libraries.

In [1]:
import os
import pandas as pd
import google.generativeai as genai
from google.api_core import exceptions as google_exceptions
from dotenv import load_dotenv
import asyncio
import json
import logging
import time
from typing import Dict, List, Tuple, Optional, Any

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

##### 1.2 Load API Key and Configure Gemini
Load the Gemini API key from a `.env` file located in the same directory as this notebook.

In [2]:
load_dotenv()
API_KEY = os.getenv("GEMINI_API_KEY")

if not API_KEY:
    logging.error("CRITICAL: GEMINI_API_KEY not found in .env file. Please create a .env file with your API key.")
else:
    try:
        genai.configure(api_key=API_KEY)
        logging.info("Gemini API Key loaded and configured.")
    except Exception as e:
        logging.error(f"Error configuring Gemini API: {e}")

2025-03-28 20:04:59,424 - INFO - Gemini API Key loaded and configured.


##### 1.3 Define Constants
Define file paths, the Gemini model name, and parameters for controlling API requests (concurrency limit, retry attempts).

In [3]:
FOUNDERS_FILE = "../data/founders1.csv"
INVESTORS_FILE = "../data/investors1.csv"
GENERATIVE_MODEL_NAME = "gemini-1.5-flash-latest" # or "gemini-pro"

In [4]:
# Rate Limiting & Retry Configuration
MAX_CONCURRENT_REQUESTS = 5
RETRY_ATTEMPTS = 3
INITIAL_RETRY_DELAY_SECONDS = 5 # (delay after first 429 error)

#### 2. Data Loading and Preparation

##### 2.1 `load_data` Function
This function loads data from the specified CSV file (`founders.csv` or `investors.csv`) into a pandas DataFrame. It performs crucial cleaning steps:
 *   Specifies the `id_column`'s data type as string during loading.
 *   Drops rows where the essential `id_column` is missing or empty.
 *   Fills missing values (`NaN`) in other text columns with empty strings.
 *   Fills missing values (`NaN`) in numeric columns with 0.

In [5]:
def load_data(filepath: str, id_column: str) -> Optional[pd.DataFrame]:
    """
    Loads data from a CSV file into a pandas DataFrame.
    Ensures the ID column is string and handles missing values.
    """
    try:
        df = pd.read_csv(filepath, dtype={id_column: str})
        logging.info(f"Successfully loaded data from {filepath}")

        # Validate ID column presence
        if id_column not in df.columns:
             logging.error(f"Error: ID column '{id_column}' not found in {filepath}")
             return None

        # Remove rows where the essential ID column is missing or blank.
        original_count = len(df)
        df.dropna(subset=[id_column], inplace=True)
        
        df = df[df[id_column].str.strip() != '']
        dropped_count = original_count - len(df)
        if dropped_count > 0:
            logging.warning(f"Dropped {dropped_count} rows from {filepath} due to missing/empty '{id_column}'.")

        if df.empty:
            logging.warning(f"DataFrame is empty after dropping rows with missing IDs from {filepath}.")
            return df

        # Clean other columns
        for col in df.columns:
            if col == id_column:
                continue
            if df[col].dtype == 'object':
                df[col] = df[col].fillna('').astype(str)
            elif pd.api.types.is_numeric_dtype(df[col]):
                df[col] = df[col].fillna(0)

        # Final confirmation of ID column 
        df[id_column] = df[id_column].astype(str)

        return df
    except FileNotFoundError:
        logging.error(f"Error: File not found at {filepath}")
        return None
    except Exception as e:
        logging.error(f"Error loading or processing data from {filepath}: {e}")
        return None


##### 2.2 Load the Datasets
Execute the `load_data` function for both founder and investor CSV files.

In [6]:
founders_df = load_data(FOUNDERS_FILE, id_column='startup_id')
investors_df = load_data(INVESTORS_FILE, id_column='investor_id')

if founders_df is not None:
    print(f"Loaded {len(founders_df)} founders.")
    
if investors_df is not None:
    print(f"Loaded {len(investors_df)} investors.")
    
if founders_df is None or investors_df is None or founders_df.empty or investors_df.empty:
    logging.error("Could not load one or both data files, or files are empty after cleaning. Stopping execution.")

2025-03-28 20:05:01,274 - INFO - Successfully loaded data from ../data/founders1.csv


2025-03-28 20:05:01,286 - INFO - Successfully loaded data from ../data/investors1.csv


Loaded 8 founders.
Loaded 8 investors.


### 3. Core Matching Logic


##### 3.1 `create_match_prompt` Function
This function constructs the prompt that will be sent to the Gemini API for each founder-investor pair. A well-structured prompt is key to getting accurate and parseable results.
 *   **Role Setting:** Instructs the AI on its persona (expert VC analyst).
 *   **Structured Data:** Presents founder and investor details clearly.
 *   **Explicit Task:** Defines the criteria for evaluation (industry, stage, funding, geography, qualitative fit).
 *   **JSON Output Format:** Crucially, demands the response *only* as a JSON object with `score` and `reasoning` fields. This makes automated processing reliable.
 *   **Scoring Guidance:** Helps the AI calibrate its score according to defined fit levels.

In [7]:
def create_match_prompt(founder_data: pd.Series, investor_data: pd.Series) -> str:

    # Prepare multi-value fields for readability in the prompt
    investor_industries = ", ".join(investor_data.get('preferred_industries', '').split('|'))
    investor_stages = ", ".join(investor_data.get('preferred_stages', '').split('|'))
    founder_industries = ", ".join(founder_data.get('industry', '').split('|'))
    founder_business_models = ", ".join(founder_data.get('business_model', '').split('|'))

    prompt = f"""
    Analyze the compatibility between the following Startup Founder and Investor. Provide a match score from 0 to 100 and a brief justification.

    **Context:** You are an expert Venture Capital analyst specialized in matching startups with the right investors.

    **Startup Founder Profile:**
    - Name: {founder_data.get('startup_name', 'N/A')}
    - Industry: {founder_industries}
    - Stage: {founder_data.get('startup_stage', 'N/A')}
    - Funding Required (USD): ${founder_data.get('funding_required_usd', 0):,}
    - Location: {founder_data.get('location_city', 'N/A')}, {founder_data.get('location_country', 'N/A')}
    - Business Model: {founder_business_models}
    - MRR (USD): ${founder_data.get('mrr_usd', 0):,}
    - User Count: {founder_data.get('user_count', 0)}
    - Team Size: {founder_data.get('team_size', 'N/A')}
    - Product Description: {founder_data.get('product_description', 'N/A')}
    - Unique Selling Proposition (USP): {founder_data.get('usp', 'N/A')}
    - Traction Summary: {founder_data.get('traction_summary', 'N/A')}

    **Investor Profile:**
    - Name: {investor_data.get('investor_name', 'N/A')} ({investor_data.get('investor_type', 'N/A')})
    - Preferred Industries: {investor_industries}
    - Investment Range (USD): ${investor_data.get('min_investment_usd', 0):,} - ${investor_data.get('max_investment_usd', 0):,}
    - Average Check Size (USD): ${investor_data.get('check_size_avg_usd', 0):,}
    - Preferred Stages: {investor_stages}
    - Geographic Focus: {investor_data.get('geographic_focus', 'N/A')}
    - Investment Thesis: {investor_data.get('investment_thesis', 'N/A')}
    - Example Portfolio Companies: {investor_data.get('portfolio_companies', 'N/A')}

    **Task:**
    Evaluate the match based on the following criteria:
    1.  **Industry Fit:** Does the startup's industry align with the investor's preferences?
    2.  **Stage Fit:** Does the startup's current stage match the investor's preferred investment stages?
    3.  **Funding/Check Size Fit:** Is the startup's required funding within the investor's typical investment range or average check size?
    4.  **Geographic Focus:** Does the startup's location align with the investor's geographic preferences?
    5.  **Qualitative Fit:** Consider the alignment between the startup's product, traction, USP, and business model with the investor's thesis and past investments. Is there a strategic or thesis-driven reason for this investor to be interested?

    **Output Format:**
    Return your response ONLY as a JSON object with the following structure:
    {{
      "score": <integer between 0 and 100>,
      "reasoning": "<string explaining the score based on the criteria>"
    }}

    **Scoring Guidance:**
    - 85-100: Excellent fit across most key criteria, strong qualitative alignment.
    - 70-84: Good fit, alignment on major criteria (e.g., industry, stage), reasonable qualitative fit.
    - 50-69: Partial fit, alignment on some criteria but mismatches on others (e.g., stage or check size slightly off, thesis alignment is okay but not perfect).
    - 25-49: Weak fit, significant mismatches in core criteria (e.g., wrong industry, wrong stage).
    - 0-24: Poor fit, fundamental mismatches across most criteria.

    Now, provide the JSON output for the match between {founder_data.get('startup_name', 'this startup')} and {investor_data.get('investor_name', 'this investor')}.
    """
    return prompt


##### 3.2 `get_match_analysis_async` Function
This asynchronous function handles the interaction with the Gemini API for a single founder-investor pair.
 *   **Concurrency Control:** Uses an `asyncio.Semaphore` to limit the number of simultaneous API requests, preventing rate limit errors.
 *   **Retry Logic:** Catches `ResourceExhausted` (429) errors and implements exponential backoff retries.
 *   **Response Parsing:** Attempts to parse the expected JSON response. Handles potential errors during parsing or if the response structure is incorrect.
 *   **Error Handling:** Includes general exception handling for API call failures.

In [8]:
async def get_match_analysis_async(
    model: genai.GenerativeModel,
    prompt: str,
    investor_id: str,
    semaphore: asyncio.Semaphore
    ) -> Tuple[str, Optional[Dict[str, Any]]]:
    """
    Calls the Gemini API asynchronously, respecting the semaphore for rate limiting,
    and includes retries for 429 errors.
    Returns the investor_id and the parsed JSON response or None if an error occurs.
    """
    retries = RETRY_ATTEMPTS
    delay = INITIAL_RETRY_DELAY_SECONDS
    last_exception = None

    async with semaphore: # Acquire semaphore before making API call
        for attempt in range(retries + 1):
            try:
                # Small delay helps slightly spread requests even when semaphore allows multiple
                await asyncio.sleep(0.1 * attempt) # Slightly increasing delay within attempts

                logging.debug(f"Attempt {attempt+1}/{retries+1} for investor {investor_id}")
                response = await model.generate_content_async(prompt)

                # Check for empty or blocked response
                if not response.parts:
                     # Check for safety ratings if available and log if blocked
                    try:
                        if response.prompt_feedback.block_reason:
                            logging.warning(f"Request for investor {investor_id} blocked. Reason: {response.prompt_feedback.block_reason}")
                            return investor_id, None # Blocked, don't retry
                    except Exception:
                        pass # Ignore if feedback structure isn't as expected
                    logging.warning(f"Received empty response for investor {investor_id} (Attempt {attempt+1}).")
                    # Decide whether to retry empty responses or not; let's not retry for now.
                    return investor_id, None

                # Extract text and attempt to parse JSON
                raw_text = response.text
                # Clean potential markdown formatting
                if raw_text.strip().startswith("```json"):
                    raw_text = raw_text.strip()[7:-3].strip()
                elif raw_text.strip().startswith("```"):
                     raw_text = raw_text.strip()[3:-3].strip()

                try:
                    match_data = json.loads(raw_text)
                    # Validate structure
                    if isinstance(match_data, dict) and "score" in match_data and "reasoning" in match_data and isinstance(match_data['score'], int):
                        logging.info(f"Successfully received and parsed analysis for investor {investor_id}")
                        return investor_id, match_data
                    else:
                        logging.warning(f"Parsed JSON for investor {investor_id} lacks fields or score is not int. Data: {match_data}")
                        return investor_id, None # Malformed JSON structure
                except json.JSONDecodeError:
                    logging.error(f"Failed to decode JSON response for investor {investor_id}. Raw text: {raw_text}")
                    return investor_id, None # JSON decode error

            except google_exceptions.ResourceExhausted as e:
                last_exception = e
                if attempt < retries:
                    logging.warning(f"Rate limit hit (429) for investor {investor_id} on attempt {attempt+1}. Retrying in {delay:.2f}s...")
                    await asyncio.sleep(delay)
                    delay *= 2 # Exponential backoff
                else:
                    logging.error(f"Rate limit hit (429) for investor {investor_id}. Max retries ({retries}) exceeded.")
                continue # Go to next attempt in the loop or finish if max retries hit

            except Exception as e:
                last_exception = e
                logging.error(f"Error calling Gemini API for investor {investor_id} (Attempt {attempt+1}): {type(e).__name__} - {e}")
                # Break loop on non-429 errors unless specific transient errors are identified for retry
                break

        # If loop finished without returning a success
        logging.error(f"Failed to get analysis for investor {investor_id} after {retries+1} attempts. Last error: {last_exception or 'Unknown'}")
        return investor_id, None

##### 3.3 `find_matches_for_founder` Function
This is the main asynchronous function that orchestrates the matching process for a *single* founder.
 *   Retrieves the specified founder's data.
 *   Initializes the Gemini model and the `asyncio.Semaphore`.
 *   Iterates through all valid investors:
     *   Validates the `investor_id`.
     *   Generates the prompt using `create_match_prompt`.
     *   Creates an asynchronous task for `get_match_analysis_async`, passing the semaphore.
 *   Uses `asyncio.gather` to run all API call tasks concurrently (up to the semaphore limit).
 *   Processes the results, extracting valid scores and reasons.
 *   Sorts the matches in descending order by score.
 *   Returns the sorted list of match dictionaries.

In [9]:
async def find_matches_for_founder(
    founder_id: str,
    founders_df: pd.DataFrame,
    investors_df: pd.DataFrame
    ) -> Optional[List[Dict[str, Any]]]:
    """
    Finds potential investor matches for a specific founder using Gemini API,
    with concurrency control and retries.
    """
    if founders_df is None or investors_df is None:
        logging.error("Input DataFrames are None.")
        return None

    founder_row = founders_df[founders_df['startup_id'] == founder_id]
    if founder_row.empty:
        logging.error(f"Founder with ID {founder_id} not found in the dataset.")
        return None

    founder_data = founder_row.iloc[0]
    logging.info(f"--- Finding matches for Founder: {founder_data.get('startup_name', 'N/A')} ({founder_id}) ---")

    # Check if API key was loaded successfully earlier
    if not API_KEY:
        logging.error("Cannot proceed without Gemini API Key.")
        return None

    try:
        # Initialize the generative model
        model = genai.GenerativeModel(GENERATIVE_MODEL_NAME)
        # Create a semaphore to limit concurrent requests
        semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
    except Exception as e:
        logging.error(f"Failed to initialize Gemini Model or Semaphore: {e}")
        return None

    tasks = []
    investor_map = {} # To map investor_id back to investor details later

    # Create async tasks for each valid investor
    for index, investor_data in investors_df.iterrows():
        investor_id = investor_data.get('investor_id') # Use .get for safety

        # --- Check for valid investor_id (redundant if load_data worked, but safe) ---
        if not investor_id or str(investor_id).strip() == '':
             logging.warning(f"Skipping row {index} in investors file during task creation due to invalid investor_id: '{investor_id}'")
             continue

        investor_id = str(investor_id) # Ensure string key for map

        investor_map[investor_id] = investor_data # Store investor data
        prompt = create_match_prompt(founder_data, investor_data)
        tasks.append(get_match_analysis_async(model, prompt, investor_id, semaphore)) # Pass semaphore

    if not tasks:
        logging.warning("No valid investors found to process for this founder.")
        return []

    # Run tasks concurrently, respecting semaphore limit
    logging.info(f"Sending {len(tasks)} requests to Gemini API (max concurrency: {MAX_CONCURRENT_REQUESTS})...")
    results = await asyncio.gather(*tasks)
    logging.info("Received all responses from Gemini API.")

    # Process results
    matches = []
    successful_analyses = 0
    failed_analyses = 0
    for investor_id, analysis_result in results:
        # Check if result is valid and score is an integer
        if analysis_result and isinstance(analysis_result.get('score'), int):
            investor_info = investor_map.get(investor_id)
            if investor_info is not None:
                matches.append({
                    "investor_id": investor_id,
                    "investor_name": investor_info.get('investor_name', 'N/A'),
                    "score": analysis_result['score'],
                    "reasoning": analysis_result.get('reasoning', 'N/A')
                })
                successful_analyses += 1
            else:
                logging.error(f"Internal consistency error: Could not find investor info for ID {investor_id} after successful analysis.")
                failed_analyses +=1
        else:
            # Log failure only if it wasn't just a skipped invalid ID from the start
            if investor_id in investor_map:
                 logging.warning(f"No valid analysis received or processed for investor {investor_id}")
                 failed_analyses += 1
            # Else: it was likely skipped earlier due to invalid ID, no need to log failure again

    logging.info(f"Analysis summary for founder {founder_id}: {successful_analyses} successful, {failed_analyses} failed/skipped.")

    # Sort matches by score descending
    matches.sort(key=lambda x: x["score"], reverse=True)

    return matches

### 4. Output Display

##### 4.1 `display_matches` Function (Top 5)
This function takes the list of matches and displays them in a ranked, readable format. It now includes a `top_n` parameter to limit the output to the top N results (defaulting to 5).


In [10]:
def display_matches(founder_id: str, matches: List[Dict[str, Any]], top_n: int = 5):
    """Displays the ranked list of investor matches, showing only the top N."""
    if matches is None:
        print(f"\nMatch calculation failed for founder {founder_id}.")
        return
    if not matches:
        print(f"\nNo suitable investor matches found for founder {founder_id}.")
        return

    # Get the top N matches, or fewer if less than N matches were found
    top_matches = matches[:top_n]
    num_to_display = len(top_matches)

    # Get founder name for display
    founder_name = "N/A"
    if founders_df is not None:
        founder_info = founders_df[founders_df['startup_id'] == founder_id]
        if not founder_info.empty:
            founder_name = founder_info.iloc[0].get('startup_name', founder_id)


    print(f"\n--- Top {num_to_display} Investor Matches for {founder_name} ({founder_id}) (Max {top_n}) ---")
    if num_to_display == 0:
        print("  (No matches met the criteria or processing failed)")

    for i, match in enumerate(top_matches):
        print(f"\nRank {i+1}:")
        print(f"  Investor: {match['investor_name']} ({match['investor_id']})")
        print(f"  Match Score: {match['score']}/100")
        print(f"  Reasoning: {match['reasoning']}")
    print("-------------------------------------------------------------")

In [11]:
import pandas as pd
from IPython.display import display, HTML
from typing import List, Dict, Any

def display_matches(founder_id: str, matches: List[Dict[str, Any]], top_n: int = 5):
    """
    Displays the ranked list of top N investor matches using a Pandas DataFrame.
    """
    if matches is None:
        # Use HTML for consistent rich output formatting
        display(HTML(f"<p style='color:red;'>Match calculation failed for founder {founder_id}.</p>"))
        return
    if not matches:
        display(HTML(f"<p>No suitable investor matches found for founder {founder_id}.</p>"))
        return

    # Get the top N matches
    top_matches = matches[:top_n]
    num_to_display = len(top_matches)

    # Get founder name for display (handle potential global scope issue)
    founder_name = founder_id # Default to ID if DataFrame not found or error occurs
    try:
        # Check if founders_df exists in the global scope and is a DataFrame
        if 'founders_df' in globals() and isinstance(globals()['founders_df'], pd.DataFrame):
            founder_info = globals()['founders_df'][globals()['founders_df']['startup_id'] == founder_id]
            if not founder_info.empty:
                founder_name = founder_info.iloc[0].get('startup_name', founder_id)
        else:
            logging.warning("founders_df not found or not a DataFrame in global scope. Using Founder ID for display.")
    except Exception as e:
         logging.warning(f"Error accessing founders_df for name lookup: {e}. Using Founder ID.")


    # --- Display Title ---
    title_html = f"<h3>🏆 Top {num_to_display} Investor Matches for {founder_name} ({founder_id})</h3>"
    display(HTML(title_html))

    if num_to_display == 0:
        display(HTML("<p><i>(No matches met the criteria or processing failed)</i></p>"))
        print("-------------------------------------------------------------") # Keep separator
        return

    # --- Create DataFrame ---
    df_data = []
    for i, match in enumerate(top_matches):
        df_data.append({
            "Rank": i + 1,
            "Investor Name": match.get('investor_name', 'N/A'),
            "Investor ID": match.get('investor_id', 'N/A'),
            "Score": match.get('score', 'N/A'),
            "Reasoning": match.get('reasoning', 'N/A')
        })

    matches_df = pd.DataFrame(df_data)

    # --- Style DataFrame (Optional but recommended) ---
    # Center align Score and Rank, left align others
    # Add hover effects, customize widths etc.
    styles = [
        {'selector': 'th', 'props': [('text-align', 'left'), ('font-weight', 'bold')]},
        {'selector': 'td', 'props': [('text-align', 'left'), ('vertical-align', 'top')]},
        {'selector': 'td:nth-child(1), th:nth-child(1)', 'props': [('text-align', 'center'), ('width', '5%')]}, # Rank
        {'selector': 'td:nth-child(4), th:nth-child(4)', 'props': [('text-align', 'center'), ('width', '8%')]}, # Score
        {'selector': 'td:nth-child(5), th:nth-child(5)', 'props': [('width', '50%')]}, # Reasoning (wider)
        {'selector': 'tr:hover', 'props': [('background-color', '#f5f5f5')]} # Hover highlight
    ]
    styled_df = matches_df.style.set_table_styles(styles).hide(axis="index") # Hide the default pandas index

    # Display the styled DataFrame
    display(styled_df)
    print("-------------------------------------------------------------") # Keep separator


### 5. Execution

##### 5.1 Final Results
This asynchronous function coordinates the overall process: loading data, selecting the founder, running the matching process, and displaying the results.

In [12]:
if founders_df is None or investors_df is None or founders_df.empty or investors_df.empty:
    logging.error("Aborting main execution: Data not loaded correctly.")
    

# --- Select Founder to Match ---
# Choose a founder ID from your founders.csv file
founder_to_match_id = "STP001" # Example: FinTechFlow

# Validate selected founder ID
if founder_to_match_id not in founders_df['startup_id'].values:
    logging.error(f"Founder ID '{founder_to_match_id}' not found in the loaded founders data. Please choose a valid ID.")
    print(f"\nValid Founder IDs are: {founders_df['startup_id'].tolist()}")
    

# --- Run Matching ---
matches = await find_matches_for_founder(founder_to_match_id, founders_df, investors_df)

# --- Display Results ---
display_matches(founder_to_match_id, matches, top_n=5) # Display top 5

2025-03-28 20:05:05,618 - INFO - --- Finding matches for Founder: FinTechFlow (STP001) ---
2025-03-28 20:05:05,621 - INFO - Sending 8 requests to Gemini API (max concurrency: 5)...
2025-03-28 20:05:09,277 - INFO - Successfully received and parsed analysis for investor INV002
2025-03-28 20:05:09,328 - INFO - Successfully received and parsed analysis for investor INV003
2025-03-28 20:05:09,506 - INFO - Successfully received and parsed analysis for investor INV005
2025-03-28 20:05:09,610 - INFO - Successfully received and parsed analysis for investor INV001
2025-03-28 20:05:10,004 - INFO - Successfully received and parsed analysis for investor INV004
2025-03-28 20:05:12,128 - INFO - Successfully received and parsed analysis for investor INV006
2025-03-28 20:05:12,633 - INFO - Successfully received and parsed analysis for investor INV008
2025-03-28 20:05:12,856 - INFO - Successfully received and parsed analysis for investor INV007
2025-03-28 20:05:12,857 - INFO - Received all responses fro

Rank,Investor Name,Investor ID,Score,Reasoning
1,Ben Carter,INV006,85,"FinTechFlow and Ben Carter exhibit a strong match across multiple criteria. 1. **Industry Fit:** Excellent. Both are focused on FinTech and SaaS, a perfect alignment. 2. **Stage Fit:** Excellent. FinTechFlow is Seed stage, which aligns perfectly with Ben Carter's preference. 3. **Funding/Check Size Fit:** Good. FinTechFlow needs $500,000, which falls within Ben Carter's investment range ($100,000-$500,000) and is close to his average check size of $250,000. This is a slightly smaller check than his average, but still within his range. 4. **Geographic Focus:** Excellent. Both are based in the USA. 5. **Qualitative Fit:** Good. FinTechFlow's AI-powered platform for automated financial reporting for SMEs aligns well with Ben Carter's investment thesis of 'Founder-friendly capital for early-stage SaaS, particularly in FinTech infrastructure.' The 20% MoM growth and 2 pilot customers represent promising early traction. While specific details about PayAPI and LedgerLite aren't provided, the focus on FinTech infrastructure suggests a potential thematic overlap. The strong team size of 8 is likely viewed as favorable by Ben. The USP of 50% faster reporting is a strong value proposition. The slight concern is the funding request, which could be considered on the high side for a single angel investor. However, given the strong alignment on other criteria and the potential for additional funding rounds, this slight mismatch does not heavily impact the overall score."
2,Alpha Ventures,INV001,75,"FinTechFlow and Alpha Ventures exhibit a good fit, though not perfect. Industry Fit (Excellent): Alpha Ventures' preference for SaaS, AI, and FinTech aligns perfectly with FinTechFlow's industry. Stage Fit (Good): Both are aligned on the Seed stage, a key positive. Funding/Check Size Fit (Fair): FinTechFlow's $500,000 request falls within Alpha Ventures' investment range. However, it's significantly below their average check size of $2,000,000. This could be a point of concern, as Alpha Ventures might prefer larger investments at this stage. Geographic Focus (Excellent): Both are located in North America (San Francisco). Qualitative Fit (Good): FinTechFlow's focus on AI-powered B2B SaaS solutions in the FinTech space directly aligns with Alpha Ventures' investment thesis. The 20% MoM growth and early traction are positive signals. However, the limited number of pilot customers (2) and relatively low MRR for a seed-stage company could be a point of concern for an investor who typically invests at a larger scale. The significant difference between FinTechFlow's funding ask and Alpha Venture's average check size might also lead Alpha Ventures to prioritize other investment opportunities that better align with their average investment size."
3,TechNexus Investors,INV004,75,"FinTechFlow presents a moderately good fit for TechNexus Investors. **Industry Fit (Good):** TechNexus's focus on AI aligns well with FinTechFlow's AI-powered platform. **Stage Fit (Good):** Both are aligned on the Seed stage. **Funding/Check Size Fit (Fair):** FinTechFlow's $500,000 request is significantly lower than TechNexus's average check size of $1,500,000. This could be a challenge, as TechNexus might prefer larger investments at this stage. While within their investment range, the discrepancy is notable. **Geographic Focus (Excellent):** Both are located in San Francisco, USA, which strongly aligns. **Qualitative Fit (Fair):** FinTechFlow's focus on a B2B SaaS model and its traction (albeit early) show some promise. However, TechNexus's investment thesis leans toward 'foundational technologies shaping the next iteration of the internet and user interaction.' While FinTechFlow uses AI, its application is more focused on efficiency within a specific niche (SME financial reporting) rather than a broader, foundational technology. The portfolio companies (SynthAI, MetaVerse One, ChainPlay) indicate a preference for potentially more disruptive, consumer-facing, or Web3-related technologies. The strong alignment on AI is the key positive here, offset by the less-disruptive nature of FinTechFlow's product compared to TechNexus's typical investments. The relatively small funding ask might make it less attractive for a firm that usually invests significantly more at seed."
4,Sarah Chen,INV002,35,"The fit between FinTechFlow and Sarah Chen is weak. While both align on the Seed stage (Stage Fit), a significant mismatch exists in the industry and funding requirements. Sarah Chen's focus on HealthTech, EdTech, and Sustainability doesn't align with FinTechFlow's FinTech focus (Industry Fit - low score). FinTechFlow requires $500,000, which far exceeds Sarah Chen's investment range of $50,000-$250,000 and her average check size of $100,000 (Funding/Check Size Fit - low score). Geographic focus is aligned (Geographic Focus - high score). The qualitative fit is also weak; Sarah Chen's impact-focused thesis doesn't strongly resonate with FinTechFlow's product, despite the AI element. While faster financial reporting is beneficial, it doesn't directly address a major societal challenge in the same way Sarah Chen's portfolio companies do (Qualitative Fit - low score). The overall low scores across key criteria result in a low match score."
5,Innovate Corp Ventures,INV007,35,"The match between FinTechFlow and Innovate Corp Ventures is weak. While Innovate Corp Ventures invests in AI, a key component of FinTechFlow's product, several significant mismatches exist. Firstly, FinTechFlow is at the Seed stage, whereas Innovate Corp Ventures prefers Series A and B. This is a major discrepancy. Secondly, FinTechFlow requires $500,000, far below Innovate Corp Ventures' average check size of $4,000,000 and minimum investment of $1,000,000. While Innovate Corp Ventures' geographic focus is global and encompasses FinTechFlow's San Francisco location, the qualitative fit is also weak. Innovate Corp Ventures' investment thesis focuses on strategic synergies with its parent company's healthcare and retail operations. FinTechFlow's B2B SaaS platform for SMEs in financial reporting, while using AI, doesn't present an obvious strategic fit with healthcare or retail. The limited traction of FinTechFlow (3 months old, 2 pilot customers) is also unlikely to appeal to an investor focused on later-stage companies with significant scale. The industry overlap in AI is the only significant point of alignment, making this a poor fit overall."


-------------------------------------------------------------


### 6. Conclusion
This notebook demonstrated how to use the Google Gemini API to build a founder-investor matching system. It loads data, generates prompts, calls the API concurrently with rate limiting and retries, parses the results, and displays the top matches with scores and reasoning.