# AI_EXTRACT

## Overview
This function uses an AI model to extract specific types of information from text. It is designed for pulling structured data (like dates, contacts, or key points) from unstructured text content such as emails, reports, or meeting notes.

## Usage
To use the `AI_EXTRACT` function in Excel, enter it as a formula in a cell, specifying your text, extract type, and any optional arguments as needed:

```excel
=AI_EXTRACT(text, extract_type, [temperature], [model], [max_tokens], [api_key], [api_url])
```
Replace each parameter with your desired value. The function returns a single-column list of extracted items.

## Parameters
| Parameter      | Type         | Required | Description                                                                                                 |
|---------------|--------------|----------|-------------------------------------------------------------------------------------------------------------|
| text          | string/range | Yes      | The text or cell reference containing the data to analyze.                                                  |
| extract_type  | string       | Yes      | The type of information to extract (e.g., "emails", "dates", "key points").                              |
| temperature   | float        | No       | Controls the randomness/creativity of the response (0.0 to 2.0). Lower values are more deterministic.       |
| model         | string       | No       | The specific AI model ID to use (must support JSON mode, e.g., 'mistral-small-latest').                     |
| max_tokens    | int          | No       | Maximum number of tokens for the generated list content.                                                    |
| api_key       | string       | No       | API key for authentication. [Get a free API key from Mistral AI](https://console.mistral.ai/).              |
| api_url       | string       | No       | OpenAI-compatible API endpoint URL (e.g., https://api.mistral.ai/v1/chat/completions).                      |

## Return Value
| Return Value   | Type    | Description                                                                                                   |
|---------------|---------|---------------------------------------------------------------------------------------------------------------|
| Extracted Data| 2D list | A single-column list of extracted items as requested.                                                          |
| Error         | string  | Error message if extraction fails or input is invalid.                                                          |

## Demo
If either `api_key` or `api_url` is not provided, both will default to Boardflare demo values (`api_url`: https://llm.boardflare.com, `api_key`: your Microsoft login token if available). This only works for users logged in with a Microsoft account and provides limited free demo usage. You may obtain a free api_key for [Mistral AI](https://console.mistral.ai/) with your Microsoft account which offers more generous free usage and supports CORS.

## Limitations
- The quality of the extraction depends on the clarity of the extract_type and the text provided.
- Large text inputs may exceed model context limits and result in truncated or incomplete results.
- The function requires an internet connection to access the AI model.
- Model availability and output may vary depending on the provider or API changes.
- Sensitive or confidential data should not be sent to external AI services.
- `temperature` must be a float between 0 and 2 (inclusive). If not, a ValueError is raised.
- If you hit the API rate limit for your provider, a message is returned instead of raising an exception.

## Benefits
- Automates extraction of structured data from unstructured text directly in Excel.
- Saves time and improves consistency in reporting and data entry.
- Enables dynamic, context-aware extraction using your own data.
- More flexible and powerful than manual or native Excel approaches for information extraction.

## Examples

### 1. Extracting Client Names from Meeting Notes
```excel
=AI_EXTRACT("During today's annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services.", "client names")
```
**Sample Output:**
| Item               |
|--------------------|
| Acme Corporation   |
| Global Enterprises |
| TechSolutions Inc. |

### 2. Extracting Financial Metrics from a Report
```excel
=AI_EXTRACT("Q1 results exceeded expectations with revenue of $2.4M, an EBITDA margin of 18.5%, and customer acquisition costs decreasing by 12%. Cash reserves stand at $5.2M and our runway extends to 24 months.", "financial metrics")
```
**Sample Output:**
| Item                              |
|-----------------------------------|
| Revenue: $2.4M                    |
| EBITDA margin: 18.5%              |
| Customer acquisition costs: -12%  |
| Cash reserves: $5.2M              |
| Runway: 24 months                 |

### 3. Extracting Action Items from Email
```excel
=AI_EXTRACT("Hi team, Following our strategic planning session: 1) Mark needs to finalize the budget by Friday, 2) Sarah will contact vendors for new quotes, 3) Development team must provide timeline estimates by next Wednesday, and 4) Everyone should review the new marketing materials.", "action items")
```
**Sample Output:**
| Item                                                        |
|-------------------------------------------------------------|
| Mark needs to finalize the budget by Friday                 |
| Sarah will contact vendors for new quotes                   |
| Development team must provide timeline estimates by next Wednesday |
| Everyone should review the new marketing materials          |

### 4. Extracting Contact Information from Business Cards
```excel
=AI_EXTRACT("John Smith\nSenior Project Manager\nInnovative Solutions Inc.\njsmith@innovativesolutions.com\n+1 (555) 123-4567\n123 Business Avenue, Suite 400\nSan Francisco, CA 94107", "contact information")
```
**Sample Output:**
| Item                                              |
|---------------------------------------------------|
| Name: John Smith                                  |
| Title: Senior Project Manager                     |
| Company: Innovative Solutions Inc.                |
| Email: jsmith@innovativesolutions.com             |
| Phone: +1 (555) 123-4567                          |
| Address: 123 Business Avenue, Suite 400, San Francisco, CA 94107 |

### 5. Extracting Dates and Deadlines
```excel
=AI_EXTRACT("The initial design phase will be completed by May 15, 2025. The stakeholder review is scheduled for May 20-22, with development starting June 1. Testing will run through September 15, with final delivery expected by October 3, 2025.", "dates and deadlines")
```
**Sample Output:**
| Item                                   |
|----------------------------------------|
| Design completion: May 15, 2025        |
| Stakeholder review: May 20-22, 2025    |
| Development start: June 1, 2025        |
| Testing completion: September 15, 2025 |
| Final delivery: October 3, 2025        |

In [None]:
import requests
import json

def ai_extract(text, extract_type, temperature=0.0, model='mistral-small-latest', max_tokens=1000, api_key=None, api_url=None):
    """
    Uses AI to extract specific types of information from text.

    Args:
        text (str or list): The text to analyze (string or 2D list with a single cell)
        extract_type (str): Type of information to extract (e.g., 'emails', 'dates', 'action items')
        temperature (float, optional): Controls response creativity (0-2). Default is 0.0
        model (str, optional): ID of the model to use
        max_tokens (int, optional): Maximum tokens for response generation. Default is 1000
        api_key (str, optional): API key for authentication (e.g. for Mistral AI)
        api_url (str, optional): OpenAI compatible URL. (e.g., https://api.mistral.ai/v1/chat/completions)

    Returns:
        list: 2D list representing the extracted data as a single column, or a string error message
    """
    # Demo fallback logic (Boardflare)
    if api_key is None or api_url is None:
        if 'idToken' in globals():
            api_key = globals()['idToken']
            api_url = "https://llm.boardflare.com"
        else:
            return "Login on the Functions tab for limited demo usage, or sign up for a free Mistral AI account at https://console.mistral.ai/ and add your own api_key."

    # Handle 2D list input (flatten to a single string)
    if isinstance(text, list):
        if len(text) > 0 and len(text[0]) > 0:
            text = str(text[0][0])
        else:
            return "Error: Empty input text."

    # Validate temperature
    if not (isinstance(temperature, (int, float)) and 0.0 <= float(temperature) <= 2.0):
        return "Error: temperature must be a float between 0.0 and 2.0."

    # Validate max_tokens
    if not (isinstance(max_tokens, int) and max_tokens > 0):
        return "Error: max_tokens must be a positive integer."

    # Construct a specific prompt for data extraction
    extract_prompt = f"Extract the following from the text: {extract_type}\n\nText: {text}"
    extract_prompt += "\n\nReturn ONLY a JSON object with a key 'items' whose value is a JSON array of the items you extracted. "
    extract_prompt += "Each item should be a single value representing one extracted piece of information. "
    extract_prompt += "Do not include any explanatory text, just the JSON object. "
    extract_prompt += "For example: {\"items\": [\"item1\", \"item2\", \"item3\"]}"

    payload = {
        "messages": [{"role": "user", "content": extract_prompt}],
        "temperature": temperature,
        "model": model,
        "max_tokens": max_tokens,
        "response_format": {"type": "json_object"}
    }

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "Accept": "application/json"
    }

    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        response_data = response.json()
        content = response_data["choices"][0]["message"]["content"]
        try:
            extracted_data = json.loads(content)
            if isinstance(extracted_data, dict) and "items" in extracted_data:
                extracted_data = extracted_data["items"]
            elif isinstance(extracted_data, dict):
                if "extracted" in extracted_data:
                    extracted_data = extracted_data["extracted"]
                elif "results" in extracted_data:
                    extracted_data = extracted_data["results"]
            if isinstance(extracted_data, list):
                return [[item] for item in extracted_data]
            else:
                return "Error: Unable to parse response. Expected a list."
        except (json.JSONDecodeError, ValueError):
            return "Error: Unable to extract data. The AI response wasn't in the expected format."
    except requests.exceptions.RequestException as e:
        return f"Error: API request failed. {str(e)}"

In [None]:
%pip install -q ipytest
import ipytest
ipytest.autoconfig()
import sys
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent / "test"))
from test_utils import get_graph_token

def inject_id_token():
    # Acquire token using shared utility
    token = get_graph_token()
    globals()["idToken"] = token

inject_id_token()

def test_extract_client_names():
    text = "During today's annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services."
    extract_type = "client names"
    result = ai_extract(text, extract_type)
    assert isinstance(result, list)
    assert any("Acme" in str(item[0]) or "Global" in str(item[0]) or "TechSolutions" in str(item[0]) for item in result)

def test_extract_financial_metrics():
    text = "Q1 results exceeded expectations with revenue of $2.4M, an EBITDA margin of 18.5%, and customer acquisition costs decreasing by 12%. Cash reserves stand at $5.2M and our runway extends to 24 months."
    extract_type = "financial metrics"
    result = ai_extract(text, extract_type)
    assert isinstance(result, list)
    assert any("$2.4M" in str(item[0]) or "18.5%" in str(item[0]) or "12%" in str(item[0]) or "$5.2M" in str(item[0]) or "24 months" in str(item[0]) for item in result)

def test_extract_action_items():
    text = "Hi team, Following our strategic planning session: 1) Mark needs to finalize the budget by Friday, 2) Sarah will contact vendors for new quotes, 3) Development team must provide timeline estimates by next Wednesday, and 4) Everyone should review the new marketing materials."
    extract_type = "action items"
    result = ai_extract(text, extract_type)
    assert isinstance(result, list)
    assert any("Mark" in str(item[0]) or "Sarah" in str(item[0]) or "Development team" in str(item[0]) or "Everyone" in str(item[0]) for item in result)

def test_extract_contact_information():
    text = "John Smith, Senior Project Manager, Innovative Solutions Inc., jsmith@innovativesolutions.com, +1 (555) 123-4567, 123 Business Avenue, Suite 400, San Francisco, CA 94107"
    extract_type = "contact information"
    result = ai_extract(text, extract_type)
    assert isinstance(result, list)
    assert any("John Smith" in str(item[0]) or "Senior Project Manager" in str(item[0]) or "Innovative Solutions" in str(item[0]) or "jsmith@innovativesolutions.com" in str(item[0]) or "555" in str(item[0]) or "Business Avenue" in str(item[0]) for item in result)

def test_extract_dates_deadlines():
    text = "The initial design phase will be completed by May 15, 2025. The stakeholder review is scheduled for May 20-22, with development starting June 1. Testing will run through September 15, with final delivery expected by October 3, 2025."
    extract_type = "dates and deadlines"
    result = ai_extract(text, extract_type)
    assert isinstance(result, list)
    assert any("May" in str(item[0]) or "June" in str(item[0]) or "September" in str(item[0]) or "October" in str(item[0]) for item in result)

def test_empty_input():
    text = []
    extract_type = "dates"
    result = ai_extract(text, extract_type)
    assert isinstance(result, str)
    assert "Error: Empty input text." in result

def test_all_parameters():
    text = "The quarterly board meeting is scheduled for March 18, 2025 at 2:00 PM in Conference Room A. The agenda includes Q1 financial review, marketing strategy update, and new product launch timeline."
    extract_type = "meeting details"
    result = ai_extract(text, extract_type, temperature=0.2, max_tokens=500, model="mistral-small-latest")
    assert isinstance(result, list)
    assert len(result) > 0

ipytest.run()

In [None]:
# Gradio Demo
import gradio as gr

def run_ai_extract(text, extract_type, temperature, model, max_tokens):
    return ai_extract(text, extract_type, temperature=temperature, model=model, max_tokens=max_tokens)

examples = [
    [
        "During today's annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services.",
        "client names",
        0.0,
        "mistral-small-latest",
        1000
    ],
    [
        "Q1 results exceeded expectations with revenue of $2.4M, an EBITDA margin of 18.5%, and customer acquisition costs decreasing by 12%. Cash reserves stand at $5.2M and our runway extends to 24 months.",
        "financial metrics",
        0.0,
        "mistral-small-latest",
        1000
    ],
    [
        "Hi team, Following our strategic planning session: 1) Mark needs to finalize the budget by Friday, 2) Sarah will contact vendors for new quotes, 3) Development team must provide timeline estimates by next Wednesday, and 4) Everyone should review the new marketing materials.",
        "action items",
        0.0,
        "mistral-small-latest",
        1000
    ],
    [
        "John Smith, Senior Project Manager, Innovative Solutions Inc., jsmith@innovativesolutions.com, +1 (555) 123-4567, 123 Business Avenue, Suite 400, San Francisco, CA 94107",
        "contact information",
        0.0,
        "mistral-small-latest",
        1000
    ],
    [
        "The initial design phase will be completed by May 15, 2025. The stakeholder review is scheduled for May 20-22, with development starting June 1. Testing will run through September 15, with final delivery expected by October 3, 2025.",
        "dates and deadlines",
        0.0,
        "mistral-small-latest",
        1000
    ]
]

demo = gr.Interface(
    fn=run_ai_extract,
    inputs=[
        gr.Textbox(label="Text", lines=4, value="During today's annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services."),
        gr.Textbox(label="Extract Type", lines=1, value="client names"),
        gr.Slider(0.0, 2.0, value=0.0, step=0.01, label="Temperature"),
        gr.Textbox(value="mistral-small-latest", label="Model"),
        gr.Number(value=1000, label="Max Tokens")
    ],
    outputs=gr.Dataframe(label="Extracted Items"),
    examples=examples,
    description="Extract structured information from unstructured text using AI. Enter the text, specify what to extract, and adjust parameters as needed.",
    flagging_mode="never",
)
demo.launch()