<a href="https://colab.research.google.com/github/alexfazio/firecrawl-cookbook/blob/main/openai_o1_firecrawl_integration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Integrating OpenAI's o1 Reasoning Models with Firecrawl: A Step-by-Step Guide

By Alex Fazio (https://twitter.com/alxfazio)

Github repo: https://github.com/alexfazio/firecrawl-cookbook

OpenAI has recently unveiled its o1 series models, marking a significant leap in the realm of complex reasoning with AI. These models are designed to "think before they answer," producing extensive internal chains of thought before responding. In this guide, we'll explore how to integrate these powerful models into your applications, with a practical example of crawling a website using the o1-preview model.

**This Jupyter notebook** demonstrates how to integrate OpenAI's o1 reasoning models with Firecrawl technology to perform complex tasks like crawling a website and extracting specific information.

By the end of this notebook, you'll be able to:

- Set up the Firecrawl and OpenAI environments
- Use the o1-preview model to enhance the crawling process
- Crawl a website and generate a list of relevant URLs based on a given objective
- Extract content from crawled pages in Markdown
- Evaluate the extracted content using the o1 reasoning model to check if it meets the specified objective

This guide is designed for developers and data scientists who want to leverage advanced AI reasoning capabilities and web crawling technology to efficiently gather and analyze information from the web.

## Requirements

Before proceeding, ensure you have the following:

- Firecrawl API key: Essential for accessing the Firecrawl service
- OpenAI API key: Required for using the o1 reasoning models

## Introduction to o1 Models

The o1 models are large language models trained with reinforcement learning to excel in complex reasoning tasks. There are two models available:

- **o1-preview**: An early preview designed for reasoning about hard problems using broad general knowledge.
- **o1-mini**: A faster, cost-effective version ideal for coding, math, and science tasks that don't require extensive general knowledge.

While these models offer significant advancements, they are not intended to replace GPT-4o in all use cases. If your application requires image inputs, function calling, or consistent fast response times, GPT-4o and GPT-4o mini remain the optimal choices.

## Prerequisites

First, let's install the required libraries:

In [1]:
%pip install -q firecrawl-py openai python-dotenv

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/386.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m378.9/386.9 kB[0m [31m20.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/78.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/325.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Step 1: Import Necessary Libraries

In [None]:
import os
from firecrawl import FirecrawlApp
import json
from dotenv import load_dotenv
from openai import OpenAI

## Step 2: Load Environment Variables

For Google Colab, we'll set the environment variables directly instead of using a .env file. In practice, you should never expose your API keys in your notebook.

In [None]:
# For development, use environment variables
os.environ['FIRECRAWL_API_KEY'] = 'your_firecrawl_api_key_here'
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key_here'

# Retrieve API keys from environment variables
firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

## Step 3: Initialize the FirecrawlApp and OpenAI Client

In [None]:
# Initialize the FirecrawlApp and OpenAI client
app = FirecrawlApp(api_key=firecrawl_api_key)
client = OpenAI(api_key=openai_api_key)

## Step 4: Define the Objective and URL

In [None]:
url = "https://example.com"
objective = "Find the contact email for customer support"

## Step 5: Determine the Search Parameter Using o1-preview

In [None]:
map_prompt = f"""
The map function generates a list of URLs from a website and accepts a search parameter. Based on the objective: {objective}, suggest a 1-2 word search parameter to find the needed information. Only respond with 1-2 words.
"""

# OpenAI API call
completion = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {"role": "user", "content": map_prompt}
    ]
)

map_search_parameter = completion.choices[0].message.content.strip()
print(f"Search parameter: {map_search_parameter}")

## Step 6: Map the Website Using the Search Parameter

In [None]:
map_website = app.map_url(url, params={"search": map_search_parameter})
print("Mapped URLs:", map_website)

## Step 7: Scrape the Top Pages and Check for the Objective

In [None]:
# Get top 3 links
top_links = map_website[:3] if isinstance(map_website, list) else []

for link in top_links:
    # Scrape the page
    scrape_result = app.scrape_url(link, params={'formats': ['markdown']})

    # Check if objective is met
    check_prompt = f"""
    Given the following scraped content and objective, determine if the objective is met with high confidence.
    If it is, extract the relevant information in a simple and concise JSON format.
    If the objective is not met with high confidence, respond with 'Objective not met'.

    Objective: {objective}
    Scraped content: {scrape_result['markdown']}
    """

    completion = client.chat.completions.create(
        model="o1-preview",
        messages=[
            {"role": "user", "content": check_prompt}
        ]
    )

    result = completion.choices[0].message.content.strip()

    if result != "Objective not met":
        try:
            extracted_info = json.loads(result)
            break
        except json.JSONDecodeError:
            continue
else:
    extracted_info = None

## Step 8: Display the Extracted Information

In [None]:
if extracted_info:
    print("Extracted Information:")
    print(json.dumps(extracted_info, indent=2))
else:
    print("Objective not met with the available content.")

## Conclusion

In this notebook, we've explored how to integrate OpenAI's new o1 reasoning models into your applications to perform complex tasks like crawling a website and extracting specific information. The o1 models showcase impressive capabilities in reasoning and problem-solving, making them valuable tools for developers tackling challenging AI tasks.

Whether you're working on advanced coding problems, mathematical computations, or intricate scientific queries, the o1 models can significantly enhance your application's reasoning abilities.

Happy coding!