# Notion Connection with LlamaIndex

This notebook demonstrates how to connect to Notion using LlamaIndex, retrieve documents, and query them using OpenAI's LLM.

## Prerequisites
- A Notion integration token (create one at https://www.notion.so/my-integrations)
- An OpenAI API key
- The Notion page must grant access to your integration

## Setup and Installation

First, let's install the required packages. Run the cell below to install all dependencies needed for this notebook.

In [None]:
# Install required packages
!pip install llama-index llama-index-readers-notion llama-index-llms-openai python-dotenv openai

# Verify installations
import importlib

def check_package(package_name):
    try:
        importlib.import_module(package_name)
        return True
    except ImportError:
        return False

packages = {
    "llama_index": "llama-index core",
    "llama_index.readers.notion": "Notion reader",
    "llama_index.llms.openai": "OpenAI integration",
    "openai": "OpenAI API"
}

all_installed = True
for package, display_name in packages.items():
    installed = check_package(package)
    print(f"{display_name}: {'✅ Installed' if installed else '❌ Not installed'}")
    all_installed = all_installed and installed

if all_installed:
    print("\n✅ All required packages are installed!")
else:
    print("\n⚠️ Some packages are missing. Run the installation command again.")

## Environment Setup

Load environment variables from the `.env` file. <br>
N.b. it will look through the entire project for a valid `.env` file.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get API keys from environment variables
NOTION_INTEGRATION_TOKEN = os.getenv("NOTION_INTEGRATION_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Set Notion page IDs (comma-separated string if multiple one)
page_ids_str = "9917363395904835a604ca7a6a358579" # replace with your Notion page ID(s)
# Convert comma-separated string to list
NOTION_PAGE_IDS = page_ids_str.split(",")

# Set environment variables for compatibility with libraries that expect them
os.environ["NOTION_INTEGRATION_TOKEN"] = NOTION_INTEGRATION_TOKEN or ""
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY or ""

# Verify API keys are set
if not NOTION_INTEGRATION_TOKEN:
    print("⚠️ Warning: NOTION_INTEGRATION_TOKEN is not set in .env file")
if not OPENAI_API_KEY:
    print("⚠️ Warning: OPENAI_API_KEY is not set in .env file")
else:
    print("✅ API keys are set")
    print(f"✅ Using Notion page IDs: {NOTION_PAGE_IDS}")

## Import Required Libraries

First, we'll import the necessary libraries and configure logging.

In [None]:
import logging
import sys
import openai

from IPython.display import Markdown, display

# Configure basic logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Import LlamaIndex Components

We need to import the necessary components from LlamaIndex, including the NotionPageReader.

In [None]:
from llama_index.core import SummaryIndex, ServiceContext, Settings
from llama_index.readers.notion import NotionPageReader
from llama_index.llms.openai import OpenAI

# Verify that the imports worked
print("✅ LlamaIndex components imported successfully")

## Verify API Keys

Before proceeding, let's verify that our API keys are properly set.

In [None]:
# Set OpenAI API key
openai.api_key = OPENAI_API_KEY

if not NOTION_INTEGRATION_TOKEN:
    raise ValueError("No Notion integration token found. Please set NOTION_INTEGRATION_TOKEN above.")
    
if not OPENAI_API_KEY:
    raise ValueError("No OpenAI API key found. Please set OPENAI_API_KEY above.")

print("✅ API keys verified")

## Notion Page Connection

Now we'll connect to Notion and retrieve documents from specified pages.

**Note**: You need to provide the ID of a Notion page that has granted access to your integration. The ID is the last part of the Notion URL, typically a 32-character string.

In [None]:
# Be aware that this cell block can take up to ~1-2 minutes to run

# We already have our Notion page IDs from environment variables
print(f"Attempting to load {len(NOTION_PAGE_IDS)} pages from Notion...")

try:
    # Load documents from Notion using the integration token from .env
    documents = NotionPageReader(integration_token=NOTION_INTEGRATION_TOKEN).load_data(
        page_ids=NOTION_PAGE_IDS
    )
    print(f"✅ Successfully loaded {len(documents)} documents from Notion")
    
    # Display brief information about the documents
    for i, doc in enumerate(documents):
        print(f"Document {i+1} - Title: {doc.metadata.get('title', 'Untitled')}")
        print(f"  - Length: {len(doc.text)} characters")
except Exception as e:
    print(f"❌ Error loading Notion pages: {e}")
    print("Please check your integration token and ensure your integration has access to the pages.")

> Note : The notion API is really slow, so try to use a small page for testing. (a normal page can take up to 2 minutes to load)

## Configure LLM and Create Index

Next, we'll set up the LLM and create an index from our Notion documents.

In [None]:
# Configure OpenAI model with API key from environment
llm = OpenAI(model_name="gpt-4o-mini", api_key=OPENAI_API_KEY)
Settings.llm = llm

# Create an index from the documents
try:
    index = SummaryIndex.from_documents(documents)
    print("✅ Index created successfully")
except Exception as e:
    print(f"❌ Error creating index: {e}")

## Query the Index

Now we can query our indexed documents to extract information.

In [None]:
# Set logging to DEBUG for detailed outputs
logging.getLogger().setLevel(logging.DEBUG)

# Create a query engine from the index
query_engine = index.as_query_engine()

# Define your query
query_text = "What is BPMN?"  # Replace with your own query

print(f"Querying: '{query_text}'")
response = query_engine.query(query_text)

print("\n" + "-"*50 + "\n")
print(f"Answer: {response}")

## Try Another Query

Feel free to experiment with different queries on your Notion content.

In [None]:
# Try another query
query_text = "Summarize the main points from this document"  # Replace with your own query

print(f"Querying: '{query_text}'")
response = query_engine.query(query_text)

# Display the response in a nicer format
print("\n" + "-"*50)
display(Markdown(f"**Answer:**\n\n{response}"))

## Conclusion

In this notebook, we demonstrated how to:
1. Connect to Notion using the LlamaIndex NotionPageReader
2. Retrieve documents from specific Notion pages
3. Create an index from those documents
4. Query the index using OpenAI's language models

This approach allows you to use your Notion content as a knowledge base for AI-powered queries and summaries.

But we also discovered that the Notion API is quite slow, that create quite a delay for a direct communication with the LLM. In order to speed up the process, we can setup a local cache to store the documents get from Notion. This way, we can avoid the delay of fetching the documents from Notion every time we want to query them. To see how to do that you can open the [Simple caching system](simple_caching_system.ipynb) notebook.

## Troubleshooting Tips

If you encounter issues:

1. **Integration Access**: Make sure your Notion integration has been granted access to the pages you're trying to query. For example if you have a workspace with multiple pages, you need to add the integration to each page you want to access. To have more information on how to do that, you can check the [Notion - Create your first integration](https://developers.notion.com/docs/create-a-notion-integration).
2. **API Keys**: Verify that your API keys are correct and have the necessary permissions.
3. **Page IDs**: Ensure you're using the correct Notion page IDs.
4. **Dependencies**: Make sure you have installed all required packages:
   ```
   pip install llama-index llama-index-readers-notion python-dotenv openai
   ```