---
sidebar_label: PolarisAIDataInsight
---

# PolarisAIDataInsightLoader

- TODO: Make sure API reference link is correct.

This notebook provides a quick overview for getting started with PolarisAIDataInsight [Document Loader](###TODO). For detailed documentation of all PolarisAIDataInsightLoader features and configurations head to the [API reference](###TODO).

## Overview
### Integration details

- TODO: Fill in table features.
- TODO: Remove JS support link if not relevant, otherwise ensure link is correct.
- TODO: Make sure API reference links are correct.

| Class | Package | Local | Serializable | [JS support](###TODO)|
| :--- | :--- | :---: | :---: |  :---: |
| [PolarisAIDataInsightLoader](###TODO) | [langchain-polaris-ai-datainsight](https://pypi.org/project/langchain-polaris-ai-datainsight/) | ❌ | ❌ | ✅ | 
### Loader features
| Source | Document Lazy Loading | Native Async Support
| :---: | :---: | :---: | 
| PolarisAIDataInsightLoader | ✅ | ❌ | 

## Setup

To access PolarisAIDataInsight document loader you'll need to install the `langchain-polaris-ai-datainsight` integration package, and create a **Polaris AI DataInsight** account and get an API key.

### Credentials

Head to [here](https://datainsight.polarisoffice.com/api/keys) to sign up to PolarisAIDataInsight and generate an API key. Once you've done this set the POLARIS_AI_DATA_INSIGHT_API_KEY environment variable:

In [None]:
import getpass
import os

os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass("Enter your PolarisAIDataInsight API key: ")

### Installation

Install **langchain-polaris-ai-datainsight**.

In [None]:
%pip install -qU langchain-polaris-ai-datainsight

## Initialization

Now we can instantiate our model object and load documents:

In [None]:
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader

loader = PolarisAIDataInsightLoader(
    file_path="/path/to/file",
    mode="page"     # "element", "page", or "single". (default is "single") 
)

## Load

In [None]:
docs = loader.load()
docs[0]

In [None]:
print(docs[0].metadata)

## Lazy Load

In [None]:
page = []
for doc in loader.lazy_load():
    page.append(doc)

print(page[0].metadata)

## Use with Vector Store

Prepare a query and file for test

In [None]:
%pip install -qU langchain[openai]

Process :

1. Set an environment variable with your OpenAI API key (or the key for any other embedding model you plan to use).  
2. Load the document and extract its contents with loader.  
3. Split the extracted text and store the chunks in a vector database.  
4. Retrieve relevant chunks from the vector store and pass them to the LLM to generate a response.

In [None]:
test_query = "What is the meaning of MIT License?"
test_file = "./example/example.docx"

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In [None]:
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader

loader = PolarisAIDataInsightLoader(
    file_path="./example/example.docx"
)

docs = loader.load()
docs[0]

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
texts = text_splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

vector_store = InMemoryVectorStore.from_documents(
    texts,
    embeddings,
)

In [None]:
retrieved_docs = vector_store.similarity_search(test_query)
retrieved_docs[0]

If you want to get resources information(e.g. image path) easily, use this method:

In [None]:
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader

resources_metadata = PolarisAIDataInsightLoader.get_resources_from_documents(retrieved_docs)
resources_metadata

In [None]:
from langchain_core.prompts import ChatPromptTemplate

# Example Prompt template 
prompt = ChatPromptTemplate.from_messages([
"""
{query}

Please compose your answer based on the information provided below.
For any images, charts, or tables cited in References, consult the details found in Resources.

** References **
{context}


** Resources **
{resources_metadata}
"""]
)
answer = prompt.invoke({
    "question": test_query,
    "context": "\n\n-------------\n\n".join([doc.page_content for doc in retrieved_docs]),
    "resources_metadata": resources_metadata
})
answer.messages[0].content