Skip to content

HonzaTuron/pinecone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pinecone Integration

Integrate Apify Actors with Pinecone to seamlessly transfer and store data as vectors.

⚠️ Important: This Actor is intended for use alongside other Actors. For instance, when using the Website Content Crawler, enable this integration to store data as vectors in Pinecone.

Explore how to utilize vector stores on the Apify platform by reading our blog post: Understanding Pinecone and Its Importance for Your LLMs.

Description

This integration is designed to process and store data vectors from various Apify Actors. It interfaces with OpenAI and Pinecone through langchain to perform the following steps:

  1. Retrieve Actor's dataset using dataset_id (automatically passed in integration).
  2. Fetch the dataset using the Apify SDK.
  3. [Optional] Segment text data into chunks with langchain's RecursiveCharacterTextSplitter (parameters like chunk_size and chunk_overlap are customizable).
  4. Compute embeddings via OpenAI.
  5. Store the resulting vectors in Pinecone.

Before You Start

Ensure you have the following prerequisites for this integration:

  • An OpenAI account and API token. Sign up for a free account at OpenAI.
  • A Pinecone database with a valid API KEY (pinecone_token).

Inputs

Refer to the input schema for detailed information:

  • index_name: Name of the Pinecone index.
  • pinecone_token: Your Pinecone access token (API KEY).
  • openai_token: Your OpenAI API token.
  • fields - Array of fields you want to save. For example, if you want to push name and user.description fields, you should set this field to ["name", "user.description"].
  • metadata_values - Object of metadata values you want to save. For example, if you want to push url and createdAt values to Pinecone, you should set this field to {"url": "https://www.apify.com", "createdAt": "2021-09-01"}.
  • metadata_fields - Object of metadata fields you want to save. For example, if you want to push url and createdAt fields, you should set this field to {"url": "url", "createdAt": "createdAt"}. If it has the same key as metadata_values, it's replaced.
  • chunk_size: Maximum character length for each text chunk.
  • chunk_overlap: Overlap in characters between consecutive text chunks.

Fields, metadata_values, and metadata_fields support dot notation for nested data.

Outputs

This integration saves selected fields from your Actor's output into your Pinecone database.

Community and Support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published