diff --git a/README.md b/README.md index dc5e1ce6..6d6b4064 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,7 @@ Go to the [examples directory](examples) to try out with any of the examples, fo | [Text Embedding](examples/text_embedding) | Index text documents with embeddings for semantic search | | [Code Embedding](examples/code_embedding) | Index code embeddings for semantic search | | [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search | +| [Manual Extraction](examples/manual_extraction) | Extract structured information from a manual using LLM | More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱. diff --git a/examples/manual_extraction/README.md b/examples/manual_extraction/README.md index ee7370ec..81e61a49 100644 --- a/examples/manual_extraction/README.md +++ b/examples/manual_extraction/README.md @@ -1,10 +1,21 @@ -Simple example for cocoindex: extract structured information from a Markdown file. +In this example, we + +* Converts PDFs (generated from a few Python docs) into Markdown. +* Extract structured information from the Markdown using LLM. +* Use a custom function to further extract information from the structured output. ## Prerequisite -[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. + +Before running the example, you need to: + +* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. +* Install / configure LLM API. In this example we use Ollama, which runs LLM model locally. You need to get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama). Alternatively, you can also follow the comments in source code to switch to OpenAI, and [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example. ## Run + +### Build the index + Install dependencies: ```bash @@ -23,10 +34,18 @@ Update index: python manual_extraction.py cocoindex update ``` -Run: +### Query the index + +After index is build, you have a table with name `modules_info`. You can query it any time, e.g. start a Postgres shell: ```bash -python manual_extraction.py +psql postgres://cocoindex:cocoindex@localhost/cocoindex +``` + +And run the SQL query: + +```sql +SELECT filename, module_info->'title' AS title, module_summary FROM modules_info; ``` ## CocoInsight @@ -35,5 +54,5 @@ CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute vi Run CocoInsight to understand your RAG data pipeline: ``` -python manual_extraction.py cocoindex server -c https://cocoindex.io/cocoinsight +python manual_extraction.py cocoindex server -c https://cocoindex.io ``` diff --git a/examples/manual_extraction/manual_extraction.py b/examples/manual_extraction/manual_extraction.py index 016fcc05..1bcb4ff9 100644 --- a/examples/manual_extraction/manual_extraction.py +++ b/examples/manual_extraction/manual_extraction.py @@ -112,7 +112,7 @@ def manual_extraction_flow(flow_builder: cocoindex.FlowBuilder, data_scope: coco modules_index.export( "modules", - cocoindex.storages.Postgres(), + cocoindex.storages.Postgres(table_name="modules_info"), primary_key_fields=["filename"], )