Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
| [Text Embedding](examples/text_embedding) | Index text documents with embeddings for semantic search |
| [Code Embedding](examples/code_embedding) | Index code embeddings for semantic search |
| [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
| [Manual Extraction](examples/manual_extraction) | Extract structured information from a manual using LLM |

More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.

Expand Down
29 changes: 24 additions & 5 deletions examples/manual_extraction/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
Simple example for cocoindex: extract structured information from a Markdown file.
In this example, we

* Converts PDFs (generated from a few Python docs) into Markdown.
* Extract structured information from the Markdown using LLM.
* Use a custom function to further extract information from the structured output.

## Prerequisite
[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.

Before running the example, you need to:

* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
* Install / configure LLM API. In this example we use Ollama, which runs LLM model locally. You need to get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama). Alternatively, you can also follow the comments in source code to switch to OpenAI, and [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example.

## Run


### Build the index

Install dependencies:

```bash
Expand All @@ -23,10 +34,18 @@ Update index:
python manual_extraction.py cocoindex update
```

Run:
### Query the index

After index is build, you have a table with name `modules_info`. You can query it any time, e.g. start a Postgres shell:

```bash
python manual_extraction.py
psql postgres://cocoindex:cocoindex@localhost/cocoindex
```

And run the SQL query:

```sql
SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;
```

## CocoInsight
Expand All @@ -35,5 +54,5 @@ CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute vi
Run CocoInsight to understand your RAG data pipeline:

```
python manual_extraction.py cocoindex server -c https://cocoindex.io/cocoinsight
python manual_extraction.py cocoindex server -c https://cocoindex.io
```
2 changes: 1 addition & 1 deletion examples/manual_extraction/manual_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def manual_extraction_flow(flow_builder: cocoindex.FlowBuilder, data_scope: coco

modules_index.export(
"modules",
cocoindex.storages.Postgres(),
cocoindex.storages.Postgres(table_name="modules_info"),
primary_key_fields=["filename"],
)

Expand Down