Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions examples/manuals_llm_extraction/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
# Structured Data Extraction from PDF with Ollama and CocoIndex

![Structured data extraction with Ollama and CocoIndex](https://cocoindex.io/blogs/assets/images/cocoindex-ollama-structured-extraction-from-pdf-6ee15b1e0fe304063dc78f04153fb385.png)


In this example, we

* Converts PDFs (generated from a few Python docs) into Markdown.
* Extract structured information from the Markdown using LLM.
* Use a custom function to further extract information from the structured output.

Please give [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a star to support us if you like our work. Thank you so much with a warm coconut hug 🥥🤗. [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)

## Prerequisite

Before running the example, you need to:
Expand Down Expand Up @@ -47,14 +54,21 @@ And run the SQL query:
```sql
SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;
```
You should see results like:

![Module Info Index](https://cocoindex.io/blogs/assets/images/module_info_index-ffaec6042ec3a18eaf94bed5b227a085.png)


## CocoInsight
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
CocoInsight is a tool to help you understand your data pipeline and data index. CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).

Run CocoInsight to understand your RAG data pipeline:

```
python main.py cocoindex server -c https://cocoindex.io
```

Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight). It connects to your local CocoIndex server with zero data retention.

You can view the pipeline flow and the data preview in the CocoInsight UI:
![CocoInsight UI](https://cocoindex.io/blogs/assets/images/cocoinsight-edd71690dcc35b6c5cf1cb31b51b6f6f.png)