Skip to content

davidthewatson/recursive_summarizer

 
 

Repository files navigation

Recursive Summarizer

This is a basic implementation of a recursive summarizer using openai, GPT-3, and requests-html to crawl, parse, and summarize text of any length by chunk.

The usual instructions apply: lin, mac, or win + python 3. Then:

python -m venv .venv
source .venv/bin/activate{.fish} # or whatever shell you're using today
pip install -r requirements.txt

The only thing missing is that you have to get an OPENAI API Key and place it in a .env file as follows: ╭─watson@acer in repo: recursive_summarizer on  main [!] via  v3.10.8 (.venv) took 14ms ╰─λ cat .env File: .env

export OPENAI_API_KEY="YOUR_KEY_GOES_HERE"

After that, just adjust the crawling frontier URLs in the source to your liking. Mine is tuned for my custom cms system that I wrote from scratch.

It starts with an index of content for which this crawler is designed to limit it's crawl to the corpus of text cascading from that index. It's generally easy to edit the filter to limit the crawl by keyword or whatever.

This is not scrapy, it's a quick weekend hack.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.7%
  • Shell 4.3%