Content Engine

Script that:

Reads a list of sources from a file (each line: type url — type is rss or html)
Parses the latest N items: RSS/Atom with feedparser, HTML blog listing pages with BeautifulSoup + readability
Calls an LLM to generate 10 post ideas from the parsed content (structured: source links, source insight, post idea, description, format, how to use)

Uses OpenAI for LLM calls and LangFuse for tracing (via langfuse.openai).

Setup

# Use uv (recommended)
uv sync

# Or pip
pip install -e .

Copy .env.example to .env and set:

OPENAI_API_KEY — required
LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY — optional; if set, traces are sent to LangFuse

Usage

Put sources in urls.txt. Format: type url (tab or space), one per line.
- rss — RSS/Atom feed URL (e.g. rss https://hnrss.org/frontpage).
- html — blog listing page URL; the script will find article links on the page and fetch each article (e.g. html https://www.forrester.com/blogs/).
Run:

python run.py

Options (env vars):

URLS_FILE — path to URLs file (default: urls.txt)
TOP_N — max number of latest entries to use (default: 10)
OPENAI_MODEL — model name (default: gpt-4o)

Output: 10 post ideas (with source links, insights, format) printed to stdout; LLM calls are logged to LangFuse when configured.

Adding new sources

When adding new URLs (especially html sources), follow docs/ADDING_SOURCES.md so that:

You verify how the site is parsed and which links are collected.
You add exclusions in parsers/html.py for category/section/landing URLs if the parser picks them up by mistake.

Project structure

content-engine/
├── run.py              # Entry point (loads .env, calls main)
├── main.py             # Pipeline: load sources → fetch → LLM → output
├── config.py           # Constants (DEFAULT_*, USER_AGENT, SOURCE_TYPES)
├── models.py           # FeedEntry, Source dataclasses
├── sources.py          # load_sources() from urls file
├── fetcher.py          # fetch_entries() — RSS + HTML, merge by date
├── parsers/
│   ├── __init__.py
│   ├── rss.py          # fetch_entries_rss()
│   └── html.py        # fetch_entries_html()
├── prompt_loader.py    # load_prompt(name, **variables)
├── llm.py              # generate_post_ideas()
├── prompts/            # Prompt templates ({{placeholder}})
│   ├── post_ideas_system.txt
│   └── post_ideas_user.txt     # {{count}}, {{content}}, {{sources}}
├── urls.txt
└── ...

Run from the project root so that imports resolve (python run.py).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Engine

Setup

Usage

Adding new sources

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cursor/rules		.cursor/rules
docs		docs
parsers		parsers
prompts		prompts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.py		config.py
fetcher.py		fetcher.py
llm.py		llm.py
main.py		main.py
models.py		models.py
prompt_loader.py		prompt_loader.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
sources.py		sources.py
urls.txt		urls.txt

OutRizz/content-engine

Folders and files

Latest commit

History

Repository files navigation

Content Engine

Setup

Usage

Adding new sources

Project structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages