Exciting Demos

Architecture <> System to evaluate open-source vs closed-source

flowchart TD
    U([👤 User]) -->|prompt + model choice| APP

    subgraph RW["☁️ Railway"]
        APP[["app.py
Streamlit UI"]]
        DB[("🐘 Postgres
llm_runs")]
        APP -->|log_run / log_comparison_runs| DB
    end

    subgraph Providers["🤖 Model Providers"]
        HF["🔓 HuggingFace Router
gpt-oss-120b"]
        GH["🏢 GitHub Models
dynamically picked"]
    end

    CAT["evaluate/
prompt_evaluator.py"] -. scores + picks .-> GH

    APP -->|single: HF_TOKEN| HF
    APP -->|single: GITHUB_TOKEN| GH
    APP -->|comparison: parallel| HF
    APP -->|comparison: parallel| GH

    HF -->|output + prompt_tokens
+ output_tokens| APP
    GH -->|output + prompt_tokens
+ output_tokens| APP

    DB -->|SQL queries| MB[["📊 Metabase
model tracing dashboard"]]

    classDef store fill:#1f6feb,stroke:#fff,color:#fff
    classDef dash fill:#6e40c9,stroke:#fff,color:#fff
    class DB store
    class MB dash

Flow:

User selects a model (OSS, Commercial, or both) and submits a prompt
app.py calls the provider(s) — parallel via ThreadPoolExecutor in comparison mode
Token counts (prompt_tokens, output_tokens) are extracted from response.usage
Every run is logged to Postgres — single row for oss/commercial, two linked rows (same run_group_id) for osscom
Metabase connects directly to the Railway Postgres and visualises model usage, latency, and token consumption

A Streamlit app that runs the same prompt against an OSS model (gpt-oss-120b via HuggingFace) and a commercial GitHub Model (picked dynamically from the live catalog), side by side — then logs every call to Postgres on Railway.

Scenario 1 — Single OSS model run

flowchart LR
    U([👤 User]) -->|prompt| APP
    APP[["app.py"]] -->|HF_TOKEN| HF["🔓 HuggingFace Router
gpt-oss-120b"]
    HF -->|output + usage| APP
    APP -->|log_run
model_type=oss| DB[("🐘 Postgres
llm_runs")]

Scenario 2 — Single Commercial model run

flowchart LR
    U([👤 User]) -->|prompt| APP
    APP[["app.py"]] -->|GITHUB_TOKEN| CAT["evaluate/prompt_evaluator.py
fetch + score catalog"]
    CAT -->|best model_id| APP
    APP -->|GITHUB_TOKEN| GH["🏢 GitHub Models
dynamically picked"]
    GH -->|output + usage| APP
    APP -->|log_run
model_type=commercial| DB[("🐘 Postgres
llm_runs")]

Scenario 3 — Side-by-side comparison (osscom)

flowchart LR
    U([👤 User]) -->|prompt| APP

    APP[["app.py
ThreadPoolExecutor"]] -->|parallel| HF["🔓 HuggingFace Router
gpt-oss-120b"]
    APP -->|parallel| GH["🏢 GitHub Models"]

    HF -->|out_a + tokens| APP
    GH -->|out_b + tokens| APP

    APP -->|log_comparison_runs
run_group_id shared| DB

    subgraph DB["🐘 Postgres · llm_runs"]
        RA["row A
model_type=oss
run_group_id=xyz"]
        RB["row B
model_type=commercial
run_group_id=xyz"]
    end

Run it locally

cd prompt_process_trace_setup
pip install -r requirements.txt
cp .env.example .env          # fill in GITHUB_TOKEN, HF_TOKEN, DATABASE_URL
streamlit run app.py

Deploy on Railway

New project → deploy from this repo (root prompt_process_trace_setup/)
Add a PostgreSQL plugin → Railway injects DATABASE_URL automatically
Set GITHUB_TOKEN and HF_TOKEN as service variables
First request auto-creates the llm_runs table

Key files

File	Purpose
`app.py`	Streamlit UI — single or side-by-side mode
`db.py`	Postgres schema · `log_run()` · `log_comparison_runs()`
`evaluate/prompt_evaluator.py`	Fetches the live GitHub Models catalog and picks the best commercial model
`prompt/prompt.md`	The test prompt (movie review of Project Hail Mary)

DB schema

Column	Type	Notes
`id`	serial	primary key
`run_at`	timestamptz	auto
`run_group_id`	text	shared UUID for osscom comparison rows
`model_id`	text	full model identifier
`model_type`	text	`oss` · `commercial` · `osscom`
`prompt`	text
`output`	text
`error`	text	null on success
`elapsed_sec`	float	wall-clock time
`prompt_tokens`	int	from `response.usage`
`output_tokens`	int	from `response.usage`
`mode`	text	`single` · `comparison`

References

DataJourneyHQ/list-github-models
Metabase — open-source BI connected to Railway Postgres

25th March CrewAI Demos

OSS Discovery + Deployed via GitHub Action @sayantikabanik
API-first asset librarian for local-first document and image similarity analysis. https://github.com/arcnem-ai/omnivec @Kthom1

GitHub Actions — CrewAI OSS Discovery Agent

A workflow_dispatch workflow that runs the CrewAI agent on demand directly from GitHub Actions. No local setup, no OpenAI account — just a GitHub PAT.

How it works:

You trigger it manually from the Actions tab with 3 inputs: criteria, programming_languages, project_types
The agent uses gpt-4o-mini routed through GitHub's model endpoint (https://models.inference.ai.azure.com) — authenticated via your GitHub token
It searches the web for open source projects, scrapes repos, and writes a discovery report
The report is saved as both .md and .html and uploaded as a downloadable artifact

Only 1 secret needed:

Secret	What it is
`GH_MODELS_TOKEN`	Your GitHub PAT with Models read permission

⚠️ Hallucination Risk — This Workflow Has a Known Issue

The workflow currently completes successfully even when the search tool fails.

Here's what actually happens at runtime:

Agent calls SerperDevTool to search the web
        │
        ▼ ❌ 403 Forbidden — SERPER_API_KEY missing or invalid
        │
        │  ERROR: 403 Client Error: Forbidden
        │  Tool: search_the_internet_with_serper
        │  Iteration: 26 — all 25 retries exhausted
        │
        ▼
Agent falls back to LLM training knowledge
        │
        ▼ ✅ GitHub Actions reports SUCCESS
Artifact uploaded — looks like a normal report

Why this is a risk

The output looks completely valid — proper markdown, real project names, GitHub URLs, star counts. But it is generated entirely from the LLM's training data (knowledge cutoff: early 2024), not from live web search. There are no guardrails in place to detect or reject this.

Risk	Detail
Stale data	Projects may be archived, renamed, or no longer maintained
Fabricated URLs	Links may point to wrong or non-existent repos
False confidence	Report reads as authoritative with no warning it failed
CI shows green	Nothing in the pipeline signals that tools broke

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
assets_knowledge		assets_knowledge
crewAI_discover_oss_project		crewAI_discover_oss_project
prompt_process_trace_setup		prompt_process_trace_setup
weekly_agenda		weekly_agenda
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exciting Demos

Architecture <> System to evaluate open-source vs closed-source

Scenario 1 — Single OSS model run

Scenario 2 — Single Commercial model run

Scenario 3 — Side-by-side comparison (osscom)

Run it locally

Deploy on Railway

Key files

DB schema

References

25th March CrewAI Demos

GitHub Actions — CrewAI OSS Discovery Agent

⚠️ Hallucination Risk — This Workflow Has a Known Issue

Why this is a risk

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exciting Demos

Architecture <> System to evaluate open-source vs closed-source

Scenario 1 — Single OSS model run

Scenario 2 — Single Commercial model run

Scenario 3 — Side-by-side comparison (osscom)

Run it locally

Deploy on Railway

Key files

DB schema

References

25th March CrewAI Demos

GitHub Actions — CrewAI OSS Discovery Agent

⚠️ Hallucination Risk — This Workflow Has a Known Issue

Why this is a risk

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages