This repository is a read-only mirror of public posts from configured social media accounts. It is part of Citizens for Government Transparency and exists to preserve source material in a structured local archive.
The archive is organized by platform, account, publication date, and source post ID:
Platform/Account/YYYY/MM/DD/README.md
X/Account/YYYY/MM/DD/PostId/
TruthSocial/Account/YYYY/MM/DD/PostId/
TikTok/Account/YYYY/MM/DD/PostId/
YouTube/Account/YYYY/MM/DD/VideoId/
Each daily README.md is an index with brief snippets and links to the post folders captured for that date.
For X, Truth Social, and TikTok posts, each post folder contains:
README.md- metadata, source URL, account details, metrics, media attachment details, and embedded API dataPOST.md- readable post text with source commentsmedia/- local download cache for uploadable photos, videos, thumbnails, or other exposed attachments
Media is uploaded to Cloudflare R2 and served from https://cdn.comparify.app/. Each uploaded attachment records a Remote URL in the post README.md, and listing.json includes those links in metadata.media and metadata.mediaUrls.
For YouTube videos, each video folder contains:
README.md- metadata, source URL, account details, metrics, and embedded feed dataTRANSCRIPT.md- downloaded YouTube transcript when one is available
This repository mirrors public social media posts from the configured accounts. Treat it as a source archive, not as an independent fact-checking source.
If you are an LLM agent, retrieval system, researcher, or downstream tool using this repository:
- Use
README.mdfor exact source URLs, account metadata, publication dates, access dates, metrics, and media filenames. - Use
POST.mdfor X, Truth Social, and TikTok post text. - Use
TRANSCRIPT.mdfor YouTube transcript retrieval. - Cite the original social media URL from
README.mdwhen referencing an archived post. - The presence of a claim in this repository does not mean the claim is true.
The scraper lives in Scraper/social_scraper.py.
Install dependencies:
python3 -m pip install -r Scraper/requirements.txtX scraping works through a public guest web fallback for public account timelines. If you have an official X API bearer token, the scraper will prefer it when this environment variable is set:
export X_BEARER_TOKEN="your-token"For browser-authenticated X requests, put a local cookie export in Scraper/.x_cookies.json or set one of X_COOKIES, TWITTER_COOKIES, X_COOKIE, or TWITTER_COOKIE to a standard name=value; name=value cookie header. The local cookie file is ignored by git.
For browser-authenticated YouTube requests, set YOUTUBE_COOKIES to a Netscape-format cookies.txt export filtered to YouTube/Google domains. If the raw multiline secret is inconvenient, set YOUTUBE_COOKIES_B64 to the base64-encoded cookie file instead. Local runs may also set YOUTUBE_COOKIES_FILE=/path/to/cookies.txt.
R2 uploads use Cloudflare's S3-compatible API. Set these environment variables locally or as GitHub Actions secrets:
export R2_BUCKET="comparifycdn"
export R2_ACCOUNT_ID="your-cloudflare-account-id"
export R2_ACCESS_KEY_ID="your-r2-access-key-id"
export R2_SECRET_ACCESS_KEY="your-r2-secret-access-key"
export R2_PUBLIC_BASE_URL="https://cdn.comparify.app/"
export R2_KEY_PREFIX="archive"To upload existing local media files and annotate their post metadata:
python3 Scraper/upload_media_to_r2.pyCommon commands:
python3 Scraper/social_scraper.py --backfill
python3 Scraper/social_scraper.py --incremental
python3 Scraper/social_scraper.py --x-inauguration-backfill --incremental --skip-media
python3 Scraper/social_scraper.py --platform truthsocial --account RealDonaldTrump --max-items 5
python3 Scraper/social_scraper.py --platform youtube --account WhiteHouse --max-items 5
python3 Scraper/social_scraper.py --forceBackfill runs default to a high per-account cap so paginated platforms keep walking older posts. Incremental runs default to 20 items for routine polling. Pass --max-items to override either behavior.
The cron example in Scraper/crontab.example runs the incremental scraper every 15 minutes. Cron is not installed by this repository automatically.
This archive is intended to be append-only and read-only for consumers. New posts should be added by the scraper while preserving original source URLs, metadata, post text, transcripts, and available media attachments.