Clone this repository and navigate into its directory:
git clone https://github.com/yourusername/CloudflareMarkdownExtractor.git
cd CloudflareMarkdownExtractorEnsure you have the following tools installed on your system:
- cURL (preinstalled on most UNIX-like systems; for Windows use WSL or Git Bash)
- Python 3 (for running the extraction script)
Export your Cloudflare credentials as environment variables to avoid hardcoding them:
export CF_ACCOUNT_ID="your-cloudflare-account-id"
export CF_API_TOKEN="your-api-token-with-edit-permissions"Retrieve the full API response for the Cloudflare blog post and save it to autorag-full-response.json:
curl -s -X POST \
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${CF_API_TOKEN}" \
-d '{
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"
}' \
> autorag-full-response.jsonBlock requests for CSS, JavaScript, and images to speed up rendering and focus on core content:
curl -s -X POST \
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${CF_API_TOKEN}" \
-d '{
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/",
"rejectRequestPattern": [
"/^.*\\.(css|js|png|svg)$/"
]
}' \
> autorag-no-assets.jsonRun the extraction script to pull the Markdown content from autorag-full-response.json and save it to autorag-blog.md:
python3 extract_markdown.pyThis script will:
- Open
autorag-full-response.json. - Extract the
resultfield containing the Markdown. - Write the Markdown to
autorag-blog.md.
.
├── autorag-full-response.json # Complete API response (raw JSON)
├── autorag-no-assets.json # API response without asset requests
├── autorag-blog.md # Extracted Markdown content
├── extract_markdown.py # Python script to extract Markdown from JSON
└── README.md # This documentation