GitHub - daisyfaithauma/MarkdownExtractor

Installation

Clone this repository and navigate into its directory:

git clone https://github.com/yourusername/CloudflareMarkdownExtractor.git
cd CloudflareMarkdownExtractor

Ensure you have the following tools installed on your system:

cURL (preinstalled on most UNIX-like systems; for Windows use WSL or Git Bash)
Python 3 (for running the extraction script)

Configuration

Export your Cloudflare credentials as environment variables to avoid hardcoding them:

export CF_ACCOUNT_ID="your-cloudflare-account-id"
export CF_API_TOKEN="your-api-token-with-edit-permissions"

Usage

1. Save the Full JSON Response

Retrieve the full API response for the Cloudflare blog post and save it to autorag-full-response.json:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"
  }' \
> autorag-full-response.json

2. (Optional) Exclude Asset Requests

Block requests for CSS, JavaScript, and images to speed up rendering and focus on core content:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/",
    "rejectRequestPattern": [
      "/^.*\\.(css|js|png|svg)$/"
    ]
  }' \
> autorag-no-assets.json

3. Extract the Markdown

Run the extraction script to pull the Markdown content from autorag-full-response.json and save it to autorag-blog.md:

python3 extract_markdown.py

This script will:

Open autorag-full-response.json.
Extract the result field containing the Markdown.
Write the Markdown to autorag-blog.md.

Repository Contents

.
├── autorag-full-response.json    # Complete API response (raw JSON)
├── autorag-no-assets.json        # API response without asset requests
├── autorag-blog.md               # Extracted Markdown content
├── extract_markdown.py           # Python script to extract Markdown from JSON
└── README.md                     # This documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Configuration

Usage

1. Save the Full JSON Response

2. (Optional) Exclude Asset Requests

3. Extract the Markdown

Repository Contents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
autorag-blog.md		autorag-blog.md
autorag-full-response.json		autorag-full-response.json
autorag-no-assets.json		autorag-no-assets.json
extract_markdown.py		extract_markdown.py

Folders and files

Latest commit

History

Repository files navigation

Installation

Configuration

Usage

1. Save the Full JSON Response

2. (Optional) Exclude Asset Requests

3. Extract the Markdown

Repository Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages