Skip to content

daisyfaithauma/MarkdownExtractor

Repository files navigation

Installation

Clone this repository and navigate into its directory:

git clone https://github.com/yourusername/CloudflareMarkdownExtractor.git
cd CloudflareMarkdownExtractor

Ensure you have the following tools installed on your system:

  • cURL (preinstalled on most UNIX-like systems; for Windows use WSL or Git Bash)
  • Python 3 (for running the extraction script)

Configuration

Export your Cloudflare credentials as environment variables to avoid hardcoding them:

export CF_ACCOUNT_ID="your-cloudflare-account-id"
export CF_API_TOKEN="your-api-token-with-edit-permissions"

Usage

1. Save the Full JSON Response

Retrieve the full API response for the Cloudflare blog post and save it to autorag-full-response.json:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"
  }' \
> autorag-full-response.json

2. (Optional) Exclude Asset Requests

Block requests for CSS, JavaScript, and images to speed up rendering and focus on core content:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/",
    "rejectRequestPattern": [
      "/^.*\\.(css|js|png|svg)$/"
    ]
  }' \
> autorag-no-assets.json

3. Extract the Markdown

Run the extraction script to pull the Markdown content from autorag-full-response.json and save it to autorag-blog.md:

python3 extract_markdown.py

This script will:

  1. Open autorag-full-response.json.
  2. Extract the result field containing the Markdown.
  3. Write the Markdown to autorag-blog.md.

Repository Contents

.
├── autorag-full-response.json    # Complete API response (raw JSON)
├── autorag-no-assets.json        # API response without asset requests
├── autorag-blog.md               # Extracted Markdown content
├── extract_markdown.py           # Python script to extract Markdown from JSON
└── README.md                     # This documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages