## **FireCrawl**
- FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown and metadata for each. No sitemap required.

- FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team.

In [1]:
from langchain_community.document_loaders.firecrawl import FireCrawlLoader

## **Scraping From the URLs**

In [None]:
from langchain_community.document_loaders import FireCrawlLoader

loader = FireCrawlLoader(
    api_key="api_key",
    url="https://firecrawl.dev",          # target site to scrape
    mode="scrape",
    api_url="https://api.firecrawl.dev"   # ✅ Firecrawl API endpoint
)


In [36]:
data = loader.lazy_load()

In [None]:
for d in data:
    print(d.page_content)

In [12]:
!uv pip install firecrawl-py==4.1.1

[2mUsing Python 3.12.3 environment at: /home/md-al-amin/My-Projects/Agentic-Ai-Journey-with-HuggingFace-and-LangChain-Academy-Course/.venv[0m
[2mAudited [1m1 package[0m [2min 4ms[0m[0m


In [15]:
import os
from dotenv import load_dotenv
from firecrawl.client import Firecrawl

load_dotenv()
api_key = os.getenv("FIRECRAWL_API_KEY")

fc = Firecrawl(api_key=api_key, api_url="https://api.firecrawl.dev")

result = fc.v2.scrape(url="https://firecrawl.dev", formats=["markdown"])
print(result)




## **Map**

In [16]:
# Map example
map_result = fc.v2.map(url="https://firecrawl.dev")
print(map_result)  # returns list of URLs


links=[LinkResult(url='https://www.firecrawl.dev', title='Firecrawl - The Web Data API for AI', description='The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.'), LinkResult(url='https://www.firecrawl.dev/pricing', title='The Web Data API for AI - Firecrawl', description='+==++X=-:. ..:-::-=-== ---.. .:.--::.. .:-==::=--X==-----====--::+:::+... ..-....-:..::-::=-=-:-::--===++=-==-----== X+=-:.::-==----+==+XX+=-::.:+--==--::.'), LinkResult(url='https://www.firecrawl.dev/blog', title='The Web Data API for AI - Firecrawl', description='The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.'), LinkResult(url='https://www.firecrawl.dev/playground', title='API, Docs and Playground - Firecrawl', description=':-=+=- -=X+X+===+---==--==--:..::...+....+ ..:::---.::.---=+==XXXXXXXX+XX++==++===--+===:+X+====+=--::--=+XXXXXXX+==++==+XX+=: 

## **Crawl**

In [25]:
crawl_job = fc.v2.crawl(
    url="https://firecrawl.dev/blog/*",
    max_discovery_depth=3,
    limit=20
)

# Inspect all attributes
print(crawl_job.__dict__)  # shows all fields


{'status': 'completed', 'total': 0, 'completed': 0, 'credits_used': 0, 'expires_at': datetime.datetime(2025, 9, 2, 6, 5, 40, tzinfo=TzInfo(UTC)), 'next': None, 'data': []}


In [26]:
print(crawl_job.model_dump())  # Pydantic v2


{'status': 'completed', 'total': 0, 'completed': 0, 'credits_used': 0, 'expires_at': datetime.datetime(2025, 9, 2, 6, 5, 40, tzinfo=TzInfo(UTC)), 'next': None, 'data': []}


In [None]:
crawl_job = fc.v2.crawl(
    url="https://firecrawl.dev/",  # start from homepage
    max_discovery_depth=1,         # just one level deep
    limit=5,                       # max 5 pages
    crawl_entire_domain=True        # explore whole domain
)



In [28]:
crawl_job.__dict__

{'status': 'completed',
 'total': 5,
 'completed': 5,
 'credits_used': 5,
 'expires_at': datetime.datetime(2025, 9, 2, 6, 7, 26, tzinfo=TzInfo(UTC)),
 'next': None,

In [29]:
# Access the documents
for doc in crawl_job.data:
    print(doc.markdown[:500])  # print first 500 chars of each page
    print("------")


We just raised our Series A and shipped Firecrawl /v2 🎉. [Read the blog.](https://www.firecrawl.dev/blog/firecrawl-v2-series-a-announcement)

[2 Months Free — Annually](https://www.firecrawl.dev/pricing)

# Turn websites into   LLM-ready data

Power your AI apps with clean data crawled

from any website. [It's also open source.](https://github.com/firecrawl/firecrawl)

Scrape

Search
New

Map

Crawl

Scrape

\*AAZ

\[ .JSON \]

```json
1[\
2  {\
3    "url": "h!9=a0--axa0pAe.-o-",\
4    "markdown
------
We just raised our Series A and shipped Firecrawl /v2 🎉. [Read the blog.](https://www.firecrawl.dev/blog/firecrawl-v2-series-a-announcement)

//

Transparent

//

## Flexible pricing

Explore transparent pricing built for real-world scraping.Start for free, then scale as you grow.

Monthly

Annual20% off

Free Plan

A lightweight way to try scraping.

No cost, no card, no hassle.

500 credits

$0

one-time

Get started

Scrape500pages

2concurrent requests

Low rate limits

Hobby

Great 

## **Search**

In [33]:
# Search example
search_results = fc.v2.search(
    query="latest AI research papers 2025",
    limit=5,
    sources=["web", "news"],  # optionally add "images"
    # scrapeOptions={"formats": ["markdown"], "onlyMainContent": True}
)

print(search_results)

web=[SearchResultWeb(url='https://arxiv.org/list/cs.AI/current', title='Artificial Intelligence Aug 2025 - arXiv', description='Artificial Intelligence. Authors and titles for August 2025. Total of 3519 entries : 1-50 51-100 101-150 151-200 ...', category=None), SearchResultWeb(url='https://magazine.sebastianraschka.com/p/llm-research-papers-2025-list-one', title='LLM Research Papers: The 2025 List (January to June) - Ahead of AI', description='LLM Research Papers: The 2025 List (January to June) · 1. Reasoning Models · 2. Other Reinforcement Learning Methods for LLMs · 3. Other Inference- ...', category=None), SearchResultWeb(url='https://www.reddit.com/r/MachineLearning/comments/1i39iuh/d_recommendations_of_noteworthy_ai_papers_for/', title='[D] Recommendations of noteworthy AI papers for starters in 2025', description="I'm devising up a list of papers to recommend students just starting out in compsci. What are some must-read papers to give that is not too deep?", category=None), Se