curl_cffi

Python binding for curl-impersonate via cffi.

Documentation | 中文 README | Discuss on Telegram

Unlike other pure python http clients like httpx or requests, curl_cffi can impersonate browsers' TLS signatures or JA3 fingerprints. If you are blocked by some website for no obvious reason, you can give this package a try.

Features

Supports JA3/TLS and http2 fingerprints impersonation.
Much faster than requests/httpx, on par with aiohttp/pycurl, see benchmarks.
Mimics requests API, no need to learn another one.
Pre-compiled, so you don't have to compile on your machine.
Supports asyncio with proxy rotation on each request.
Supports http 2.0, which requests does not.
Supports websocket.

library	requests	aiohttp	httpx	pycurl	curl_cffi
http2	❌	❌	✅	✅	✅
sync	✅	❌	✅	✅	✅
async	❌	✅	✅	❌	✅
websocket	❌	✅	❌	❌	✅
fingerprints	❌	❌	❌	❌	✅
speed	🐇	🐇🐇	🐇	🐇🐇	🐇🐇

Install

pip install curl_cffi --upgrade

This should work on Linux, macOS and Windows out of the box. If it does not work on you platform, you may need to compile and install curl-impersonate first and set some environment variables like LD_LIBRARY_PATH.

To install beta releases:

pip install curl_cffi --upgrade --pre

To install unstable version from GitHub:

git clone https://github.com/yifeikong/curl_cffi/
cd curl_cffi
make preprocess
pip install .

Usage

Use the latest impersonate versions, do NOT copy chrome110 here without changing.

requests-like

from curl_cffi import requests

# Notice the impersonate parameter
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110")

print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser

# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)

proxies = {"https": "socks://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)

Sessions

# sessions are supported
s = requests.Session()
# httpbin is a http test website
s.get("https://httpbin.org/cookies/set/foo/bar")
print(s.cookies)
# <Cookies[<Cookie foo=bar for httpbin.org />]>
r = s.get("https://httpbin.org/cookies")
print(r.json())
# {'cookies': {'foo': 'bar'}}

Supported impersonate versions, as supported by my fork of curl-impersonate:

However, only Chrome-like browsers are supported. Firefox support is tracked in #59.

chrome99
chrome100
chrome101
chrome104
chrome107
chrome110
chrome116 ^[1]
chrome119 ^[1]
chrome120 ^[1]
chrome99_android
edge99
edge101
safari15_3 ^[2]
safari15_5 ^[2]
safari17_0 ^[1]
safari17_2_ios ^[1]

Notes:

Added in version 0.6.0.
fixed in version 0.6.0, previous http2 fingerprints were not correct.

asyncio

from curl_cffi.requests import AsyncSession

async with AsyncSession() as s:
    r = await s.get("https://example.com")

More concurrency:

import asyncio
from curl_cffi.requests import AsyncSession

urls = [
    "https://google.com/",
    "https://facebook.com/",
    "https://twitter.com/",
]

async with AsyncSession() as s:
    tasks = []
    for url in urls:
        task = s.get(url)
        tasks.append(task)
    results = await asyncio.gather(*tasks)

WebSockets

from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
    print(message)

with Session() as s:
    ws = s.ws_connect(
        "wss://api.gemini.com/v1/marketdata/BTCUSD",
        on_message=on_message,
    )
    ws.run_forever()

curl-like

Alternatively, you can use the low-level curl-like API:

from curl_cffi import Curl, CurlOpt
from io import BytesIO

buffer = BytesIO()
c = Curl()
c.setopt(CurlOpt.URL, b'https://tls.browserleaks.com/json')
c.setopt(CurlOpt.WRITEDATA, buffer)

c.impersonate("chrome110")

c.perform()
c.close()
body = buffer.getvalue()
print(body.decode())

See the docs for more details.

scrapy

If you are using scrapy, check out these middlewares:

Acknowledgement

Originally forked from multippt/python_curl_cffi, which is under the MIT license.
Headers/Cookies files are copied from httpx, which is under the BSD license.
Asyncio support is inspired by Tornado's curl http client.
The WebSocket API is inspired by websocket_client.

[Sponsor] Bypass Cloudflare with API

Yescaptcha is a proxy service that bypasses Cloudflare and uses the API interface to obtain verified cookies (e.g. cf_clearance). Click here to register: https://yescaptcha.com/i/stfnIO

[Sponsor] ScrapeNinja

ScrapeNinja is a web scraping API with two engines: fast, with high performance and TLS fingerprint; and slower with a real browser under the hood.

ScrapeNinja handles headless browsers, proxies, timeouts, retries, and helps with data extraction, so you can just get the data in JSON. Rotating proxies are available out of the box on all subscription plans.

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
.github		.github
assets		assets
benchmark		benchmark
curl_cffi		curl_cffi
docs		docs
examples		examples
ffi		ffi
scripts		scripts
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README-zh.md		README-zh.md
README.md		README.md
libs.json		libs.json
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

curl_cffi

Features

Install

Usage

requests-like

Sessions

asyncio

WebSockets

curl-like

scrapy

Acknowledgement

[Sponsor] Bypass Cloudflare with API

[Sponsor] ScrapeNinja

Sponsor

About

Releases

Packages

Languages

License

bjia56/curl_cffi

Folders and files

Latest commit

History

Repository files navigation

curl_cffi

Features

Install

Usage

requests-like

Sessions

asyncio

WebSockets

curl-like

scrapy

Acknowledgement

[Sponsor] Bypass Cloudflare with API

[Sponsor] ScrapeNinja

Sponsor

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages