Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
feafc3f
Create client.ts
NadavToledo1 Oct 28, 2025
292df2d
Add files via upload
NadavToledo1 Oct 28, 2025
cc78cc0
Delete src/client.ts
NadavToledo1 Oct 28, 2025
94cce09
Create tst
NadavToledo1 Oct 28, 2025
db06632
Add files via upload
NadavToledo1 Oct 28, 2025
b5a6625
Delete src/api/tst
NadavToledo1 Oct 28, 2025
bf6605d
Create tst
NadavToledo1 Oct 28, 2025
9d17b0d
Create tst
NadavToledo1 Oct 28, 2025
cbb11e4
Add files via upload
NadavToledo1 Oct 28, 2025
c590d49
Create tst
NadavToledo1 Oct 28, 2025
baea616
Add files via upload
NadavToledo1 Oct 28, 2025
468a96f
Delete src/.github/tst
NadavToledo1 Oct 28, 2025
f469f81
Create tst
NadavToledo1 Oct 28, 2025
a6a8e5a
Add files via upload
NadavToledo1 Oct 28, 2025
35e632a
Delete tests/tst
NadavToledo1 Oct 28, 2025
b54f9f0
Create tst
NadavToledo1 Oct 28, 2025
6d625f4
Add files via upload
NadavToledo1 Oct 28, 2025
3407ff9
Delete src/.github/workflows directory
NadavToledo1 Oct 28, 2025
e7a5e31
Delete .github/workflows/tst
NadavToledo1 Oct 28, 2025
35f434b
Create tst
NadavToledo1 Oct 28, 2025
916467a
Add files via upload
NadavToledo1 Oct 28, 2025
cf23801
Create tst
NadavToledo1 Oct 28, 2025
50f548a
Create tst
NadavToledo1 Oct 28, 2025
a714d3b
Add files via upload
NadavToledo1 Oct 28, 2025
c89460c
Delete examples/tst
NadavToledo1 Oct 28, 2025
1feb6a7
Delete src/schemas/tst
NadavToledo1 Oct 28, 2025
270c3de
Create tst
NadavToledo1 Oct 28, 2025
e60e6e5
Update README.md
NadavToledo1 Oct 28, 2025
5e39aed
Update README.md
NadavToledo1 Oct 28, 2025
51ca070
Update client.py
NadavToledo1 Oct 28, 2025
89f7a66
Update client.py
NadavToledo1 Oct 28, 2025
93b7e65
Update client.py
NadavToledo1 Oct 28, 2025
e06a3c8
Update client.py
NadavToledo1 Oct 28, 2025
c240a96
Update client.py
NadavToledo1 Oct 28, 2025
816985a
Update client.py
NadavToledo1 Oct 28, 2025
fb6ec7c
Update client.py
NadavToledo1 Oct 28, 2025
47eb72a
Update client.py
NadavToledo1 Oct 28, 2025
c938723
Update client.py
NadavToledo1 Oct 28, 2025
b480df3
Update client.py
NadavToledo1 Oct 28, 2025
10632e0
Update README.md
NadavToledo1 Oct 28, 2025
fcafb41
Create tst
NadavToledo1 Oct 29, 2025
cf040c6
Add files via upload
NadavToledo1 Oct 29, 2025
50969f5
Delete src/exceptions/tst
NadavToledo1 Oct 29, 2025
a965fa8
Add files via upload
NadavToledo1 Oct 29, 2025
9d368a1
Add files via upload
NadavToledo1 Oct 29, 2025
99f3b79
Update client.py
NadavToledo1 Oct 29, 2025
58b6d3e
Update README.md
NadavToledo1 Oct 29, 2025
23f9f30
Update README.md
NadavToledo1 Oct 29, 2025
8378083
Delete tests/__init__.py
NadavToledo1 Oct 29, 2025
3675229
Update test_client.py
NadavToledo1 Oct 29, 2025
b96e809
Update client.py
NadavToledo1 Oct 29, 2025
65d11b6
Add files via upload
NadavToledo1 Oct 29, 2025
d238355
Update client.py
NadavToledo1 Oct 30, 2025
eef631c
Update client.py
NadavToledo1 Oct 30, 2025
23216b4
Update test_client.py
NadavToledo1 Oct 30, 2025
8b3bb16
Update README.md
NadavToledo1 Oct 30, 2025
4e376a0
Update setup.py
NadavToledo1 Oct 30, 2025
db9aee3
Update client.py
NadavToledo1 Oct 30, 2025
96ae527
Update client.py
NadavToledo1 Oct 30, 2025
b60f994
Update client.py
NadavToledo1 Oct 30, 2025
6929beb
Create search.py
NadavToledo1 Nov 2, 2025
b7f7676
Update client.py
NadavToledo1 Nov 2, 2025
d691094
Update client.py
NadavToledo1 Nov 2, 2025
a3b8830
Update pyproject.toml
NadavToledo1 Nov 2, 2025
f0fbefd
Update setup.py
NadavToledo1 Nov 2, 2025
4114c22
Update search.py
NadavToledo1 Nov 2, 2025
b868ca3
Update client.py
NadavToledo1 Nov 2, 2025
fc79ded
Update client.py
NadavToledo1 Nov 3, 2025
4857448
Update client.py
NadavToledo1 Nov 3, 2025
db8c569
Update search.py
NadavToledo1 Nov 3, 2025
4fb3cd3
Add Search namespace + GPT search support
NadavToledo1 Nov 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@

<img width="1300" height="200" alt="sdk-banner(1)" src="https://github.com/user-attachments/assets/c4a7857e-10dd-420b-947a-ed2ea5825cb8" />

<<<<<<< HEAD
<h3 align="center">Python SDK by Bright Data, Easy-to-use scalable methods for web search & scraping</h3>
=======
<h3 align="center">Python SDK by Bright Data, Easy to use scalable methods for web search & scraping</h3>
>>>>>>> mywork/main
<p></p>

## Installation
Expand All @@ -10,11 +14,19 @@ To install the package, open your terminal:
```python
pip install brightdata-sdk
```
<<<<<<< HEAD
> If using macOS, first open a virtual environment for your project

## Quick Start

Create a [Bright Data](https://brightdata.com/cp/setting/) account and copy your API key
=======
> If using macOS, first open a virtual environment for your project.

## Quick Start

Create a [Bright Data](https://brightdata.com/cp/setting/) account and copy your API key.
>>>>>>> mywork/main

### Initialize the Client

Expand All @@ -25,7 +37,11 @@ client = bdclient(api_token="your_api_token_here") # can also be defined as BRIG
```

### Launch first request
<<<<<<< HEAD
Add to your code a serp function
=======
Add to your code a serp function.
>>>>>>> mywork/main
```python
results = client.search("best selling shoes")

Expand All @@ -38,10 +54,17 @@ print(client.parse_content(results))

| Feature | Functions | Description
|--------------------------|-----------------------------|-------------------------------------
<<<<<<< HEAD
| **Scrape every website** | `scrape` | Scrape every website using Bright's scraping and unti bot-detection capabilities
| **Web search** | `search` | Search google and other search engines by query (supports batch searches)
| **Web crawling** | `crawl` | Discover and scrape multiple pages from websites with advanced filtering and depth control
| **AI-powered extraction** | `extract` | Extract specific information from websites using natural language queries and OpenAI
=======
| **Scrape any website** | `scrape` | Scrape every website using Bright's scraping and unti bot-detection capabilities
| **Web search(SERP)** | `search` | Search google and other search engines by query (supports batch searches)
| **Web crawling** | `crawl` | Discover and scrape multiple pages from websites with advanced filtering and depth control
| **AI extraction** | `extract` | Extract specific information from websites using natural language queries and OpenAI
>>>>>>> mywork/main
| **Content parsing** | `parse_content` | Extract text, links, images and structured data from API responses (JSON or HTML)
| **Browser automation** | `connect_browser` | Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser
| **Search chatGPT** | `search_chatGPT` | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
Expand All @@ -51,11 +74,19 @@ print(client.parse_content(results))
| **Client class** | `bdclient` | Handles authentication, automatic zone creation and managment, and options for robust error handling
| **Parallel processing** | **all functions** | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats

<<<<<<< HEAD
### Try usig one of the functions

#### `Search()`
```python
# Simple single query search
=======
### Try using one of the functions

#### `Search()`
```python
# Single query search
>>>>>>> mywork/main
result = client.search("pizza restaurants")

# Try using multiple queries (parallel processing), with custom configuration
Expand All @@ -69,7 +100,11 @@ results = client.search(
```
#### `scrape()`
```python
<<<<<<< HEAD
# Simple single URL scrape
=======
# Single URL scrape
>>>>>>> mywork/main
result = client.scrape("https://example.com")

# Multiple URLs (parallel processing) with custom options
Expand All @@ -83,13 +118,30 @@ results = client.scrape(
```
#### `search_chatGPT()`
```python
<<<<<<< HEAD
result = client.search_chatGPT(
prompt="what day is it today?"
# prompt=["What are the top 3 programming languages in 2024?", "Best hotels in New York", "Explain quantum computing"],
# additional_prompt=["Can you explain why?", "Are you sure?", ""]
)

client.download_content(result) # In case of timeout error, your snapshot_id is presented and you will downloaded it using download_snapshot()
=======
# Sync mode (immediate result)
result = client.search_gpt(
prompt="Top startups in Tel Aviv",
country="IL",
web_search=True
)
print(result)

# Async mode (retrieve snapshot later)
result = client.search_gpt(
prompt="Top startups in 2024",
sync=False
)
print(result["snapshot_id"])
>>>>>>> mywork/main
```

#### `search_linkedin.`
Expand Down Expand Up @@ -125,7 +177,11 @@ print(results) # will print the snapshot_id, which can be downloaded using the d
result = client.crawl(
url="https://example.com/",
depth=2,
<<<<<<< HEAD
filter="/product/", # Only crawl URLs containing "/product/"
=======
include_filter="/product/", # Only crawl URLs containing "/product/"
>>>>>>> mywork/main
exclude_filter="/ads/", # Exclude URLs containing "/ads/"
custom_output_fields=["markdown", "url", "page_title"]
)
Expand Down Expand Up @@ -202,12 +258,23 @@ client.download_content(data)
```
**`download_snapshot`** (for async requests)
```python
<<<<<<< HEAD
# Save this function to seperate file
client.download_snapshot("") # Insert your snapshot_id
```

> [!TIP]
> Hover over the "search" or each function in the package, to see all its available parameters.
=======
# Save this function to a seperate file
# Download snapshot
snapshot_id = "your_snapshot_id_here" # <-- Replace with your actual snapshot ID
client.download_snapshot(snapshot_id) # Insert your snapshot_id
```

> [!TIP]
> Hover over the "search" or each function in the package to see all its available parameters.
>>>>>>> mywork/main

![Hover-Over1](https://github.com/user-attachments/assets/51324485-5769-48d5-8f13-0b534385142e)

Expand Down Expand Up @@ -251,7 +318,11 @@ Discover and scrape multiple pages from websites with advanced filtering.
- `url`: Single URL string or list of URLs to crawl (required)
- `ignore_sitemap`: Ignore sitemap when crawling (optional)
- `depth`: Maximum crawl depth relative to entered URL (optional)
<<<<<<< HEAD
- `filter`: Regex to include only certain URLs (e.g. "/product/")
=======
- `include_filter`: Regex to include only certain URLs (e.g. "/product/")
>>>>>>> mywork/main
- `exclude_filter`: Regex to exclude certain URLs (e.g. "/ads/")
- `custom_output_fields`: List of output fields to include (optional)
- `include_errors`: Include errors in response (default: True)
Expand Down
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ Repository = "https://github.com/brightdata/bright-data-sdk-python"
"Bug Reports" = "https://github.com/brightdata/bright-data-sdk-python/issues"
Changelog = "https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md"

[tool.setuptools]
package-dir = {"" = "src"}

[tool.setuptools.packages.find]
include = ["brightdata*"]
exclude = ["tests*"]
Expand Down Expand Up @@ -134,4 +137,4 @@ filterwarnings = [
"error",
"ignore::UserWarning",
"ignore::DeprecationWarning",
]
]
59 changes: 58 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<<<<<<< HEAD
"""
Setup script for Bright Data SDK

Expand Down Expand Up @@ -51,6 +52,29 @@ def read_version():
install_requires=[
"requests>=2.25.0",
"python-dotenv>=0.19.0",
=======
from setuptools import setup, find_packages

setup(
name="brightdata-sdk",
version="1.1.3",
description="Python SDK for Bright Data Web Scraping and SERP APIs",
author="Bright Data",
author_email="support@brightdata.com",
maintainer="Bright Data",
maintainer_email="idanv@brightdata.com",
license="MIT",
packages=find_packages(where="src"),
package_dir={"": "src"},
include_package_data=True,
python_requires=">=3.8",
install_requires=[
"requests>=2.25.0",
"python-dotenv>=0.19.0",
"aiohttp>=3.8.0",
"beautifulsoup4>=4.9.0",
"openai>=1.0.0",
>>>>>>> mywork/main
],
extras_require={
"dev": [
Expand All @@ -59,6 +83,7 @@ def read_version():
"black>=21.0.0",
"isort>=5.0.0",
"flake8>=3.8.0",
<<<<<<< HEAD
],
},
keywords="brightdata, web scraping, proxy, serp, api, data extraction",
Expand All @@ -67,4 +92,36 @@ def read_version():
"Documentation": "https://github.com/brightdata/brightdata-sdk-python#readme",
"Source": "https://github.com/brightdata/brightdata-sdk-python",
},
)
)
=======
"mypy>=0.900",
],
"test": [
"pytest>=6.0.0",
"pytest-cov>=2.10.0",
],
},
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Internet :: WWW/HTTP",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Internet :: WWW/HTTP :: Indexing/Search",
],
project_urls={
"Homepage": "https://github.com/brightdata/bright-data-sdk-python",
"Documentation": "https://github.com/brightdata/bright-data-sdk-python#readme",
"Repository": "https://github.com/brightdata/bright-data-sdk-python",
"Bug Reports": "https://github.com/brightdata/bright-data-sdk-python/issues",
"Changelog": "https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md",
},
)
>>>>>>> mywork/main
82 changes: 82 additions & 0 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""
## Bright Data SDK for Python

A comprehensive SDK for Bright Data's Web Scraping and SERP APIs, providing
easy-to-use methods for web scraping, search engine result parsing, and data management.
## Functions:
First import the package and create a client:
```python
from brightdata import bdclient
client = bdclient(your-apy-key)
```
Then use the client to call the desired functions:
#### scrape()
- Scrapes a website using Bright Data Web Unblocker API with proxy support (or multiple websites sequentially)
- syntax: `results = client.scrape(url, country, max_workers, ...)`
#### .scrape_linkedin. class
- Scrapes LinkedIn data including posts, jobs, companies, and profiles, recieve structured data as a result
- syntax: `results = client.scrape_linkedin.posts()/jobs()/companies()/profiles() # insert parameters per function`
#### search()
- Performs web searches using Bright Data SERP API with customizable search engines (or multiple search queries sequentially)
- syntax: `results = client.search(query, search_engine, country, ...)`
#### .search_linkedin. class
- Search LinkedIn data including for specific posts, jobs, profiles. recieve the relevent data as a result
- syntax: `results = client.search_linkedin.posts()/jobs()/profiles() # insert parameters per function`
#### search_chatGPT()
- Interact with ChatGPT using Bright Data's ChatGPT API, sending prompts and receiving responses
- syntax: `results = client.search_chatGPT(prompt, additional_prompt, max_workers, ...)`
#### download_content() / download_snapshot()
- Saves the scraped content to local files in various formats (JSON, CSV, etc.)
- syntax: `client.download_content(results)`
- syntax: `client.download_snapshot(results)`
#### connect_browser()
- Get WebSocket endpoint for connecting to Bright Data's scraping browser with Playwright/Selenium
- syntax: `endpoint_url = client.connect_browser()` then use with browser automation tools
#### crawl()
- Crawl websites to discover and scrape multiple pages using Bright Data's Web Crawl API
- syntax: `result = client.crawl(url, filter, exclude_filter, depth, ...)`
#### parse_content()
- Parse and extract useful information from API responses (JSON or HTML)
- syntax: `parsed = client.parse_content(data, extract_text=True, extract_links=True)`

### Features:
- Web Scraping: Scrape websites using Bright Data Web Unlocker API with proxy support
- Search Engine Results: Perform web searches using Bright Data SERP API
- Web Crawling: Discover and scrape multiple pages from websites with advanced filtering
- Content Parsing: Extract text, links, images, and structured data from API responses
- Browser Automation: Simple authentication for Bright Data's scraping browser with Playwright/Selenium
- Multiple Search Engines: Support for Google, Bing, and Yandex
- Parallel Processing: Concurrent processing for multiple URLs or queries
- Robust Error Handling: Comprehensive error handling with retry logic
- Input Validation: Automatic validation of URLs, zone names, and parameters
- Zone Management: Automatic zone creation and management
- Multiple Output Formats: JSON, raw HTML, markdown, and more
"""

from .client import bdclient
from .exceptions import (
BrightDataError,
ValidationError,
AuthenticationError,
ZoneError,
NetworkError,
APIError
)
from .utils import parse_content, parse_multiple, extract_structured_data

__version__ = "1.1.3"
__author__ = "Bright Data"
__email__ = "support@brightdata.com"

__all__ = [
'bdclient',
'BrightDataError',
'ValidationError',
'AuthenticationError',
'ZoneError',
'NetworkError',
'APIError',
'parse_content',
'parse_multiple',
'extract_structured_data'
]
13 changes: 13 additions & 0 deletions src/api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from .scraper import WebScraper
from .search import SearchAPI
from .chatgpt import ChatGPTAPI
from .linkedin import LinkedInAPI
from .crawl import CrawlAPI

__all__ = [
'WebScraper',
'SearchAPI',
'ChatGPTAPI',
'LinkedInAPI',
'CrawlAPI'
]
Loading