Researches is a Google scraper. Minimal requirements.
Key designs:
- No beautifulsoup. We want to make sure everything is running smoothly and not slowly.
- Simple API. Great developer experience, that's all that matters.
- Typed. We support typing for everything you see.
Note thet researches
does not clean up data for you, meaning it's better for LLM-based content consumption.
search("Who invented papers?")
# Result(snippet=Snippet(…), aside=None, weather=None, web=[Web(…), …], …)
- A decent computer with an Internet connection
- Python ≥ 3.9 (
dataclasses
support) primp
– 🪞 HTTP connections & fingerprint impersonation.selectolax
– 🌯 The HTML parser.
Just start searching right away. Don't worry, Gemini won't hurt you (also gemini).
# Sync code
search(
"US to Japan", # query
hl="en", # language
ua=None, # custom user agent or ours
**kwargs # kwargs to pass to primp (optional)
) -> Result
For people who love async, we've also got you covered:
# Async code
await asearch(
"US to Japan" # query
hl="en", # language
ua=None, # custom user agent or ours
**kwargs # kwargs to pass to primp (optional)
) -> Result
So, what does the Result
class has to offer? At a glance:
result.snippet?
⤷ .text: str
⤷ .name: str?
result.aside?
⤷ .text: str
result.weather?
⤷ .c: str
⤷ .f: str
⤷ .precipitation: str
⤷ .humidty: str
⤷ .wind_metric: str
⤷ .wind_imperial: str
⤷ .description: str
⤷ .forecast: PartialWeatherForReport[]
⤷ .weekday: str
⤷ .high_c: str
⤷ .low_c: str
⤷ .high_f: str
⤷ .low_f: str
result.web: Web[]
⤷ .title: str
⤷ .url: str
⤷ .text: str
result.flights: Flight[]
⤷ .title: str
⤷ .description: str
⤷ .duration: str
⤷ .price: str
result.lyrics?
⤷ .text: str
⤷ .is_partial: bool
Data comes in different shapes and sizes, and Google played it extremely well. That also includes randomizing CSS class names making it almost impossible to scrape data.
Google sucks, but it's actually the knowledge base we all need. Say, there are these types of result pages:
- Links – What made Google, "Google." Or,
&udm=14
. - Weather – Weather forecast.
- Wikipedia (aside) – Wikipedia text.
- Flights – Flights.
- Lyrics – Both full and partial lyrics. unstable
...and many more. (Contribute!)
Scraper APIs out there are hella expensive, and ain't no way I'm paying or entering their free tier. So, I made my own that's perfect for extracting data with LLMs.
If you're looking for something other than Google or something more general-purposed, check these out:
air_web
– A lightweight package for crawling with the minimalist of code.ddginternal
– Simple Duckduckgo scraper.
(c) 2024 AWeirdDev, sus2790, and other silly people