Skip to content

Crawlora-org/crawlora-java-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawlora Java SDK

Maven Central License: MIT

Java client for the public Crawlora web-scraping API. It wraps every public endpoint with generated grouped helpers and a dynamic call interface, plus retries, pagination, middleware hooks, and client-side rate limiting. The published artifact has no runtime dependencies (built on the JDK's java.net.http.HttpClient with a hand-written JSON parser).

  • Base URL: https://api.crawlora.net/api/v1
  • Auth: API key (x-api-key) or JWT (Authorization)
  • JDK: 17+
  • Operation reference: docs/operations.md · recipes: docs/recipes.md

Install

Published to Maven Central under the net.crawlora namespace — no extra repository configuration needed.

Maven:

<dependency>
  <groupId>net.crawlora</groupId>
  <artifactId>crawlora-sdk</artifactId>
  <version>1.5.0-sdk.3</version>
</dependency>

Gradle:

implementation "net.crawlora:crawlora-sdk:1.5.0-sdk.3"

Quick start

import net.crawlora.CrawloraClient;
import java.util.List;
import java.util.Map;

// Reads CRAWLORA_API_KEY from the environment if apiKey(...) is omitted.
CrawloraClient client = CrawloraClient.builder()
        .apiKey(System.getenv("CRAWLORA_API_KEY"))
        .build();

@SuppressWarnings("unchecked")
Map<String, Object> result = (Map<String, Object>) client.bing().search(Map.of("q", "web scraping"));
for (Object item : (List<Object>) result.get("data")) {
    System.out.println(((Map<String, Object>) item).get("title"));
}

Responses are returned as plain Java values: Map<String,Object>, List<Object>, String, Long/Double, Boolean, or null.

Calling operations

Every group is a first-class typed accessor on the client (client.<group>().<method>(params)):

client.google().search(Map.of("q", "crawlora", "country", "US"));
client.youtube().video(Map.of("id", "dQw4w9WgXcQ"));
client.appStore().search(Map.of("term", "weather"));

You can still construct a group directly or dispatch dynamically by name:

import net.crawlora.groups.YoutubeGroup;

new YoutubeGroup(client).video(Map.of("id", "dQw4w9WgXcQ"));
client.groupOf("google").call("search", Map.of("q", "crawlora"));

CrawloraClient is AutoCloseable, so it works with try-with-resources:

try (CrawloraClient client = CrawloraClient.builder().apiKey(System.getenv("CRAWLORA_API_KEY")).build()) {
    client.bing().search(Map.of("q", "web scraping"));
}

Or call any operation dynamically by its id:

client.request("bing-search", Map.of("q", "web scraping", "page", 2), null);

// Discover operations:
net.crawlora.Operations.OPERATION_COUNT;           // total operations
net.crawlora.Operations.GROUPS.get("bing");        // {search=bing-search, ...}
net.crawlora.OperationId.BING_SEARCH;              // "bing-search"

Configuration

CrawloraClient client = CrawloraClient.builder()
        .apiKey("…")
        .timeout(30)             // seconds per request
        .retries(2)              // retry attempts on retryable failures
        .retryDelay(0.25)        // base backoff (exponential + jitter, honors Retry-After)
        .requestId(true)         // attach an x-request-id to every call
        .idempotencyKeys(true)   // stable Idempotency-Key on POST/PATCH
        .rateLimit(5)            // max requests/second (client-side)
        .maxConcurrency(4)       // max in-flight requests across threads
        .headers(Map.of("X-Tenant", "acme"))
        .build();

Per-request overrides go through RequestOptions:

import net.crawlora.RequestOptions;

client.request("bing-search", Map.of("q", "x"),
        new RequestOptions().responseType("text").timeout(5).retries(0));

Pagination

import net.crawlora.PaginateOptions;

// Numeric (page/offset) — stops on the first empty page:
for (Object review : client.paginateItems("airbnb-room-reviews",
        Map.of("id", "123"), new PaginateOptions().maxPages(5))) {
    System.out.println(((Map<String, Object>) review).get("text"));
}

// Cursor mode — supply the cursor param and a next-cursor extractor:
PaginateOptions opts = new PaginateOptions()
        .cursorParam("cursor")
        .nextCursor(page -> ((Map<String, Object>) page).get("next_cursor"));
for (Object page : client.paginate("producthunt-leaderboard", Map.of(), opts)) {
    System.out.println(((Map<String, Object>) page).get("data"));
}

paginate/paginateItems return a List; paginateStream/pageIterator/ itemIterator return a lazy Stream/Iterator.

Error handling

import net.crawlora.*;

try {
    client.bing().search(Map.of("q", "x"));
} catch (CrawloraClientError e) {     // 4xx
    System.err.println("rejected (" + e.getStatus() + "): " + e.getMessage() + " " + e.getCode());
} catch (CrawloraServerError e) {     // 5xx
    System.err.println("server error: " + e.getStatus());
} catch (CrawloraNetworkError e) {    // timeout / transport failure
    System.err.println("network: " + e.getMessage());
}

All inherit from CrawloraError, which exposes getStatus, getCode, getBody, getRawBody, getHeaders, and getRequestId.

License

MIT. See LICENSE.

About

Official Java SDK for the Crawlora web-scraping API — typed, dependency-free client with grouped & dynamic operation calls, retries, pagination, hooks, and rate limiting.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors