A simple, dependency-free Java client for WebCrawlerAPI that can be copy-pasted into any Java 17+ project.
- Zero Dependencies: Uses only standard Java HTTP client (no external libraries required)
- Simple Integration: Just copy
WebCrawlerAPI.javainto your project - Complete API Coverage: Supports
crawl(),scrape(), andscrapeAsync()methods - Java 17+ Compatible: Works with Java 17 and newer versions (tested on Java 8+)
- No Build Tools Required: Can be compiled and run with just
javacandjava - Fully Tested: 50+ unit tests, all passing ✅
Copy WebCrawlerAPI.java into your project's source directory.
// Create a client
WebCrawlerAPI client = new WebCrawlerAPI("your-api-key");
// Crawl a website
WebCrawlerAPI.CrawlResult result = client.crawl("https://example.com", "markdown", 10);
System.out.println("Found " + result.items.size() + " items");
// Scrape a single page
WebCrawlerAPI.ScrapeResult scrape = client.scrape("https://example.com", "markdown");
System.out.println("Content: " + scrape.content);The example/ directory contains a complete working example.
- Java 17 or newer
- A WebCrawlerAPI key (get one at https://webcrawlerapi.com)
- Copy the SDK file into the example directory:
cd example/src
cp ../../WebCrawlerAPI.java .- Compile the example:
javac src/*.java -d bin- Run the example:
# With production API
API_KEY=your-api-key java -cp bin Example
# With local development API
API_KEY=test-api-key API_BASE_URL=http://localhost:8080 java -cp bin Examplecd example/src
cp ../../WebCrawlerAPI.java .
javac Example.java WebCrawlerAPI.java
API_KEY=your-api-key java Example// Default constructor (uses https://api.webcrawlerapi.com)
WebCrawlerAPI client = new WebCrawlerAPI(String apiKey);
// Constructor with custom base URL (for testing)
WebCrawlerAPI client = new WebCrawlerAPI(String apiKey, String baseUrl);Crawl a website and return all discovered pages.
CrawlResult crawl(String url, String scrapeType, int itemsLimit)
CrawlResult crawl(String url, String scrapeType, int itemsLimit, int maxPolls)Parameters:
url- The URL to crawlscrapeType- Type of content to extract:"html","cleaned", or"markdown"itemsLimit- Maximum number of pages to crawlmaxPolls- (Optional) Maximum polling attempts (default: 100)
Returns: CrawlResult containing job details and crawled items
Example:
CrawlResult result = client.crawl("https://example.com", "markdown", 10);
for (CrawlItem item : result.items) {
System.out.println("URL: " + item.url);
System.out.println("Content URL: " + item.getContentUrl("markdown"));
}Scrape a single page synchronously (waits for completion).
ScrapeResult scrape(String url, String scrapeType)
ScrapeResult scrape(String url, String scrapeType, int maxPolls)Parameters:
url- The URL to scrapescrapeType- Type of content to extract:"html","cleaned", or"markdown"maxPolls- (Optional) Maximum polling attempts (default: 100)
Returns: ScrapeResult containing the scraped content
Example:
ScrapeResult result = client.scrape("https://example.com", "markdown");
if ("done".equals(result.status)) {
System.out.println("Content: " + result.content);
}Start a scrape job asynchronously (returns immediately).
String scrapeAsync(String url, String scrapeType)Parameters:
url- The URL to scrapescrapeType- Type of content to extract:"html","cleaned", or"markdown"
Returns: Scrape ID (String) that can be used with getScrape()
Example:
String scrapeId = client.scrapeAsync("https://example.com", "html");
System.out.println("Scrape started: " + scrapeId);
// Later, check the status
ScrapeResult result = client.getScrape(scrapeId);Get the status and result of a scrape job.
ScrapeResult getScrape(String scrapeId)Parameters:
scrapeId- The scrape ID returned fromscrapeAsync()
Returns: ScrapeResult with current status and content (if done)
public class CrawlResult {
public String id; // Job ID
public String status; // Job status: "new", "in_progress", "done", "error"
public String url; // Original URL
public String scrapeType; // Scrape type used
public int recommendedPullDelayMs; // Recommended delay between polls
public List<CrawlItem> items; // List of crawled items
}public class CrawlItem {
public String url; // Page URL
public String status; // Item status
public String rawContentUrl; // URL to raw HTML content
public String cleanedContentUrl; // URL to cleaned content
public String markdownContentUrl; // URL to markdown content
// Helper method to get content URL based on scrape type
public String getContentUrl(String scrapeType);
}public class ScrapeResult {
public String status; // Scrape status: "in_progress", "done", "error"
public String content; // Scraped content (based on scrape_type)
public String html; // Raw HTML content
public String markdown; // Markdown content
public String cleaned; // Cleaned text content
public String url; // Page URL
public int pageStatusCode; // HTTP status code
}public class WebCrawlerAPIException extends Exception {
public String getErrorCode(); // Error code from API
}Common error codes:
network_error- Network/connection errorinvalid_response- Invalid API responseinterrupted- Operation was interruptedunknown_error- Unknown error occurred
@Service
public class WebScraperService {
private final WebCrawlerAPI client;
public WebScraperService(@Value("${webcrawlerapi.key}") String apiKey) {
this.client = new WebCrawlerAPI(apiKey);
}
public List<String> scrapeWebsite(String url) throws WebCrawlerAPI.WebCrawlerAPIException {
WebCrawlerAPI.CrawlResult result = client.crawl(url, "markdown", 20);
return result.items.stream()
.map(item -> item.url)
.collect(Collectors.toList());
}
}public class MyApp {
public static void main(String[] args) {
WebCrawlerAPI client = new WebCrawlerAPI(System.getenv("API_KEY"));
try {
WebCrawlerAPI.ScrapeResult result = client.scrape(
"https://example.com",
"markdown"
);
if ("done".equals(result.status)) {
System.out.println(result.content);
}
} catch (WebCrawlerAPI.WebCrawlerAPIException e) {
System.err.println("Error: " + e.getMessage());
}
}
}If you're using Maven, just add the file to src/main/java/:
src/
main/
java/
com/
yourcompany/
WebCrawlerAPI.java ← Copy here
YourApp.java
For Gradle projects, add to src/main/java/:
src/
main/
java/
com/
yourcompany/
WebCrawlerAPI.java ← Copy here
YourApp.java
The example supports the following environment variables:
API_KEY- Your WebCrawlerAPI key (required)API_BASE_URL- Custom API base URL (optional, for testing)
All methods can throw WebCrawlerAPIException. Always handle exceptions appropriately:
try {
WebCrawlerAPI.CrawlResult result = client.crawl(url, "markdown", 10);
// Process result...
} catch (WebCrawlerAPI.WebCrawlerAPIException e) {
System.err.println("Error code: " + e.getErrorCode());
System.err.println("Error message: " + e.getMessage());
// Handle error...
}The SDK includes a comprehensive test suite without requiring any test frameworks or build tools.
cd tests
./run-tests.sh- 50+ Unit Tests: Test constructors, data classes, JSON parsing, validation
- Integration Tests: Test actual API calls (requires API key)
- Custom Test Framework: Zero dependencies, works with Java 8+
See tests/README.md for detailed testing documentation.
cd tests
# Run unit tests only
./run-tests.sh
# Run all tests including integration
API_KEY=your-key ./run-tests.shOutput:
============================================================
Test Summary
============================================================
Total tests: 50
Passed: 50 ✓
Failed: 0
Success rate: 100%
============================================================
The SDK includes GitHub Actions workflows for automated testing and releases.
Tests run automatically on:
- Every push to main/master/develop branches
- Every pull request
- Multiple Java versions: 8, 11, 17, 21
Push a semantic version tag (without 'v' prefix):
# Create and push a release tag
git tag 1.0.0
git push origin 1.0.0This automatically:
- Runs all tests
- Builds JAR and source ZIP
- Creates GitHub release with artifacts
- Generates SHA256 checksums
See .github/RELEASE.md for detailed release instructions.
Download pre-built releases from the Releases page:
- JAR file (compiled classes)
- Source ZIP (complete source with tests)
- SHA256 checksums for verification
- Simple JSON Parsing: Uses basic string operations instead of a JSON library. Works for WebCrawlerAPI responses but may need adjustments for complex nested structures.
- No Async I/O: Uses blocking HTTP calls. For high-throughput applications, consider using the original SDK with async libraries.
- Basic Error Handling: Error parsing is simplified compared to the full SDK.
This standalone SDK is provided as-is for use with WebCrawlerAPI.
For API documentation and support, visit:
- Documentation: https://docs.webcrawlerapi.com
- Website: https://webcrawlerapi.com
- GitHub: https://github.com/webcrawlerapi
| Feature | Standalone SDK | Full SDK (Gradle) |
|---|---|---|
| Dependencies | None | Gson, HttpClient |
| Setup | Copy single file | Gradle/Maven setup |
| JSON Parsing | Basic string ops | Full Gson support |
| Type Safety | Basic | Full with builders |
| Size | ~600 lines | Multiple files |
| Best For | Simple projects | Production apps |
Choose the standalone SDK when you want simplicity and no dependencies. Use the full SDK when you need advanced features and type safety.