Renderscript
An API to render a page inside a real Chromium (with JavaScript enabled) and send back the raw HTML.
This project is directly written for and consumed by Algolia Crawler.
π Secure
Leverages Context
to isolate each page, prevent cookie sharing, control redirection, etc...
π Performant:
Ignores unnecessary resources for rendering HTML (e.g. images
, video
, font
, etc...) and bundle an AdBlocker by default.
π€ Automated: Renderscript has everything abstracted to render a page and login to website with minimal configuration required.
Usage
Local
yarn dev
Goto: http://localhost:3000
Docker
docker build . -t algolia/renderscript
docker run -p 3000:3000 -it algolia/renderscript
curl -X POST http://localhost:3000/render \
-H 'Content-Type: application/json' \
-d '{"url": "https://www.algolia.com/", "ua": "local_renderscript"}'
API
POST /render
Main endpoint. Renders the page and dumps a JSON with all the page information.
Body parameters:
{
/**
* URL to render (for hash and query params support, use `encodeURIComponent` on it)
*/
url: string;
/**
* User-Agent to use.
*/
ua: string;
/**
* Enables AdBlocker
*/
adblock?: boolean;
/**
* Define the range of time.
* Minimum and maximum execution time.
*/
waitTime?: {
min?: number;
max?: number;
};
/**
* Headers to Forward on navigation
*/
headersToForward?: {
[s: string]: string;
};
}
application/json
:
Response {
/**
* HTTP Code of the rendered page.
*/
statusCode: number | null;
/**
* HTTP Headers of the rendered page.
*/
headers: Record<string, string>;
/**
* Body of the rendered page.
*/
body: string | null;
/**
* Metrics from different taks during the rendering.
*/
metrics: Metrics;
/**
* The redirection renderscript caught.
*/
resolvedUrl: string | null;
/**
* Has the page reached timeout?
* When timeout has been reached we continue the rendering as usual
* but reduce other timeout to a minimum.
*/
timeout: boolean;
/**
* Any error encountered along the way.
* If this field is filled that means the rest of the payload is partial.
*/
error: string | null;
}
GET /render
Used for debug purposes. Dumps directly the HTML for easy inspection in your browser.
Query parameters:
see
POST /render
parameters
text/html
.
Response CSP headers are set to prevent script execution on the rendered page.
POST /login
This endpoint will load a given login page, look for input
fields, enter the given credentials and validate the form.
It allows retrieving programmatically a session-cookie from websites with CSRF protection.
Body parameters
{
/**
* URL to render (for hash and query params support, use `encodeURIComponent` on it)
*/
url: string;
/**
* User-Agent to use.
*/
ua: string;
/**
* Username to enter on the login form. Renderscript expects to find an `input[type=text]` or `input[type=email]` on the login page.
*/
username: string;
/**
* Password to enter on the login form. Renderscript expects to find an `input[type=password]` on the login page.
*/
password: string;
/**
* Define the range of time.
* Minimum and maximum execution time.
*/
waitTime?: {
min?: number;
max?: number;
};
/**
* Boolean (optional).
* If set to true, Renderscript will return the rendered HTML after the login request. Useful to debug visually.
*/
renderHTML?: boolean;
}
application/json
Response {
/**
* HTTP Code of the rendered page.
*/
statusCode: number | null;
/**
* HTTP Headers of the rendered page.
*/
headers: Record<string, string>;
/**
* Metrics from different taks during the rendering.
*/
metrics: Metrics;
/**
* Has the page reached timeout?
* When timeout has been reached we continue the rendering as usual
* but reduce other timeout to a minimum.
*/
timeout: boolean;
/**
* Any error encountered along the way.
* If this field is filled that means the rest of the payload is partial.
*/
error: string | null;
/**
* Cookie generated from a succesful login.
*/
cookies: Cookie[];
/**
* The URL at the end of a succesful login.
*/
resolvedUrl: string | null;
/**
* Body at the end of a succesful login.
*/
body: string | null;
}
text/html
Response If renderHTML: true
, returns text/html
.
CSP headers are set to prevent script execution on the rendered page.
GET /list
List currenlty open pages. Useful to debug.
GET /healthy
, GET /ready
Health Check for Kubernetes and others.
Credits
This project was heavily inspired by GoogleChrome/rendertron
.
It was based on puppeteer-core
but we switched to Playwright.