Skip to content
This repository has been archived by the owner on May 5, 2022. It is now read-only.

REST APIs

vjsamuel edited this page Jul 9, 2014 · 2 revisions

Register jar

If the user has written his/her own custom UDFs or custom loader/storage function for the PIG script (in Java), then the user needs to first register their jar using this API. This API will copy the user's jar into the dedicated location in the DFS. The output of this API is location of the user's jar which the user needs to mention in his/her PIG script while referring it.

REST URL

http://localhost:8080/oink/jar/$jarName

Request Method

POST

Request Parameters

Name Data Type Comment Required/Optional
jarName String Name of jar (it needs to be unique). As convention, you can have version number as part of jarName to maintain multiple versions of same jar Required

Request Body

| Data Type | Comment | Required/Optional | :---: | :---:| :--- | :--- |Binary stream | Actual jar content | Required |

Response Status

Response Status Comment
200 Jar registered successfully. Request output will give path to location.
409 Duplicate jar name. If same jar name is already registered, then status code is 409
500 If any error occurred during copying file to DFS, then status code is 500

Response Body

Data Type Comment
String DFS path where jar file is stored. This path should be used in PIG script when referring to this jar.

Sample Input

$ curl -X POST http://localhost:54321/oink/jar/automaton.jar -H "Content-Type: application/octet-stream" --data-binary @/path/to/automaton.jar 

Sample Response

hdfs://localhost:54321/tmp/pig/jars/automaton.jar

Sample PIG Script

REGISTER "hdfs://localhost:54321/tmp/pig/jars/automaton.jar"
...
...

Unregister jar

If the user wants to update his/her jar (or if the user does not need his/her jar any longer), then the user needs to unregister jar.

REST URL

http://localhost:8080/oink/jar/$jarName

Request Method

DELETE

Request Parameter

Name Data Type Comment Optional/Required
jarName String Name of jar Required

Response Status

Status Code Comment
200 Jar unregistered successfully.
404 If provided jarName does not exist.
500 If any error occurred during deleting file from DFS, then status code is 500

Sample Input

$ curl -X DELETE http://localhost:8080/oink/jar/automaton.jar

Sample Response

Jar deleted successfully

Retrieve jar

If the user want to retrieve his/her jar, then the user should use this API.

REST URL

http://localhost:8080/oink/jar/$jarName

Request Method

GET

Request Parameters

Name Data Type Comment Optional/Required
jarName String Name of jar Required

Response Status

Status Code Comment
200 If provided jarName exist and able to fetch from DFS.
404 If provided jarName does not exist.
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
Binary stream Requested jar as binary stream

Sample Input

$ curl -X GET http://localhost:8080/oink/jar/automaton.jar

Register Pig Script

In order to run a Pig script, the user needs to first register the script using this API. Some validations will be applied while registering script. The validations are:

  1. if the script has "DUMP .." statement. As output API is going to read from DFS only, then DUMP statement should not be used. Instead one should have "STORE .." statement. If no output is generated from the Pig job, then neither DUMP nor STORE statement is required.
  2. Pig service will provide the output path for PIG jobs, so STORE statement should be like "STORE ... into '$output' using PigStorage();".

###REST URL

http://localhost:8080/oink/script/$scriptName

Request Method

POST

Request Parameter

Name Data Type Comment Optional/Required
scriptName String Name of script (it needs to be unique). As convention, the user can have version number as part of scriptName to maintain multiple versions of same script Required

Request Body

Data Type Comment Optional/Required
Binary stream Actual script content Required

Response Status

Status Code Comment
200 Script registered successfully.
400 Any validation failure like use DUMP statement or STORE does not have $output variable
409 Duplicate script name. If same script name is already registered, then status code is 409
500 If any error occurred during copying file to DFS, then status code is 500

Response Body

Data Type Comment
String DFS path where script file is stored. This is just for reference.

Sample Input

$ curl -X POST http://localhost:8080/oink/script/myScript.pig -H "Content-Type: application/octet-stream" --data-binary @/path/to/myScript.Pig

Sample Response

hdfs://localhost:54321/tmp/pig/scripts/myScript.pig

Unregister script

If the user wants to update his/her script (or if the user does not need the script any longer), then the user needs to unregister script.

REST URL

http://localhost:8080/oink/script/$scriptName

Request Method

DELETE

Request Parameter

Name Data Type Comment Optional/Required
scriptName String Name of script Required

Response Status

Status Code Comment
200 Script unregistered successfully.
404 If provided scriptName does not exist.
500 If any error occurred during deleting file from DFS, then status code is 500

Sample Input

$ curl -X DELETE http://localhost/oink/script/myScript.pig

Sample Response

Script deleted successfully

Retrieve Script

If the user wants to retrieve his/her script, then the user must this API.

REST URL

http://localhost:8080/oink/script/$scriptName

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
scriptName String Name of script Required

Response Status

Status Code Comment
200 If provided scriptName exist and able to fetch from DFS.
404 If provided scriptName does not exist.
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
Binary stream Requested script as binary stream

###Sample Input

$ curl -X GET http://localhost:8080/oink/script/myScript.pig

Sample Response

REGISTER 'hdfs://localhost:54321/tmp/pig/jars/data-fu.jar'
 
SET mapred.map.tasks.speculative.execution false;
 
define Quartile datafu.pig.stats.Quantile('0.0','0.25','0.5','0.75','1.0');
 
temperature = LOAD 'temperature.txt' AS (id:chararray, temp:double);

temperature = FILTER temperature BY id == $id; 
temperature = GROUP temperature BY id;
 
temperature_quartiles = FOREACH temperature {
  sorted = ORDER temperature by temp; -- must be sorted
  GENERATE group as id, Quartile(sorted.temp) as quartiles;
}

STORE temperature_quartiles into '$output' using PigStorage(',');

Submit request

In order to submit a request for the user's PIG script, the user must use this API

REST URL

http://localhost:8080/oink/request/$scriptName

Request Method

POST

Request Parameter

Name Data Type Comment Optional/Required
scriptName String Name of script Required. This script needs to be registered first with register API.

Request Body

Parameter Data Type Comment Optional/Required
inputParameters Map<String, String> List of parameters (key,value) pair. Any $variable in your PIG script, will be replaced with parameters. Required if there is any $variable referred in your script. No need to provide $output parameter as it will be passed by service.
httpcallback String HTTP callback parameter which will be used to provide updates to user's script (i.e, if the Pig job has completed successfully or if an error has occurred and what the progress of the script is) Optional

About httpCallback

As PIG request execution is asynchronously processed, it supports the feature of httpCallback in order to indicate when the request is submitted in Hadoop and when request is completed. Value provided for "httpcallback" parameter needs to be hosted as service by the user and this service should implement GET call which should return HttpStatus.OK as response.

Notification will be called for progress made by the Pig job and finally when Pig job is completed. This httpCallback url can be constructed with 3 special parameters whose details is provided below:

Name Data Type Comment Parameter?
id String PIG request id. Can be Path parameter or query parameter.
status String Status of request. It can be SUBMITTED, FAILED or SUCCEEDED Can be Path parameter or query parameter.
stats Base64 encoded String for PigRequestStats object PigRequestStats class Example:
{
"bytesWritten":12345678, //number of bytes written as output
"duration":1234, //time taken for PIG execution in milliseconds
"errorMessage":"job failed because", //error message in case of PIG job failure
"numberOfJobs":2, //number of jobs submitted as part of this PIG request
"status":"SUBMITTED", //status of PIG job can be "SUBMITTED" or "FAILED" or "SUCCEEDED"
"progress":50 //%age of progress made for PIG job
}
Should be query parameter

####Example of httpcallback

http://machine1:8080/request/$id/$status
http://machine1:8080/$id?status=$status&stats=$stats
http://machine1:8080/request?id=$id&status=$status&stats=$stats

###Response Status

Status Code Comment
200 If PIG job submitted successfully to service
400 If provided scriptName does not exist or any input parameter is not valid.
500 If any error occurred during DFS access, then status code is 500

###Response Body

Data Type Comment
String Request ID in UUID format

###Sample Input

URL: http://localhost:8080/oink/request/myScript.pig
Header: Content-type : application/json
Method: POST
Payload:
{ "inputParameters" : 
 {
    "id":"abcd"
 },
"httpCallback":"http://machine1:8080/request/$id/$status?stats=$stats"
}

Sample Response

94f9b962-86ff-4b46-ad2f-b2bfb28e2490

Retrieve PIG request input

To read input of user's submitted request, use this API.

REST URL

http://localhost:8080/oink/request/$id

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
id String (in UUID format) Request ID which is generated while submitting request Required

Response Status

Status Code Comment
200 If request ID is present and able to fetch output
404 If request ID is not present or no input available for this request
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
JSON Input to the service for given request ID. Json of PigRequestParameters is returned

Sample Input

$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490

Sample Response

{
    "inputParameters": {
        "id" : "1234"
    },
    "requestStartTime": "Mar 21, 2014 2:11:00 AM",
    "pigScript":"myScript.pig",
    "requestIp":"10.12.45.34"
}

Check status of Pig request

In order to know whether the user's request is submitted or completed (successfully or failed), one needs to use this API. This API is an alternative to providing http notification while submitting request. The user can do polling at regular intervals (say 5 minutes) on this status API and when it returns status as "SUCCEEDED" or "FAILED" or "KILLED", then request execution is completed after which the user can read the output.

REST URL

http://localhost:8080/oink/request/$id/status

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
id String (in UUID format) Request ID which is generated while submitting request Required

Response Status

Status Code Comment
200 If request ID is present and able to get status
404 If request ID is not present
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
String Status which can be "SUBMITTED" or "SUCCEEDED" or "FAILED" or "KILLED"

Sample Input

$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/status

Sample Response

SUBMITTED

Retrieve Pig request output

To read output of user's submitted request, use this API.

REST URL

http://localhost:8080/oink/request/$id/output

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
id String (in UUID format) Request ID which is generated while submitting request Required

Response Status

Status Code Comment
200 If request ID is present and able to fetch output
404 If request ID is not present or not output available for this request
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
Binary stream Output data of request

Sample Input

$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/output

Sample Response

abcd,1,2,3,4,5

Retrieve Pig request statistics

To read output of the user's submitted request, this API is used. This stats will get updated each time the Pig job's progress is updated.

REST URL

http://localhost:8080/oink/request/$id/stats

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
id String (in UUID format) Request ID which is generated while submitting request Required

Response Status

Status Code Comment
200 If request ID is present and able to fetch statistics
404 If request ID is not present
500 If any error occurred during retrieving file from DFS, then status code is 500

Response Body

Data Type Comment
JSON Input to the service for given request ID. Json of PigRequestStats is returned

Sample Input

$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/stats

###Sample Response

{
    "bytesWritten": 19476,
    "duration": 178464,
    "numberOfJobs": 1,
    "status": "SUCCEEDED",
    "progress":"100"
}

Cancel Running PIG Request

To cancel your submitted request, this API is used.

REST URL

http://localhost:8080/oink/request/$id/cancel

Request Method

GET

Request Parameter

Name Data Type Comment Optional/Required
id String (in UUID format) Request ID which is generated while submitting request Required

Response Status

Status Code Comment
200 If request ID is present and able to cancel
404 If request ID is not present
500 If any error occurred during the cancellation process

Response Body

Data Type Comment
String Response message for cancellation process

Sample Input

$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/cancel

Sample Response

Jobs cancelled

Clone this wiki locally