-
Notifications
You must be signed in to change notification settings - Fork 13
REST APIs
If the user has written his/her own custom UDFs or custom loader/storage function for the PIG script (in Java), then the user needs to first register their jar using this API. This API will copy the user's jar into the dedicated location in the DFS. The output of this API is location of the user's jar which the user needs to mention in his/her PIG script while referring it.
http://localhost:8080/oink/jar/$jarName
POST
Name | Data Type | Comment | Required/Optional |
---|---|---|---|
jarName | String | Name of jar (it needs to be unique). As convention, you can have version number as part of jarName to maintain multiple versions of same jar | Required |
| Data Type | Comment | Required/Optional | :---: | :---:| :--- | :--- |Binary stream | Actual jar content | Required |
Response Status | Comment |
---|---|
200 | Jar registered successfully. Request output will give path to location. |
409 | Duplicate jar name. If same jar name is already registered, then status code is 409 |
500 | If any error occurred during copying file to DFS, then status code is 500 |
Data Type | Comment |
---|---|
String | DFS path where jar file is stored. This path should be used in PIG script when referring to this jar. |
$ curl -X POST http://localhost:54321/oink/jar/automaton.jar -H "Content-Type: application/octet-stream" --data-binary @/path/to/automaton.jar
hdfs://localhost:54321/tmp/pig/jars/automaton.jar
REGISTER "hdfs://localhost:54321/tmp/pig/jars/automaton.jar"
...
...
If the user wants to update his/her jar (or if the user does not need his/her jar any longer), then the user needs to unregister jar.
http://localhost:8080/oink/jar/$jarName
DELETE
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
jarName | String | Name of jar | Required |
Status Code | Comment |
---|---|
200 | Jar unregistered successfully. |
404 | If provided jarName does not exist. |
500 | If any error occurred during deleting file from DFS, then status code is 500 |
$ curl -X DELETE http://localhost:8080/oink/jar/automaton.jar
Jar deleted successfully
If the user want to retrieve his/her jar, then the user should use this API.
http://localhost:8080/oink/jar/$jarName
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
jarName | String | Name of jar | Required |
Status Code | Comment |
---|---|
200 | If provided jarName exist and able to fetch from DFS. |
404 | If provided jarName does not exist. |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
Binary stream | Requested jar as binary stream |
$ curl -X GET http://localhost:8080/oink/jar/automaton.jar
In order to run a Pig script, the user needs to first register the script using this API. Some validations will be applied while registering script. The validations are:
- if the script has "DUMP .." statement. As output API is going to read from DFS only, then DUMP statement should not be used. Instead one should have "STORE .." statement. If no output is generated from the Pig job, then neither DUMP nor STORE statement is required.
- Pig service will provide the output path for PIG jobs, so STORE statement should be like "STORE ... into '$output' using PigStorage();".
###REST URL
http://localhost:8080/oink/script/$scriptName
POST
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
scriptName | String | Name of script (it needs to be unique). As convention, the user can have version number as part of scriptName to maintain multiple versions of same script | Required |
Data Type | Comment | Optional/Required |
---|---|---|
Binary stream | Actual script content | Required |
Status Code | Comment |
---|---|
200 | Script registered successfully. |
400 | Any validation failure like use DUMP statement or STORE does not have $output variable |
409 | Duplicate script name. If same script name is already registered, then status code is 409 |
500 | If any error occurred during copying file to DFS, then status code is 500 |
Data Type | Comment |
---|---|
String | DFS path where script file is stored. This is just for reference. |
$ curl -X POST http://localhost:8080/oink/script/myScript.pig -H "Content-Type: application/octet-stream" --data-binary @/path/to/myScript.Pig
hdfs://localhost:54321/tmp/pig/scripts/myScript.pig
If the user wants to update his/her script (or if the user does not need the script any longer), then the user needs to unregister script.
http://localhost:8080/oink/script/$scriptName
DELETE
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
scriptName | String | Name of script | Required |
Status Code | Comment |
---|---|
200 | Script unregistered successfully. |
404 | If provided scriptName does not exist. |
500 | If any error occurred during deleting file from DFS, then status code is 500 |
$ curl -X DELETE http://localhost/oink/script/myScript.pig
Script deleted successfully
If the user wants to retrieve his/her script, then the user must this API.
http://localhost:8080/oink/script/$scriptName
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
scriptName | String | Name of script | Required |
Status Code | Comment |
---|---|
200 | If provided scriptName exist and able to fetch from DFS. |
404 | If provided scriptName does not exist. |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
Binary stream | Requested script as binary stream |
###Sample Input
$ curl -X GET http://localhost:8080/oink/script/myScript.pig
REGISTER 'hdfs://localhost:54321/tmp/pig/jars/data-fu.jar'
SET mapred.map.tasks.speculative.execution false;
define Quartile datafu.pig.stats.Quantile('0.0','0.25','0.5','0.75','1.0');
temperature = LOAD 'temperature.txt' AS (id:chararray, temp:double);
temperature = FILTER temperature BY id == $id;
temperature = GROUP temperature BY id;
temperature_quartiles = FOREACH temperature {
sorted = ORDER temperature by temp; -- must be sorted
GENERATE group as id, Quartile(sorted.temp) as quartiles;
}
STORE temperature_quartiles into '$output' using PigStorage(',');
In order to submit a request for the user's PIG script, the user must use this API
http://localhost:8080/oink/request/$scriptName
POST
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
scriptName | String | Name of script | Required. This script needs to be registered first with register API. |
Parameter | Data Type | Comment | Optional/Required |
---|---|---|---|
inputParameters | Map<String, String> | List of parameters (key,value) pair. Any $variable in your PIG script, will be replaced with parameters. | Required if there is any $variable referred in your script. No need to provide $output parameter as it will be passed by service. |
httpcallback | String | HTTP callback parameter which will be used to provide updates to user's script (i.e, if the Pig job has completed successfully or if an error has occurred and what the progress of the script is) | Optional |
As PIG request execution is asynchronously processed, it supports the feature of httpCallback in order to indicate when the request is submitted in Hadoop and when request is completed. Value provided for "httpcallback" parameter needs to be hosted as service by the user and this service should implement GET call which should return HttpStatus.OK as response.
Notification will be called for progress made by the Pig job and finally when Pig job is completed. This httpCallback url can be constructed with 3 special parameters whose details is provided below:
Name | Data Type | Comment | Parameter? |
---|---|---|---|
id | String | PIG request id. | Can be Path parameter or query parameter. |
status | String | Status of request. It can be SUBMITTED, FAILED or SUCCEEDED | Can be Path parameter or query parameter. |
stats | Base64 encoded String for PigRequestStats object | PigRequestStats class Example: { "bytesWritten":12345678, //number of bytes written as output "duration":1234, //time taken for PIG execution in milliseconds "errorMessage":"job failed because", //error message in case of PIG job failure "numberOfJobs":2, //number of jobs submitted as part of this PIG request "status":"SUBMITTED", //status of PIG job can be "SUBMITTED" or "FAILED" or "SUCCEEDED" "progress":50 //%age of progress made for PIG job } |
Should be query parameter |
####Example of httpcallback
http://machine1:8080/request/$id/$status
http://machine1:8080/$id?status=$status&stats=$stats
http://machine1:8080/request?id=$id&status=$status&stats=$stats
###Response Status
Status Code | Comment |
---|---|
200 | If PIG job submitted successfully to service |
400 | If provided scriptName does not exist or any input parameter is not valid. |
500 | If any error occurred during DFS access, then status code is 500 |
###Response Body
Data Type | Comment |
---|---|
String | Request ID in UUID format |
###Sample Input
URL: http://localhost:8080/oink/request/myScript.pig
Header: Content-type : application/json
Method: POST
Payload:
{ "inputParameters" :
{
"id":"abcd"
},
"httpCallback":"http://machine1:8080/request/$id/$status?stats=$stats"
}
94f9b962-86ff-4b46-ad2f-b2bfb28e2490
To read input of user's submitted request, use this API.
http://localhost:8080/oink/request/$id
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
id | String (in UUID format) | Request ID which is generated while submitting request | Required |
Status Code | Comment |
---|---|
200 | If request ID is present and able to fetch output |
404 | If request ID is not present or no input available for this request |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
JSON | Input to the service for given request ID. Json of PigRequestParameters is returned |
$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490
{
"inputParameters": {
"id" : "1234"
},
"requestStartTime": "Mar 21, 2014 2:11:00 AM",
"pigScript":"myScript.pig",
"requestIp":"10.12.45.34"
}
In order to know whether the user's request is submitted or completed (successfully or failed), one needs to use this API. This API is an alternative to providing http notification while submitting request. The user can do polling at regular intervals (say 5 minutes) on this status API and when it returns status as "SUCCEEDED" or "FAILED" or "KILLED", then request execution is completed after which the user can read the output.
http://localhost:8080/oink/request/$id/status
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
id | String (in UUID format) | Request ID which is generated while submitting request | Required |
Status Code | Comment |
---|---|
200 | If request ID is present and able to get status |
404 | If request ID is not present |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
String | Status which can be "SUBMITTED" or "SUCCEEDED" or "FAILED" or "KILLED" |
$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/status
SUBMITTED
To read output of user's submitted request, use this API.
http://localhost:8080/oink/request/$id/output
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
id | String (in UUID format) | Request ID which is generated while submitting request | Required |
Status Code | Comment |
---|---|
200 | If request ID is present and able to fetch output |
404 | If request ID is not present or not output available for this request |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
Binary stream | Output data of request |
$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/output
abcd,1,2,3,4,5
To read output of the user's submitted request, this API is used. This stats will get updated each time the Pig job's progress is updated.
http://localhost:8080/oink/request/$id/stats
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
id | String (in UUID format) | Request ID which is generated while submitting request | Required |
Status Code | Comment |
---|---|
200 | If request ID is present and able to fetch statistics |
404 | If request ID is not present |
500 | If any error occurred during retrieving file from DFS, then status code is 500 |
Data Type | Comment |
---|---|
JSON | Input to the service for given request ID. Json of PigRequestStats is returned |
$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/stats
###Sample Response
{
"bytesWritten": 19476,
"duration": 178464,
"numberOfJobs": 1,
"status": "SUCCEEDED",
"progress":"100"
}
To cancel your submitted request, this API is used.
http://localhost:8080/oink/request/$id/cancel
GET
Name | Data Type | Comment | Optional/Required |
---|---|---|---|
id | String (in UUID format) | Request ID which is generated while submitting request | Required |
Status Code | Comment |
---|---|
200 | If request ID is present and able to cancel |
404 | If request ID is not present |
500 | If any error occurred during the cancellation process |
Data Type | Comment |
---|---|
String | Response message for cancellation process |
$ curl -X GET http://localhost:8080/oink/request/94f9b962-86ff-4b46-ad2f-b2bfb28e2490/cancel
Jobs cancelled