Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception w/ apply_operations API #1356

Closed
quocvu opened this issue Nov 24, 2017 · 12 comments
Closed

Exception w/ apply_operations API #1356

quocvu opened this issue Nov 24, 2017 · 12 comments

Comments

@quocvu
Copy link

quocvu commented Nov 24, 2017

Running this command

/usr/bin/curl -s -F operations=@test.json 'http://127.0.0.1:3333/command/core/apply-operations?project=2126170820257'

The test.json file has the following

[
  {
    "op": "core/column-addition",
    "description": "Create column item no at index 1 based on column Item # using expression grel:value",
    "engineConfig": {
      "mode": "row-based",
      "facets": []
    },
    "newColumnName": "item no",
    "columnInsertIndex": 1,
    "baseColumnName": "Item #",
    "expression": "grel:value",
    "onError": "set-to-blank"
  }
]

I get this exception which seems to complaint about the formatting of the content in the test.json file. However that content was taken directly from the OpenRefine UI

java.lang.NullPointerException
	at java.base/java.io.StringReader.<init>(StringReader.java:50)
	at org.json.JSONTokener.<init>(JSONTokener.java:73)
	at com.google.refine.util.ParsingUtilities.evaluateJsonStringToArray(ParsingUtilities.java:136)
	at com.google.refine.commands.history.ApplyOperationsCommand.doPost(ApplyOperationsCommand.java:63)
	at com.google.refine.RefineServlet.service(RefineServlet.java:177)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
	at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
	at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:155)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

Also, note I passed the project_id as a query string param while the documentation says it should be a form param. Passing as a form param, I get another exception complaining for missing param.

@ostephens
Copy link
Sponsor Member

In the 'test.json' file that you've posted there is an invisible Byte Order Mark character in the column name "Item #".

My guess is that this is the problem, but I need to do some more testing to check

@ostephens
Copy link
Sponsor Member

ostephens commented Nov 24, 2017

Scratch that, seems like the presence of the BOM is a red herring. From what I can see the problem is with the request.

If you try:

curl -d operations='[{"op":"core/column-addition","description":"Create column item no at index 1 based on column Item # using expression grel:value","engineConfig":{"mode":"row-based","facets":[]},"newColumnName":"item no","columnInsertIndex":1,"baseColumnName":"Item #","expression":"grel:value","onError":"set-to-blank"}]' http://127.0.0.1:3333/command/core/apply-operations?project=2126170820257

I think it will work (I could make this work locally)

However, I've not yet managed to get a curl which reads the JSON from the file working - suspect this is me not quite getting the right curl incantation rather than anything else

@jackyq2015
Copy link
Contributor

You can also take a look of this if you are looking for some library to do resetful operation:
https://groups.google.com/forum/#!topic/openrefine/a9EiNzHPvJA
https://github.com/felixlohmeier/openrefine-client

@ostephens
Copy link
Sponsor Member

ostephens commented Nov 24, 2017

I'm at the limit of my knowledge of curl, but from a bit of poking around, it looks like to pass the content of the file to a named POST parameter, you have to use something like:

curl --data-urlencode operations@test.json http://127.0.0.1:3333/command/core/apply-operations?project=2126170820257

However from what I've been able to see this will only work if there are no newlines \n in the file.

I think we've established this is not an OpenRefine, but a curl issue, so I'm inclined to close? @quocvu @jackyq2015

@quocvu
Copy link
Author

quocvu commented Nov 24, 2017

Just to confirm the special is not a problem. I tried w/ a different column and getting the same stacktrace

[
  {
    "op": "core/column-addition",
    "description": "Create column item no at index 1 based on column Item # using expression grel:value",
    "engineConfig": {
      "mode": "row-based",
      "facets": []
    },
    "newColumnName": "price2",
    "columnInsertIndex": 1,
    "baseColumnName": "Price",
    "expression": "grel:value",
    "onError": "set-to-blank"
  }
]

@quocvu
Copy link
Author

quocvu commented Nov 24, 2017

@ostephens it is rather surprising that you where able to use -d with curl because the documentation says we need to post w/ multipart form data which is the -F "operations=@test.json" or -F "operations={...some json here...}"

https://github.com/OpenRefine/OpenRefine/wiki/OpenRefine-API#apply-operations

@thadguidry
Copy link
Member

Upgrade your curl ? compare versions ?

@quocvu
Copy link
Author

quocvu commented Nov 24, 2017

$ curl --version
curl 7.56.0 (x86_64-pc-linux-gnu) libcurl/7.56.0 OpenSSL/1.1.0f zlib/1.2.11 libpsl/0.18.0 (+libicu/59.1) libssh2/1.8.0 nghttp2/1.23.1
Release-Date: 2017-10-04
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL 

@quocvu
Copy link
Author

quocvu commented Nov 25, 2017

I figured it out after trying different combo. The documentation is inaccurate.

  1. The project id parameter is passed as query params (current implementation) while document as a form param (which is cleaner to not mix query and form params in one request)
  2. the form params are not multipart. Please see curl docs of -d vs -F (https://ec.haxx.se/http-postvspost.html). Multipart make sense since we allow uploading the operations file

The quick fix is the documentation although I would prefer to change the code to match the documentation.

$ /usr/bin/curl -s -d @test.json 'http://127.0.0.1:3333/command/core/apply-operations?project=2381194801492

with the test.json looking like this

operations=[
  {
    "op": "core/column-addition",
    "description": "Create column item no at index 1 based on column Item # using expression grel:value",
    "engineConfig": {
      "mode": "row-based",
      "facets": []
    },
    "newColumnName": "price2",
    "columnInsertIndex": 1,
    "baseColumnName": "Price",
    "expression": "grel:value",
    "onError": "set-to-blank"
  }
]

@quocvu
Copy link
Author

quocvu commented Nov 25, 2017

after further examination, the -d @test.json is still not working as expected. Because data is no longer multipart, thus must be url encoded. For example, the + signs in the JSON file are not properly preserved. I had to replace them w/ %2B to get things to work. I don't think is what we wanted.

@ostephens
Copy link
Sponsor Member

@quocvu as in my previous example, I'd suggest the simplest curl option is to use --data-urlencode having removed the newlines from the JSON in the operations file.

I'm looking at the code and the documentation to see what I can do about the issues. I think the first step will be to update the documentation to be honest. We can then look at the code to see if anything can be done to support other options for passing parameter values/data to the API

@ostephens
Copy link
Sponsor Member

I've updated the HTTP API documentation at https://github.com/OpenRefine/OpenRefine/wiki/OpenRefine-API

If there are further issues with the documentation, or you would like the HTTP API to work differently to how it works currently, please open a new issue for each problem / request (and of course, feel free to add to the documentation where you can improve it directly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants