Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GREL to parse URIs and extract key aspects #1857

Closed
ostephens opened this issue Nov 22, 2018 · 1 comment · Fixed by #4697
Closed

Implement GREL to parse URIs and extract key aspects #1857

ostephens opened this issue Nov 22, 2018 · 1 comment · Fixed by #4697
Assignees
Labels
grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@ostephens
Copy link
Member

Is your feature request related to a problem or area of OpenRefine? Please describe.
Given a URI or URL it can be very useful to break down the URI into its constituent parts. Also to check the URI for validity.

The gokbutils extension implements an 'extractHost' function, but this is only a single part of the set of things you might want to extract from a valid URI. Other aspects include:

  • scheme or protocol
  • host
  • port
  • path
  • fragment (after a #)
  • query (after a ?)

Describe the solution you'd like
A GREL command or set of GREL commands to make it easy to extract these from a string that looks like a URI. For example:

if value = "https://www.openrefine.org:80/documentation#download":

value.parseURI() -> a set of elements which can be accessed via dot notation or 'get()' function. If 'value' is not a valid URI return an error
value.parseURI().host -> www.openrefine.org
and/or
value.parseURI().get("host") -> www.openrefine.org
and/or
value.parseURI().authority -> www.openrefine.org

and similarly (using either dot or 'get' syntax)

value.parseURI().path -> /documentation
value.parseURI().port -> 80
value.parseURI().scheme -> https
value.parseURI().fragment -> dowload

etc.
Describe alternatives you've considered
Some or all of this can be achieved using match or split with the appropriate regular expressions, but the parseURI would allow for the validity of the string as a URI to be checked, and makes it more accessible.

Additional context
What I've basically described here is implementing some of the methods of https://docs.oracle.com/javase/7/docs/api/java/net/URI.html

e.g. URI.create(String) would be used to create a URI object from the string
URI.getAuthority would be used to get the host
etc.

@thadguidry
Copy link
Member

There's also Thad's way with Clojure as documented on our Wiki https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure

@thadguidry thadguidry added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. grel The default expression language, GREL, could be improved in many ways! labels Nov 28, 2018
@elroykanye elroykanye self-assigned this Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants