Skip to content

Implement GREL to parse URIs and extract key aspects #1857

@ostephens

Description

@ostephens

Is your feature request related to a problem or area of OpenRefine? Please describe.
Given a URI or URL it can be very useful to break down the URI into its constituent parts. Also to check the URI for validity.

The gokbutils extension implements an 'extractHost' function, but this is only a single part of the set of things you might want to extract from a valid URI. Other aspects include:

  • scheme or protocol
  • host
  • port
  • path
  • fragment (after a #)
  • query (after a ?)

Describe the solution you'd like
A GREL command or set of GREL commands to make it easy to extract these from a string that looks like a URI. For example:

if value = "https://www.openrefine.org:80/documentation#download":

value.parseURI() -> a set of elements which can be accessed via dot notation or 'get()' function. If 'value' is not a valid URI return an error
value.parseURI().host -> www.openrefine.org
and/or
value.parseURI().get("host") -> www.openrefine.org
and/or
value.parseURI().authority -> www.openrefine.org

and similarly (using either dot or 'get' syntax)

value.parseURI().path -> /documentation
value.parseURI().port -> 80
value.parseURI().scheme -> https
value.parseURI().fragment -> dowload

etc.
Describe alternatives you've considered
Some or all of this can be achieved using match or split with the appropriate regular expressions, but the parseURI would allow for the validity of the string as a URI to be checked, and makes it more accessible.

Additional context
What I've basically described here is implementing some of the methods of https://docs.oracle.com/javase/7/docs/api/java/net/URI.html

e.g. URI.create(String) would be used to create a URI object from the string
URI.getAuthority would be used to get the host
etc.

Metadata

Metadata

Assignees

Labels

Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.grelThe default expression language, GREL, could be improved in many ways!

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions