-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drill-7344: Add Geo-IP Functions #1841
Conversation
@cgivre if you want to include PR #1840 and #1841 into the upcoming release, please rework these PRs to have proper error handling, to comply with Drill coding style etc. There were similar PRs which undergo the review and have been merged, you can take a look at the code and make appropriate changes. I think it's more reasonable than making code reviewers pointing at the same issues over and over again :( |
<repositories> | ||
<repository> | ||
<id>Jabylon Repository</id> | ||
<url>http://www.jabylon.org/maven/</url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should add one more repository. Could you please use dependencies from maven central?
|
||
@Override | ||
public void setup() { | ||
java.io.InputStream serviceFile = getClass().getClassLoader().getResourceAsStream("service-names-port-numbers.csv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this a good idea. First the source of file is unknown, secondly if there is not lib that provides this functionality, I am not sure we should provide such function in Drill.
@cgivre, please avoid adding large files to the project resources. One of the files you have added has a size of 57.5 MB (!) (GeoLite2-City.mmdb), other files have a size of several MB. |
@vvysotskyi |
@cgivre taking into account that these files are needed for a couple of functions and their large size, I think we should not allow them into Apache Drill. If these will lead to not adding Geo-IP functions, I think it's much better than enlarging project size. If user needs such functions, he can add them in the classpath. |
Hi Arina,
What if we just require that the files are in the classpath, don't ship them with Drill and if the user wants to use these functions, they can put the files in their classpath. Would that work?
… On Aug 13, 2019, at 11:46 AM, Arina Ielchiieva ***@***.***> wrote:
@cgivre <https://github.com/cgivre> taking into account that these files are needed for a couple of functions and their large size, I think we should not allow them into Apache Drill. If these will lead to not adding Geo-IP functions, I think it's much better than enlarging project size. If user needs such functions, he can add them in the classpath.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1841?email_source=notifications&email_token=ABKB7PUVY4IZRU4NZIL2X5LQELJMRA5CNFSM4IK5LY7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4GCYOA#issuecomment-520891448>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKB7PSPOSO2BY2E6SUWM3TQELJMRANCNFSM4IK5LY7A>.
|
I think this brings too much overhead, if user needs these functions he can include jar with functions and files in the classpath. Having functions that require some special data in the classpath to be working is odd. |
The reason I'd suggest this is that there are free functions, which are in widespread use, and there are also paid versions with the same functionality. If we moved it to the classpath, a user could choose whether they want to use the free or paid versions of these repositories.
-- C
… On Aug 13, 2019, at 11:49 AM, Arina Ielchiieva ***@***.***> wrote:
I think brings too much overhead, if user needs these functions he can include jar with functions and files in the classpath. Having functions that require some special data in the classpath to be working is odd.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1841?email_source=notifications&email_token=ABKB7PQTEP34VOBGMVWG5XDQELJZJA5CNFSM4IK5LY7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4GDDII#issuecomment-520892833>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKB7PQAL7UD7CZV5SV6XF3QELJZJANCNFSM4IK5LY7A>.
|
My point is that it is strange and not common to have functions that do not work out of box. As I said before, you can share these functions in you repo and user can build them from source and add into Drill classpath if needed. |
Drill User Defined Functions
This
README
documents functions which users have submitted to Apaceh Drill.Protocol Lookup Functions
These functions provide a convenience lookup capability for port numbers. They will accept port numbers as either an int or string.
get_host_name(<ip address>)
: This function accepts an IP address and will return the hostget_service_name(<port number>, <protocol>)
: This function returns the service name for a port and protocol combination.get_short_service_name(<port number>, <protocol>)
: Same as above but returns a short protocol name.GeoIP Functions for Apache Drill
This is a collection of GeoIP functions for Apache Drill. These functions are a wrapper for the MaxMind GeoIP Database.
IP Geo-Location is inherently imprecise and should never be relied on to get anything more than a general sense of where the traffic is coming from.
getCountryName( <ip> )
: This function returns the country name of the IP address, "Unknown" if the IP is unknown or invalid.getCountryConfidence( <ip> )
: This function returns the confidence score of the country ISO code of the IP address.getCountryISOCode( <ip> )
: This function returns the country ISO code of the IP address, "Unknown" if the IP is unknown or invalid.getCityName( <ip> )
: This function returns the city name of the IP address, "Unknown" if the IP is unknown or invalid.getCityConfidence( <ip> )
: This function returns confidence score of the city name of the IP address.getLatitude( <ip> )
: This function returns the latitude associated with the IP address.getLongitude( <ip> )
: This function returns the longitude associated with the IP address.getTimezone( <ip> )
: This function returns the timezone associated with the IP address.getAccuracyRadius( <ip> )
: This function returns the accuracy radius associated with the IP address, 0 if unknown.getAverageIncome( <ip> )
: This function returns the average income of the region associated with the IP address, 0 if unknown.getMetroCode( <ip> )
: This function returns the metro code of the region associated with the IP address, 0 if unknown.getPopulationDensity( <ip> )
: This function returns the population density associated with the IP address.getPostalCode( <ip> )
: This function returns the postal code associated with the IP address.getCoordPoint( <ip> )
: This function returns a point for use in GIS functions of the lat/long of associated with the IP address.getASN( <ip> )
: This function returns the autonomous system of the IP address, "Unknown" if the IP is unknown or invalid.getASNOrganization( <ip> )
: This function returns the autonomous system organization of the IP address, "Unknown" if the IP is unknown or invalid.isEU( <ip> ), isEuropeanUnion( <ip> )
: This function returnstrue
if the ip address is located in the European Union,false
if not.isAnonymous( <ip> )
: This function returnstrue
if the ip address is anonymous,false
if not.isAnonymousVPN( <ip> )
: This function returnstrue
if the ip address is an anonymous virtual private network (VPN),false
if not.isHostingProvider( <ip> )
: This function returnstrue
if the ip address is a hosting provider,false
if not.isPublciProxy( <ip> )
: This function returnstrue
if the ip address is a public proxy,false
if not.isTORExitNode( <ip> )
: This function returnstrue
if the ip address is a known TOR exit node,false
if not.This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com.