Drill UDF for parsing User Agent Strings. This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa.
Using this function is fairly simple. The function parse_user_agent()
takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string.
SELECT parse_user_agent( columns[0] ) as ua
FROM dfs.`/Users/cgivre/drill-httpd/ua.csv`;
The query above returns:
"OperatingSystemName":"Mac OS X",
"OperatingSystemNameVersion":"Mac OS X 10.10.1",
"LayoutEngineNameVersion":"Blink 39.0",
"LayoutEngineNameVersionMajor":"Blink 39",
"AgentNameVersion":"Chrome 39.0.2171.99",
"AgentNameVersionMajor":"Chrome 39",
The function returns a Drill map, so you can access any of the fields using Drill's table.map.key notation. For example, the query below illustrates how to extract a field from this map and summarize it:
SELECT uadata.ua.AgentNameVersion AS Browser,
COUNT( * ) AS BrowserCount
SELECT parse_user_agent( columns[0] ) AS ua
FROM dfs.drillworkshop.`user-agents.csv`
) AS uadata
GROUP BY uadata.ua.AgentNameVersion
ORDER BY BrowserCount DESC
To install this function, first download the contents of this repository and build it using maven.
> git clone https://github.com/cgivre/drill-useragent-function.git
> cd drill-useragent-function
> mvn clean package -DskipTests
> cp ./target/*.jar <drill-path>/jars/3rdparty
Make sure you replace <drill-path>
with your actual path to your drill installation.
Next, you will have to download and build the UA parser. Navigate out of the function folder and:
> git clone https://github.com/nielsbasjes/yauaa.git
> cd yauaa
> mvn clean package -DskipTests
> cp <path-to-yauaa>/analyzer/target/yauaa-0.11-SNAPSHOT.jar <drill-path>/jars/3rdparty
> cp <path-to-yauaa>/analyzer/target/yauaa-0.11-SNAPSHOT-udf.jar <drill-path>/jars/3rdparty