A Collection of Useful Hive UDFs.
Implementation: com.damientseng.dive.ql.udf.GenericUDAFMaxWhen
Signature: maxwhen(cmp, val)
Description: returns column val
's value from the row where cmp
has the maximum value. It has an antonym minwhen
.
Example: For each uid
, get the latest ip
.
select
uid, maxwhen(dt, ip) as final_ip
from mydb.mytb
group by uid
Implementation: com.damientseng.dive.ql.udf.GenericUDAFRecent
Signature: recent(flg, ch)
Description: a user-defined Analytics function that combines records without explicit joins. Check out this post for more details.
Implementation: com.damientseng.dive.ql.udf.GenericUDFLCS
Signature: lcs(str1, str2)
Description: returns the size of the longest common subsequence (LCS) of str1
and str2
. This UDF adopts a dynamic programming approach with a time complexity of str1
and str2
respectively.
Example:
select lcs('abcde', 'aced');
>> 3
Implementation: com.damientseng.dive.ql.udf.GenericUDTFDuplicate
Signature: dup(c, col1, col2,...)
Description: a UDTF that makes c
copies of each row, with the specified columns.
Example:
select dup(2, name, age) from customers;
Jack 37
Jack 37
Pony 35
Pony 35