Skip to content

damientseng/Dive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dive

GitHub Build Status Language grade: Java Maintainability

A Collection of Useful Hive UDFs.

UDAF

MaxWhen

Implementation: com.damientseng.dive.ql.udf.GenericUDAFMaxWhen

Signature: maxwhen(cmp, val)

Description: returns column val's value from the row where cmp has the maximum value. It has an antonym minwhen.

Example: For each uid, get the latest ip .

select 
    uid, maxwhen(dt, ip) as final_ip
from mydb.mytb
group by uid

Recent

Implementation: com.damientseng.dive.ql.udf.GenericUDAFRecent

Signature: recent(flg, ch)

Description: a user-defined Analytics function that combines records without explicit joins. Check out this post for more details.

UDF

LCS

Implementation: com.damientseng.dive.ql.udf.GenericUDFLCS

Signature: lcs(str1, str2)

Description: returns the size of the longest common subsequence (LCS) of str1 and str2. This UDF adopts a dynamic programming approach with a time complexity of $O(m*n)$, where m and n are the size of str1 and str2 respectively.

Example:

select lcs('abcde', 'aced');
>> 3

UDTF

DUP

Implementation: com.damientseng.dive.ql.udf.GenericUDTFDuplicate

Signature: dup(c, col1, col2,...)

Description: a UDTF that makes c copies of each row, with the specified columns.

Example:

select dup(2, name, age) from customers;
Jack  37
Jack  37
Pony  35
Pony  35

Releases

No releases published

Packages

No packages published

Languages