Distinct filter plugin for Embulk
filter returns distinct records by columns you configured.
- Plugin type: filter
- columns: column name list to distinguish records (array of string, required)
filters: - type: distinct columns: [c0, c1]
$ ./gradlew classpath $ embulk run -I lib example/config.yml
this plugin uses a lot of memory because of having distinct column values.
- lessen further the amount of memory by filter. i.e. use crc32 of values as distinct key?
- want ideas!
$ ./gradlew gem # -t to watch change of files and rebuild continuously