Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stringLast and stringFirst aggregators extension #5789

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
a0049fd
Add lastString and firstString aggregators extension
May 20, 2018
09338b9
Remove duplicated class
May 20, 2018
3658b0c
Move first-last-string doc page to extensions-contrib
May 20, 2018
5975952
Fix ObjectStrategy compare method
May 28, 2018
adc773b
Fix doc bad aggregatos type name
May 29, 2018
c2f3672
Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery
May 31, 2018
1c3bd6a
Add getMaxStringBytes() method to support JSON serialization
Jun 1, 2018
c75c88f
Fix null pointer exception at segment creation phase when the string …
Jun 6, 2018
d671922
Control the valueSelector object class on BufferAggregators
Jun 6, 2018
6945bf9
Perform all improvements
Jun 9, 2018
7dd12e0
Add java doc on SerializablePairLongStringSerde
Jun 9, 2018
217627f
Refactor ObjectStraty compare method
Jun 9, 2018
9b68a60
Remove unused ;
Jun 9, 2018
e552f04
Add aggregateCombiner unit tests. Rename BufferAggregators unit tests
Jun 9, 2018
48325f3
Remove unused imports
Jun 9, 2018
657e8a3
Add license header
Jun 9, 2018
e8a2ded
Add class name to java doc class serde
Jun 9, 2018
4d24159
Throw exception if value is unsupported class type
Jun 9, 2018
7eeef86
Merge branch 'master' into feature-first-last-string-aggregators
andresgomezfrr Jun 9, 2018
bbf84b8
Move first-last-string extension into druid core
Jun 10, 2018
b09217c
Update druid core docs
Jun 10, 2018
834662d
Fix null pointer exception when pair->string is null
Jun 10, 2018
482e8bf
Add null control unit tests
Jun 10, 2018
046c5a9
Remove unused imports
Jun 10, 2018
9829263
Add first/last string folding aggregator on AggregatorsModule to supp…
Jun 10, 2018
023fc88
Change SerializablePairLongString to extend SerializablePair
Jun 11, 2018
e6dee78
Change vars from public to private
Jun 11, 2018
1fb6789
Convert vars to primitive type
Jun 11, 2018
42b6ca3
Clarify compare comment
Jun 11, 2018
a9f2d61
Change IllegalStateException to ISE
Jun 11, 2018
5a1643f
Remove TODO comments
Jun 11, 2018
a9a1069
Control possible null pointer exception
Jun 11, 2018
1e53094
Add @Nullable annotation
Jul 27, 2018
129d594
Remove empty line
Jul 27, 2018
a5dea44
Remove unused parameter type
Jul 27, 2018
e7992fa
Improve AggregatorCombiner javadocs
Jul 27, 2018
aa4ff30
Add filterNullValues option at StringLast and StringFirst aggregators
Jul 27, 2018
1d11d46
Add filterNullValues option at agg documentation
Jul 27, 2018
33c944f
Fix checkstyle
Jul 27, 2018
7ffebe9
Merge branch 'master' of github.com:druid-io/druid into feature-first…
Jul 27, 2018
acfbed8
Update header license
Jul 27, 2018
ff56f3e
Fix StringFirstAggregatorFactory.VALUE_COMPARATOR
Jul 30, 2018
7ff6fc3
Fix StringFirstAggregatorCombiner
Jul 30, 2018
b074c5f
Fix if condition at StringFirstAggregateCombiner
Jul 31, 2018
21dfdc1
Remove filterNullValues from string first/last aggregators
Jul 31, 2018
19b55c3
Add isReset flag in FirstAggregatorCombiner
Jul 31, 2018
f674212
Change Arrays.asList to Collections.singletonList
Jul 31, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
41 changes: 41 additions & 0 deletions docs/content/development/extensions-core/first-last-string.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file's in the extensions-core folder, but the link to it in extensions.md points at extensions-contrib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!

layout: doc_page
---

# First/Last String Module

To use these aggregators, make sure you [include](../../operations/including-extensions.html) the extension in your config file:

```
druid.extensions.loadList=["druid-first-last-string"]
```

## First String aggregator

`firstString` computes the metric value with the minimum timestamp or `null` if no row exist

```json
{
"type" : "firstString",
"name" : <output_name>,
"fieldName" : <metric_name>,
"maxStringBytes" : <integer>
}
```

## Last String aggregator

```json
{
"type" : "lastString",
"name" : <output_name>,
"fieldName" : <metric_name>,
"maxStringBytes" : <integer>
}
```

`lastString` computes the metric value with the maximum timestamp or `null` if no row exist



Note: The default value of `maxStringBytes` is 1024.
1 change: 1 addition & 0 deletions docs/content/development/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ All of these community extensions can be downloaded using *pull-deps* with the c
|kafka-emitter|Kafka metrics emitter|[link](../development/extensions-contrib/kafka-emitter.html)|
|druid-thrift-extensions|Support thrift ingestion |[link](../development/extensions-contrib/thrift.html)|
|druid-opentsdb-emitter|OpenTSDB metrics emitter |[link](../development/extensions-contrib/opentsdb-emitter.html)|
|druid-first-last-string|First/Last String Aggregators |[link](../development/extensions-contrib/first-last-string.html)|

## Promoting Community Extension to Core Extension

Expand Down
81 changes: 81 additions & 0 deletions extensions-contrib/first-last-string/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Druid - a distributed column store.
~ Copyright 2012 - 2015 Metamarkets Group Inc.
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>io.druid.extensions.contrib</groupId>
<artifactId>druid-first-last-string</artifactId>
<name>druid-first-last-string</name>
<description>druid-first-last-string</description>

<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.13.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

<dependencies>
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-processing</artifactId>
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-sql</artifactId>
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>

<!-- Tests -->
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-processing</artifactId>
<version>${project.parent.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-sql</artifactId>
<version>${project.parent.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-server</artifactId>
<version>${project.parent.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.easymock</groupId>
<artifactId>easymock</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.query.aggregation;

import com.fasterxml.jackson.databind.Module;
import com.fasterxml.jackson.databind.jsontype.NamedType;
import com.fasterxml.jackson.databind.module.SimpleModule;
import com.google.common.collect.ImmutableList;
import com.google.inject.Binder;
import io.druid.initialization.DruidModule;
import io.druid.query.aggregation.first.StringFirstAggregatorFactory;
import io.druid.query.aggregation.last.StringLastAggregatorFactory;
import io.druid.segment.serde.ComplexMetrics;

import java.util.List;

/**
*/
public class FirstLastStringDruidModule implements DruidModule
{
public static final String STRING_LAST = "stringLast";
public static final String STRING_FIRST = "stringFirst";

@Override
public List<? extends Module> getJacksonModules()
{
return ImmutableList.of(
new SimpleModule("FirstLastStringModule").registerSubtypes(
new NamedType(StringLastAggregatorFactory.class, STRING_LAST),
new NamedType(StringFirstAggregatorFactory.class, STRING_FIRST)
)
);
}

@Override
public void configure(Binder binder)
{
if (ComplexMetrics.getSerdeForType("serializablePairLongString") == null) {
ComplexMetrics.registerSerde("serializablePairLongString", new SerializablePairSerde());
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.query.aggregation;

import io.druid.collections.SerializablePair;
import io.druid.data.input.InputRow;
import io.druid.segment.GenericColumnSerializer;
import io.druid.segment.column.ColumnBuilder;
import io.druid.segment.data.GenericIndexed;
import io.druid.segment.data.ObjectStrategy;
import io.druid.segment.serde.ComplexColumnPartSupplier;
import io.druid.segment.serde.ComplexMetricExtractor;
import io.druid.segment.serde.ComplexMetricSerde;
import io.druid.segment.serde.LargeColumnSupportedComplexColumnSerializer;
import io.druid.segment.writeout.SegmentWriteOutMedium;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

public class SerializablePairSerde extends ComplexMetricSerde
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add some java doc for this class? It should contain the value types to be serde and where this class is used.

{

public SerializablePairSerde()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary class constructor.

{

}

@Override
public String getTypeName()
{
return "serializablePairLongString";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this string as a static variable in somewhere and use it?

}

@Override
public ComplexMetricExtractor getExtractor()
{
return new ComplexMetricExtractor()
{
@Override
public Class<SerializablePair> extractedClass()
{
return SerializablePair.class;
}

@Override
public Object extractValue(InputRow inputRow, String metricName)
{
return inputRow.getRaw(metricName);
}
};
}

@Override
public void deserializeColumn(ByteBuffer buffer, ColumnBuilder columnBuilder)
{
final GenericIndexed column = GenericIndexed.read(buffer, getObjectStrategy(), columnBuilder.getFileMapper());
columnBuilder.setComplexColumn(new ComplexColumnPartSupplier(getTypeName(), column));
}

@Override
public ObjectStrategy getObjectStrategy()
{
return new ObjectStrategy<SerializablePair>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify type parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new class SerializablePairLongString because if we specify type parameters later I can't do the SerializablePair<Long, String>.class

{
@Override
public int compare(SerializablePair o1, SerializablePair o2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify type parameters.

{
return o1.lhs.equals(o2.lhs) && o1.rhs.equals(o2.rhs) ? 1 : 0;
}

@Override
public Class<? extends SerializablePair> getClazz()
{
return SerializablePair.class;
}

@Override
public SerializablePair<Long, String> fromByteBuffer(ByteBuffer buffer, int numBytes)
{
final ByteBuffer readOnlyBuffer = buffer.asReadOnlyBuffer();

Long lhs = readOnlyBuffer.getLong();
Integer stringSize = readOnlyBuffer.getInt();

byte[] stringBytes = new byte[stringSize];
readOnlyBuffer.get(stringBytes, 0, stringSize);

return new SerializablePair<>(lhs, new String(stringBytes, StandardCharsets.UTF_8));
}

@Override
public byte[] toBytes(SerializablePair val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify type parameters.

{
String rhsString = (String) val.rhs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have to cast if type parameters are set.


ByteBuffer bbuf = ByteBuffer.allocate(Long.BYTES + Integer.BYTES + rhsString.length());
bbuf.putLong((Long) val.lhs);
bbuf.putInt(Long.BYTES, rhsString.length());
bbuf.position(Long.BYTES + Integer.BYTES);
bbuf.put(rhsString.getBytes(StandardCharsets.UTF_8));

return bbuf.array();
}
};
}

@Override
public GenericColumnSerializer getSerializer(SegmentWriteOutMedium segmentWriteOutMedium, String column)
{
return LargeColumnSupportedComplexColumnSerializer.create(segmentWriteOutMedium, column, this.getObjectStrategy());
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.query.aggregation.first;

import io.druid.query.aggregation.ObjectAggregateCombiner;
import io.druid.segment.ColumnValueSelector;

import javax.annotation.Nullable;

public class StringFirstAggregateCombiner extends ObjectAggregateCombiner<String>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
String lastString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the name should be firstString.


@Override
public void reset(ColumnValueSelector selector)
{
lastString = (String) selector.getObject();
}

@Override
public void fold(ColumnValueSelector selector)
{
lastString = (String) selector.getObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this string keep changing to find the first string?

}

@Nullable
@Override
public String getObject()
{
return lastString;
}

@Override
public Class<String> classOfObject()
{
return String.class;
}
}