Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OffsetsForTimes API to Consumer #235

Merged
merged 16 commits into from Jul 25, 2017
Merged

Conversation

denisivan0v
Copy link
Contributor

Closes #228.

@ghost
Copy link

ghost commented Jun 20, 2017

It looks like @denisivan0v hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@denisivan0v
Copy link
Contributor Author

denisivan0v commented Jun 20, 2017

[clabot:check]

@ghost
Copy link

ghost commented Jun 20, 2017

It looks like @denisivan0v hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@denisivan0v
Copy link
Contributor Author

[clabot:check]

@ghost
Copy link

ghost commented Jun 20, 2017

@confluentinc It looks like @denisivan0v just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@denisivan0v
Copy link
Contributor Author

@mhowlett Is there any feed where I can get these artifacts https://ci.appveyor.com/project/mhowlett/confluent-kafka-dotnet/build/1.0.53/artifacts ?

@treziac
Copy link
Contributor

treziac commented Jun 20, 2017

Thanks a lat for PR!

Not online (only release packages are pushed on nuget), but you can download them and add a repository as feed source to test them.
Will review later.

@mhowlett
Copy link
Contributor

that was quick! thanks!

General comments:

  1. We'll need API docs on public methods
  2. We'll need an integration test to demonstrate it works in some basic way.

API choice:

I wouldn't use IReadOnlyDictionary. If we go with something compatible with dictionaries, I'd use IEnumerable<KeyValuePair< ... >> instead which is less specialized.

All of the following is up for discussion/change, just thinking out loud here:

Currently, no other method on Consumer (eg Position) accepts/returns dictionary compatible types. Use of IEnumerable<KeyValuePair<>> is quite natural, but for consistency with other methods, I'd say we should not use dictionaries for this right now. If we change to dictionaries, that is a future enhancement made across the entire library.

I'd invent a new type TopicPartitionTimestamp, where Timestamp should (definitely) be Confluent.Kafka's Timestamp, not DateTime. Input would be IEnumerable<TopicPartitionTimestamp>.

For return value I'd use IEnumerable<TopicPartitionOffset> (consistent with other methods). We use concrete List<..>s elsewhere, but we should definitely not be returning concrete types, this is a mistake. We should update other methods too, but not necessary in this PR.

I see in the returned collection you're including the Timestamp. Denormalized output is arguably better here, but it's nothing you couldn't easily do with a line of LINQ and also we don't do this anywhere else in the API, so I wouldn't do it here either. Again, discussion on this point encouraged!

Note to self: Read this closely, and review entire API:
https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/guidelines-for-collections

@mhowlett
Copy link
Contributor

Also, I'm a fan of using TimeSpan for specifying timeouts as I think it makes code more readable. I'd use that instead of millisecondsTimeout.

@denisivan0v
Copy link
Contributor Author

Thanks for your comments! I've made required changes to unify API and inserted API docs on public methods and types.

As for IEnumerable<TopicPartitionTimestamp> instead of IReadOnlyDictionary<>, this opens an ability to pass duplicates of TopicPartitionTimestamp and we'll get an exception while librdkafka call.

Also, I think that is better to use IReadOnlyCollection<> instead of IEnumerable<> because we always have a finite sequence of some instances in memory.

I'm going to also add integration test(s) later today as well.

@mhowlett
Copy link
Contributor

excellent! looking good at a quick glance - look forward to the integration test :-)

I don't think librdkafka will produce an error for duplicate input (or at least if I had written it it wouldn't..).

I think you can feel confident using this branch in your own code (most conveniently via the artifact produced on appveyor). I'm open to changing my mind on our use of our collections but I think for now consistency should win and I'll open a general issue soliciting feedback across the entire API.

FYI: the timeline for getting this merged is as follows:

  1. I'm going on vacation for a few weeks from tomorrow, we'll get it merged to master soon after.
  2. The 0.11.0 release is frozen unfortunately, we've missed that release.
  3. we plan to set up more frequent pushes of preview snapshots to nuget.org, so it'll get in the first of those.

@@ -1005,8 +1005,12 @@ public class Producer<TKey, TValue> : ISerializingProducer<TKey, TValue>, IDispo
ISerializer<TValue> valueSerializer,
bool manualPoll, bool disableDeliveryReports)
{
var configWithoutKeySerializerProperties = KeySerializer.Configure(config, true);
var configWithoutValueSerializerProperties = ValueSerializer.Configure(config, false);
var configWithoutKeySerializerProperties = keySerializer != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, noticed it this we, it was introduced in #205 but waiting for matt to come back as there are other things to fix on it. This will be done in separate PR I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok if it will be another PR for this. But I need it to be fixed here as I'm going to use this CI build. I'll resolve conflicts if any.

@treziac
Copy link
Contributor

treziac commented Jun 26, 2017

Still didn't get proper time to review/try it, but should definitely have some this week ;)

@denisivan0v
Copy link
Contributor Author

@mhowlett I've added an integration test and basic unit tests.

Also, I've found that most of the existing integration tests don't work (at least in my env).

Moreover, Producer<K,V> still hangs on ProduceAsync if K is Null. I'd spent several hours trying to fix it, but had no luck. There is no callback from librdkafka in this case.

@denisivan0v
Copy link
Contributor Author

@treziac Looking forward to your review as well. Thanks!

Copy link
Contributor

@treziac treziac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

Major problem for me is the wrong values returned without exception, but it's more a librdkafka issue I think.

You should add integration test on what happen when using wrong timestamp (in future or too much in past) and check we properly throw.

Still have to review and test properly for possible issue

@@ -713,6 +719,11 @@ private class NativeMethods
int partition, out long low, out long high);

[DllImport(DllName, CallingConvention = CallingConvention.Cdecl)]
internal static extern ErrorCode rd_kafka_offsets_for_times(IntPtr rk,
/* rd_kafka_topic_partition_list_t * */ IntPtr topics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsets, not topics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -612,6 +612,31 @@ public WatermarkOffsets QueryWatermarkOffsets(TopicPartition topicPartition)
=> consumer.QueryWatermarkOffsets(topicPartition);

/// <summary>
/// Look up the offsets for the given partitions by timestamp. The returned offset for each partition is the
/// earliest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as you repeat the doc, you may want to use include_docs.xml (matt used it recently, this seems a good idea to use it when doc is duplicated)

So just need a
/// <include file='include_docs.xml' path='API/Member[@name="Consumer_OffsetsForTimes"]/*' />

And add the proper doc in the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -391,6 +392,11 @@ internal static SafeTopicHandle topic_new(IntPtr rk, string topic, IntPtr conf)
out long low, out long high)
=> _get_watermark_offsets(rk, topic, partition, out low, out high);

private delegate ErrorCode OffsetsForTimes(IntPtr rk, IntPtr topics, IntPtr timeout_ms);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsets, not topics (by reading rdkafka.h, may change this comment later finaly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

internal IEnumerable<TopicPartitionOffset> OffsetsForTimes(IEnumerable<TopicPartitionTimestamp> timestampsToSearch, int millisecondsTimeout)
{
IEnumerable<TopicPartitionOffset> offsets = timestampsToSearch
.Select(t => new TopicPartitionOffset(t.TopicPartition, t.Timestamp.UnixTimestampMs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems so weird with the TopicPartitionOffset taking a long timestamp (though required by librdkafka)
Add some commentary to explain quickly why this is done, or perhaps create a new GetCTopicPartitionList(IEnumerable<TopicPartitionTimestamp> offsets) - not sure what is best

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment with explanation

@@ -30,13 +30,19 @@ public static IEnumerable<object[]> KafkaParameters()
{
if (kafkaParameters == null)
{
var codeBase = typeof(Tests).GetTypeInfo().Assembly.CodeBase;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in another PR - I never took time to do it for now as I wanted to put properly appveyor for integration tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assembly.Location provides the full path of the tests dll. And it's exactly what is needed here.

From the MSDN article for the CodeBase property:

To get the absolute path to the loaded manifest-containing file, use the Assembly.Location property instead.

I can rollback this and move to anther PR. Should I do it?

var assemblyDirectory = Path.GetDirectoryName(assemblyPath);
var jsonPath = Path.Combine(assemblyDirectory, "kafka.parameters.json");
dynamic json = JsonConvert.DeserializeObject(File.ReadAllText(jsonPath));
kafkaParameters = new List<object[]>() { new object[] { json.bootstrapServers.ToString(), json.topic.ToString(), json.partitionedTopic.ToString() } };
var json = JObject.Parse(File.ReadAllText(jsonPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to go from JsonConvert.DeserializeObject to JObject.Parse?
Same things, would be better in another PR with the previous one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With JsonConvert.DeserializeObject we need to have strongly typed object (or dynamic object, like in this case). With JObject.Parse we have parsed JSON tree as the result and can access to the object properties without any other object creations. In my opinion, it's better to use JObject here instead of dynamic object.

// See librdkafka implementation for details https://github.com/edenhill/librdkafka/blob/master/src/rdkafka.c#L2433
var result = consumer.OffsetsForTimes(
new[] {new TopicPartitionTimestamp(messages[0].TopicPartition, messages[0].Timestamp)},
TimeSpan.FromSeconds(10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have have some issues here, I think it is librdkafka related
When passing TimeSpan.FromSeconds(1), I don't receive any error but the offset is not updated - I retrieve timestamp as offset. Will try to test more this we - won't have more time for reviewing today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The current rdlibkafka implementation just returns from the wait loop when timeout ends. This leads to the issue when offset is not updated, unfortunately.
See https://github.com/edenhill/librdkafka/blob/master/src/rdkafka.c#L2475 for details

{
for (int index = 0; index < N; index++)
{
var message = await producer.ProduceAsync(topic, $"test key {index}", $"test val {index}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having too long tests, prefer using an IDeliveryHandler, allowing you to produce all your messages async - this is not a no-go, feel free to change it or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried using IDeliveryHandler to make code shorter, but this approach gives me about the same. I'd like to leave it like this, if you don't mind.

/// <param name="timestamp">
/// A Kafka timestamp value.
/// </param>
public TopicPartitionTimestamp(string topic, int partition, Timestamp timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may modify Timestamp back to the way where it accepted Datetime - if we want to retrieve offset at a particular date, it's kind of heavy with current api (new Timestamp(Timestamp.FromDateTime(...)), or perhaps just add a Timestamp.FromDateTime static method, or an implicit/explicit conversion from datetime to timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Timestamp.FromDateTime method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to provide easier conversion from DateTime to Timestamp.

Of the options @treziac notes, I'm definitely against implicit conversion and TimestampType needs to be specified so explicit conversion is out too.

Between the static method and constructor, I'm currently favoring using a constructor because 1. it's fewer characters. 2. we only ever want to return a Timestamp struct - we don't need the flexibility provided by a static method. 3. It's slightly incorrectly named - DateTime is not the only parameter. 4. the constructor variant reads easier to me.

I'm definitely open to being convinced back to using a static method if someone wants to present a good argument, but currently think we should change the FromDateTime method to an additional Timestamp constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, some previous conversation related to DateTime/Timestamps is in #120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed that a constructor much better than the static method. Done in a517357

Copy link
Contributor

@treziac treziac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: found the error by proper checking of librdkafka code. Need to adapt your code and add tests given this scenario


var list = GetTopicPartitionOffsetErrorList(cOffsets);
LibRdKafka.topic_partition_list_destroy(cOffsets);
return list.Select(t => t.TopicPartitionOffset).ToList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have error by partition - the error returned by offsets_for_times i a general error (broker not available), but each partition may return an error. In this case, the offset will still be at the timestamp. You can reproduce easily by querying to an inexistant partition.

Two choices: either we return a list of TopicPartitionOffsetError (and user will have to map it before assign), either we throw exception if any member has error. I prefer first solution, eventualy with an implcit cast from TopicPartitionErrorOffset to topicPartitionOffset which would throw if an error is present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I can repro the partition error when querying non-existent partition. In my case I've got a general error.

But I've made checks for errors for every partition, and in case of any error OffsetsForTimes throws the KafkaException for the first found error.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer first solution,

yes, I agree (as noted elsewhere).

eventualy with an implcit cast from TopicPartitionErrorOffset to topicPartitionOffset which would throw if an error is present

This is an interesting (good) idea. We could do implicit or explicit casting or have a ToTopicPartitionOffset method. A potential problem is confusion with the TopicPartitionOffset property (which doesn't throw). We could make that throw, but then that would be inconsistent with other properties.

It's intuitive to me that a cast would throw if there was a problem, so I like the idea of using a cast for conversion (though it could be argued the semantics aren't perfectly clear). We want an explicit cast though, as the operation can fail. See:

https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/types/casting-and-type-conversions

I think we should include that in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm not a big fan of cast operations that can throw. Client can test Error property before get a TopicPartitionOffset without cast operation and try/catch block.

However, it was easy to add an explicit cast operation and unit test for it, so I did it. Thanks!

@denisivan0v
Copy link
Contributor Author

@treziac many thanks for your very detailed review. Please check the changes I've made.

I started to use the CI build in my project and will report other issues if any.

}

var list = GetTopicPartitionOffsetErrorList(cOffsets);
var error = list.Where(x => x.Error.Code != ErrorCode.NoError).Select(x => x.Error).FirstOrDefault();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I suggested making a TopicPartitionTimestamp type, I wasn't realizing that offsets_for_times could return per partition errors. Sorry! We should not have the TopicPartitionTimestamp type, replace it with a TopicPartitionTimestampError type and not throw an exception when a partition specific error occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, we return TopicPartitionOffset not TopicPartitionTimestamp. So keep the TopicPartitionTimestamp type but return TopicPartitionOffsetError instead of TopicPartitionOffset (and get rid of the check for per partition errors here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a517357, thanks!

@@ -16,4 +16,8 @@
<PackageReference Include="xunit.runner.visualstudio" Version="2.2.0" />
</ItemGroup>

<ItemGroup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we want this ItemGroup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual studio keeps adding it, I think we could include it. Didn't look if they changed it in 2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this https://stackoverflow.com/a/33414270

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a517357

@mhowlett
Copy link
Contributor

thanks for the latest commits @denisivan0v - this is looking good to me now. Will review again tomorrow and probably merge :-)

@denisivan0v
Copy link
Contributor Author

Many thanks for your changes and upstream merge @mhowlett. Looking forward for any updates here.

@mhowlett mhowlett merged commit 3b7bde7 into confluentinc:master Jul 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants