Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance datagen for use as a load generator #3230

Merged
merged 4 commits into from
Aug 22, 2019

Conversation

rodesai
Copy link
Contributor

@rodesai rodesai commented Aug 19, 2019

Resurrecting some ancient enhancements to datagen so that we can use it
to generate load:

  • Add a flag to disable printing each row
  • Add a flag to control the number of threads producing data
  • Add a flag to control the total message rate (msgs/second) across all the
    threads. The rate limiting is implemented using a token bucket.

@rodesai rodesai requested a review from a team as a code owner August 19, 2019 18:44
Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodesai -- very cool features!!

private static int parseNumThreads(final String numThreadsString) {
try {
final int result = Integer.valueOf(numThreadsString, 10);
if (result < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (result < 0) {
if (result <= 0) {

And similarly for message rate below.

@@ -122,6 +157,9 @@ private static void usage() {
private final long maxInterval;
private final String schemaRegistryUrl;
private final InputStream propertiesFile;
private final int numThreads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add these into the usage() string above, so users will know we've added awesome new features :)


for (final Thread t : threads) {
try {
t.join();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this call? I'm having trouble making sense of the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It blocks waiting for the thread to exit. Otherwise, the program will just exit without producing the records that the user asked it to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason producer threads would exit is if they're interrupted, right? So DataGen will always exit with exit code 1? (Not saying there's anything wrong with this, just clarifying my understanding.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no they could just finish producing the requested number of records.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good call. That means this PR also changes the meaning of iterations from the total number of messages produced to the number of messages produced per thread.

@vcrfxia vcrfxia requested a review from a team August 19, 2019 20:44
import java.time.Instant;
import java.util.Objects;

public class TokenBucket {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason. I threw this together without much thought like a year ago to do ksql benchmarking. I agree it's better to just use that.

Resurrecting some ancient enhancements to datagen so that we can use it
to generate load:

- Add a flag to disable printing each row
- Add a flag to control the number of threads producing data
- Add a flag to control the total message rate (msgs/second) across all the
  threads. The rate limiting is implemented using a token bucket.
Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo -- hooray for refactors! LGTM :)

.put("schemaRegistryUrl", (builder, argVal) -> builder.schemaRegistryUrl = argVal)
.put("propertiesFile",
(builder, argVal) -> builder.propertiesFile = toFileInputStream(argVal))
(builder, argVal) -> builder.propertiesFile = toFileInputStream(argVal).get())
.put("msgRate", (builder, argVal) -> builder.msgRate = parseMsgRate(argVal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you intended to replace this (and parseNumThreads below) with parseInt?

@rodesai rodesai merged commit ddb970b into confluentinc:master Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants