Skip to content

Conversation

@frankgh
Copy link
Contributor

@frankgh frankgh commented Jun 22, 2022

entire_sstable_stream_throughput_outbound and entire_sstable_inter_dc_stream_throughput_outbound were
introduced in CASSANDRA-17065. They were added to the LoaderOptions class but they are not being used in
the BulkLoader class. In this commit, we make use of the entire_sstable streaming options in BulkLoader.
In addittion, the throttle and interDcThrottle are being passed as MB/s but they should be MiB/s.
This commit fixes the verbiage as well as setting the correct value for those properties in BulkLoader.

@frankgh frankgh changed the base branch from trunk to cassandra-4.1 June 22, 2022 19:45
Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and thank you for adding all those tests. I found only one issue that is easy to miss if you haven't before.
Precision problem when we set the old deprecated throttle in megabits. More info in the comments
Also, I almost forgot we need to add News.txt entry probably under Deprecation

File config = new File(Paths.get(".", "test", "conf", "cassandra.yaml").normalize());
String[] args = { "--throttle", "200", "--inter-dc-throttle", "400", "-d", "127.9.9.1", "-f", config.absolutePath(), sstableDirName("legacy_sstables", "legacy_ma_simple") };
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
// converts from megabytes to mebibytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// converts from megabytes to mebibytes
// converts from megabits to mebibytes

File config = new File(Paths.get(".", "test", "conf", "cassandra.yaml").normalize());
String[] args = { "-tmib", "200", "-idctmib", "400", "-d", "127.9.9.1", "-f", config.absolutePath(), sstableDirName("legacy_sstables", "legacy_ma_simple") };
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
// converts from megabytes to mebibytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// converts from megabytes to mebibytes
// converts from megabits to mebibytes

File config = new File(Paths.get(".", "test", "conf", "cassandra.yaml").normalize());
String[] args = { "--throttle-mib", "200", "--inter-dc-throttle-mib", "400", "-d", "127.9.9.1", "-f", config.absolutePath(), sstableDirName("legacy_sstables", "legacy_ma_simple") };
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
// converts from megabytes to mebibytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// converts from megabytes to mebibytes
// converts from megabits to mebibytes

File config = new File(Paths.get(".", "test", "conf", "cassandra.yaml").normalize());
String[] args = { "-t", "200", "-idct", "400", "-d", "127.9.9.1", "-f", config.absolutePath(), sstableDirName("legacy_sstables", "legacy_ma_simple") };
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
// converts from megabytes to mebibytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// converts from megabytes to mebibytes
// converts from megabits to mebibytes

Comment on lines 511 to 581
if (cmd.hasOption(THROTTLE_MEGABITS) && cmd.hasOption(THROTTLE_MEBIBYTES))
{
errorMsg(String.format("Both '%s' and '%s' were provided. Please only provide one of the two options", THROTTLE_MEGABITS, THROTTLE_MEBIBYTES), options);
}

if (cmd.hasOption(INTER_DC_THROTTLE_MEGABITS) && cmd.hasOption(INTER_DC_THROTTLE_MEBIBYTES))
{
errorMsg(String.format("Both '%s' and '%s' were provided. Please only provide one of the two options", INTER_DC_THROTTLE_MEGABITS, INTER_DC_THROTTLE_MEBIBYTES), options);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! :-)

if (cmd.hasOption(THROTTLE_MEGABITS))
{
throttleMebibytes = DataRateSpec.IntMebibytesPerSecondBound.megabitsPerSecondInMebibytesPerSecond(Integer.parseInt(cmd.getOptionValue(THROTTLE_MEGABITS)))
.toMebibytesPerSecondAsInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this part has an issue and breaks the compatibility. The conversions between megabits and mebibytes are double numbers internally. Something very annoying to keep backward compatibility, etc.
So the issue with this approach here is that someone will set megabits but by using .toMebibytesPerSecondAsInt() to set the throttling we actually round the number. This can be caught if you try to add --throttle being 5megabits in testBulkLoader_WithArgs6. Then if you do assertEquals(5, DatabaseDescriptor.getStreamThroughputOutboundMegabitsPerSec()); you actually get 8.
As it is a tool and the difference doesn't seem super big might be ok.
The bad part is that if someone tries super small number just for testing or something, they will actually get 0 and unthrottle... That is how I identified that annoying issue when we worked on the new config framework and we switched to double in DataRateSpec to be precise as the RateLimiter gets double anyway. At the same time, of course, we do not allow the users to input double numbers, just int. The SetGetStreamThroughputTest tests small numbers but we didn't have this for the tool and it is very tiny detail easy to miss. :-(
I am thinking maybe we can introduce double variable to ensure precision and no surprises to people who continue using the megabits version? WDYT?
In general probably we can just document for people this behavior and leave it as is I guess but if we want to keep the compatibility and be precise everywhere, we will need to introduce double. I guess this shouldn't be hard so at this point of the release I guess I would prefer just to work our the precision to be correct.

CC @yifan-c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me think about this. I have some ideas, but one of the options is to use a double type instead of int. Another idea is to keep two variables, one for mibs and one for MiBs. I will probably end up going with the solution that produces the more accurate result in the tests.

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the user sees only ints and input matches output - I am good :-) Probably both versions should work

{
LoaderOptions options = LoaderOptions.builder().build();
assertEquals(0, options.throttleMebibytes);
assertEquals(0, options.interDcThrottleMebibytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we add assertions also for the other two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have another test for entire sstable default settings testEntireSSTableDefaultSettings. Do we want to combine them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that one it seems. Maybe we can just combine them, it's up to you.

// converts from megabytes to mebibytes
assertEquals(24, options.throttleMebibytes);
assertEquals(48, options.interDcThrottleMebibytes);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we fix the precision issue, I think it will be correct to check the megabits here

LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
// converts from megabytes to mebibytes
assertEquals(24, options.throttleMebibytes);
assertEquals(48, options.interDcThrottleMebibytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we fix the precision issue, I think it will be correct to check the megabits here

@@ -296,7 +296,7 @@ public IntMebibytesPerSecondBound(long mebibytesPerSecond)
// which were in megabits per second in 4.0. Do not start using it for any new properties
public static IntMebibytesPerSecondBound megabitsPerSecondInMebibytesPerSecond(long megabitsPerSecond)
{
final double MEBIBYTES_PER_MEGABIT = 0.119209289550781;
final double MEBIBYTES_PER_MEGABIT = 15625D / 131072D;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made these conversions more explicit, let me know if this is okay @ekaterinadimitrova2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to see the reasoning behind the equations.

How about define the constants for MEGABIT, MEBIBYTES, etc. in bits and derive the conversion from the constants? For example,

static final double MEBIBYTES_IN_BITS = Math.pow(2, 20) * 8;
static final double MEGABIT_IN_BITS = Math.pow(10, 6); 
static final double MEBIBYTES_PER_MEGABIT = MEGABIT_IN_BITS / MEBIBYTES_IN_BITS;
... more for the other size units...

Copy link
Contributor Author

@frankgh frankgh Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I've updated the PR. The only difference is that I've defined MEBIBYTES_IN_BITS as:

static final double MEBIBYTES_IN_BITS = Math.pow(2, 23);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what is the benefit from introducing these calculations? We can also add just a comment, at the same time the name already states - MEBIBYTES/MEGABIT.

Copy link
Contributor

@yifan-c yifan-c Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only to make the reasoning more clear since the procedure is coded. Adding a comment with the calculations also works. Meanwhile a few constants should not introduce any overkill. I think either way is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, while you are right I prefer we leave it in the comments and don't do any more changes in this area

@@ -395,7 +395,7 @@ public double convert(double source, DataRateUnit sourceUnit)
};

static final double MAX = Long.MAX_VALUE;
static final double MEGABITS_PER_MEBIBYTE = 8.388608;
static final double MEGABITS_PER_MEBIBYTE = 131_072D / 15_625D;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another explicit conversion

@@ -38,6 +38,10 @@
* The Regexp used to parse the rate provided as String in cassandra.yaml.
*/
private static final Pattern UNITS_PATTERN = Pattern.compile("^(\\d+)(MiB/s|KiB/s|B/s)$");
static final double MEBIBYTES_IN_BITS = Math.pow(2, 23);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static final double MEBIBYTES_IN_BITS = Math.pow(2, 23);
static final double MEBIBYTE_IN_BITS = Math.pow(2, 23);

return conf.stream_throughput_outbound.toMebibytesPerSecondAsInt();
}

public static void setStreamThroughputOutboundMegabitsPerSec(long value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think value should stay int

return conf.inter_dc_stream_throughput_outbound.toMebibytesPerSecondAsInt();
}

public static void setInterDCStreamThroughputOutboundMegabitsPerSec(long value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think value should stay int

Comment on lines 76 to 77
DatabaseDescriptor.setStreamThroughputOutboundMegabitsPerSec(Double.valueOf(options.throttleMegabits).longValue());
DatabaseDescriptor.setInterDCStreamThroughputOutboundMegabitsPerSec(Double.valueOf(options.interDcThrottleMegabits).longValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just leave the option in double? it is later casted to double anyway in megabitsPerSecondInMebibytesPerSecond?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatabaseDescriptor.setStreamThroughputOutboundMegabitsPerSec and DatabaseDescriptor.setInterDCStreamThroughputOutboundMegabitsPerSec take int as a param. So we need to cast it. In your previous comment you mentioned that in public static void setStreamThroughputOutboundMegabitsPerSec(int value) value should stay as int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that one is called directly in nodetool and we should keep it int as otherwise we will introduce a bug by extending to long or double.

@@ -224,27 +238,27 @@ public Builder authProvider(AuthProvider authProvider)
return this;
}

public Builder throttle(int throttle)
public Builder throttleMegabits(int throttleMegabits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should also add builder methods for the mebibytes version we added?
Also, considering all latest discussions around tools, are we allowed change the names of these Builder methods? Maybe better to deprecate the old name and point to the new ones?
Something like:

public Builder throttleMegabits(int throttleMegabits)
        {
            this.throttleMegabits = throttleMegabits;
            return this;
        }

        
        @Deprecated 
        public Builder throttle(int throttle)
        {
            return throttleMegabits(throttle);
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of worried about the similarity of the names... It could be prone to oversights.
For example, code auto-completion may put throttleMegabits, while the actual intent is to have throttleMebibytes.
How about not introducing a new method with the similar name, but adding java docs to indicate the input throttle (better rename it to throttleMegabits) is in the megabits, and the method is deprecated.
@ekaterinadimitrova2 , what do you think?

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so you suggest we remove the throttleMegabits and end up only with the old deprecated throttle and the new throttleMebibytes
That also should work. Honestly, people again can make the same mistake and ignore throttle was in megabits but I guess less methods is better. Also, if we decide to get rid of the megabits version for those properties on a major version.

better rename it to throttleMegabits --> I think any renaming here will be breaking change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better rename it to throttleMegabits

I was not clear on what is "it". I mean to say that it is better to just rename the input parameter from throttle to throttleMegabits. Instead, we have

        @Deprecated 
        public Builder throttle(int throttleMegabits)
        {
           ...
        }


if (cmd.hasOption(THROTTLE_MEBIBYTES))
{
throttleMegabits = DataRateSpec.DataRateUnit.MEBIBYTES_PER_SECOND.toMegabitsPerSecond(Integer.parseInt(cmd.getOptionValue(THROTTLE_MEBIBYTES)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe new can add static import:
import static org.apache.cassandra.config.DataRateSpec.DataRateUnit.MEBIBYTES_PER_SECOND;

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throttleMegabits ---> Internally we save in mebibytes in DataRateSpec and megabits is used just on the parsing level, backward compatibility. So I feel we need to do this the other way around. - store in throttleMebibytes and if someone wants to use the megabits, that is converted to mebibytes. Otherwise we will do more calculations - the user gives mebibytes, we make it megabits here, but then we use those megabits to set it internally in mebibytes again in DataRateSpec

{
interDcThrottle = Integer.parseInt(cmd.getOptionValue(INTER_DC_THROTTLE_MBITS));
interDcThrottleMegabits = DataRateSpec.DataRateUnit.MEBIBYTES_PER_SECOND.toMegabitsPerSecond(Integer.parseInt(cmd.getOptionValue(INTER_DC_THROTTLE_MEBIBYTES)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for throttleMegabits

@@ -61,11 +61,11 @@ public static void setupClass()
public void testUpdateStreamThroughput()
{
// Initialized value check
assertEquals(defaultStreamThroughputMebibytesPerSec * BYTES_PER_MEBIBYTE, StreamRateLimiter.getRateLimiterRateInBytes(), 0);
assertEquals(defaultStreamThroughputMebibytesPerSec * BYTES_PER_MEBIBYTE, StreamRateLimiter.getRateLimiterRateInBytes(), 0.01);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe these changes can be skipped and we can be consistent if we change in the test class the constant MEBIBYTES_PER_MEGABIT to (same as DataRateSpec)

private static final double MEBIBYTES_IN_BITS = Math.pow(2, 23);
private static final double MEGABIT_IN_BITS = Math.pow(10, 6);
private static final double MEBIBYTES_PER_MEGABIT = MEGABIT_IN_BITS / MEBIBYTES_IN_BITS;

@@ -296,7 +296,7 @@ public IntMebibytesPerSecondBound(long mebibytesPerSecond)
// which were in megabits per second in 4.0. Do not start using it for any new properties
public static IntMebibytesPerSecondBound megabitsPerSecondInMebibytesPerSecond(long megabitsPerSecond)
{
final double MEBIBYTES_PER_MEGABIT = 0.119209289550781;
final double MEBIBYTES_PER_MEGABIT = 15625D / 131072D;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what is the benefit from introducing these calculations? We can also add just a comment, at the same time the name already states - MEBIBYTES/MEGABIT.

Comment on lines 92 to 93
assertEquals(0, options.throttleMegabits, 0.01);
assertEquals(0, options.interDcThrottleMegabits, 0.01);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say 0 should be exactly 0, we should never deviate as it has special meaning - unthrottle.
I think it is safe to leave the delta being 0 and be sure this never regress

"-d",
"127.9.9.1:9041",
"--throttle",
"10",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to set something crazy like 3 which is less than 0.5 in mebibytes to ensure no one ever makes a change this to be rounded to 0 and unthrottle. Highly unlikely, but someone might be testing something, I like correctness :-)

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think I am +1, I left just a nit and we need to update also NEWS.txt for the new flags and methods and mark as deprecated the old ones. I have only one comment that I will leave to @yifan-c to express final preference in the BulkLoader class, line 76-77

public static final String ENTIRE_SSTABLE_THROTTLE_MBITS = "entire-sstable-throttle";
public static final String ENTIRE_SSTABLE_INTER_DC_THROTTLE_MBITS = "entire-sstable-inter-dc-throttle";
@Deprecated
public static final String THROTTLE_MEGABITS = "throttle";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My OCD around the tools is on alert these days and I was wondering about changing this public static final String names but I guess people should be using the actual string, not the constants? @yifan-c ? Should we deprecate them with the old names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!
Cassandra is not provided as a library, but people do use it for the purpose. Compatibility is best effort. Those constants can be perfectly as "private", but declared as "public". See https://issues.apache.org/jira/browse/CASSANDRA-10637
I think the changes of the constants names are not strictly required. We can deprecate the original names and add a comment saying "MBITS" are "MEGAMITS".
WDYT? @frankgh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to keep the original name, since Cassandra could sometimes be used as a library. I will add the comment and I will retain the @Deprecated annotation as well.

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing that ticket @yifan-c , I think we can even add it to the comments for other people like us, tempted to make changes here :-)

public final int entireSSTableThrottle;
public final int entireSSTableInterDcThrottle;
public final double throttleMegabits;
public final double interDcThrottleMegabits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had for a moment a concern whether we change how and where we demand only ints from the users for those two but testing in LoaderOptionsTest with double numbers seems it triggers the same NumberFormatException for all 4 of them so it seems we are good

Comment on lines 104 to 105
assertEquals(200, options.throttleMegabits, 0.01);
assertEquals(400, options.interDcThrottleMegabits, 0.01);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(200, options.throttleMegabits, 0.01);
assertEquals(400, options.interDcThrottleMegabits, 0.01);
assertEquals(200, options.throttleMegabits, 0);
assertEquals(400, options.interDcThrottleMegabits, 0);

Comment on lines 116 to 117
assertEquals(200, options.throttleMegabits, 0.01);
assertEquals(400, options.interDcThrottleMegabits, 0.01);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(200, options.throttleMegabits, 0.01);
assertEquals(400, options.interDcThrottleMegabits, 0.01);
assertEquals(200, options.throttleMegabits, 0);
assertEquals(400, options.interDcThrottleMegabits, 0);

DatabaseDescriptor.setInterDCStreamThroughputOutboundMegabitsPerSec(options.interDcThrottle);
StreamResultFuture future = null;
DatabaseDescriptor.setStreamThroughputOutboundMegabitsPerSec(Double.valueOf(options.throttleMegabits).intValue());
DatabaseDescriptor.setInterDCStreamThroughputOutboundMegabitsPerSec(Double.valueOf(options.interDcThrottleMegabits).intValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not a big deal but I wanted to mention again that if we deprecate the megabits options and we go for mebibytes internally in DataRateSpec, probably better to add a setter in mebibytes and use mebibytes and do only one calculation megabits to mebibytes in the parking area for legacy reason in case someone wants to use the old ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added two setters setStreamThroughputOutboundMebibytesPerSec, setInterDCStreamThroughputOutboundMebibytesPerSec. However, I've decided to keep throttle and interDcThrottle in megabits, because the conversion from mebibytes to megabits produces a larger number, and it is less prone to loss of precision.

For example, if we have 0.1 and 0.3 mebibytes, the rounded conversion to megabits will be 1, and 3 megabits respectively. Because we cast these numbers to int in the setter, we get more precision in megabits. If we go the other way around, we lose precision, 3 and 1 both end up as 0 mebibytes.

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering you kept throttleMegabits and interDcThrottleMegabits being double, I guess this comment is outdated.
I spent some time thinking and I think we should just make internally those 4 properties to be operated in bytes and change also throttleMegabits and interDcThrottleMegabits to being bytes. This will save a lot of calculations and headache.
If we step back and look at the StreaManager.StreamRateLimiter, we will see that the rate limiter expects bytes.
Why this was actually not done when we were switching to the new types - because the whole effort was huge and switching internally parameters to smallest possible unit was left for follow up improvement tickets. But in this case I think this is at this point more like a bug considering how many twists and calculations the bulk loader will trigger. Also, division is more expensive operation so it is good if we can get rid of it. WDYT? CC @yifan-c

@frankgh
Copy link
Contributor Author

frankgh commented Jul 6, 2022

Thanks for the review @ekaterinadimitrova2 and @yifan-c . I think this PR is in a good place for merging.

…ireSSTableInterDcThrottle

`entire_sstable_stream_throughput_outbound` and `entire_sstable_inter_dc_stream_throughput_outbound` were
introduced in CASSANDRA-17065. They were added to the LoaderOptions class but they are not being used in
the BulkLoader class. In this commit, we make use of the entire_sstable streaming options in BulkLoader.
In addittion, the `throttle` and `interDcThrottle` are being passed as MB/s but they should be MiB/s.
This commit fixes the verbiage as well as setting the correct value for those properties in BulkLoader.
@frankgh frankgh force-pushed the CASSANDRA-17677-4.1 branch from 76cda2b to b39abf4 Compare August 14, 2022 00:35
Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left only one comment about the description, otherwise I am +1. we need only NEWS.txt (deprecation section and to explain we have changed the new flags for entire-stable an this is not compatible with alpha1 version)

options.addOption("idct", INTER_DC_THROTTLE_MBITS, "inter-dc-throttle", "inter-datacenter throttle speed in Mbits (default unlimited)");
options.addOption("e", ENTIRE_SSTABLE_THROTTLE_MBITS, "entire-sstable-throttle", "entire SSTable throttle speed in Mbits (default unlimited)");
options.addOption("eidct", ENTIRE_SSTABLE_INTER_DC_THROTTLE_MBITS, "entire-sstable-inter-dc-throttle", "entire SSTable inter-datacenter throttle speed in Mbits (default unlimited)");
options.addOption("t", THROTTLE_MBITS, "throttle", "throttle speed in Mbps (default 0 for unlimited), this option is deprecated, use \"throttle-mib\" instead");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use tmib, while throttle-mib explains it all, the actual option is different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the inter-dc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline and decided to keep the long name, and additionally, we decided to remove the short names that we're going to be introduced in this PR. I have updated the code, documentation, and tests to reflect the removal of the short names

-idct,--inter-dc-throttle <inter-dc-throttle> Inter-datacenter throttle speed in Mbits (default unlimited)
-idct,--inter-dc-throttle <inter-dc-throttle> (deprecated) Inter-datacenter throttle speed in Mbits (default 0 for unlimited)
Use -idctmib / --inter-dc-throttle-mib instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove -idctmib

-t,--throttle <throttle> Throttle
speed in Mbits (default unlimited)
-t,--throttle <throttle> (deprecated) Throttle speed in Mbits (default 0 for unlimited)
Use -tmib / --throttle-mib instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove -tmib

import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.net.*;
import java.net.InetAddress;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against the changes. Those are good and more in line with the code style, but unrelated changes of the patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I revert these changes?

@@ -224,27 +238,27 @@ public Builder authProvider(AuthProvider authProvider)
return this;
}

public Builder throttle(int throttle)
public Builder throttleMegabits(int throttleMegabits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of worried about the similarity of the names... It could be prone to oversights.
For example, code auto-completion may put throttleMegabits, while the actual intent is to have throttleMebibytes.
How about not introducing a new method with the similar name, but adding java docs to indicate the input throttle (better rename it to throttleMegabits) is in the megabits, and the method is deprecated.
@ekaterinadimitrova2 , what do you think?

@frankgh frankgh deleted the CASSANDRA-17677-4.1 branch January 31, 2024 01:28
ekaterinadimitrova2 pushed a commit that referenced this pull request Jun 9, 2025
### What is the issue
This is a follow up test for
riptano/cndb#13696.

### What does this PR fix and why was it fixed
The fix was already merged, but this confirms that without the fix,
collections are affected by the NoSuchElementException bug.
michaelsembwever pushed a commit to thelastpickle/cassandra that referenced this pull request Jan 7, 2026
### What is the issue
This is a follow up test for
riptano/cndb#13696.

### What does this PR fix and why was it fixed
The fix was already merged, but this confirms that without the fix,
collections are affected by the NoSuchElementException bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants