Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Price taker model for DISPATCHES, Rehashed #1358

Open
wants to merge 100 commits into
base: main
Choose a base branch
from

Conversation

djlaky
Copy link

@djlaky djlaky commented Feb 29, 2024

Fixes

Compared to #1201, operational constraints mathematical form was corrected. Unnecessary functions were removed/merged. Additional user flexibility was added for constructing cost objectives.

Summary/Motivation:

Resurrecting #1201 to finish price taker framework in accordance with project milestones.

Framework allows the user to construct price-taker models for design and/or operational optimization considering time-varying market price data.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

radhakrishnatg and others added 30 commits May 25, 2023 21:46
… functions to add constraints through pyomo blocks
setup.py Outdated
@@ -97,6 +97,8 @@ class ExtraDependencies:
]
grid = [
"gridx-prescient>=2.2.1", # idaes.tests.prescient
"scikit-learn",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check, what are these dependencies required for and are they absolutely necessary?

Copy link
Author

@djlaky djlaky May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're used for KMeans clustering to reduce a large set of days (~365 days) to a smaller number (output an 'optimal' clusters function, or user specifiedvalue):

for k in k_values:
kmeans = KMeans(n_clusters=k).fit(daily_data.transpose())
inertia_values.append(kmeans.inertia_)
# Identify the "elbow point"
elbow_point = KneeLocator(
k_values, inertia_values, curve="convex", direction="decreasing"
)
n_clusters = elbow_point.knee

and:

kmeans = KMeans(n_clusters=n_clusters).fit(daily_data.transpose())

@radhakrishnatg, Can you comment on if these dependencies (kneed, scikit-learn) are necessary?

@blnicho blnicho requested a review from esrawli May 2, 2024 18:39
Param,
)

from idaes.apps.grid_integration import MultiPeriodModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not something for this PR, but one thing I have wanted to comment on for a while is to make sure that the DISPATCHES team is aware the IDAES steady-state flowsheets support time indexing.

m.fs = FlowsheetBlock(dynamic=False, time_set=[0, 1, 2, ...])

I do not think this tool has been using this capability, and it might help simplify things (or conversely make things more complicated). However, this would allow you to e.g. initialize all states (StateBlocks) in the multi-period model in one go using the existing initialization frameworks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will note this for our continued development, thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewlee94 Thank you for the note! I was not aware of time indexing. In DISPATCHES, we create one instance of the flowsheet, initialize it and clone it. Also, in most cases, models tend be linear/quadratic, so initialization is not needed. If we want to use the time index, then I believe we have to make a lot of changes to the MultiPeriodModel class. Maybe for another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely another PR.

For reference, state block initialization will handle the indexing just fine and will initialize all time points in one go for you. @Robbybp also has some dynamic initialization tools that copy results from one state to another that might work here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have some tools in pyomo-mpc for working with dynamic models. These should be useful for working with time-indexed steady state flowsheets, as this is very close to the use case they were designed for. To my basic understanding, MultiPeriodModel and the related tooling in Dispatches seems like they are solving a similar problem from a different angle.

f"Could not find elbow point for given kmin, kmax. Consider increasing the range of kmin, kmax."
)

print(f"Optimal # of clusters is: {n_clusters}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally discourage print statements inside code; using a logger is generally better as it gives control over where the output goes.

)

if plot == True:
plt.plot(k_values, inertia_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved into a separate method so that it can be called by the user separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more closely, I would recommend breaking this into a separate method. This will make testing easier as well.

test_fig = plt.gcf()
plt.close("all")

assert test_fig is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really test much of the actual code? It runs the plotting code, but all this assert does is make sure the resulting plot got closed which doesn't really test the plot. Note that testing plotting code is generally very hard - we generally only test the data used to generate the plot and leave the actual plotting untested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct this just tests if the plot was successfully called; it does not check if the "correct" plot or "correct" data was displayed.

I'm inclined to remove the plotting test. Please see my other review comment for further information on the optimal clusters code snippet (link).


return daily_data

def get_optimal_n_clusters(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would appear that changing platforms where the code is run or changing the version of scikit-learn can generate different results for the optimal clustering, changing the result of this plot as well.

That's why I made the test a range instead of a single value in another test:

def test_determine_optimal_num_clusters(excel_data):
# Added a range for optimal cluster values based on how the
# plot appears visually. Test can be removed in the future if
# failure occurs. This may depend on scikit-learn and kneed and
# the interaction thereof.
# Older versions get n_clusters = 10, Newer versions n_clusters = 11
m = PriceTakerModel()
daily_data = m.generate_daily_data(excel_data["BaseCaseTax"])
n_clusters, inertia_values = m.get_optimal_n_clusters(daily_data)
assert 9 <= n_clusters <= 15

Ultimately, whether the above test stays hinges on whether the finding of "optimal" clusters stays. I'm also inclined to keep the optimal clustering functionality.

@radhakrishnatg, can you please give us some guidance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply that there should be bounds on the version of sklearn we support to ensure expected behaviour (or at least for testing purposes).

djlaky added 5 commits May 8, 2024 10:10
Using lighter weight parent class for design and operation model blocks.
Made sklearn and kneed optional imports to perform clustering. Logger messages and import errors added for when functions are called that use these packages. Updated setup.py to reflect that these are no longer new dependencies.
Updated tests for optional imports. Moved plotting test to within a separate function.

Ran black on all code.
Split tests so that optional import tests are now all separated from tests that do not use optional imports.
labels = kmeans.labels_

# Set any centroid values that are < 1e-4 to 0 to avoid noise
centroids = centroids * (centroids >= 1e-4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djlaky Not sure what this is trying to do. From what I understand, you are trying to set the LMP values which are less than 1e-4 to zero to avoid numerical issues. This is good, and we need this. But, it cannot be value <=1e-4. LMP values can also be negative, and this would set those LMPs to zero as well. We need to set LMPs to zero if their absolute value is < 1e-4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change. Good catch

return (
min_capacity * op_mode[self.mp_model.set_period.at(t)]
<= var_commodity[self.mp_model.set_period.at(t)]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change it, but you can write it this way:
min_capacity * op_mode[t] <= var_commodity[t]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should make that change - for one it simplifies the code and takes it easier to read, and secondly it removes one unnecessary method call that could potentially break in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally in place to handle cases where multi-year, or multi-day (or both) were used to specify the multiperiod model. In this case, self.mp_model.set_period is a data structure full of tuples i.e.:

instead of:
set_period = [period_1, period_2, ..., period_n]

we have:
set_period = [(period_1, year_1), (period_2, year_1), ... , (period_n, year_1), (period_1, year_2), ..., ..., (period_n, year_n)]

That's why the structure self.range_time_steps was created. In this way, you can use the .at(t) functionality to get the tuple within the self.mp_model.set_period data structure indexed by the self.range_time_steps RangeSet.

Three realistic options:

  1. Leave it the way it is.
  2. Make a call t = self.mp_model.set_period.at(t) at the beginning of these functions to make it more readable.
  3. Update it to be more readable for now and inevitably change it back when we use representative days or multi-year simulations.

Since the multiperiod class is the way it is, I think this probably has to stay the way it is for the moment. We can address this in a future release when we fully support representative days (which will also use the tuple structure from the multiperiod class). Up to suggestions, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and how is this rule used? My inclination would be handle this at the component generation step, and only used the indexing elements you want. That way, this rule will only ever get a single value for t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not have two separate indexing sets in this case, one for period and one for year? That way they would be clearly distinguished (and year can be a single element if you don't need it).

Whilst dealing with arbitrary length indexing elements is possible, it does add some complexity and edge cases you need to be able to handle. This brings me to another question, do you have tests for both cases (and any edge cases you can envision occurring)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djlaky We have declared:

op_mode = {
            t: self.mp_model.period[t].find_component(op_blk + ".op_mode")
            for t in self.mp_model.period
        }

so, t = self.mp_model.set_period.at(t). If elements of set_period are tuples, then t is also a tuple. Right now, we only plan to support representative days. The multiyear case can be handled by creating an instance of the PriceTakerModel for each year. If single index vs. tuple is the issue, then you can construct the constraint in this manner.

m.capacity_low_lim = Constraint(m.set_period)
for t in m.set_period:
    m.capacity_low_lim.body(min_capacity * op_mode[t] <= var_commodity[t])

This can handle both single index and double index. Double check the method name once again. It has to be either body or expr or something like that.
Alternatively, you can separate it out depending on full year vs. representative days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, there is another approach:

@m.Constraint(m.set_period)
def capacity_low_lim(blk, *t):
    return min_capacity * op_mode[t] <= var_commodity[t]

This works for both single index and tuple.

Copy link
Author

@djlaky djlaky May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For constraints that have no skipping, sure, but for the ramping constraints and startup/shutdown there are issues:

https://github.com/djlaky/idaes-pse/blob/price-taker-model/idaes/apps/grid_integration/multiperiod/price_taker_model.py#L644-L656

    # The linearized ramping constraints
    @blk.Constraint(self.range_time_steps)
    def ramp_up_con(b, t):
        if t == 1:
            return Constraint.Skip
        else:
            return (
                var_ramping[self.mp_model.set_period.at(t)]
                - var_ramping[self.mp_model.set_period.at(t - 1)]
                <= (startup_rate - ramp_up_rate)
                * act_startup_rate[self.mp_model.set_period.at(t)]
                + ramp_up_rate * act_op_mode_rate[self.mp_model.set_period.at(t)]
            )

https://github.com/djlaky/idaes-pse/blob/f10ea59ad3f2a430f749748b42be8419e34982a3/idaes/apps/grid_integration/multiperiod/price_taker_model.py#L751-L759

    def Binary_relationhsip_con(b, t):
        if t == 1 or t > number_time_steps:
            return Constraint.Skip
        return (
            op_mode[self.mp_model.set_period.at(t)]
            - op_mode[self.mp_model.set_period.at(t - 1)]
            == start_up[self.mp_model.set_period.at(t)]
            - shut_down[self.mp_model.set_period.at(t)]
        )

Copy link
Contributor

@radhakrishnatg radhakrishnatg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.


# Importing in the necessary variables
if not hasattr(self, "range_time_steps"):
self.range_time_steps = RangeSet(len(self.mp_model.set_period))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why this might not exist at this point? It seems to me that this is something that should exist before you get here.

Copy link
Member

@andrewlee94 andrewlee94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We will need documentation for all the new classes - at a minimum some technical references for the API (autodocs), but and explanation with a simple demonstration of how to use the tools would be nice if possible (looking at the code, this will be very important).
  2. I do not see a test file for design_and_operation_model.py. This might be rolled into another test, but I recommend having a separate test file for this to help with future maintenance.
  3. In general, I think you could (and should) do a lot more unit testing of methods in isolation. From what I can see in the tests, there are not a lot of tests that target individual methods. E.g. I see no test that targets add_startup_shutdown and actually tests that it does what it is supposed to do (I see one test for a failure case, but that is it).
  4. Most of the tests I see are based on what I assume is an actual case study. I would suggest thinking about whether you can construct simpler dummy cases for unit testing all the methods in isolation, and where possible mock up any necessary starting information so that you have fewer moving parts and simpler cases to test with. The full case study is good for an end-to-end component/integration test of the system, but often make it harder to do unit testing.


@pytest.mark.unit
def test_seed_value():
m = PriceTakerModel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this test? As far as I can tell, this test would only fail if PriceTakerModel somehow prevents users from creating new attributes. If you expect seed to be created by PriceTakeModel, you should use either assert isinstace(m.seed, expected_type) or assert hasattr(m, "seed").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as seed is a property with a custom settr, I would suggest testing for the presence of _seed and the failure cases for seed here.

value = 12.34
with pytest.raises(
ValueError,
match=(f"seed must be an integer, but {value} is not an integer"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use an f-string here - check for the expected string output to make sure nothing else is going wrong. Same for the above and below.


@pytest.mark.unit
def test_ramping_constraint_logger_messages(excel_data):
des = DesignModel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one very long test involving a lot of with pytest.raises() checks. I would suggest breaking this in to multiple tests which check each possible exception individually. This will help future maintenance, as you will be able to see all exceptions in one pass; at the moment the test will fail at the first exception it encounters and other exceptions might get hidden until you fix the first. This applies to some other tests I've seen too.

not (have_skl and have_kn),
reason="optional package 'scikit-learn' not installed",
)
@pytest.mark.unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving the with pytest.rasies() to be only around the final piece of code you expect to fail. This way, any failures in the set up code will cause different errors to be emitted which might help with debugging.

Related to this:

  1. Are there any unit tests for each of the lines of code in the set up part, to make sure these are tested in isolation? At the moment, this tests depends on a lot of moving parts and I am not sure if they have been tested sufficiently to separate a failure in the setup for the failure you are actually trying to test. This goes for a lot of the other tests here too.
  2. Is there a way to dummy up the first part to avoid possible failures in the set up code?

NonNegativeReals,
Constraint,
)
from functools import reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylint will probably complain about import ordering here. You need to group imports, and start with Python native funcitons, then thrid party imports, then Pyomo and finally IDAES.


import matplotlib.pyplot as plt

import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably OK, but would it be better to use the IDAES logger instead?


return daily_data

def get_optimal_n_clusters(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply that there should be bounds on the version of sklearn we support to ensure expected behaviour (or at least for testing purposes).

)

if plot == True:
plt.plot(k_values, inertia_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more closely, I would recommend breaking this into a separate method. This will make testing easier as well.

path_to_file,
)

if column_name is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment, but I would generally put the simple checks first before trying to open the file. These checks are a lot cheaper than opening a file, and I prefer to fail sooner rather than later.

test_fig = plt.gcf()
plt.close("all")

assert 9 <= n_clusters <= 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a great test (I assume this is what you were referring to with different platforms/versions giving different results). This also does not test the inertia values which would be good to include (if possible). Would it be possible to use a much simpler dummy data set to test this which gives a well defined solution for testing?

Also - this is where break the plotting function out as a separate method would help as you could have a separate test for it.

@ksbeattie
Copy link
Member

@djlaky we're hoping to cut a release at the end of the month - next week. Do you think this will be done in time for that?

@adowling2
Copy link
Contributor

We will not have time to incorporate the feedback from Andrew. If we are okay with merging without those suggestions (maybe open an issue to track proposed changes), then yes, this can probably get merged.

@ksbeattie
Copy link
Member

Pushing this to the Aug release

@ksbeattie
Copy link
Member

@djlaky any update on this?

This reverts commit 46a3331.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DISPATCHES Priority:Normal Normal Priority Issue or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet