Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What (should) happen at 48 hrs? #32

Open
andreww opened this issue May 11, 2023 · 8 comments
Open

What (should) happen at 48 hrs? #32

andreww opened this issue May 11, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@andreww
Copy link
Collaborator

andreww commented May 11, 2023

I've just done a demo of this and it turned out that the lowest carbon intensity predicted for Oxford was in 48 hrs time (the last half hour returned by the API). Currently we choose to schedule the start of the task then. Is this what we want to happen?

Probably worth thinking about some of this kind of edge case, and cooking up some example csv files so we can test them. But deciding what to do isn't obvious to me.

@tlestang
Copy link
Collaborator

tlestang commented May 11, 2023

Our first 'live' bug report I think! 🍾

the lowest carbon intensity predicted for Oxford was in 48 hrs time (the last half hour returned by the API). Currently we choose > to schedule the start of the task then. Is this what we want to happen?

No, at least this is not my understanding of what whould happen

If the returned values of carbon intensities $c_i$ at times $t_i$ with $\delta = t_{i+1} - t_i$. and $\Delta$ is the job runtime provided by the user (using the -d flag), then I'm expecting the job start time to be $t_i$ with $i$ minimising the sum

$$s_i = \sum_{k = i}^{i + M} c_k,$$

with $M = \lceil \Delta / \delta \rceil$ and $0 \leq i \leq n - M$. So the latest possible job start time is $t_{n-M}$.

For instance if your job's duration is 3 hours (cats -d 180) the latest possible start time should be 45 hours from running cats. I thought this is what timeseries_conversion.get_lowest_carbon_intensity was doing, but we should have a look.

Probably worth thinking about some of this kind of edge case, and cooking up some example csv files so we can test them. But > deciding what to do isn't obvious to me.

Couldn't agree more - this stuff is easy to get almost right :)

@sadielbartholomew
Copy link
Member

(I've nothing helpful to add here right now, but 💯 for the use of LaTeX in markdown Thibault, I did not know that was possible!)

@colinsauze
Copy link
Member

@tlestang's approach reduces the scope for when longer jobs can run but guarantees that you get a sensible answer. Two thoughts on other approaches:

  1. Could we add an option to let the user decide.
  2. Weather forecasts extend beyond 48 hours with lower accuracy, you can probably make a sensible guess about whether the grid in your region will have a lower carbon intensity up to 5 days from now by looking at the weather forecast. The result won't be as good as the carbonintensity.org.uk predictions but might be good enough to tell you it will be a lot less/more sunny/windy then and your job would be best run a bit later. However this does require us getting into producing our own grid intensity forecast. Although if people are going to wait 4/5 days for a job to run we might need to start asking about how important the job is or asking for a target carbon intensity.

@tlestang
Copy link
Collaborator

I've just done a demo of this and it turned out that the lowest carbon intensity predicted for Oxford was in 48 hrs time (the last half hour returned by the API). Currently we choose to schedule the start of the task then. Is this what we want to happen?

Currently the cat_converter function is called with the "simple" method hardcoded as an argument:

def writecsv(...):
    # ...
    return cat_converter(outputfile, "simple", duration)

The 'simple' method just returns the time corresponding to the minimum of the timeseries -- not the minimum of running average over the job runtime:

def cat_converter(...):
    # ...
    if method == "simple":
        #  Return element with smallest 2nd value
        #  if multiple elements have the same value, return the first
        rtn = min(data, key=lambda x: x[1])
        rtn = {
            "timestamp": rtn[0],
            "carbon_intensity": rtn[1],
            "est_total_carbon": rtn[1],
        }

So yeah - if the min carbon intensity is the last data point, cats will return this as the start time. But that doesn't sound right to me.

@andreww andreww added the bug Something isn't working label May 26, 2023
@abhidg abhidg added this to the Version 1.0 milestone Oct 24, 2023
@colinsauze
Copy link
Member

This has probably been resolved in #43 (to ensure that jobs complete within the 48 hour forecast), but there doesn't appear to be a test for it. Has anybody written a test that covers this? Is it in the main branch?

@tlestang
Copy link
Collaborator

Directly following the CW2023 hackathon, cats internally implemented two methods of determining the job start time:

  • One so called "simple" that just returned the time corresponding to the minimum of the carbon intensity timeseries.
  • Another called "windowed" that would compute the best time window (over the job duration) that minimise carbon intensity over the next 48h and return the start of that window.

The second behaviour is what is actually expected from cats, but the first one has been the default for a while until #55 and d9fdddd6 were merged in.

The behaviour originally described by @andreww makes sense if cats just returned the minimum of the carbon intensity forecast dependently of job duration. That's not the case anymore, see cats/optimise_starttime.py.

In terms of testing, I'm not sure what to add in addition to tests in tests/test_windowed_forecast.py?

@colinsauze
Copy link
Member

Last week @Llannelongue suggested we create a test where the carbon intensity continuously falls over the 48 hour forecast. The correct behaviour would be that a 48 hour job gets scheduled to run at the start of the 48 hour period, not the end of it.

@abhidg
Copy link
Contributor

abhidg commented May 20, 2024

Related: #99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants