Refactor code (and some other changes) #43

Llannelongue · 2023-05-29T22:46:27Z

This is following discussions on #26 and during the meeting that now that we are going toward a stable version, it would be worth refactoring the code to make it a bit more future-proof.

This is most likely not final at all, and it'd be great to hear everyone's thoughts to refine it. And also to check that it still works as expected (in particular if I have inadvertently deleted something important!). And we can open dedicated issues for anything arising for discussions here.

Here is a schema of the draft new structure:

The guiding principles in refactoring were to try to:

keep each step separate, with a different class/script
minimised chained imports (file A imports B which imports C, file D also imports B etc.)
ensure that the main steps can be easily followed from cats.run
make it easy (when possible) to add more API/different optimisation tools in the future

As a reminder, the main steps of CATS are:

Read data/information from the command line and the config file
Pull forecast data from a carbon intensity (CI) API
Find the window of time (start time + end time) that would minimise average carbon intensity within the known forecast
(optional) Estimate the total energy used and carbon footprint by starting or not at the best time

An overview of some changes beyond structure

I aimed at keeping the functionalities as they were as much as possible, just trying to enhance when I could. Here are some changed:

Output as been changed a bit, here is the new one for now:

Using config.yml found in current directory
Using carbonintensity.org.uk for carbon intensity forecasts.
Using location provided: PE29

Carbon intensity forecast loaded for between 29/05/2023-20:30 UTC and 31/05/2023-20:30 UTC.

Best start time: 31/05/2023-12:00 UTC (in 39.1 hours)
	 Expected end time: 31/05/2023-17:50 UTC
	 Expected average carbon intensity: 70.03 gCO2e/kWh

Estimated carbon footprint of running job at best time: 10.68 gCO2e
	vs running it now: 13.26 gCO2e (2.58 gCO2e saved)
	vs running at worst time (31/05/2023-01:00 UTC): 17.54 gCO2e (6.87 gCO2e saved)
	- Estimated energy usage: 0.15 kWh

Regarding how to find the best start time:
- the "simple" method has been removed, as it doesn't really hold to minimise carbon intensity
- more discussion is needed regarding what data the carbon intensity.org.uk API sends back. More on this on issue How to best estimate an average carbon intensity over the duration of a job #42.
Now all carbon intensity values with timestamps (from the API or computed internally) are stored as a CarbonIntensityEstimate (defined in CI_api_interface.py but could be defined more centrally). It is a simple adaptation of the previously existing CarbonIntensityAverageEstimate and has properties such as value, start, end and timedelta and can be sorted by CI value.
Added some progress update/error messages throughout (more needed!)

How this release addresses existing open issues

What (should) happen at 48 hrs? #32 : as it stands, it only plans job within the forecast window (latest start time is end_of_forecast - duration). How to remove this constraints probably needs to be discussed further as it's not trivial.
Multinational support: research electricity grid APIs for other countries #22 : this new structure should make it much easier to add new countries, directly in CI_api_interface.py (e.g. here)
It should finish closing Calculate corresponding carbon footprints #3 as carbon footprint estimates are now fully functional

Open questions:

Why some things that are not errors are being printed on stderr and others on stdout? As a temporary measure, all updates/messages are sent to stdout and only errors/warnings to stderr, but would be great to hear some thoughts on that.
I suspect I may have broken the pipeline with the at linux scheduler with the different output, any thoughts on the best way to pass information on start time?

What is still needed

Not necessarily before merging, but here is a list non-exhaustive of what still needs to be done (the idea of this PR is to discuss this new structure while polishing the details).

Write more tests
Comment the code/functions more thoroughly
Check how it deals with edge cases, absurd values, etc. (and add error messages)
Rewrite README
Check how the new structure works with the at linux command

…d making the choice of CI API a parameter

…se_data`

…on footprint

…be nore versatile and be the only new object used

… APIs are added later)

tlestang

Thanks @Llannelongue for your work and the detailed PR description. As this is a large set of changes I won't be able to look at everything in one go.

About the API interface:

I don't see the benefit of introducing a new plain class for the API interface, compared to the currently implemented APIInterface namedtuple. It provides the same access to the request url and response parsing functions through dot notation and a lot less code.

More problematic to me is that the API-specific logic of forming the query URL and parsing the response is coupled back to the API interface code. Somebody wishing to interface cats with a new web API would have to modify the implementation of CI_API_interface itself, adding a new statememt if self.choice_CI_API == 'myapi.org' and hardcoding API-specific code in there. I feel like this is a step back from the current implementation (introduced in #37) in which interfacing against a different API is only a matter of providing two functions (possibly grouping them into a new APIInterface object) without having to touch the code of cats itself at any point.

Lastly, do we really want to rename api_query.py and api_interface.py ?

cats/CI_api_interface.py

Llannelongue · 2023-05-30T17:20:14Z

Thanks for these @tlestang

I don't see the benefit of introducing a new plain class for the API interface, compared to the currently implemented APIInterface namedtuple. It provides the same access to the request url and response parsing functions through dot notation and a lot less code.

More on this below after the second comment, but regarding class vs namedtuple, I don't mind as long as it's easy to switch between APIs (so we would need a master function to pick the right namedtuple in case it's not provided directly by the user). I can draft a new commit reversing to namedtuple.

More problematic to me is that the API-specific logic of forming the query URL and parsing the response is coupled back to the API interface code. Somebody wishing to interface cats with a new web API would have to modify the implementation of CI_API_interface itself, adding a new statememt if self.choice_CI_API == 'myapi.org' and hardcoding API-specific code in there. I feel like this is a step back from the current implementation (introduced in #37) in which interfacing against a different API is only a matter of providing two functions (possibly grouping them into a new APIInterface object) without having to touch the code of cats itself at any point.

This is an interesting point, and probably needs more thinking. However not sure about not having to touch the code, I see two different use cases here:

We or other contributors will want to add other CI APIs (e.g. for other countries), and we ideally want to make them part of CATS so that these new APIs are available to the whole community. In this case, it would be good to have all the URL/parsing codes in the same place and api_inferface is a good place for that (it also makes it easier to add things by copy-pasting). And in terms of how much hassle it is to add it, it's equivalent now and with the new code (api_interface needs to be modified either way, and current code requires messing with __init__.py as well), but the existing code doesn't allow user to easily pick an API, this is what the new argument --api-carbonintensity introduces.
Second use case is if users want to pass their own API wrapper directly to CATS without having to modify the code. And in this case I agree, it would be good to make it possible in an easier way. But how would that work in practice? It would be good to have an idea of how the user would do it if we want to implement it.

I opened issue #44 to have a space to discuss (2) further.

Lastly, do we really want to rename api_query.py and api_interface.py ?

The motivation behind that is to clarify that the code in there is related to pulling carbon intensities, in case more APIs for other parts of the code are included later. But at the moment, this is only future-proofing so I don't mind removing the CI_ in the names if it's an issue for now.

Llannelongue · 2023-05-30T20:57:47Z

Following up on that, commit 8c927c7 reintroduces namedtuple for the api interfaces.

tlestang · 2023-05-30T21:06:01Z

We or other contributors will want to add other CI APIs (e.g. for other countries), and we ideally want to make them part of CATS so that these new APIs are available to the whole community. In this case, it would be good to have all the URL/parsing codes in the same place and api_inferface is a good place for that (it also makes it easier to add things by copy-pasting).

Agreed 100%, and that's the motivation behind #37. But the separation between the API query code and the API specific bits wasn't done with a particular usage of cats in mind - its generally useful for developers adding new APIs because they don't have to touch the api query module. Turns out its also useful for users playing with cats as a library as they can actually bring they own functions. Sure, the right default API interface must be selected in the init script for now but that's about it. Could as well be specified at the command line.

tlestang · 2023-05-30T21:22:42Z

cats/optimise_starttime.py

@@ -0,0 +1,91 @@
+from datetime import datetime, timezone


I'll add my general comment on this module here so we can have a thread.

I think this PR adds useful missing pieces but I'm a bit uncomfortable with the fast it completely removes the timeseries_conversion and forecast modules. Makes more sense to me to build upon them instead of discarding them alltogether: they already contain most of the functionality (re)implemented in this module. Particularly, I'm not sure why WindowedForecast disappeared -- it was merged 4 days ago -- see #36 . I think it would be straightforward to make it work with a simple summation of intensity values.

That being a few welcome things in this module that are missing in the current implementation:

Detecting cases where the jon duration exceeds the maximum forecast duration.

Allowing to compute the windowed/summed/integrated intensity over the window by just summing intervals.

Handle the fact that the first and last interval in the time window are fraction of the default interval length (30min).

Again -- I think we'd be better off integrating the above in the current code, rather than reimplementing a whole new module from scratch. Unless of course we think that there are strong reasons to do so, but not obvious to me at this stage.

Regarding WindowedForecast: should definitely be kept, as highlighted in the PR release note and in the code here, the goal is to keep the trapezoidal integration as an option, especially while we figure out which one is right for the UK (issue #42). But it's a rather delicate function to get the bounds right (especially first and last segments), so rather than me just shoehorn in into this updated structure on my own, it'd be good to discuss about the best way to include it. Very happy to see suggestions.

Regarding forecast it's not removed, just renamed as I'm not sure forecast still describes what this part of the code is doing. But it still serves the same purpose.

As of timeseries_conversion (and most of the code in this PR), it's of course reusing existing code, sometimes grouped differently based on how usage has evolved. Which functions in particular from timeseries_conversion have been removed completely but still needed? My understanding is that csv_loader and cat_converter aren't needed anymore (as per PR #41), check_duration has just been moved to check_clean_arguments as it's what it does and get_lowest_carbon_intensity is now in optimise_starttime for the same reason.

the goal is to keep the trapezoidal integration as an option [..]. But it's a rather delicate function to get the bounds right (especially first and last segments), so rather than me just shoehorn in into this updated structure on my own, it'd be good to discuss about the best way to include it.

Don't you think it will more effort re-integrating something that is already there (and covered by a couple of tests) into new code? As opposed to start from what is there. Sorry -- I understand the appeal of writing that module from scratch, but I'm questioning whether it will make the next few steps actually harder. Obviously that's just my own humble opinion, let's see what other people have to say about this!

Regarding forecast it's not removed, just renamed as I'm not sure forecast still describes what this part of the code is doing. But it still serves the same purpose.

Did forecast become optimise_starttime? It looks like a new module.

I'll submit a commit later today that reintroduces WindowedForecast closer to its original form so that we can hopefully get the best of both worlds

Turns out that properly integrating the first and last time windows into the trapezoidal model was a bit harder than planned (didn't manage with the original implementation but this one is very close) but it's now done alongside a test scenario: commit
1aca7ab

Also @tlestang if you think testing the integration method on the sinus function is still needed, it can be put back in as it's only commented out for now, for now there is a test on a dummy carbon intensity forecast

cats/optimise_starttime.py

…st and last windows

andreww · 2023-06-23T15:27:59Z

I'm not sure exactly where we are with this one - does it make sense to discuss at the meeting on Wednesday. One possibility would be to try and chop the changes into smaller sets but I think some wider discussion around the architecture could be useful.

My two cents is that a useful approach to thinking about this is to consider other future uses of (maybe bits of) the code base and think about how they would work. So, for cats, I imagine a user who has some big python package providing a service that runs some regular processing, and it's the processing that they want to schedule (no need to integrate with a scheduler). Or, somebody wants to post-process job scheduler logs.

Some (partial) answers to the questions above:

Why some things that are not errors are being printed on stderr and others on stdout? As a temporary measure, all updates/messages are sent to stdout and only errors/warnings to stderr, but would be great to hear some thoughts on that.

The way the python currently interfaces with at is via a shell backtick expansion - stuff that comes out of stdout from cats goes into an argument to at and (from the point of view of at) everything else is ignored and goes to the terminal. We could sensibly use python's subprocess module to do this instead. This would give us more control but could limit what the user could do.

tlestang · 2023-07-14T16:26:28Z

I'm closing this for now as we discussed it won't be merged as-in. But let's keep the branch around for reference.

Llannelongue added 30 commits May 27, 2023 15:22

Create cats class with __init__ for args, config file and location

bcc5d64

Ignore Pycharm files from git

f0e4313

created a class for the API interface

252abf6

centralised API query in a class, calling itself the API interface an…

71ca36a

…d making the choice of CI API a parameter

Removed default value for choice_CI_API

b0b6285

rename postcode into location and clean postcode

adde463

rename postcode into location and added CI API choice

199cb1c

rename remaining postcode into location

802e703

integrate API forecast into main code

223340b

Convert timestamps[str] to datetime in `CI_API_interface.parse_respon…

7eef370

…se_data`

Parse forecast CIs as a list of CarbonIntensityPointEstimate objects

b9e1512

Find best start time using slightly modified estimation of total carb…

be4fc0c

…on footprint

Changed CarbonIntensityPointEstimate into CarbonIntensityEstimate to …

e88e030

…be nore versatile and be the only new object used

Integrated the best starttime into __init__

14309c2

improved clarity

c458ca2

renamed internal functions with _

b3be9b8

Print out progress and results

999a5e1

Updated main to new structure

05799df

Fixed internal imports

dca32cd

Rename API scripts to highlight that these are CI APIs (in case other…

417df12

… APIs are added later)

Added help to parser messages

670902c

Write out progress and warnings to stdout and stderr for clarity and UX

74eaa5a

Return averageCI for now

e61fdcd

Add carbon footprint estimation

76f99b4

delete files

d37da42

update import

7f9d809

move args cleaning to separate file

93127dc

validate API choices

0f67184

validate duration and location

19a996b

minor fix

d371ba5

Llannelongue requested review from tlestang, andreww, sadielbartholomew, abhidg, colinsauze, ljcolling and asw-v4 May 29, 2023 22:46

Llannelongue self-assigned this May 29, 2023

Llannelongue added the enhancement New feature or request label May 29, 2023

cleaned up imports

4dac3a0

This was linked to issues May 30, 2023

Refactor the code for more modularity + self-explanatory function names #26

Closed

Calculate corresponding carbon footprints #3

Closed

Llannelongue removed the enhancement New feature or request label May 30, 2023

tlestang reviewed May 30, 2023

View reviewed changes

cats/CI_api_interface.py Show resolved Hide resolved

Llannelongue mentioned this pull request May 30, 2023

Providing users with the option to pass their own API wrappers for carbon intensity #44

Open

reverting to namedtuples for the API interface

8c927c7

tlestang reviewed May 30, 2023

View reviewed changes

cats/optimise_starttime.py Outdated Show resolved Hide resolved

Llannelongue added 2 commits May 31, 2023 00:52

Replacing list storting sorting by min and max

99c88fa

Reincluded trapezoidal integration, and fixed the handling of the fir…

1aca7ab

…st and last windows

Llannelongue mentioned this pull request Jun 5, 2023

Compute 'now' average intensity estimate alongside optmised one #46

Closed

tlestang closed this Jul 14, 2023

tlestang mentioned this pull request Jul 23, 2023

Account for job start/end time not exactly matching forecast data points #54

Merged

colinsauze mentioned this pull request May 17, 2024

What (should) happen at 48 hrs? #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code (and some other changes) #43

Refactor code (and some other changes) #43

Llannelongue commented May 29, 2023

tlestang left a comment •

edited

Loading

Llannelongue commented May 30, 2023

Llannelongue commented May 30, 2023

tlestang commented May 30, 2023

tlestang May 30, 2023 •

edited

Loading

Llannelongue May 30, 2023 •

edited

Loading

tlestang May 31, 2023

Llannelongue Jun 1, 2023

Llannelongue Jun 5, 2023 •

edited

Loading

Llannelongue Jun 5, 2023

andreww commented Jun 23, 2023

tlestang commented Jul 14, 2023

Refactor code (and some other changes) #43

Refactor code (and some other changes) #43

Conversation

Llannelongue commented May 29, 2023

An overview of some changes beyond structure

How this release addresses existing open issues

Open questions:

What is still needed

tlestang left a comment • edited Loading

Choose a reason for hiding this comment

Llannelongue commented May 30, 2023

Llannelongue commented May 30, 2023

tlestang commented May 30, 2023

tlestang May 30, 2023 • edited Loading

Choose a reason for hiding this comment

Llannelongue May 30, 2023 • edited Loading

Choose a reason for hiding this comment

tlestang May 31, 2023

Choose a reason for hiding this comment

Llannelongue Jun 1, 2023

Choose a reason for hiding this comment

Llannelongue Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

Llannelongue Jun 5, 2023

Choose a reason for hiding this comment

andreww commented Jun 23, 2023

tlestang commented Jul 14, 2023

tlestang left a comment •

edited

Loading

tlestang May 30, 2023 •

edited

Loading

Llannelongue May 30, 2023 •

edited

Loading

Llannelongue Jun 5, 2023 •

edited

Loading