Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain source lineage when multiplying deterministic uni-source BeliefsDataFrames #33

Open
Flix6x opened this issue Nov 3, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@Flix6x
Copy link
Collaborator

Flix6x commented Nov 3, 2020

I'm proposing a new feature here (multiplying data from different sensors), and would like to discuss how to handle source lineage. That is, being able to track the source of a data value.

Consider the case of multiplying power data (MW units) with price data ($/MWh units) to obtain costs data ($/h units).
The relevant operations for each of the columns / index levels in the BeliefsDataFrame should be:

  • event_value: multiply
  • event_start: outer join *
  • event_end: outer join *
  • belief_time: max **
  • belief_horizon: min **
  • source: ?
  • cumulative_probability: 0.5 (deterministic BeliefsDataFrames only for now)

How should we handle the source? Three options I can think of: ***

  1. Set the source to None, accepting the loss of source lineage (note we already lose one of the two belief_times)
  2. Set the source to a list of BeliefSources, introducing the concept of multi-sourced values.
  3. Create a new BeliefSource (if it doesn't exist already), which leaves open the possibility of source lineage if the new source holds information about the component sources. That is, it can later be modelled as an AggregatedBeliefSource, which subclasses BeliefSource, which also introduces the concept of multi-sourced values.

* In case one of the frames uses event_start and the other uses event_end, respect the index perspective of the first frame.
** In case one of the frames uses belief_time and the other uses belief_horizon, respect the index perspective of the first frame.
*** There is an analogy here to the issue of how we would handle the sensor attribute of the resulting BeliefsDataFrame: 1. None, 2. a list of Sensors or 3. a new AggregatedSensor.

@Flix6x Flix6x added the enhancement New feature or request label Nov 3, 2020
@nhoening
Copy link
Contributor

nhoening commented Nov 3, 2020

I'd say we should consider how this multiplication feature will be used most of the time. Is the result going to be persisted or is the result used temporarily? In the former case, a source might be interesting to have, not in the latter. But even in the former case, it's easy to add a source after multiplication.

I believe in our current usage, we are doing the latter. We might use the former for performance reasons (caching multiplications), but we're not exactly sure yet.

I find the idea of multi-source values interesting, but I would separate them from this feature for now. I would make a comment about the information being lost (also the belief_horizon bit).

@Flix6x
Copy link
Collaborator Author

Flix6x commented Nov 3, 2020

Separating this issue from the multiplication feature doesn't necessarily make things easier.

If we set the resulting source to None, then the resulting frame will not be a valid BeliefsDataFrame anymore (several properties of our subclass would fail), so the result would have to be a pandas DataFrame. That also means we lose the slicing and plotting methods of our subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants