# 07: Hvordan fungerer forsystemobjektene, sånn *egentlig*?

**Forfatter:** Benedikt Goodman \
**Medhjelpere:** Mistral Large, ChatGPT-4, ChatGPT-4o


Forsystemet er den delen av nasjonalregnskapet som ivaretar sammenhengene mellom alle tidsseriene som vi bruker for å lage såkalte indikatorer. Indikatorer er som kjent estimatorer på tall vi ikke har enda, og som brukes av f.eks KNR-modellen til å produsere et bilde av hvordan det står til i norsk økonomi.

I denne delen av kurset skal vi se nærmere på hvordan forsystemet er bygget opp, og hvordan objektorientert programmering brukes til å danne byggeklossene i systemet. Vi kommer ikke til å studere hver bidige linje med kode, for de er det flere tusen av, men vi skal forstå det litt bedre enn bare å kunne instansiere objektene og bruke .evaluate().

## Recap: Forsystemet på et konseptuelt nivå.

*Fra dokumentasjonen til Magnus:*

**Indicator**-underklassen definerer et indikatorobjekt som favner om de fleste indikatorer i nasjonalregnskapet,

$$
  x_t = x_T\cdot\frac{k_t\sum_i w_{i,T} I_{i,t}}{\sum_{s\in T}k_s\sum_i w_{i,T} I_{i,s}},
$$

der $x$ er den aktuelle nasjonalregnskapsvariabelen, $w$ er vekter, $I$ er indikatorer. $T$ betegner basisåret.
$k$ er en korreksjon som er lik én med mindre brukeren ønsker å foreta en korreksjon.

**FDeflate**-underklassen tar utgangspunkt i en eksisterende formel (for eksempel en **Indicator**-instans) og deflaterer denne,

$$
  \sum_{s\in T}x_s\cdot\frac{k_t x_t/\sum_i w_{i,T} I_{i,t}}{\sum_{s\in T}k_s x_s/\sum_i w_{i,T} I_{i,s}}.
$$

**FInflate**-underklassen tar utgangspunkt i en eksisterende formel (for eksempel en **Indicator**-instans) og inflaterer denne,

$$
  \sum_{s\in T}x_s\cdot\frac{k_t x_t\sum_i w_{i,T} I_{i,t}}{\sum_{s\in T}k_s x_s\sum_i w_{i,T} I_{i,s}}.
$$

**FSum** summerer andre **Formula**-insanser, **FSumProd** lager et summerprodukt, **FMutlt** multipliserer to instanser, og **FDiv** dividerer.

Alle undeklassene har metodene **what** og **evaluate**. `formel.what` vil returnere en tekstlig representasjon av definisjonen på formelen. Dette lar brukeren spore seg tilbake til én eller flere **Indicator**-instanser (alle formler må til slutt ende i **Indicator**-instanser). `formel.evaluate(års_df, indikator_df, vekt_df, korreksjon_df)` returnerer en **Pandas**-serie som er den aktuelle formelen evaluert gjenstand for data.

## Systemets programmatiske logikk

Systemet har en baseclass som alle andre klasser i systemet arver fra, denne heter `Formula`. `Indicator` bruker (og endrer) metoder og attributter fra `Formula`. Klassene med `F...` som `FDeflate` eller `FDiv` tar imot `Indicator` instanser og lager instrukser for hvordan dataene skal behandles i henhold til hva slags type objekt det er. F.eks så vil `FSum` lage en ny formel som er en ren summering av `Indicator`-objektene man gir den.

"Ok, kult. Men hva er en Baseclass?" tenker du kanskje nå. Til å forklare har jeg fått litt hjelp og derfor er resten av denne notebooken på engelsk.

## Baseclasses, the fundamental building block of most libraries

In object-oriented programming (OOP), a base class (also known as a superclass or parent class) is a class that provides common attributes and methods that other classes (called derived classes, subclasses, or child classes) can inherit and use.

Key Concepts of a Base Class

**Inheritance:**
- The base class defines general characteristics and behaviors that can be shared by multiple subclasses.
- Subclasses inherit these characteristics and behaviors, meaning they automatically have the attributes and methods defined in the base class.

**Reuse:**
- By defining common functionality in a base class, you can avoid repeating code in multiple subclasses. This makes your code more modular and easier to maintain.

**Extensibility:**
- Subclasses can add their own unique attributes and methods, or override the inherited methods to provide specific behaviors.

### A brief example of using a baseclass and inheritance to make a class hierarchy

In [1]:
class Vehicle:
    """
    The base class representing a generic vehicle.
    It provides common attributes and methods that all vehicles have.
    """
    def __init__(self, make, model):
        # Initialize the make and model of the vehicle
        self.make = make
        self.model = model

    def start_engine(self):
        # Method to start the vehicle's engine
        print("Engine started")


class Car(Vehicle):
    """
    A subclass of Vehicle representing a car.
    Inherits common vehicle attributes and methods, and adds specific attributes and methods for cars.
    """
    def __init__(self, make, model, num_doors):
        # Initialize the make, model, and number of doors of the car
        super().__init__(make, model)
        self.num_doors = num_doors

    def honk_horn(self):
        # Method to honk the car's horn
        print("Beep beep!")


class Racecar(Car):
    """
    A subclass of Car representing a racecar.
    Inherits common car attributes and methods, adds specific attributes and methods for racecars,
    and overrides some methods.
    """
    def __init__(self, make, model, num_doors, team):
        # Initialize the make, model, number of doors, and team of the racecar
        super().__init__(make, model, num_doors)
        self.team = team

    # Override method from parent class to provide specific behavior for racecar
    def honk_horn(self):
        print('Meep meep!')

    def crash_and_burn(self):
        # Method specific to racecars to simulate a crash
        print(f'Oh dear, the racecar of team {self.team} crashed and burnt. The driver made it out, but is now scarred for life. Sad.')


class Bus(Vehicle):
    """
    A subclass of Vehicle representing a bus.
    Inherits common vehicle attributes and methods, and adds specific attributes and methods for buses.
    """
    def __init__(self, make, model, num_passengers):
        # Initialize the make, model, and number of passengers of the bus
        super().__init__(make, model)
        self.num_passengers = num_passengers

    def announce_stops(self):
        # Method to announce the next stop
        print("Next stop: ...")


### A UML (Unified Modeling Language) diagram of the class hierarchy

In programming it is sometimes useful to make diagrams that show the how classes, methods and attributes related shared in a hierarchy. One such way of doing this is to make a Unified Modeling Language graph. Here I've enlisted chatGPT to create an ascii version of the classes above for me.

## The `Formula` class as a base class in a hierarchy

What is the Formula Class?

The `Formula` class is like a blueprint for creating objects that represent mathematical formulas. It provides a basic structure and functionality that other more specific types of formulas (like Indicator, FDeflate, etc.) can build on.


### An overview of the class

**Attributes (Data)**
- `_baseyear`: Stores the base year for calculations. It can be an integer or None.
- `_name`: Stores the name of the formula in lowercase.
- `_calls_on`: Keeps track of other formulas that this formula depends on.

**Methods (Actions)** \
`__init__`: This is the initializer method. When you create a new Formula object, you provide a name, and it sets up the initial values.\
Properties:
- `name`: The name of the formula.
- `baseyear`: Gets or sets the base year.
- `what`: Intended to describe what the formula does. It returns an empty string here but is meant to be overridden by subclasses.
- `calls_on`: Returns the dictionary of dependent formulas.
- `indicators` and `weights`: Intended to return lists related to the formula's indicators and weights. These are empty here but can be overridden.

**Other Methods:**
- `__repr__`: Provides a string representation of the formula object.
- `__call__`: Allows the object to be used like a function to evaluate the formula.
- `__add__`, `__mul__`, `__truediv__`: Define how child objects can be combined using +, *, and / operators. *I.e. they enable python to do math with these objects.* 
- `info`: Prints details about the formula and its dependencies.
- `indicators_weights`: Placeholder method to return indicato- rs and weights. 
- `evaluate`: Evaluates the formula using provided data. This method is a placeholder and is intended to be overridden by subclasses
- `_check_df`: A helper method to validate data frames used in evaluations.


## A UML model of the entire hierarchy

Below is a model of all the objects related to Formula (i.e. that inhert from it), all methods, attributes and properties. It's a lot to take in. However, notice how many of the same properties and methods occur across all objects. This illustrates how Formula is a blueprint for the other classes.

**NB!: Most classes in this hierarchy take in `Indicator` objects and do transformations on them. So, while the `Formula` object is the base-class that works as the fundamental building block of the library, the `Indicator` child-class is what the users will be using to make the pre-system.**

In [2]:
%%html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>UML Diagram</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            background-color: #2c2c2c;
            color: #ffffff;
            display: flex;
            flex-direction: column;
            align-items: center;
        }
        .class-box {
            border: 2px solid #ffcc00;
            border-radius: 8px;
            padding: 10px;
            margin: 10px;
            background-color: #404040;
            display: inline-block;
            text-align: left;
            width: 300px;
        }
        .class-name {
            font-weight: bold;
            font-size: 1.2em;
            color: #ffcc00;
            margin-bottom: 5px;
        }
        .attributes, .methods {
            margin: 5px 0;
        }
        .attributes-title, .methods-title {
            font-weight: bold;
            margin-top: 10px;
        }
        .methods {
            color: #cccccc;
        }
        .container {
            text-align: center;
        }
        .inheritance {
            display: flex;
            justify-content: center;
            background-color: #333333;
            padding: 10px;
            border-radius: 8px;
            margin-top: -10px;
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="class-box">
            <div class="class-name">Formula</div>
            <div class="attributes-title">Attributes</div>
            <div class="attributes">
                - _baseyear: int | None<br>
                - _name: str<br>
                - _calls_on: dict[str, Formula]
            </div>
            <div class="methods-title">Methods</div>
            <div class="methods">
                + __init__(name: str)<br>
                + name: str<br>
                + baseyear: int | None<br>
                + baseyear(baseyear: int)<br>
                + what: str<br>
                + calls_on: dict[str, Formula]<br>
                + indicators: list[str]<br>
                + weights: list[str] | list[float]<br>
                + __repr__()<br>
                + __call__(annual_df, ...)<br>
                + __add__(other: Formula)<br>
                + __mul__(other: Formula)<br>
                + __truediv__(other: Formula)<br>
                + info(i: int)<br>
                + indicators_weights(trace: bool)<br>
                + evaluate(annual_df, ...)<br>
                + _check_df(df_name, df, baseyear, frequency)
            </div>
        </div>
        <div class="inheritance">
            <div class="class-box">
                <div class="class-name">Indicator</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _annual: str<br>
                    - _indicators: list[str]<br>
                    - _weights: list[str]<br>
                    - _correction: str<br>
                    - _normalise: bool<br>
                    - _aggregation: str
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, annual, indicators, weights, correction, normalise, aggregation)<br>
                    + indicators: list[str]<br>
                    + weights: list[str]<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FDeflate</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula: Formula<br>
                    - _indicators: list[str]<br>
                    - _weights: list[str]<br>
                    - _correction: str<br>
                    - _normalise: bool
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formula, indicators, weights, correction, normalise)<br>
                    + indicators: list[str]<br>
                    + weights: list[str]<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FInflate</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula: Formula<br>
                    - _indicators: list[str]<br>
                    - _weights: list[str]<br>
                    - _correction: str<br>
                    - _normalise: bool
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formula, indicators, weights, correction, normalise)<br>
                    + indicators: list[str]<br>
                    + weights: list[str]<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
        </div>
        <div class="inheritance">
            <div class="class-box">
                <div class="class-name">FSum</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formulae: list[Formula]
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, *formulae)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FSumProd</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formulae: list[Formula]<br>
                    - _weights: list[float]
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formulae, weights)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FMult</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula1: Formula<br>
                    - _formula2: Formula
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formula1, formula2)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FDiv</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula1: Formula<br>
                    - _formula2: Formula
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formula1, formula2)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
        </div>
        <div class="inheritance">
            <div class="class-box">
                <div class="class-name">AddCorr</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula: Formula<br>
                    - _correction_name: str
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(formula, correction_name)<br>
                    + baseyear: int | None<br>
                    + baseyear(baseyear: int)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">MultCorr</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula: Formula<br>
                    - _correction_name: str
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(formula, correction_name)<br>
                    + baseyear: int | None<br>
                    + baseyear(baseyear: int)<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
            <div class="class-box">
                <div class="class-name">FJoin</div>
                <div class="attributes-title">Attributes</div>
                <div class="attributes">
                    - _formula1: Formula<br>
                    - _formula0: Formula<br>
                    - _from_year: int
                </div>
                <div class="methods-title">Methods</div>
                <div class="methods">
                    + __init__(name, formula1, formula0, from_year)<br>
                    + indicators: list[str]<br>
                    + what: str<br>
                    + indicators_weights(trace: bool)<br>
                    + evaluate(annual_df, indicators_df, weights_df, correction_df, test_dfs)
                </div>
            </div>
        </div>
    </div>
</body>
</html>


## A closer look of what happens when we instantiate the `Indicator` class



In [3]:
from pre_system.formula import Indicator

# Lets make an indicator object called xa1
xa1 = Indicator(
    name="xa1", 
    annual="xa", 
    indicators=["x0", "x1", "x2"], 
    weights=[0.4, 0.3, 0.3]
)




Below is are snippets of sourcecode that applies when we instantiate the Indicator object. Remember that Indicator inherits from Formula. As such, all methods and properties carry over from Formula to Indicator and will behave the same unless overidden.

```python

# The parent class
class Formula:
    _baseyear: int | None = None
    
    # The baseclass only really has a name of the formula.
    def __init__(self, name: str) -> None:
        """Initialize a Formula instance.

        Parameters
        ----------
        name : str
            The name of the formula.

        Raises:
        ------
        TypeError
            If `name` is not a string.
        """
        if not isinstance(name, str):
            raise TypeError("name must be str")
        self._name = name.lower()
        self._baseyear: int | None = None
        self._calls_on: dict[str, Formula] = {}
    
    # These allow users to access the properties via writing formula_instance.name or .baseyear
    @property
    def name(self) -> str:
        return self._name

    @property
    def baseyear(self) -> int | None:
        return self._baseyear

    @baseyear.setter
    def baseyear(self, baseyear: int) -> None:
        if not isinstance(baseyear, int):
            raise TypeError("baseyear must be int")
        self._baseyear = baseyear

    @property
    def what(self) -> str:
        return ""

    @property
    def calls_on(self) -> dict[str, Formula]:
        return self._calls_on

    @property
    def indicators(self) -> list[str]:
        return []

    @property
    def weights(self) -> list[str] | list[float]:
        return []

    def __repr__(self) -> str:
        return f"Formula: {self.name} = {self.what}"


# Here we see that indicator inherits from Formula
class Indicator(Formula):
    

    def __init__(
        self,
        # Mandatory positional arguments
        name: str,
        annual: str,
        indicators: list[str],
        # Optional default arguments. Takes default values unless defined otherwise
        weights: list[str] | list[float] | None = None,
        correction: str | None = None,
        normalise: bool = False,
        aggregation: str = "sum",
    ) -> None:
        """Initialize an Indicator object.

        Parameters
        ----------
        name : str
            The name of the indicator.
        annual : str
            The name of the annual data.
        indicator_names : list[str]
            The list of indicator names.
        weight_names : list[str], optional
            The list of weight names, by default None.
        correction_name : str, optional
            The name of the correction data, by default None.

        Raises:
        ------
        IndexError
            If `weight_names` is provided and has a different length than `indicator_names`.
        """
        super().__init__(name)
        # Input validation
        if not isinstance(annual, str):
            raise TypeError("annual must be str")
        if not isinstance(indicators, list):
            raise TypeError("indicator_names must be a list")
        if not all(isinstance(x, str) for x in indicators):
            raise TypeError("indicator_names must containt str")
        if weights and len(weights) != len(indicators):
            raise IndexError("weight_names must have same length as indicator_names")
        if weights and not all(isinstance(x, type(weights[0])) for x in weights):
            raise TypeError("all weights must be of same type")
        if aggregation.lower() not in ["sum", "avg"]:
            raise NameError("aggregation must be sum or avg")
            
        # Storage of arguments as attributes inside the object, note how they are all private
        self._annual = annual
        self._indicators = [x.strip() for x in indicators]
        self._weights = [] if weights is None else weights
        self._correction = correction
        self._normalise = normalise
        self._aggregation = aggregation.lower()
    
    # The below defines behaviour of the stored attributes. Some just return the attribute,
    # others, like the what property change the output according to rules before being displayed
    @property
    def indicators(self) -> list[str]:
        return self._indicators

    @property
    def weights(self) -> list[str] | list[float]:
        if self._weights:
            return self._weights
        return [1.0 for _ in self.indicators]

    @property
    def what(self) -> str:
        correction = f"{self._correction}*" if self._correction else ""

        if self._normalise:
            indicators = [
                f"{x}/sum({x}<date {self.baseyear}>)" for x in self._indicators
            ]
        else:
            indicators = self._indicators

        if self._weights:
            aggregated_indicators = "+".join(
                [
                    "*".join([str(x).lower(), y.lower()])
                    for x, y in zip(self._weights, indicators)
                ]
            )
        else:
            aggregated_indicators = "+".join([x.lower() for x in indicators])

        numerator = f"{correction}({aggregated_indicators})"
        denominator = f"{self._aggregation}({numerator}<date {self.baseyear}>)"
        fraction = f"{numerator}/{denominator}"

        return f"{self._annual.lower()}*<date {self.baseyear}>*{fraction}"
    
    # ....
    
```

The TLDR version of what happens behind the scenes when we instantiate the `Indicator` class:

- You create an Indicator object with specific attributes.
- It first sets up basic details like the name (in lowercase) and empty/default values for some attributes.
- It then adds specific details about the indicators, weights, and other parameters.
- Performs checks to make sure the input is valid.
- Your `Indicator` object (xa1) is now ready to use.


Here is the generalised formula for the indicators we use in the pre-system:

$$
  x_t = x_T\cdot\frac{k_t\sum_i w_{i,T} I_{i,t}}{\sum_{s\in T}k_s\sum_i w_{i,T} I_{i,s}},
$$

Notice that this is essentially the same what we get when we trigger the `.what` attribute of an indicator. When we instantiate an indicator object we've essentially just filled out the fields equation.

In [4]:
# Sets baseyear
xa1.baseyear = 2010

# Look at the formula
xa1.info()

# Alternatively to get it as a string and not as print
xa1.what

xa1 = xa*<date 2010>*(0.4*x0+0.3*x1+0.3*x2)/sum((0.4*x0+0.3*x1+0.3*x2)<date 2010>)


'xa*<date 2010>*(0.4*x0+0.3*x1+0.3*x2)/sum((0.4*x0+0.3*x1+0.3*x2)<date 2010>)'

## The `.evaluate()` method

To use the `.evaluate()` method we first need to make some annual data and some indicator data. The method **calculates the series we want to estimate using the indicator series we've given to the `Indicator` object.**

I.e. it just calculates $x_t$

Here is an example to proof what's going on

In [5]:
import pandas as pd
import numpy as np

years = 14

# Annual data
annual_df = pd.DataFrame(
    np.exp(0.02 + np.random.normal(0, 0.01, (years, 10)).cumsum(axis=0)),
    columns=[f"x{i}" for i in "abcdefghij"],
    index=pd.period_range(start="2010", periods=years, freq="Y"),
)

annual_df.head()

Unnamed: 0,xa,xb,xc,xd,xe,xf,xg,xh,xi,xj
2010,1.01957,1.03076,1.020887,1.025334,1.028908,1.026316,1.013679,1.025766,1.026009,1.020325
2011,1.023118,1.034992,1.024469,1.012426,1.028389,1.018402,1.005889,1.034341,1.026612,1.023208
2012,1.037522,1.043341,1.033177,1.021352,1.032537,1.008865,0.992037,1.043256,1.022171,1.022008
2013,1.028696,1.041391,1.037473,1.023224,1.029041,1.014295,1.00534,1.025206,1.000156,1.0147
2014,1.030056,1.036476,1.034451,1.009878,1.050405,0.997726,1.022474,1.044302,1.010709,1.040385


In [6]:
# Indicator data
indicator_df = pd.DataFrame(
    np.exp(0.02 + np.random.normal(0, 0.01, (years * 12, 10)).cumsum(axis=0)),
    columns=[f"x{i}" for i in range(5)] + [f"p{i}" for i in range(5)],
    index=pd.period_range(start="2010-01", periods=years * 12, freq="M"),
)

indicator_df.head()

Unnamed: 0,x0,x1,x2,x3,x4,p0,p1,p2,p3,p4
2010-01,1.025718,1.014746,1.026154,1.012834,1.00643,1.033913,1.012689,1.009228,1.012362,1.031728
2010-02,1.02028,1.012721,1.038317,1.024799,1.014637,1.029176,1.021844,1.002349,1.020483,1.045927
2010-03,1.025104,1.003164,1.029349,1.02095,1.012526,1.027394,1.033959,0.99082,1.027709,1.048834
2010-04,1.027958,1.0022,1.02161,1.033243,1.019329,1.028511,1.029923,1.01413,1.046426,1.061091
2010-05,1.029246,0.991116,1.000632,1.03071,1.030005,1.025052,1.018093,1.016343,1.042335,1.050544


In [7]:
# Here we make the xa1 series using the evaluate method
xa1.evaluate(annual_df, indicator_df)

2010-01    0.086426
2010-02    0.086499
2010-03    0.086192
2010-04    0.086068
2010-05    0.085299
             ...   
2023-08    0.087124
2023-09    0.088223
2023-10    0.088118
2023-11    0.088165
2023-12    0.088161
Freq: M, Length: 168, dtype: float64

## Proof

In [8]:
# Define constants
baseyear = 2010
weights = [0.4, 0.3, 0.3]
indicators = ['x0', 'x1', 'x2']

# We dont apply any correction factor, so the default is 1
k = pd.Series(1, index=indicator_df.index)

# The value of xa in 2010
xa = annual_df.loc['2010', 'xa']

# Calculate the weighted sum of indicators for each period
weighted_indicators = (indicator_df[indicators] * weights).sum(axis=1)

# Calculate the numerator
numerator = k * weighted_indicators

# Calculate the denominator
denominator = numerator.loc[numerator.index.year == baseyear].sum()

# Calculate the result
result = xa * (numerator / denominator)

result

2010-01    0.086426
2010-02    0.086499
2010-03    0.086192
2010-04    0.086068
2010-05    0.085299
             ...   
2023-08    0.087124
2023-09    0.088223
2023-10    0.088118
2023-11    0.088165
2023-12    0.088161
Freq: M, Length: 168, dtype: float64

To test that the result of our mock calculation and the `.evaluate()` method are approximately similar we will use the `assert_series_equal` method from `pandas.testing`

In [9]:
import pandas.testing as pdt

try:
    pdt.assert_series_equal(result, xa1.evaluate(annual_df, indicator_df), atol=0.00001)
    print("The series are approximately equal.")
except AssertionError as e:
    print("The series are not approximately equal.")
    print(e)

The series are approximately equal.


## A closer look at the source code inside the Indicator.evaluate() metho


```python

# Method from Formula object
class Formula():
    def __init(*args, **kwargs):
        
        # rest of class goes here
        
    # Method that checks that conditions are met for DataFrame to be valid input
    @staticmethod
    def _check_df(
        df_name: str, df: pd.DataFrame, baseyear: int, frequency: str | None = None
    ) -> None:
        if not isinstance(df, pd.DataFrame):
            raise TypeError(f"{df_name} must be a Pandas.DataFrame")
        if not isinstance(df.index, pd.PeriodIndex):
            raise AttributeError(f"{df_name}.index must be Pandas.PeriodIndex")
        if frequency and df.index.freq != frequency:
            raise AttributeError(f"{df_name} must have frequency {frequency}")
        if baseyear not in df.index.year:
            raise IndexError(f"baseyear {baseyear} is out of range for annual_df")
        if not all(np.issubdtype(df[x].dtype, np.number) for x in df.columns):  # type: ignore [arg-type]
            raise TypeError(f"All columns in {df_name} must be numeric")

    def evaluate(
        self,
        annual_df: pd.DataFrame,
        indicators_df: pd.DataFrame,
        weights_df: pd.DataFrame | None = None,
        correction_df: pd.DataFrame | None = None,
        test_dfs: bool = True,
    ) -> pd.Series:
        """Evaluate the formula using the provided data.

        This function is only used by subclasses to check preconditions. In this
        baseclass it returns a dummy pd.Series object which is not used.

        Args:
            annual_df: The annual data used for evaluation.
            indicators_df: The indicator data used for evaluation.
            weights_df: The weight data used for evaluation. Optional and defaults to None.
            correction_df: The correction data used for evaluation. Ootional and defaults to None.
            test_dfs: If dataframes should be tested or not.

        Returns:
            A dummy pd.Series object. The return value is only valid for subclasses.

        Raises:
            ValueError: If the base year is not set or is out of range for the provided
                data.
            AttributeError: If the index of any input DataFrame is not a Pandas
                PeriodIndex or if the frequency is incorrect.
        """
        # Does input validation
        if self.baseyear is None:
            raise ValueError("baseyear is None")
        
        # More validation, essentially defines a set of default checks to run
        # through whenever the method is being used by its child classes
        if test_dfs:
            self._check_df("annual_df", annual_df, self.baseyear, "YE")
            self._check_df("indicators_df", indicators_df, self.baseyear)

            if weights_df is not None:
                self._check_df("weights_df", weights_df, self.baseyear, "YE")

            if not isinstance(indicators_df.index, pd.PeriodIndex):
                raise AttributeError("indicators_df.index must be Pandas.PeriodIndex")

            if correction_df is not None:
                self._check_df(
                    "correction_df",
                    correction_df,
                    self.baseyear,
                    indicators_df.index.freqstr,
                )
                
        # If all is well, return empty series
        return pd.Series()


class Indicator(Formula):
    def __init__(
        self,
        name: str,
        annual: str,
        indicators: list[str],
        weights: list[str] | list[float] | None = None,
        correction: str | None = None,
        normalise: bool = False,
        aggregation: str = "sum",
    ) -> None:
        """Initialize an Indicator object.

        Parameters
        ----------
        name : str
            The name of the indicator.
        annual : str
            The name of the annual data.
        indicator_names : list[str]
            The list of indicator names.
        weight_names : list[str], optional
            The list of weight names, by default None.
        correction_name : str, optional
            The name of the correction data, by default None.

        Raises:
        ------
        IndexError
            If `weight_names` is provided and has a different length than `indicator_names`.
        """
        super().__init__(name)
        
        # MOAR INPUT CHECKS
        if not isinstance(annual, str):
            raise TypeError("annual must be str")
        if not isinstance(indicators, list):
            raise TypeError("indicator_names must be a list")
        if not all(isinstance(x, str) for x in indicators):
            raise TypeError("indicator_names must containt str")
        if weights and len(weights) != len(indicators):
            raise IndexError("weight_names must have same length as indicator_names")
        if weights and not all(isinstance(x, type(weights[0])) for x in weights):
            raise TypeError("all weights must be of same type")
        if aggregation.lower() not in ["sum", "avg"]:
            raise NameError("aggregation must be sum or avg")

        self._annual = annual
        self._indicators = [x.strip() for x in indicators]
        self._weights = [] if weights is None else weights
        self._correction = correction
        self._normalise = normalise
        self._aggregation = aggregation.lower()

    
    # rest of class....
    
    
    def evaluate(
        self,
        annual_df: pd.DataFrame,
        indicators_df: pd.DataFrame,
        weights_df: pd.DataFrame | None = None,
        correction_df: pd.DataFrame | None = None,
        test_dfs: bool = True,
    ) -> pd.Series:
        """Evaluate the data using the provided DataFrames and return the evaluated series.

        Parameters
        ----------
        annual_df : pd.DataFrame
            The DataFrame containing annual data.
        indicators_df : pd.DataFrame
            The DataFrame containing indicator data.
        weights_df : pd.DataFrame, optional
            The DataFrame containing weight data. Defaults to None.
        correction_df : pd.DataFrame, optional
            The DataFrame containing correction data. Defaults to None.

        Raises:
        ------
        ValueError
            If the baseyear is not set.
        TypeError
            If any of the input DataFrames is not of type pd.DataFrame.
        AttributeError
            If the index of any DataFrame is not of type pd.PeriodIndex or has incorrect frequency.
        IndexError
            If the baseyear is out of range for any of the DataFrames.
        NameError
            If the required column names are not present in the DataFrames.

        Returns:
        -------
        pd.Series
            The evaluated series.
        """
        # Instantiate the method from the parent method (Formula) that does input checks
        super().evaluate(
            annual_df, indicators_df, weights_df, correction_df, test_dfs=test_dfs
        )
        
        # EVEN MORE INPUT CHECKS
        if not isinstance(annual_df.index, pd.PeriodIndex):
            raise AttributeError("annual_df.index must be Pandas.PeriodIndex")

        if self._annual not in annual_df.columns:
            raise NameError(f"Cannot find {self._annual} in annual_df")

        if any(x not in indicators_df.columns for x in self._indicators):
            missing = [x for x in self._indicators if x not in indicators_df.columns]
            raise NameError(f'Cannot find {",".join(missing)} in indicators_df')
        
        # Extract relevant indicators for our series
        indicator_matrix = indicators_df.loc[:, self._indicators]
        
        if not isinstance(indicator_matrix.index, pd.PeriodIndex):
            raise AttributeError("indicator_matrix.index must be Pandas.PeriodIndex")

        if self._normalise:
            indicator_matrix = indicator_matrix.div(
                indicator_matrix.loc[indicator_matrix.index.year == self.baseyear].sum()
            )
        
        # Check weights if present, if all is well set to basis year
        if self._weights:
            if all(isinstance(x, str) for x in self._weights):
                if weights_df is None:
                    raise NameError(f"{self.name} expects weights_df")
                if any(x not in weights_df.columns for x in self._weights):
                    missing = [x for x in self._weights if x not in weights_df.columns]  # type: ignore
                    raise NameError(f'Cannot find {",".join(missing)} in weights_df')
                if not isinstance(weights_df.index, pd.PeriodIndex):
                    raise AttributeError("weights_df.index must be Pandas.PeriodIndex")
                weight_vector = weights_df.loc[
                    weights_df.index.year == self.baseyear, self._weights
                ].to_numpy()  # type: ignore [misc]

            if all(isinstance(x, float) for x in self._weights):
                weight_vector = np.array([self._weights])

            weighted_indicators = pd.Series(
                indicator_matrix.to_numpy().dot(weight_vector.transpose())[:, 0],
                index=indicators_df.index,
            )
        
        # If no weights, sum indicators
        else:
            weighted_indicators = indicator_matrix.sum(axis=1, skipna=False)
        
        # If there are corrections, multiply the weighted indicators by the corrections
        # But first some validation of inputs :)
        if self._correction:
            if correction_df is None:
                raise NameError(f"{self.name} expects correction_df")
            if self._correction not in correction_df.columns:
                raise NameError(f"{self._correction} is not in correction_df")
            # Calculation happens here
            corrected_indicators = (
                weighted_indicators * correction_df.loc[:, self._correction]
            )
        else:
            # ...or here if no corrections are present
            corrected_indicators = weighted_indicators
        
        # The actual calculation of xt happens here
        evaluated_series = annual_df.loc[
            annual_df.index.year == self.baseyear, self._annual
        ].to_numpy() * corrected_indicators.div(
            corrected_indicators.loc[
                corrected_indicators.index.year == self.baseyear
            ].sum()
            if self._aggregation == "sum"
            else corrected_indicators.loc[
                corrected_indicators.index.year == self.baseyear
            ].mean()
        )
        
        return evaluated_series  # type: ignore [no-any-return]

```