# Introduction

This notebook is a demonstration of the *Multivariable Linear Regression*

The *Multivariable Linear Regression* package (short: `mvlr`) can be found in the `openergyid.mvlr` module.

## Step 1: Importing the libraries

In [21]:
from openenergyid.mvlr import MultiVariableRegressionInput, find_best_mvlr

## Step 2: Loading Input Data

In the `data` folder, a few json samples are stored.

The input model is described by the `MultivariableRegressionInput` class. You can find the class definition in the `openergyid.mvlr` module, or you can use `MultiVariableRegressionInput.model_json_schema()` to get the json schema of the input model.

By using the `MultiVariableRegressionInput.model_validate_json()` method, you can load the input model from a json file with automatic validation. Regular constructors are also available.

In [19]:
with open('data/mvlr/sample_solar.json', 'r') as f:
    input_data = MultiVariableRegressionInput.model_validate_json(f.read())

In [20]:
input_data.model_dump()

{'timezone': 'Europe/Brussels',
 'independent_variables': [{'name': 'temperatureEquivalent',
   'variants': ['HDD_16.5', 'CDD_24', 'HDD_15', 'FDD_0']},
  {'name': 'solarRadiation', 'variants': []},
  {'name': 'windPower', 'variants': []}],
 'dependent_variable': 'energyProduction/solarPhotovoltaic',
 'frame': {'columns': ['energyProduction/solarPhotovoltaic',
   'temperatureEquivalent',
   'solarRadiation',
   'windPower'],
  'index': [datetime.datetime(2021, 1, 1, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 2, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 3, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 4, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 5, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 6, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 7, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 8, 0, 0, tzinfo=TzInfo(+01:00)),
   datetime.datetime(2021, 1, 9, 0, 0, tzinfo=TzInfo(+01:00

## Step 4: Running the model

The `find_best_mvlr` function is the main function of the `mvlr` module. It takes an input model and returns a `MultivariableRegressionOutput` object.

In [22]:
result = find_best_mvlr(input_data)

In [23]:
result.model_dump()

{'dependent_variable': 'energyProduction/solarPhotovoltaic',
 'independent_variables': [{'name': 'solarRadiation',
   'coef': 0.0024924735991862625,
   't_stat': 12.250914745542833,
   'p_value': 2.404202661661546e-07,
   'std_err': 0.0002034520401909647,
   'confidence_interval': {'confidence': 0.95,
    'lower': 0.0020391542039252418,
    'upper': 0.0029457929944472832}}],
 'r2': 0.9375331511967008,
 'r2_adj': 0.9312864663163709,
 'f_stat': 150.08491210255892,
 'prob_f_stat': 2.40420266166154e-07,
 'intercept': {'name': 'Intercept',
  'coef': 2.733302450232918,
  't_stat': 0.13401884485997137,
  'p_value': 0.8960465198298829,
  'std_err': 20.39491127601338,
  'confidence_interval': {'confidence': 0.95,
   'lower': -42.70939174623032,
   'upper': 48.175996646696156}},
 'granularity': <Granularity.P1M: 'P1M'>,
 'frame': {'columns': ['energyProduction/solarPhotovoltaic', 'solarRadiation'],
  'index': [Timestamp('2021-01-01 00:00:00+0100', tz='Europe/Brussels'),
   Timestamp('2021-02-01 