# Introduction

This notebook is a demonstration of the *Multivariable Linear Regression*

The *Multivariable Linear Regression* package (short: `mvlr`) can be found in the `openergyid.mvlr` module.

## Step 1: Importing the libraries

In [1]:
from openenergyid.mvlr import MultiVariableRegressionInput, find_best_mvlr

## Step 2: Loading Input Data

In the `data` folder, a few json samples are stored.

The input model is described by the `MultivariableRegressionInput` class. You can find the class definition in the `openergyid.mvlr` module, or you can use `MultiVariableRegressionInput.model_json_schema()` to get the json schema of the input model.

By using the `MultiVariableRegressionInput.model_validate_json()` method, you can load the input model from a json file with automatic validation. Regular constructors are also available.

In [2]:
with open("data/mvlr/sample_gas.json", "r") as f:
    input_data = MultiVariableRegressionInput.model_validate_json(f.read())

In [3]:
input_data.model_dump()

{'timezone': 'Europe/Brussels',
 'independent_variables': [{'name': 'temperatureEquivalent',
   'variants': ['HDD_14',
    'HDD_14.5',
    'HDD_15',
    'HDD_15.5',
    'HDD_16',
    'HDD_16.5',
    'HDD_17',
    'HDD_17.5',
    'HDD_18',
    'HDD_18.5',
    'CDD_20',
    'CDD_21',
    'CDD_22',
    'CDD_23',
    'CDD_24',
    'CDD_25',
    'FDD_0',
    'FDD_1',
    'FDD_2',
    'FDD_3',
    'FDD_4',
    'FDD_5'],
   'allow_negative_coefficient': True}],
 'dependent_variable': 'energyUse/naturalGas',
 'frame': {'index': [datetime.datetime(2021, 4, 3, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 4, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 5, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 6, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 7, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 8, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 9, 0, 0, tzinfo=TzInfo(+02:00)),
   datetime.datetime(2021, 4, 10, 0

## Step 4: Running the model

The `find_best_mvlr` function is the main function of the `mvlr` module. It takes an input model and returns a `MultivariableRegressionOutput` object.

In [4]:
result = find_best_mvlr(input_data)

In [5]:
result.model_dump()

{'dependent_variable': 'energyUse/naturalGas',
 'independent_variables': [{'name': 'HDD_16',
   'coef': 9.156326327183008,
   't_stat': 16.68389326864723,
   'p_value': 3.700860402803631e-09,
   'std_err': 0.5488123293374092,
   'confidence_interval': {'confidence': 0.95,
    'lower': 7.948398534640814,
    'upper': 10.364254119725201}}],
 'r2': 0.9619840581695711,
 'r2_adj': 0.9585280634577139,
 'f_stat': 278.3522945996123,
 'prob_f_stat': 3.700860402803623e-09,
 'intercept': {'name': 'Intercept',
  'coef': 25.937621656190615,
  't_stat': 0.23178557399325664,
  'p_value': 0.8209600104917621,
  'std_err': 111.9035201774258,
  'confidence_interval': {'confidence': 0.95,
   'lower': -220.3603656153664,
   'upper': 272.23560892774765}},
 'granularity': <Granularity.P1M: 'P1M'>,
 'frame': {'index': [Timestamp('2021-04-01 00:00:00+0200', tz='Europe/Brussels'),
   Timestamp('2021-05-01 00:00:00+0200', tz='Europe/Brussels'),
   Timestamp('2021-06-01 00:00:00+0200', tz='Europe/Brussels'),
   T