Purpose of this repository is to provide very simple introductory tutorials for energy analytics novices. Each tutorial has the same process. Main objective of this exercise is to provide a quick introduction to basic machine learning models in Python. Students can pick up from where the tutorial ends and improve model performance by simply changing the hyperparameters.
- Data is preprocessed and cleaned (see preprocessing notebook)
- Only scikit-learn models are used.
- Focused on regression models. Data is actually for forecasting, but we currently do not apply time series models.
- State-of-the-Art (SotA) approaches are avoided. This is not an ML-Ops repository. Each notebook is an almost-copy of the other, except the model object. See vchapparo's repository in the resources for a sophisticated example.
- Hyperparameter tuning is not covered.
- Convenient reproducibility is at the center. You only need the input data file and the Jupyter Notebook.
- Performance is not important and only basic metrics are used in evaluation. But discussion about interpretation in terms of bias, absolute deviation and relative deviation is good practice. sklearn.metrics is used to show more of sciki-learn.
Briefly, this repository covers fundamentals, provides hands on applications (good if you are bored of iris dataset),
You can use this repository under the permissive MIT License as long as you respect data license (see here and here) and scikit-learn license (see here).
- Add more models.
- Add explanations, more notes and guidelines to tutorials. Try to give the intuition of models with as few words as possible.