Ensemble models combine predictions from multiple individual models to improve overall performance and robustness. 
They leverage the strengths of various algorithms to produce a final prediction that is often more accurate and less prone to overfitting than any single model alone.

There are several types of ensemble methods, which are;

Bagging (Bootstrap Aggregating)
How it Works: Multiple copies of the same algorithm are trained on different subsets of the training data (created by bootstrapping, which involves sampling with replacement). The final prediction is made by aggregating the predictions from all models, typically by averaging for regression tasks and majority voting for classification tasks.
Example: Random Forest is a popular bagging algorithm that uses decision trees as the base models.

Boosting
How it Works: Models are trained sequentially, where each new model focuses on the errors made by the previous models. The predictions from all models are combined, often with weights assigned to each model based on its performance.
Example: AdaBoost and Gradient Boosting Machines (GBM) are common boosting techniques.

Stacking (Stacked Generalization)
How it Works: Multiple models (base learners) are trained on the same dataset. A meta-model (or second-level model) is then trained on the outputs of the base models to make the final prediction. This meta-model learns how to best combine the predictions of the base models.
Example: You might use logistic regression as a meta-model to combine the outputs of several different base classifiers.

Voting
How it Works: Multiple models (typically of different types) are trained, and their predictions are combined through a voting scheme. For classification tasks, it could be majority voting or weighted voting, and for regression, it could be averaging.
Example: A simple voting classifier that uses the majority vote of decision trees, support vector machines, and k-nearest neighbors.

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import sklearn
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import xgboost as xgb



%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

Loading the data

In [3]:
elec_data = fetch_openml(name='electricity', version=1)

In [4]:
type(elec_data)

sklearn.utils._bunch.Bunch

In [5]:
elec_data.details

{'id': '151',
 'name': 'electricity',
 'version': '1',
 'description_version': '1',
 'format': 'ARFF',
 'creator': ['M. Harries', 'J. Gama', 'A. Bifet'],
 'collection_date': '1998-12-05',
 'upload_date': '2014-04-10T02:42:23',
 'language': 'English',
 'licence': 'Public',
 'url': 'https://api.openml.org/data/v1/download/2419/electricity.arff',
 'parquet_url': 'https://openml1.win.tue.nl/datasets/0000/0151/dataset_151.pq',
 'file_id': '2419',
 'default_target_attribute': 'class',
 'version_label': '1',
 'tag': ['AzurePilot',
  'concept_drift',
  'electricity',
  'Life Science',
  'mythbusting_1',
  'OpenML-CC18',
  'OpenML100',
  'study_1',
  'study_123',
  'study_135',
  'study_14',
  'study_15',
  'study_16',
  'study_20',
  'study_34',
  'study_37',
  'study_41',
  'study_7',
  'study_70',
  'study_99',
  'Transportation'],
 'visibility': 'public',
 'original_data_url': 'http://www.inescporto.pt/~jgama/ales/ales_5.html',
 'paper_url': 'http://citeseerx.ist.psu.edu/viewdoc/summary?doi

In [6]:
elec_data.data.shape

(45312, 8)

In [7]:
# Description of the data 
print(elec_data.DESCR)

**Author**: M. Harries, J. Gama, A. Bifet  
**Source**: [Joao Gama](http://www.inescporto.pt/~jgama/ales/ales_5.html) - 2009  
**Please cite**: None  

**Electricity** is a widely used dataset described by M. Harries and analyzed by J. Gama (see papers below). This data was collected from the Australian New South Wales Electricity Market. In this market, prices are not fixed and are affected by demand and supply of the market. They are set every five minutes. Electricity transfers to/from the neighboring state of Victoria were done to alleviate fluctuations.

The dataset (originally named ELEC2) contains 45,312 instances dated from 7 May 1996 to 5 December 1998. Each example of the dataset refers to a period of 30 minutes, i.e. there are 48 instances for each time period of one day. Each example on the dataset has 5 fields, the day of week, the time stamp, the New South Wales electricity demand, the Victoria electricity demand, the scheduled electricity transfer between states and the 

In [8]:
# Displaying feature names

elec_data.feature_names

['date',
 'day',
 'period',
 'nswprice',
 'nswdemand',
 'vicprice',
 'vicdemand',
 'transfer']

In [9]:
# Displaying target name

elec_data.target_names

['class']

In [10]:
# Getting the whole dataframe

elec_df = elec_data.frame

In [11]:
type(elec_df)

pandas.core.frame.DataFrame