## Adaptive Realtime Machine Learning

In the recent years with the rise of huge data, incremental and on-line learning gained more attention especially in the context of big data and learning from data streams, conflicting with the traditional assumption of complete data availability. Many of the existing complex real time machine learning methods only rely on Incremental learning techniques limiting the true potential of Real time learning (decremental learning, Parallel processing, Real time Feature selection and deletion etc.,). ART ML method can enhance the real time learning by giving all kind of flexibilities for incremental and decremental learning.

## ART-ML explaination with Iris Dataset

The Iris dataset is one of the traditional datasets used for understanding machine leraning concepts. Contained data has been collected in the early $20$-th century regarding characteristics of three different types of iris flowers.The famous Iris database is first used by Sir R.A Fisher and it is perhaps the best known database to be found in the pattern recognition literature.

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Although, real time learning is not required for Iris dataset, Iris being a well used dataset in Machine learning family, this classic dataset is used to explain the ART-ML library and different features in ART-ML.

Let's start by importing some libraries and examining the data.

In [1]:
# Importing all the required libraries
import os
import math
from numpy import * 
import numpy as np
import pandas as pd
from sklearn import datasets
from scipy import stats
from scipy.stats import norm
from scipy.stats import chisqprob
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')

In [2]:
# loading Iris dataset
from sklearn.datasets import load_iris
data = load_iris()
df_iris = pd.DataFrame(data.data, columns=data.feature_names)
df_iris.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [3]:
#Exploring the Iris dataset
df_iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
sepal length (cm)    150 non-null float64
sepal width (cm)     150 non-null float64
petal length (cm)    150 non-null float64
petal width (cm)     150 non-null float64
dtypes: float64(4)
memory usage: 4.8 KB


In [4]:
os.chdir(r'\\SRVA\Homes$\pothulas\Desktop\RTLM')
df_iris = pd.read_csv('iris.csv')
df_iris.head()

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Data Preparation

Since Iris dataset is having only 4 numerical and 1 categorical features, to expand the scope of dataset for analysis, dataset is transformed as given below. Categorical variables are encoded and binning of numerical features is done and this transformed iris dataset is used for explaining different ART-ML features. 

In [5]:
# feature Engineering of iris dataset is done to explain differenet models using ART-ML 
df_iris['x1'] = df_iris['sepal length']
df_iris['x2'] = df_iris['sepal width']
df_iris['x3'] = df_iris['petal length']
df_iris['x4'] = df_iris['petal width']
df_iris['x5'] = df_iris['sepal length'].apply(lambda x: 1 if x < 6 else 0)
df_iris['x6'] = df_iris['sepal length'].apply(lambda x: 1 if x >= 6 else 0)
df_iris['x7'] = df_iris['sepal width'].apply(lambda x: 1 if x < 3 else 0)
df_iris['x8'] = df_iris['sepal width'].apply(lambda x: 1 if x >= 3 else 0)
df_iris['x9'] = df_iris['petal length'].apply(lambda x: 1 if x < 4 else 0)
df_iris['x10'] = df_iris['petal length'].apply(lambda x: 1 if x >= 4 else 0)
df_iris['x11'] = df_iris['petal width'].apply(lambda x: 1 if x < 1 else 0)
df_iris['x12'] = df_iris['petal width'].apply(lambda x: 1 if x >= 1 else 0)
df_iris['x13'] = df_iris['iris'].apply(lambda x: 1 if x == 'Iris-setosa' else 0)
df_iris['x14'] = df_iris['iris'].apply(lambda x: 1 if x == 'Iris-virginica' else 0)
df_iris['x15'] = df_iris['iris'].apply(lambda x: 1 if x == 'Iris-versicolor' else 0)

In [6]:
col = ['sepal length','sepal width','petal length','petal width','iris']
df_iris = df_iris.drop(col, axis=1)
df_iris.head()

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
0,5.1,3.5,1.4,0.2,1,0,0,1,1,0,1,0,1,0,0
1,4.9,3.0,1.4,0.2,1,0,0,1,1,0,1,0,1,0,0
2,4.7,3.2,1.3,0.2,1,0,0,1,1,0,1,0,1,0,0
3,4.6,3.1,1.5,0.2,1,0,0,1,1,0,1,0,1,0,0
4,5.0,3.6,1.4,0.2,1,0,0,1,1,0,1,0,1,0,0


### Splitting the Dataset for explaining all the ART-ML functions

Since we are using the ART-ML library to explain the Real time Learning of models with new data, exsisting dataset is split into two parts. Initially we use the first part and then update everything with the second part data. 

In [7]:
# splitting the dataset (first 75 datapoints in dataset)
df_iris1 = df_iris.head(75)

In [8]:
# splitting the dataset (Last 75 datapoints in dataset)
df_iris2 = df_iris.tail(75)

In [9]:
# Importing the ART-ML libarary
os.chdir(r'\\SRVA\Homes$\pothulas\Desktop\RTLM')
from artml import bet

### Generating BET

As a fisrt step in Adaptive real time learning, we create a Basic Element table which summarizes the whole dataset and can be updated in real-time by only using the newly generated data. BET is key for using ART-ML. 

Here the Iris Dataset is split into two parts to explain the power of ART-ML library and how real time learning works.

In [10]:
# Generated Basic element table for the Iris Dataset using BET function in ARt-Ml library
BET_iris = bet.create_bet(df_iris1)
BET_iris

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
x1,"[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1...","[75, 400.59999999999997, 2169.8999999999996, 1..."
x2,"[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26...","[75, 240.30000000000004, 786.8900000000002, 26..."
x3,"[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22...","[75, 181.00000000000003, 578.1999999999999, 22..."
x4,"[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000...","[75, 45.79999999999999, 49.72, 65.978000000000..."
x5,"[75, 61, 61, 61, 61, 75, 400.59999999999997, 2...","[75, 61, 61, 61, 61, 75, 240.30000000000004, 7...","[75, 61, 61, 61, 61, 75, 181.00000000000003, 5...","[75, 61, 61, 61, 61, 75, 45.79999999999999, 49...","[75, 61, 61, 61, 61, 75, 61, 61, 61, 61, 61, 61]","[75, 61, 61, 61, 61, 75, 14, 14, 14, 14, 0, 0]","[75, 61, 61, 61, 61, 75, 19, 19, 19, 19, 10, 10]","[75, 61, 61, 61, 61, 75, 56, 56, 56, 56, 51, 51]","[75, 61, 61, 61, 61, 75, 55, 55, 55, 55, 55, 55]","[75, 61, 61, 61, 61, 75, 20, 20, 20, 20, 6, 6]","[75, 61, 61, 61, 61, 75, 50, 50, 50, 50, 50, 50]","[75, 61, 61, 61, 61, 75, 25, 25, 25, 25, 11, 11]","[75, 61, 61, 61, 61, 75, 50, 50, 50, 50, 50, 50]","[75, 61, 61, 61, 61, 75, 0, 0, 0, 0, 0, 0]","[75, 61, 61, 61, 61, 75, 25, 25, 25, 25, 11, 11]"
x6,"[75, 14, 14, 14, 14, 75, 400.59999999999997, 2...","[75, 14, 14, 14, 14, 75, 240.30000000000004, 7...","[75, 14, 14, 14, 14, 75, 181.00000000000003, 5...","[75, 14, 14, 14, 14, 75, 45.79999999999999, 49...","[75, 14, 14, 14, 14, 75, 61, 61, 61, 61, 0, 0]","[75, 14, 14, 14, 14, 75, 14, 14, 14, 14, 14, 14]","[75, 14, 14, 14, 14, 75, 19, 19, 19, 19, 9, 9]","[75, 14, 14, 14, 14, 75, 56, 56, 56, 56, 5, 5]","[75, 14, 14, 14, 14, 75, 55, 55, 55, 55, 0, 0]","[75, 14, 14, 14, 14, 75, 20, 20, 20, 20, 14, 14]","[75, 14, 14, 14, 14, 75, 50, 50, 50, 50, 0, 0]","[75, 14, 14, 14, 14, 75, 25, 25, 25, 25, 14, 14]","[75, 14, 14, 14, 14, 75, 50, 50, 50, 50, 0, 0]","[75, 14, 14, 14, 14, 75, 0, 0, 0, 0, 0, 0]","[75, 14, 14, 14, 14, 75, 25, 25, 25, 25, 14, 14]"
x7,"[75, 19, 19, 19, 19, 75, 400.59999999999997, 2...","[75, 19, 19, 19, 19, 75, 240.30000000000004, 7...","[75, 19, 19, 19, 19, 75, 181.00000000000003, 5...","[75, 19, 19, 19, 19, 75, 45.79999999999999, 49...","[75, 19, 19, 19, 19, 75, 61, 61, 61, 61, 10, 10]","[75, 19, 19, 19, 19, 75, 14, 14, 14, 14, 9, 9]","[75, 19, 19, 19, 19, 75, 19, 19, 19, 19, 19, 19]","[75, 19, 19, 19, 19, 75, 56, 56, 56, 56, 0, 0]","[75, 19, 19, 19, 19, 75, 55, 55, 55, 55, 7, 7]","[75, 19, 19, 19, 19, 75, 20, 20, 20, 20, 12, 12]","[75, 19, 19, 19, 19, 75, 50, 50, 50, 50, 2, 2]","[75, 19, 19, 19, 19, 75, 25, 25, 25, 25, 17, 17]","[75, 19, 19, 19, 19, 75, 50, 50, 50, 50, 2, 2]","[75, 19, 19, 19, 19, 75, 0, 0, 0, 0, 0, 0]","[75, 19, 19, 19, 19, 75, 25, 25, 25, 25, 17, 17]"
x8,"[75, 56, 56, 56, 56, 75, 400.59999999999997, 2...","[75, 56, 56, 56, 56, 75, 240.30000000000004, 7...","[75, 56, 56, 56, 56, 75, 181.00000000000003, 5...","[75, 56, 56, 56, 56, 75, 45.79999999999999, 49...","[75, 56, 56, 56, 56, 75, 61, 61, 61, 61, 51, 51]","[75, 56, 56, 56, 56, 75, 14, 14, 14, 14, 5, 5]","[75, 56, 56, 56, 56, 75, 19, 19, 19, 19, 0, 0]","[75, 56, 56, 56, 56, 75, 56, 56, 56, 56, 56, 56]","[75, 56, 56, 56, 56, 75, 55, 55, 55, 55, 48, 48]","[75, 56, 56, 56, 56, 75, 20, 20, 20, 20, 8, 8]","[75, 56, 56, 56, 56, 75, 50, 50, 50, 50, 48, 48]","[75, 56, 56, 56, 56, 75, 25, 25, 25, 25, 8, 8]","[75, 56, 56, 56, 56, 75, 50, 50, 50, 50, 48, 48]","[75, 56, 56, 56, 56, 75, 0, 0, 0, 0, 0, 0]","[75, 56, 56, 56, 56, 75, 25, 25, 25, 25, 8, 8]"
x9,"[75, 55, 55, 55, 55, 75, 400.59999999999997, 2...","[75, 55, 55, 55, 55, 75, 240.30000000000004, 7...","[75, 55, 55, 55, 55, 75, 181.00000000000003, 5...","[75, 55, 55, 55, 55, 75, 45.79999999999999, 49...","[75, 55, 55, 55, 55, 75, 61, 61, 61, 61, 55, 55]","[75, 55, 55, 55, 55, 75, 14, 14, 14, 14, 0, 0]","[75, 55, 55, 55, 55, 75, 19, 19, 19, 19, 7, 7]","[75, 55, 55, 55, 55, 75, 56, 56, 56, 56, 48, 48]","[75, 55, 55, 55, 55, 75, 55, 55, 55, 55, 55, 55]","[75, 55, 55, 55, 55, 75, 20, 20, 20, 20, 0, 0]","[75, 55, 55, 55, 55, 75, 50, 50, 50, 50, 50, 50]","[75, 55, 55, 55, 55, 75, 25, 25, 25, 25, 5, 5]","[75, 55, 55, 55, 55, 75, 50, 50, 50, 50, 50, 50]","[75, 55, 55, 55, 55, 75, 0, 0, 0, 0, 0, 0]","[75, 55, 55, 55, 55, 75, 25, 25, 25, 25, 5, 5]"
x10,"[75, 20, 20, 20, 20, 75, 400.59999999999997, 2...","[75, 20, 20, 20, 20, 75, 240.30000000000004, 7...","[75, 20, 20, 20, 20, 75, 181.00000000000003, 5...","[75, 20, 20, 20, 20, 75, 45.79999999999999, 49...","[75, 20, 20, 20, 20, 75, 61, 61, 61, 61, 6, 6]","[75, 20, 20, 20, 20, 75, 14, 14, 14, 14, 14, 14]","[75, 20, 20, 20, 20, 75, 19, 19, 19, 19, 12, 12]","[75, 20, 20, 20, 20, 75, 56, 56, 56, 56, 8, 8]","[75, 20, 20, 20, 20, 75, 55, 55, 55, 55, 0, 0]","[75, 20, 20, 20, 20, 75, 20, 20, 20, 20, 20, 20]","[75, 20, 20, 20, 20, 75, 50, 50, 50, 50, 0, 0]","[75, 20, 20, 20, 20, 75, 25, 25, 25, 25, 20, 20]","[75, 20, 20, 20, 20, 75, 50, 50, 50, 50, 0, 0]","[75, 20, 20, 20, 20, 75, 0, 0, 0, 0, 0, 0]","[75, 20, 20, 20, 20, 75, 25, 25, 25, 25, 20, 20]"


### Learn with Data

Using BET generated from the data we can create the models and also perform univariate and Bivariate exploration. If we want to update and learn from data streams, we need to update everything with the total data available at that time if we are using traditional methods. Since we are using BET here, If we update the BET itself only using using the new data instead of using whole dataset we can update all other steps in real time.

Learn function is used to update the Basic element Table with the new data. Since all the exploration steps and algorithms are built using the BET table. Whenever the BET is updated all other modeling steps get updated in real time with new data

In [11]:
# Updating the Basic element table with new datapoints using BET.learn function
BET_iris = bet.learn(BET_iris, df_iris2)
BET_iris

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
x1,"[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ...","[150, 876.5, 5223.85, 31744.991, 196591.7005, ..."
x2,"[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15...","[150, 458.1, 1427.05, 4533.315, 14682.1933, 15..."
x3,"[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150...","[150, 563.8, 2583.0, 12974.014, 68227.386, 150..."
x4,"[150, 179.8, 302.3, 563.536, 1108.4546, 150, 8...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 4...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 1...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 8...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 6...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 9...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 6...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 8...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 1...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5...","[150, 179.8, 302.3, 563.536, 1108.4546, 150, 5..."
x5,"[150, 83.0, 83.0, 83.0, 83.0, 150, 876.5, 5223...","[150, 83.0, 83.0, 83.0, 83.0, 150, 458.1, 1427...","[150, 83.0, 83.0, 83.0, 83.0, 150, 563.8, 2583...","[150, 83.0, 83.0, 83.0, 83.0, 150, 179.8, 302....","[150, 83.0, 83.0, 83.0, 83.0, 150, 83.0, 83.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 67.0, 67.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 57.0, 57.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 93.0, 93.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 61.0, 61.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 89.0, 89.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 50.0, 50.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 100.0, 100....","[150, 83.0, 83.0, 83.0, 83.0, 150, 50.0, 50.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 50.0, 50.0,...","[150, 83.0, 83.0, 83.0, 83.0, 150, 50.0, 50.0,..."
x6,"[150, 67.0, 67.0, 67.0, 67.0, 150, 876.5, 5223...","[150, 67.0, 67.0, 67.0, 67.0, 150, 458.1, 1427...","[150, 67.0, 67.0, 67.0, 67.0, 150, 563.8, 2583...","[150, 67.0, 67.0, 67.0, 67.0, 150, 179.8, 302....","[150, 67.0, 67.0, 67.0, 67.0, 150, 83.0, 83.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 67.0, 67.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 57.0, 57.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 93.0, 93.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 61.0, 61.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 89.0, 89.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 50.0, 50.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 100.0, 100....","[150, 67.0, 67.0, 67.0, 67.0, 150, 50.0, 50.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 50.0, 50.0,...","[150, 67.0, 67.0, 67.0, 67.0, 150, 50.0, 50.0,..."
x7,"[150, 57.0, 57.0, 57.0, 57.0, 150, 876.5, 5223...","[150, 57.0, 57.0, 57.0, 57.0, 150, 458.1, 1427...","[150, 57.0, 57.0, 57.0, 57.0, 150, 563.8, 2583...","[150, 57.0, 57.0, 57.0, 57.0, 150, 179.8, 302....","[150, 57.0, 57.0, 57.0, 57.0, 150, 83.0, 83.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 67.0, 67.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 57.0, 57.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 93.0, 93.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 61.0, 61.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 89.0, 89.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 50.0, 50.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 100.0, 100....","[150, 57.0, 57.0, 57.0, 57.0, 150, 50.0, 50.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 50.0, 50.0,...","[150, 57.0, 57.0, 57.0, 57.0, 150, 50.0, 50.0,..."
x8,"[150, 93.0, 93.0, 93.0, 93.0, 150, 876.5, 5223...","[150, 93.0, 93.0, 93.0, 93.0, 150, 458.1, 1427...","[150, 93.0, 93.0, 93.0, 93.0, 150, 563.8, 2583...","[150, 93.0, 93.0, 93.0, 93.0, 150, 179.8, 302....","[150, 93.0, 93.0, 93.0, 93.0, 150, 83.0, 83.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 67.0, 67.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 57.0, 57.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 93.0, 93.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 61.0, 61.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 89.0, 89.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 50.0, 50.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 100.0, 100....","[150, 93.0, 93.0, 93.0, 93.0, 150, 50.0, 50.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 50.0, 50.0,...","[150, 93.0, 93.0, 93.0, 93.0, 150, 50.0, 50.0,..."
x9,"[150, 61.0, 61.0, 61.0, 61.0, 150, 876.5, 5223...","[150, 61.0, 61.0, 61.0, 61.0, 150, 458.1, 1427...","[150, 61.0, 61.0, 61.0, 61.0, 150, 563.8, 2583...","[150, 61.0, 61.0, 61.0, 61.0, 150, 179.8, 302....","[150, 61.0, 61.0, 61.0, 61.0, 150, 83.0, 83.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 67.0, 67.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 57.0, 57.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 93.0, 93.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 61.0, 61.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 89.0, 89.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 50.0, 50.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 100.0, 100....","[150, 61.0, 61.0, 61.0, 61.0, 150, 50.0, 50.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 50.0, 50.0,...","[150, 61.0, 61.0, 61.0, 61.0, 150, 50.0, 50.0,..."
x10,"[150, 89.0, 89.0, 89.0, 89.0, 150, 876.5, 5223...","[150, 89.0, 89.0, 89.0, 89.0, 150, 458.1, 1427...","[150, 89.0, 89.0, 89.0, 89.0, 150, 563.8, 2583...","[150, 89.0, 89.0, 89.0, 89.0, 150, 179.8, 302....","[150, 89.0, 89.0, 89.0, 89.0, 150, 83.0, 83.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 67.0, 67.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 57.0, 57.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 93.0, 93.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 61.0, 61.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 89.0, 89.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 50.0, 50.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 100.0, 100....","[150, 89.0, 89.0, 89.0, 89.0, 150, 50.0, 50.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 50.0, 50.0,...","[150, 89.0, 89.0, 89.0, 89.0, 150, 50.0, 50.0,..."


## Real Time Exploration:


### Univariate Statistics: 
univariate function generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution. univariate function in ART-Ml is equivalent to df.describe() in pandas. But in this case Univariate stats can be updated in real time.

In [12]:
#Univariate Analysis of data using Univariate function in ART-Ml library
from artml.explore import stats
stats.univariate(BET_iris)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
count,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0
Mean,5.843333,3.054,3.758667,1.198667,0.553333,0.446667,0.38,0.62,0.406667,0.593333,0.333333,0.666667,0.333333,0.333333,0.333333
Variance,0.681122,0.186751,3.092425,0.578532,0.247156,0.247156,0.2356,0.2356,0.241289,0.241289,0.222222,0.222222,0.222222,0.222222,0.222222
Standard_deviation,0.825301,0.432147,1.758529,0.760613,0.497147,0.497147,0.485386,0.485386,0.491212,0.491212,0.471405,0.471405,0.471405,0.471405,0.471405
coeff_of_variation,14.12381,14.150183,46.785984,63.45489,89.845919,111.301661,127.733275,78.288136,120.789751,82.788481,141.421356,70.710678,141.421356,141.421356,141.421356
skewness,0.318087,0.337421,-0.277232,-0.106055,-0.218916,0.218916,0.504496,-0.504496,0.387733,-0.387733,0.721472,-0.721472,0.721472,0.721472,0.721472
Kurtosis,-0.518269,0.335927,-1.379572,-1.316568,-1.965035,-1.965035,-1.757046,-1.757046,-1.86193,-1.86193,-1.489243,-1.489243,-1.489243,-1.489243,-1.489243


### Bivariate Statistics:

Bivariate function generates the Covariance and correlation table for all the features in Basic element table

In [13]:
#Bivariate Analysis of data using Bivariate function in ART-Ml library
stats.covariance(BET_iris)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
x1,0.681122,-0.039007,1.265191,0.513458,-0.342644,0.342644,0.041533,-0.041533,-0.314956,0.314956,-0.279111,0.279111,-0.279111,0.248222,0.030889
x2,-0.039007,0.186751,-0.319568,-0.117195,0.039453,-0.039453,-0.157187,0.157187,0.08004,-0.08004,0.121333,-0.121333,0.121333,-0.026667,-0.094667
x3,1.265191,-0.319568,3.092425,1.287745,-0.671796,0.671796,0.28504,-0.28504,-0.777858,0.777858,-0.764889,0.764889,-0.764889,0.597778,0.167111
x4,0.513458,-0.117195,1.287745,0.578532,-0.275929,0.275929,0.095173,-0.095173,-0.324791,0.324791,-0.318222,0.318222,-0.318222,0.275778,0.042444
x5,-0.342644,0.039453,-0.671796,-0.275929,0.247156,-0.247156,-0.0236,0.0236,0.181644,-0.181644,0.148889,-0.148889,0.148889,-0.137778,-0.011111
x6,0.342644,-0.039453,0.671796,0.275929,-0.247156,0.247156,0.0236,-0.0236,-0.181644,0.181644,-0.148889,0.148889,-0.148889,0.137778,0.011111
x7,0.041533,-0.157187,0.28504,0.095173,-0.0236,0.0236,0.2356,-0.2356,-0.067867,0.067867,-0.113333,0.113333,-0.113333,0.013333,0.1
x8,-0.041533,0.157187,-0.28504,-0.095173,0.0236,-0.0236,-0.2356,0.2356,0.067867,-0.067867,0.113333,-0.113333,0.113333,-0.013333,-0.1
x9,-0.314956,0.08004,-0.777858,-0.324791,0.181644,-0.181644,-0.067867,0.067867,0.241289,-0.241289,0.197778,-0.197778,0.197778,-0.135556,-0.062222
x10,0.314956,-0.08004,0.777858,0.324791,-0.181644,0.181644,0.067867,-0.067867,-0.241289,0.241289,-0.197778,0.197778,-0.197778,0.135556,0.062222


In [14]:
#Correlation of numerical features in data using correlation function in ART-Ml library
stats.correlation(BET_iris)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
x1,1.0,-0.109369,0.871754,0.817954,-0.835114,0.835114,0.10368,-0.10368,-0.776905,0.776905,-0.717416,0.717416,-0.717416,0.63802,0.079396
x2,-0.109369,1.0,-0.420516,-0.356544,0.18364,-0.18364,-0.749371,0.749371,0.377057,-0.377057,0.595601,-0.595601,0.595601,-0.130901,-0.4647
x3,0.871754,-0.420516,1.0,0.962757,-0.768427,0.768427,0.33394,-0.33394,-0.900496,0.900496,-0.922688,0.922688,-0.922688,0.721102,0.201587
x4,0.817954,-0.356544,0.962757,1.0,-0.729707,0.729707,0.257789,-0.257789,-0.869305,0.869305,-0.88751,0.88751,-0.88751,0.769134,0.118376
x5,-0.835114,0.18364,-0.768427,-0.729707,1.0,-1.0,-0.0978,0.0978,0.743821,-0.743821,0.635307,-0.635307,0.635307,-0.587896,-0.047411
x6,0.835114,-0.18364,0.768427,0.729707,-1.0,1.0,0.0978,-0.0978,-0.743821,0.743821,-0.635307,0.635307,-0.635307,0.587896,0.047411
x7,0.10368,-0.749371,0.33394,0.257789,-0.0978,0.0978,1.0,-1.0,-0.284643,0.284643,-0.495309,0.495309,-0.495309,0.058272,0.437037
x8,-0.10368,0.749371,-0.33394,-0.257789,0.0978,-0.0978,-1.0,1.0,0.284643,-0.284643,0.495309,-0.495309,0.495309,-0.058272,-0.437037
x9,-0.776905,0.377057,-0.900496,-0.869305,0.743821,-0.743821,-0.284643,0.284643,1.0,-1.0,0.854113,-0.854113,0.854113,-0.585403,-0.26871
x10,0.776905,-0.377057,0.900496,0.869305,-0.743821,0.743821,0.284643,-0.284643,-1.0,1.0,-0.854113,0.854113,-0.854113,0.585403,0.26871


### Statistical tests

Z-test and t-test are basically the same. They assess whether the averages of two groups are statistically different from each other or not. This analysis is appropriate for comparing the averages of a numerical variable for two categories of a categorical variable. 
Ztest and Ttest functions can be used for performing these Statistical tests

In [15]:
# Ztest for numeraical and categorical features in Dataset using Ztest function ART-Ml library

BET_iris = bet.create_bet(df_iris)
BET_iris_z_test = BET_iris.loc[['x1','x8']]
BET_iris_z_test = BET_iris_z_test[['x1','x8']]

stats.Ztest(BET_iris_z_test, 'x1', 'x8')

0.17140346776084492

### Chi-square Test

The chi-square test can be used to determine the association between categorical variables. It is based on the difference between the expected frequencies (e) and the observed frequencies (n) in one or more categories in the frequency table. The chi-square distribution returns a probability for the computed chi-square and the degree of freedom. A probability of zero shows a complete dependency between two categorical variables and a probability of one means that two categorical variables are completely independent. 

chi2 function is used for checking the association between categorical variables in real time

In [16]:
# Chi-square for categorical features in Dataset using chi2 function ART-Ml library
stats.chi2(BET_iris, ['x11', 'x12'] , ['x13','x14','x15'])

chi2: 150.0
df: 2
chisqprob: 2.67863696181e-33


2.6786369618080871e-33

## Real time Algorithms

Once we have BET, we can using all the traditional linear algorithms that are built in ART-ML library for making predictions. Whenever BET gets updated with the streaming Data, All the below models get updated in real time which improves the prediction accuracies and generates better Business insights for the real world casestudies.

### Naive Bayesian

In the real time version of Bayesian classifiers we calculate the likelihood and the prior probabilities from the Basic Elements Table (BET) which can be updated in real time with the new data.

In [17]:
# Modeling with Real time Naive-bayesian algorithm using gaussian_NB, Multinomial_NB functions from ART-ML
from artml.models import naive_bayes

BET_iris = bet.create_bet(df_iris)
# Selecting the particular section of BET table (Selecting only particular features for analysis)
BET_iris_NB = BET_iris.loc[['x1','x15']]
BET_iris_NB = BET_iris_NB[['x1','x15']]

bayes = naive_bayes.bayes_numerical()
bayes.fit(BET_iris_NB, [6],'x15')

0.5343249776550095

In [18]:
BET_iris = bet.create_bet(df_iris)

In [19]:
# Selecting the particular section of BET table (Selecting only particular features for analysis)
BET_iris_NB = BET_iris.loc[['x5','x15']]
BET_iris_NB = BET_iris_NB[['x5','x15']]
BET_iris_NB

Unnamed: 0,x5,x15
x5,"[150, 83, 83, 83, 83, 150, 83, 83, 83, 83, 83,...","[150, 83, 83, 83, 83, 150, 50, 50, 50, 50, 26,..."
x15,"[150, 50, 50, 50, 50, 150, 83, 83, 83, 83, 26,...","[150, 50, 50, 50, 50, 150, 50, 50, 50, 50, 50,..."


In [20]:
# Modeling with Real time Naive-bayesian algorithm using gaussian_NB, Multinomial_NB functions from ART-ML

bayes = naive_bayes.bayes_categorical()
bayes.fit(BET_iris_NB, [1] ,'x15')

0.3132530120481928

### LDA

Real time Linear Discriminant Analysis (LDA) classification can be performed using the pooled covariance matrix (𝑪 ) derived from the BET table. 

Use LD-fit and predict functions for fittiing the algorithm and classifying with new data.

In [21]:
# Selecting the particular section of BET table (Selecting only particular features for analysis)
BET_iris_NB = BET_iris.loc[['x1','x4','x15']]
BET_iris_NB = BET_iris_NB[['x1','x4','x15']]
BET_iris_NB

Unnamed: 0,x1,x4,x15
x1,"[150, 876.5000000000002, 5223.849999999998, 31...","[150, 876.5000000000002, 5223.849999999998, 31...","[150, 876.5000000000002, 5223.849999999998, 31..."
x4,"[150, 179.8000000000001, 302.3000000000001, 56...","[150, 179.8000000000001, 302.3000000000001, 56...","[150, 179.8000000000001, 302.3000000000001, 56..."
x15,"[150, 50, 50, 50, 50, 150, 876.5000000000002, ...","[150, 50, 50, 50, 50, 150, 179.8000000000001, ...","[150, 50, 50, 50, 50, 150, 50, 50, 50, 50, 50,..."


In [22]:
# Modeling with Real time LDA algorithm using LDA_fit, LDA_predict functions from ART-ML
from artml.models import lda

lda = lda.LinearDiscriminantAnalysis()
lda.fit(BET_iris_NB, 'x15')

([5.797000000000002, 1.135000000000001],
 [5.936, 1.3259999999999998],
 array([ 0.13559494, -0.45102473]),
 -0.6931471805599453)

### MLR

Multiple Linear Regression (MLR) is a method used to model the linear relationship between a target (dependent variable) and one or more attributes (independent variables).Real time MLR is built using real time covariance matrices.

In [23]:
# Modeling with Real time MLR algorithm using MLR function from ART-ML
BET_iris_MLR = BET_iris.loc[['x1','x2','x3','x4']]
BET_iris_MLR = BET_iris_MLR[['x1','x2','x3','x4']]

from artml.models import MLR
linreg = MLR.LinearRegression()
linreg.fit(BET_iris_MLR,'x4')

(-0.24872358602445588, array([-0.21027133,  0.22877721,  0.52608818]))

### PCA

Principal component analysis (PCA) is a classical statistical method. This linear transform has been widely used in data analysis and data compression. The principal components (Eigenvectors) for a dataset can directly be extracted from the covariance matrix.

In [24]:
# Real time PCA  using PCA function from ART-ML
from artml.models import realtimePCA

PCA = realtimePCA.PCA()
PCA.fit(BET_iris_MLR)

Eigenvectors: 
[[ 0.36158968 -0.65653988 -0.58099728  0.31725455]
 [-0.08226889 -0.72971237  0.59641809 -0.32409435]
 [ 0.85657211  0.1757674   0.07252408 -0.47971899]
 [ 0.35884393  0.07470647  0.54906091  0.75112056]]

Eigenvalues: 
[ 4.19667516  0.24062861  0.07800042  0.02352514]

Eigenvalues in descending order:
4.1966751632
0.240628614483
0.0780004153735
0.0235251402785


### SVM Algorithm

SVM algorithm in Real time can be performed by using Linear Proximal SVM Algorithm defined by Fung and Mangasarian.

Use SVM_Reg_fit for SVM regression and SVM_fit for SVM classification.

In [25]:
# Modeling with Real time SVM regression algorithm using SVM_Reg_fit function from ART-ML
BET_iris_MLR = BET_iris.loc[['x1','x2','x3','x4']]
BET_iris_MLR = BET_iris_MLR[['x1','x2','x3','x4']]

from artml.models import svm
svm = svm.SVR()
svm.fit(BET_iris_MLR, 'x4',2)

[1127.6500000000003, 531.5300000000002, 868.97, -179.8000000000001]


array([-0.20555047,  0.2064008 ,  0.52052792,  0.18645092])

In [26]:
# Modeling with Real time SVM classification algorithm using SVM_fit function from ART-ML

from artml.models import svm
svc = svm.SVC()

svc.fit(BET_iris_NB, 'x15')

array([-0.060328  ,  0.20023818,  0.22082574])

That's it! Thanks for reading. In further exercises real world datasets and casestudies are explored where ART-Ml can significantly impact traditional approaches and generate insights that have true Business value. 

-------------------------------------------------------------------------------------------------------------------------------