# Lecture 7 - Advanced Power BI visualisation

Introducing advanced Power BI visualisations

#### Linear Regression (Craydec regression in Power BI)

Craydec Regression chart is a scatter chart / scatter plot with a simple linear regression. The visual calculates Pearson's correlation coefficient, R2 value, and it draws the correlation equation as abline on the chart.

![Craydec_regression.PNG](attachment:Craydec_regression.PNG)

Let's understand linear regression. In the simplest case, the regression model allows for a linear relationship between the forecast variable $y$ and a single predictor variable $x$

$$y_{t} = \beta _{0} + \beta _{1} * x_{t} + \varepsilon_{t}$$

The coefficients $\beta _{0}$ and $\beta _{1}$ denote the intercept and the slope of the line respectively. The intercept $\beta _{0}$ represents the predicted value of $y$ when $x = 0$. The slope $\beta _{1}$ represents the average predicted change in $y$ resulting from a one unit increase in $x$.

![linear_reg.PNG](attachment:linear_reg.PNG)

Notice that the observations do not lie on the straight line but are scattered around it. We can think of each observation $y_{t}$ as consisting of the systematic or explained part of the model, $\beta _{0} + \beta _{1} * x_{t}$, and the random “error”, $\varepsilon_{t}$. The “error” term does not imply a mistake, but a deviation from the underlying straight line model. It captures anything that may affect $y_{t}$ other than $x_{t}$.

#### Clustering with outlier detection

![pb_clustering.PNG](attachment:pb_clustering.PNG)

In the real world, data is often not easy to separate, and patterns are not usually obvious. Clustering helps you find similarity groups in your data and it is one of the most common tasks in the Data Science. Finding the "outliers", which are the observations in your data isolated from the rest of observations, is often a non-easy analytics task by its own. It explains why the density-based clustering, which find similarity groups and outliers in your data simultaniously, is one of the most common clustering algorithms.

Some useful features:

* Define the fields to be used in clustering (two or more numerical variables)
* Optionally, provide the labels to be shown on top of each observation
* If the dimensionality of data is higher than two, consider data preprocessing
* DBSCAN algorithm requires 2 parameters to control the granularity of clusters. They can be set manually by user (recommended) or automatically by underlying algorithm

R package dependencies(auto-installed): scales, fpc, car, dbscan

#### Decision tree visualisation

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

In decision analysis, a decision tree and the closely related influence diagram are used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated.

![detree1.PNG](attachment:detree1.PNG)

Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that purity of the node increases with respect to the target variable. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.

The algorithm selection is also based on type of target variables. Refer [here](http://www.ashukumar27.io/Decision-Trees-splitting/) for the detailed algorithm.

#### Time Series Forecasting Chart

![pb_forecast.PNG](attachment:pb_forecast.PNG)

Using [exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing) model to predict future values based on previously observed values. Time series forecasting is the use of a model to predict future values based on previously observed values.

Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations.

It is one of the prime tools of any buisness analyst used to predict demand and inventory, budgeting, sales quotas, marketing campaigns and procurement. Accurate forecasts lead to better decisions. Current visual implements well known exponential smoothing method for the forecasting. The prediction is based on trend and seasonality modeling. You can control the algorithm parameters and the visual attributes to suit your needs.

Some useful features:

* The underlying algorithm requires the input data to be equally spaced time series
* Seasonal factor can be found automatically or set by user
* The choice of additive or multiplicative effect of each component can be found automatically or set by user

R package dependencies(auto-installed): graphics, scales, forecast, zoo, ggplot2, htmlWidgets, XML, plotly

#### Forecast using multiple models by maq software

![pb-multi-model.PNG](attachment:pb-multi-model.PNG)

Forecast Using Multiple Models by MAQ Software lets you implement four different forecasting models to learn from historical data and predict future values. The forecasting models include [Linear Regression](https://en.wikipedia.org/wiki/Linear_regression), [ARIMA](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average), Exponential Smoothing, and [Neural Network](https://otexts.com/fpp2/nnetar.html). Higly recommend the [Forecasting: Principles and Practice](https://otexts.com/fpp2/) for more statistic knowledge. 

This visual is excellent for forecasting budgets, sales, demand, or inventory.

R package dependencies (auto-installed): forecast, plotly, zoo, lubridate.

Some useful features:

* Use four different forecasting methods/models.
* Manually adjust the parameters of the learning model.
* Supports a wide range of date and time formats.
* Forecast options include the choice of algorithm, showing or hiding confidence intervals, deciding on the split point, and applying data transformation.