<a href="https://colab.research.google.com/github/ziababar/DataDen/blob/master/DataVisualization/Example4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualizing linear relationships

*Credit: These examples were originally source from the Seaborn [tutorials](https://seaborn.pydata.org/tutorial.html)*


Many datasets contain multiple quantitative variables, and the goal of an analysis is often to relate those variables to each other. We previously discussed functions that can accomplish this by showing the joint distribution of two variables. It can be very helpful, though, to use statistical models to estimate a simple relationship between two noisy sets of observations. The functions discussed in this chapter will do so through the common framework of linear regression.

## Setting things up

Perform initial setup of the environment and import necessary libraries

In [0]:
import seaborn as sns
sns.__version__

In [0]:
!pip install seaborn=="0.9.0"

In [0]:
import numpy as np

Set various aesthetic parameters in one step. Details of possible parameters can be found at https://seaborn.pydata.org/generated/seaborn.set.html

In [0]:
sns.set(color_codes=True)

## Loading and Creating the Datasets

Multiple data-sets would be used in this example; load them now.

In [0]:
tips = sns.load_dataset("tips")
anscombe = sns.load_dataset("anscombe")

In [0]:
tips.head(10)

In [0]:
anscombe.head(10)

## Linear Regression Models

Two main functions are used to visualize a linear relationship as determined through regression.

*   regplot()
> * draws a scatterplot of two variables and overlays that with a regression model (including a confidence interval).
> * accepts the x and y variables in a variety of formats including simple numpy arrays, pandas Series objects, or as references to variables in a pandas DataFrame object passed to data.
> * always shows a single relationship between two variable.

*   lmplot()
> * draws a scatterplot of two variables and overlaps that with a regression model (including a confidence interval).
> * has data as a required parameter and the x and y variables must be specified as strings.
> * lmplot() combines regplot() with FacetGrid to provide an easy interface to show a linear regression on “faceted” plots that allow you to explore interactions with up to three additional categorical variables.

### Example 1

In [0]:
sns.regplot(x="total_bill", y="tip", data=tips);

### Example 2

In [0]:
sns.lmplot(x="total_bill", y="tip", data=tips);

### Example 3

In [0]:
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);

### Example 4

In [0]:
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);

### Example 5

In [0]:
sns.lmplot(x="size", y="tip", data=tips);

### Example 6

In [0]:
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);

## Non-Linear Regression Models

For polynomial equations, lmplot() and regplot() can be used to plot non-linear regression models to visually depict the non-linear trends in the dataset.

### Example 7

In [0]:
#sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"), order=1, ci=None, scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"), order=2, ci=None, scatter_kws={"s": 80});

### Example 8

In [0]:
#sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"), robust=False, ci=None, scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"), robust=True, ci=None, scatter_kws={"s": 80});