<a href="https://colab.research.google.com/github/ziababar/DataDen/blob/master/DataVisualization/Example1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualizing Statistical Relationships


*Credit: These examples were originally source from the Seaborn [tutorials](https://seaborn.pydata.org/tutorial.html)*



Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables. Visualization can be a core component of this process because, when data are visualized properly, the human visual system can see trends and patterns that indicate a relationship.

## Setting things up

Perform initial setup of the environment and import necessary libraries

In [0]:
import seaborn as sns
sns.__version__

In [0]:
!pip install seaborn=="0.9.0"

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Set various aesthetic parameters in one step. Details of possible parameters can be found at https://seaborn.pydata.org/generated/seaborn.set.html

In [0]:
sns.set(style="darkgrid")

## Loading and Creating the Datasets

Multiple data-sets would be used in this example; load them now.

In [0]:
tips = sns.load_dataset("tips")
fmri = sns.load_dataset("fmri")
df_linear = pd.DataFrame(dict(time=np.arange(500), value=np.random.randn(500).cumsum()))

In [0]:
tips.head(10)

In [0]:
fmri.head(10)

In [0]:
df_linear.head(10)

## Visualization using Scatterplots

Scatterplots are using to illustrate the distribution of a dataset containing two variables using a cloud of points, where each point represents an observation in the dataset.

### Example 1

In [0]:
sns.scatterplot(x="total_bill", y="tip", data=tips);

### Example 2

In [0]:
sns.scatterplot(x="total_bill", y="tip", hue="smoker", data=tips);

### Example 3

In [0]:
sns.scatterplot(x="total_bill", y="tip", hue="smoker", style="smoker", data=tips);

### Example 4

In [0]:
sns.scatterplot(x="total_bill", y="tip", size="size", sizes=(15, 200), data=tips);

## Visualization using Linear Plots

In some datasets, the distribution of datapoints happens over line. For this we can use linear plots with time being one dimension.

### Example 5

In [0]:
g = sns.relplot(x="time", y="value", kind="line", data=df_linear)

### Example 6

Some datasets have multiple observations at the same time-interval. These can be aggregated (at that time-point) with the mean being the actual value that gets plotted. The confidence interval can be shown around the mean.

In [0]:
sns.relplot(x="timepoint", y="signal", kind="line", data=fmri);

In [0]:
sns.relplot(x="timepoint", y="signal", ci=None, kind="line", data=fmri);