# Scatter Matrix
* A scatterplot matrix is a matrix associated to *n* numerical arrays (data variables), *X1,X2,...,Xn*, of the same length. 
* The cell *(i,j)* of such a matrix displays the scatter plot of the variable *Xi* versus *Xj*.

## Using Plotly Express

In [None]:
# Plot the scatter matrix for the columns of the dataframe. 
# By default, all columns are considered.
import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df)
fig.show()
#df

Specify the columns to be represented with the *dimensions* argument, and set colors using a column of the dataframe:

In [None]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df,
    dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
    color="species")
fig.show()

## Styled Scatter Matrix with Plotly Express

In [None]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df,
    dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
    color="species", symbol="species",
    title="Scatter matrix of iris data set",
    labels={col:col.replace('_', ' ') for col in df.columns}) # remove underscore
fig.update_traces(diagonal_visible=False)
fig.show()

## Scatter Matrix with Graphic Object
* It is possible to use the more generic go.Splom function.
* The Plotly *splom* trace implementation for the scatterplot matrix does not require to set *x=Xi , and y=Xj*, for each scatter plot. 
* All arrays, *X1,X2,...,Xn* , are passed once, through a list of dicts called *dimensions*, i.e. each array/variable represents a dimension.
* A trace of type splom is defined as follows:

$\qquad$trace=go.Splom(dimensions=[dict(label='string-1',<br />
$\qquad\qquad$                              values=X1),<br />
$\qquad\qquad$                         dict(label='string-2',<br />
$\qquad\qquad$                                 values=X2),<br />
$\qquad\qquad$                            .<br />
$\qquad\qquad$                            .<br />
$\qquad\qquad$                            .<br />
$\qquad\qquad$                            dict(label='string-n',<br />
$\qquad\qquad$                                values=Xn)],<br />
$\qquad\qquad$                            ....<br />
$\qquad$                )<br />

* The label in each dimension is assigned to the axes titles of the corresponding matrix cell.
* More here: https://plot.ly/python/reference/#splom

## Splom of the Iris data set

In [None]:
import plotly.graph_objects as go
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/iris-data.csv')

# The Iris dataset contains four data variables, sepal length, sepal width, petal length,
# petal width, for 150 iris flowers. The flowers are labeled as `Iris-setosa`,
# `Iris-versicolor`, `Iris-virginica`.

# Define indices corresponding to flower categories, using pandas label encoding
index_vals = df['class'].astype('category').cat.codes

fig = go.Figure(data=go.Splom(
                dimensions=[dict(label='sepal length', values=df['sepal length'], visible=True),
                            dict(label='sepal width',  values=df['sepal width'], visible=True),
                            dict(label='petal length', values=df['petal length'], visible=True),
                            dict(label='petal width',  values=df['petal width'], visible=True)],
                            #We can choose to remove a variable from splom, by setting 
                            #visible=False in its corresponding dimension. 
                            #In this case the default grid associated to the scatterplot 
                            #matrix keeps its number of cells, but the cells in the row 
                            #and column corresponding to the visible false dimension are empty
                text=df['class'],
                marker=dict(color=index_vals,
                            showscale=False, # colors encode categorical variables
                            line_color='white', line_width=0.5),
                #diagonal_visible=True, # show or remove plots on diagonal
                # To plot only the lower/upper half of the splom we switch the default showlowerhalf=True/showupperhalf=True to False:
                #showupperhalf=False, # plot only the lower/upper half
                )                
               )


fig.update_layout(
    title='Iris Data set',
    dragmode='select',
    width=600,
    height=600,
    hovermode='closest',
)

fig.show()

## Question: How can we display something else in the diagonal?

## Using Figure Factory
https://plot.ly/python/figure-factory-subplots/#plotlys-figure-factory-module

We can customize what to display in the diagonal

In [None]:
import plotly.figure_factory as ff

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/iris-data.csv')

fig = ff.create_scatterplotmatrix(df, diag='histogram', index='class',
                                  height=800, width=800)
fig.show()

In [None]:
df