# Basic Visualization by plotly

Here is Tutorial https://plot.ly/python/ipython-notebook-tutorial/

This is a cheat sheet https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf

In this time, I try to use plotly offline (prepare for handling secret data)

In [None]:
##Visualization
import plotly.offline as offline
import plotly.graph_objs as go
offline.init_notebook_mode()

##import
import pandas as pd
import numpy as np

In [None]:
##acquire data
df = pd.read_csv('./creditcard.csv')
df0 = df[df.Class == 0]
df1 = df[df.Class == 1]
df1.head()

## histgram

First, let's confirm correlation "Amount" and "Class" as a example.
I want to know the difference of "Amount" distribution between Class==0 and 1.
Use histgram. (Distplots is also great. But,very heavy)

In [None]:
##make trace
trace0 = go.Histogram(
    x = df0.Amount,
    opacity = 0.7,
    name = 'class0',
    xbins = dict(
        start = 0,
        end = max(df.Amount),
        size = 50
    )
)
trace1 = go.Histogram(
    x = df1.Amount,
    opacity = 0.7,
    name = 'class1',
    xbins = dict(
        start = 0,
        end = max(df.Amount),
        size = 50
    )
)
data = [trace0, trace1]

##define layout
layout = go.Layout(
    #barmode='overlay',
    yaxis=dict(
        type='log',
        autorange=True,
        title = 'frequency'
    ),
    xaxis=dict(
        autorange=True,
        title = 'Amount'
    ),
    bargap=0.1,
    bargroupgap=0,
)

fig = go.Figure(data=data, layout=layout)

offline.iplot(fig)

↑ y-axis is log scale.

Considering inbalance, there is little difference in Amount distribution.
In Amount > 2150, there are no class == 1 data.

Anather example

In [None]:
##make trace
trace0 = go.Histogram(
    x = df0.V2,
    opacity = 0.7,
    name = 'class0',
    xbins = dict(
        start = min(df.V2),
        end = max(df.V2),
        size = 5
    )
)
trace1 = go.Histogram(
    x = df1.V2,
    opacity = 0.7,
    name = 'class1',
    xbins = dict(
        start = min(df.V2),
        end = max(df.V2),
        size = 5
    )
)
data = [trace0, trace1]

##define layout
layout = go.Layout(
    #barmode='overlay',
    yaxis=dict(
        type='log',
        autorange=True,
        title = 'frequency'
    ),
    xaxis=dict(
        autorange=True,
        title = 'V2'
    ),
    bargap=0.1,
    bargroupgap=0.05,
)

fig = go.Figure(data=data, layout=layout)

offline.iplot(fig)

## 3D-PCA

PCA is a dimensionality reduction method.
Using Scatter3d(a plotly function), we can plot the data points in three dimensions.
Fortunately, Vn (n=1,2...,27) are features PCAed from original secret data.
This time, I can use them.

But Scatter3d function is heavy in case of plotting too many.
Very many plots cause memory shortage. 
Then, let's try undersampling in Class==0 prior to visualization.

In [None]:
print('Class 0:',len(df0),', Class 1:',len(df1))

In [None]:
##random under sampling
df0u = df0.sample(frac = 0.05)
print('Class 0:',len(df0u),', Class 1:',len(df1))

In [None]:
## make trace
trace0 = go.Scatter3d(
    x = df0u.V1,
    y = df0u.V2,
    z = df0u.V3,
    name = 'class0',
    mode = 'markers',
    opacity = 0.4,
    marker = dict(
        size = 2
    )
)
trace1 = go.Scatter3d(
    x = df1.V1,
    y = df1.V2,
    z = df1.V3,
    name = 'class1',
    mode = 'markers',
    marker = dict(
        size = 3
    )
)
## concatnate traces
data = [trace0, trace1]

## define layout
layout = go.Layout(
    title='3D-PCA',
    width=600,
    height=500,
    scene = dict(
        xaxis = dict(
            nticks=4, range = [min(df.V1),max(df.V1)], title='V1'),
        yaxis = dict(
            nticks=4, range = [min(df.V2),max(df.V2)], title='V2'),
        zaxis = dict(
            nticks=4, range = [min(df.V3),max(df.V3)], title='V3')
    ),
    showlegend=True)

fig = dict(data=data, layout=layout)
offline.iplot(fig)