# Visualizations

##### Notes
- Do the problem on a small sample of data to ensure the results are as expected and then apply to the larger set. Can do visualizations in excel if it helps.
- Will be experimenting with techniques, etc. and not all of these will produce the intended result but trying things is the key to getting the desired technique.
- Look at the ebook that talks about visualizations
- I personally don't believe in charting data for the sake of charting data. Should help to explain the questions.

#### Pandas Plot Types

There are several plot types built-in to pandas, most of them statistical plots by nature:

* df.plot.area     
* df.plot.barh     
* df.plot.density  
* df.plot.hist     
* df.plot.line     
* df.plot.scatter
* df.plot.bar      
* df.plot.box      
* df.plot.hexbin   
* df.plot.kde      
* df.plot.pie

You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms shown in the list above (e.g. 'box','barh', etc..)
___

#### Plotly and Cufflinks

* scatter
* bar
* box
* spread
* ratio
* heatmap
* surface
* histogram
* bubble

In [None]:
# MULTIPLE PLOTS ON SAME CANVAS
# plt.subplot(nrows, ncols, plot_number)
plt.subplot(1,2,1)
plt.plot(x, y, 'r--') # More on color options later
plt.subplot(1,2,2)
plt.plot(y, x, 'g*-');

![image.png](attachment:image.png)

In [None]:
# Creates blank canvas
fig = plt.figure()

axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # main axes
axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes

# Larger Figure Axes 1
axes1.plot(x, y, 'b')
axes1.set_xlabel('X_label_axes2')
axes1.set_ylabel('Y_label_axes2')
axes1.set_title('Axes 2 Title')

# Insert Figure Axes 2
axes2.plot(y, x, 'r')
axes2.set_xlabel('X_label_axes2')
axes2.set_ylabel('Y_label_axes2')
axes2.set_title('Axes 2 Title');

![image.png](attachment:image.png)

In [None]:
# distplot
# The distplot shows the distribution of a univariate set of observations.
sns.distplot(tips['total_bill'],kde=False,bins=30)

![image.png](attachment:image.png)

In [None]:
# jointplot
# jointplot() allows you to basically match up two distplots for bivariate data. With your choice of what kind parameter to compare with:
# “scatter”, “reg”, “resid”, “kde”, “hex”
sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')

![image.png](attachment:image.png)

In [None]:
sns.pairplot(tips,hue='sex',palette='coolwarm')

![image.png](attachment:image.png)

In [None]:
# PairGrid
# Pairgrid is a subplot grid for plotting pairwise relationships in a dataset# Map to upper,lower, and diagonal
g = sns.PairGrid(iris)
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)

![image.png](attachment:image.png)

In [None]:
sns.pairplot(iris,hue='species',palette='rainbow')

![image.png](attachment:image.png)

In [None]:
# Facet Grid
# FacetGrid is the general way to create grids of plots based off of a feature
g = sns.FacetGrid(tips, col="time",  row="smoker",hue='sex')
# Notice hwo the arguments come after plt.scatter call
g = g.map(plt.scatter, "total_bill", "tip").add_legend()

![image.png](attachment:image.png)

In [None]:
# JointGrid
# JointGrid is the general version for jointplot() type grids
g = sns.JointGrid(x="total_bill", y="tip", data=tips)
g = g.plot(sns.regplot, sns.distplot)

![image.png](attachment:image.png)

In [None]:
# kdeplot AND rug plots
# kdeplots are Kernel Density Estimation plots. 
# These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. 
# Plot the sum of the basis function
sum_of_kde = np.sum(kernel_list,axis=0)

# Plot figure
fig = plt.plot(x_axis,sum_of_kde,color='indianred')

# Add the initial rugplot
sns.rugplot(dataset,c = 'indianred')

# Get rid of y-tick marks
plt.yticks([])

# Set title
plt.suptitle("Sum of the Basis Functions")

![image.png](attachment:image.png)

In [None]:
# barplot
sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.std)

![image.png](attachment:image.png)

In [None]:
# countplot
sns.countplot(x='sex',data=tips)

![image.png](attachment:image.png)

In [None]:
# boxplot and violinplot
# boxplots and violinplots are used to shown the distribution of categorical data
# A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons 
# between variables or across levels of a categorical variable. 
# The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, 
# except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
sns.boxplot(x="day", y="total_bill", data=tips,palette='rainbow')

# Can do entire dataframe with orient='h'
sns.boxplot(data=tips,palette='rainbow',orient='h')

# 
sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="coolwarm")

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [None]:
# violinplot
# A violin plot plays a similar role as a box and whisker plot. 
# It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that 
# those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, 
# the violin plot features a kernel density estimation of the underlying distribution.
sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',split=True,palette='Set1')

![image.png](attachment:image.png)

In [None]:
# stripplot and swarmplot
# The stripplot will draw a scatterplot where one variable is categorical.
# The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap.
sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)

![image.png](attachment:image.png)

In [None]:
sns.swarmplot(x="day", y="total_bill",hue='sex',data=tips, palette="Set1", split=True)

![image.png](attachment:image.png)

In [None]:
sns.violinplot(x="tip", y="day", data=tips,palette='rainbow')
sns.swarmplot(x="tip", y="day", data=tips,color='black',size=3)

![image.png](attachment:image.png)

In [None]:
# factorplot
# factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type
sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')

![image.png](attachment:image.png)

In [None]:
# clustermap
# The clustermap uses hierarchal clustering to produce a clustered version of the heatmap
# More options to get the information a little clearer like normalization
sns.clustermap(pvflights,cmap='coolwarm',standard_scale=1)

![image.png](attachment:image.png)

In [None]:
# lmplot allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, 
# as well as coloring the hue based off of features
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm')

![image.png](attachment:image.png)

In [None]:
# lmplot kwargs get passed through to regplot which is a more general form of lmplot()
# http://matplotlib.org/api/markers_api.html
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm',
           markers=['o','v'],scatter_kws={'s':100}

![image.png](attachment:image.png)

In [None]:
sns.lmplot(x="total_bill", y="tip", row="sex", col="time",data=tips)

![image.png](attachment:image.png)

In [None]:
sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm',
          aspect=0.6,size=8)

![image.png](attachment:image.png)

In [None]:
# Choropleth Maps
data = dict(type='choropleth',
            colorscale = 'Viridis',
            reversescale = True,
            locations = usdf['State Abv'],
            z = usdf['Voting-Age Population (VAP)'],
            locationmode = 'USA-states',
            text = usdf['State'],
            marker = dict(line = dict(color = 'rgb(255,255,255)',width = 1)),
            colorbar = {'title':"Voting-Age Population (VAP)"}
            ) 

layout = dict(title = '2012 General Election Voting Data',
              geo = dict(scope='usa',
                         showlakes = True,
                         lakecolor = 'rgb(85,173,240)')
             )

choromap = go.Figure(data = [data],layout = layout)
plot(choromap,validate=False)

![image.png](attachment:image.png)

In [None]:
# Get a count of how many instances for 0,1,2,3 in a particular group (accuracy_group in this case)
# game_session is 
train_labels.groupby('accuracy_group')['game_session'].count() \
    .plot(kind='barh', figsize=(15, 5), title='Target (accuracy group)')
plt.show()

# Get a count in sns that shows the % value it represents overall
%matplotlib inline
sns.set(style="darkgrid")
ax = sns.countplot(y='accuracy_group',  data=train_labels)

total = len(train_labels['accuracy_group'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))
        
plt.show()

