# Final Project for DTSA-5304
### Ethan Tucker
#### 12/5/2021

Welcome to my final project for DTSA-5304. For my project I chose to use dash in conjunction with plotly to improve UI and ease of use for the client. Before starting, please make sure to install all required dependencies (all the packages listed in the chunk below), including statsmodels which is implicitly called when creating regression models in plotly. Also, please ensure to put the csv called "WineData.csv" which I have included on my [GitHub](https://github.com/firstrider55/DTSA-5503-Final-Project) in your working directory. Have fun, thanks for your time, and let me know what you think of my project :)

In [1]:
import altair as alt
import pandas as pd
import plotly as plt
from jupyter_dash import JupyterDash
import dash
from dash import dcc
from dash import html
import plotly.express as px
import plotly.graph_objects as go
from dash.dependencies import Input, Output

In [2]:
%cd C:/Users/first/Desktop

C:\Users\first\Desktop


In [3]:
%pwd

'C:\\Users\\first\\Desktop'

In [4]:
#I used R to join the Red and White datasets into one called "WineData". All I did was name the type of wine,
## then call full_join() in readr. R code will be placed in a footnote.
WineData = pd.read_csv("WineData.csv").sort_values(by = "quality", ascending = False)
list(WineData.columns)

['fixed acidity',
 'volatile acidity',
 'citric acid',
 'residual sugar',
 'chlorides',
 'free sulfur dioxide',
 'total sulfur dioxide',
 'density',
 'pH',
 'sulphates',
 'alcohol',
 'quality',
 'Type']

In [6]:
%%capture
#This chunk generates the scatterplot matrix used in Task 2

colors = {
    'background': '#111111',
    'text': '#7FDBFF'
}

names = ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"]
dims = [dict(label = names[i], values = WineData[names[i]]) for i in range(len(names))]
index_vals = WineData['Type'].astype('category').cat.codes

def fixColors(input):
    for i in range(len(input)):
        if input[i] == 1:
            input[i] = "white"
        else:
            input[i] = "#d62728"
    return(input)

newVals = fixColors(index_vals)

splom = go.Figure(data=go.Splom(
                dimensions= dims,
                showupperhalf=False,
                opacity = 0.4,
                marker=dict(color= newVals,
                            showscale=False, # colors encode categorical variables
                            line_color='white', line_width=0.5)
                ))

splom.update_layout(
    width = 1750,
    height = 1750,
    plot_bgcolor=colors['background'],
    paper_bgcolor=colors['background'],
    font_color=colors['text'])

In [38]:
#This chunk creates the application wherein all three tasks reside, along with all three tasks. The application is initialized in the next chunk.

app = JupyterDash(__name__)

app.layout = html.Div(style={'backgroundColor': colors['background']}, children=[
    
    dcc.Tabs([
        
        dcc.Tab(label='Task One', children=[
            
            html.H1(
                children='Task One: Understand Property Distributions',
                style={'textAlign': 'center','color': colors['text']}),
            
            html.Div(
                children='Please select an attribute you would like to know the distribution of below. Interaction options are in the upper right of the plot.', 
                style={'textAlign': 'center','color': colors['text']}),

            dcc.Graph(id='Violin_Plot'),
            
            dcc.Dropdown(
                id='Violin_Dropdown',
                options=[
                    {'label': 'Fixed Acidity', 'value': 'fixed acidity'},
                    {'label': 'Volatile Acidity', 'value': 'volatile acidity'},
                    {'label': 'Citric Acid', 'value': 'citric acid'},
                    {'label': 'Residual Sugar', 'value': 'residual sugar'},
                    {'label': 'Chlorides', 'value': 'chlorides'},
                    {'label': 'Free Sulfur Dioxide', 'value': 'free sulfur dioxide'},
                    {'label': 'Total Sulfur Dioxide', 'value': 'total sulfur dioxide'},
                    {'label': 'Density', 'value': 'density'},
                    {'label': 'pH', 'value': 'pH'},
                    {'label': 'Sulphates', 'value': 'sulphates'},
                    {'label': 'Alcohol', 'value': 'alcohol'},
                    {'label': 'Quality', 'value': 'quality'}
                ],
                value = "fixed acidity",
                placeholder = "Select a variable for the x - axis"),
            
            html.Br()
        
        ]),
        
        dcc.Tab(label='Task Two', children=[
            
            html.H1(
                children = "Task Two: Understand Correlation between Physical Attributes",
                style={'textAlign': 'center','color': colors['text']}
            ),
            
            html.Div(
                children = "This scatterplot matrix relates each physical attribute in the data with all others. Duplicate cells (upper right) are removed for clarity. Interaction options are in the upper right of the plot. Please scroll down to bottom of tab to explore individual relationships.",
                style={'textAlign': 'center','color': colors['text']}
            ),
            
            dcc.Graph(figure = splom),
            
            html.Br(),
            
            html.Div(
                children = "The below scatterplot serves as a zoom function, and provides a linear regression model for each wine type. Choose two physical attributes (and optionally filter by wine type).",
                style={'textAlign': 'center','color': colors['text']}
            ),
            
            dcc.Graph(id = "Task2_Scatterplot"),
            
            html.Br(),
            
            dcc.Dropdown(
                id = "Task2_Scatterplot_xaxis",
                options=[
                    {'label': 'Fixed Acidity', 'value': 'fixed acidity'},
                    {'label': 'Volatile Acidity', 'value': 'volatile acidity'},
                    {'label': 'Citric Acid', 'value': 'citric acid'},
                    {'label': 'Residual Sugar', 'value': 'residual sugar'},
                    {'label': 'Chlorides', 'value': 'chlorides'},
                    {'label': 'Free Sulfur Dioxide', 'value': 'free sulfur dioxide'},
                    {'label': 'Total Sulfur Dioxide', 'value': 'total sulfur dioxide'},
                    {'label': 'Density', 'value': 'density'},
                    {'label': 'pH', 'value': 'pH'},
                    {'label': 'Sulphates', 'value': 'sulphates'},
                    {'label': 'Alcohol', 'value': 'alcohol'},
                ],
                value = "fixed acidity",
                placeholder = "Select a variable for the x - axis"),
            
            html.Br(),
            
            dcc.Dropdown(
                id = "Task2_Scatterplot_yaxis",
                options=[
                    {'label': 'Fixed Acidity', 'value': 'fixed acidity'},
                    {'label': 'Volatile Acidity', 'value': 'volatile acidity'},
                    {'label': 'Citric Acid', 'value': 'citric acid'},
                    {'label': 'Residual Sugar', 'value': 'residual sugar'},
                    {'label': 'Chlorides', 'value': 'chlorides'},
                    {'label': 'Free Sulfur Dioxide', 'value': 'free sulfur dioxide'},
                    {'label': 'Total Sulfur Dioxide', 'value': 'total sulfur dioxide'},
                    {'label': 'Density', 'value': 'density'},
                    {'label': 'pH', 'value': 'pH'},
                    {'label': 'Sulphates', 'value': 'sulphates'},
                    {'label': 'Alcohol', 'value': 'alcohol'},
                ],
                value = "fixed acidity",
                placeholder = "Select a variable for the y - axis"
            ),
            
            html.Br(),
            
            dcc.Dropdown(
                id = "Task2_Scatterplot_filter",
                options=[
                    {'label' : "White", 'value': 'White'},
                    {'label' : "Red", 'value': 'Red'},
                    {'label' : 'Both', 'value': 'both'} 
                ], 
                value = "both",
                placeholder = "Select the wine types you want included on the scatterplot"
            ),
            
            html.Br()
            
        ]),
        
        
        dcc.Tab(label='Task Three', children=[
            
            html.H1(
                children='Task Three: Understand Correlation between Physical Attribute and Quality',
                style={'textAlign': 'center','color': colors['text']}),
            
            html.Div(
                children='Please select an attribute which you want plotted against quality. Interaction options are in the upper right of the plot. Mouse over regression line for R^2.', 
                style={'textAlign': 'center','color': colors['text']}),
            
            dcc.Graph(id = 'Quality_Corr_Plot'),
            
            html.Br(),
            
            dcc.Dropdown(
                id = 'Quality_Corr_Dropdown1',
                options=[
                    {'label': 'Fixed Acidity', 'value': 'fixed acidity'},
                    {'label': 'Volatile Acidity', 'value': 'volatile acidity'},
                    {'label': 'Citric Acid', 'value': 'citric acid'},
                    {'label': 'Residual Sugar', 'value': 'residual sugar'},
                    {'label': 'Chlorides', 'value': 'chlorides'},
                    {'label': 'Free Sulfur Dioxide', 'value': 'free sulfur dioxide'},
                    {'label': 'Total Sulfur Dioxide', 'value': 'total sulfur dioxide'},
                    {'label': 'Density', 'value': 'density'},
                    {'label': 'pH', 'value': 'pH'},
                    {'label': 'Sulphates', 'value': 'sulphates'},
                    {'label': 'Alcohol', 'value': 'alcohol'},
                ],
                value = "alcohol",
                placeholder = "Select a variable for the x - axis"),
                
            
            html.Br(),
            
            dcc.Dropdown(
                id = 'Quality_Corr_Dropdown2',
                options=[
                    {'label' : "White", 'value': 'White'},
                    {'label' : "Red", 'value': 'Red'},
                    {'label' : 'Both', 'value': 'both'} 
                ], 
                value = "both",
                placeholder = "Select the wine types you want included on the scatterplot"
            ),
            
            html.Br()
            
        ]),
    ])
    ])

#This callback updates the x-axis for Task 1, and determines whether to present a violin plot or histogram based on whether quality is selected.
@app.callback(
    Output('Violin_Plot', 'figure'),
    Input('Violin_Dropdown', 'value')

)
def changeViolinAxis(value):
    if value != "quality":
        fig = px.violin(WineData, x = value, color="Type",  color_discrete_sequence=["white", "#d62728"], box=True, hover_data=WineData.columns,
                    title="Probability Density")

        fig.update_layout(
            plot_bgcolor=colors['background'],
            paper_bgcolor=colors['background'],
            font_color=colors['text'],
            height = 750
        )
            
    
    else:
        fig = px.histogram(WineData, x = value, color = "Type", color_discrete_sequence=["white", "#d62728"], histnorm='probability density',
                          barmode = "overlay", opacity = 0.75, title = "Quality is represented with a histogram due to its discrete nature")
        
        fig.update_layout(
            plot_bgcolor=colors['background'],
            paper_bgcolor=colors['background'],
            font_color=colors['text'])
    
    return(fig)

#This callback updates the scatterplot (not splom) in Task 2. It also enables filtration by wine type.
@app.callback(
        Output("Task2_Scatterplot", "figure"),
        Input("Task2_Scatterplot_xaxis", "value"),
        Input("Task2_Scatterplot_yaxis", "value"),
        Input("Task2_Scatterplot_filter", "value"),
)
def changeAttributeVsAttributePlot(xaxis_var, yaxis_var, whichpoints):
    if whichpoints == "Red":
        df = WineData[WineData["Type"] == whichpoints]
        figure = px.scatter(df, x = xaxis_var, y = yaxis_var, color = "Type", color_discrete_sequence=["#d62728"],  trendline = "ols")
        
    elif whichpoints == "Red":
        df = WineData[WineData["Type"] == whichpoints]
        figure = px.scatter(df, x = xaxis_var, y = yaxis_var, color = "Type", color_discrete_sequence=["white"],  trendline = "ols")
    
    else:
        figure = px.scatter(WineData, x = xaxis_var, y = yaxis_var, color = "Type", color_discrete_sequence=["white", "#d62728"],  trendline = "ols")
        
    figure.update_layout(
        plot_bgcolor=colors['background'],
        paper_bgcolor=colors['background'],
        font_color=colors['text'],
        width = 2000,
        height = 1000
    )
    
    figure.update_traces(
        marker = dict(size=7),
        opacity = 0.8
    )
    
    return figure

#This callback updates the scatterplot in Task 3. It enables filtration by wine type and selection of x-axis variable
@app.callback(
        Output('Quality_Corr_Plot', 'figure'),
        Input('Quality_Corr_Dropdown1', 'value'),
        Input('Quality_Corr_Dropdown2', 'value')
)
def changeQualityCorrPlot(xaxis_var, whichpoints):
    
    if whichpoints == "Red":
        df = WineData[WineData["Type"] == whichpoints]
        figure = px.scatter(df, x = xaxis_var, y = "quality", color = "Type", color_discrete_sequence=["#d62728"],  trendline = "ols")
        
    elif whichpoints == "White":
        df = WineData[WineData["Type"] == whichpoints]
        figure = px.scatter(df, x = xaxis_var, y = "quality", color = "Type", color_discrete_sequence=["white"],  trendline = "ols")
        
    else:
        figure = px.scatter(WineData, x = xaxis_var, y = "quality", color = "Type", color_discrete_sequence=["white", "#d62728"],  trendline = "ols")

    figure.update_layout(
        plot_bgcolor=colors['background'],
        paper_bgcolor=colors['background'],
        font_color=colors['text'],
        width = 2000,
        height = 1000
    )
    
    figure.update_traces(
        marker = dict(size=7),
        opacity = 0.8
    )
    
    return figure


In [39]:
#Please click on the link after running this chunk to access the application. 
app.run_server()

Dash app running on http://127.0.0.1:8050/


In [None]:
#Here is the R code I used to create WineData. I used the RedWineQualities dataset from 

# library(tidyverse)
# RedWineData <- read_csv("RedWineQualities.csv")
# WhiteWineData <- read.table("winequality-white.txt", nrows = 4899)
# names(WhiteWineData) <- as.character(WhiteWineData[1,])
# WhiteWineData <-  WhiteWineData[2:nrow(WhiteWineData), ]

# parse_column <- function(data, column_number){
  
#   parse_me <- data[ , column_number]
#   return(parse_number(parse_me))
# }

# parse_dataframe <- function(data){
#   n <- ncol(data)
#   for(i in 1:n){
#     data[, i] <- parse_column(data, i)
#   }
#   return(data)
# }

# WhiteWineData <- parse_dataframe(WhiteWineData)

# n.white <- nrow(WhiteWineData)
# n.red <- nrow(RedWineData)
# newwhitecol <- rep("White", n.white)
# newredcol <- rep("Red", n.red)

# WhiteWineData$Type <- newwhitecol
# RedWineData$Type <- newredcol

# WineData <- full_join(WhiteWineData, RedWineData)