# CS445 Lab 3: 3D Vizualizations

## Chosen Dataset: Wine Quality

**Dataset Explanation: The  goals of these datasets are to, "model wine quality based on physicochemical tests," (Cortez 1). These datasets are divided into white wines and red wines however, in this Jupyter Project, only the red wine dataset will be visualized.Unfortunately, the units of measurement are not provided on the UCI website so, the data visualized in this project, and the interpretations as a result, must be taken with a grain of salt.*  

## Initial Data Visualization Setup

We begin by importing the necessary libraries and loading the dataset to explore its structure.


In [34]:
# Necessary import statements
import pandas as pd

# Read the data from the red wine
df = pd.read_csv('winequality-red.csv', sep=';')

# Display a preview of the data
print(df.head())


   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

# Visualization 1: Alcohol vs Sulphates vs Quality 3D Scatter Plot

**Purpose and Rationale**  
This scatter plot visualizes alcohol content vs sulphate content vs wine quality on the x, y, and z axes respectively. This allows us to analyza how the alcohol and sulphate content of a wine relates to its perceived quality

## Insights and Interpretations:
1. As the quality of wine increases, the concentration of data points on the alcohol content axis moves in the positive direction.
2. The vast majority of red wines are at quality level 5 or 6
3. The vast majority of red wines contain between 0 and 1 sulphates

**Interpretation: In general, most wines are mid-quality and relatively low in sulfate content. However, as alcohol content increases, so does perceived quality.** 


In [35]:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio

# Read the data
df = pd.read_csv('winequality-red.csv', sep=';')

# Create 3D scatter plot using Plotly
fig = go.Figure(data=[go.Scatter3d(
    x=df['alcohol'],
    y=df['sulphates'],
    z=df['quality'],
    mode='markers',
    marker=dict(
        size=4,
        color=df['quality'],  # Color by quality
        colorscale='Viridis',
        opacity=0.8,
        colorbar=dict(title='Quality')
    )
)])

# Set layout and axis labels, force axes to start at 0
fig.update_layout(
    title='3D Scatter Plot: Alcohol vs Sulphates vs Quality',
    scene=dict(
        xaxis=dict(title='Alcohol', range=[0, df['alcohol'].max()]),
        yaxis=dict(title='Sulphates', range=[0, df['sulphates'].max()]),
        zaxis=dict(title='Quality', range=[0, 10])
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

# Show the interactive plot
fig.show()

# Save the figure as a PNG image
pio.write_image(fig, 'Fisher_Bachman-Rhodes+scatter.png', format='png', width=1000, height=600, scale=2)


# Visualization 2: Volatile Acidity vs Citric Acid vs Quality 3D Mesh Plot

**Purpose and Rationale**  
This mesh plot visualizes volatile acidity vs citric acid content vs wine quality on the x, y, and z axes respectively. This allows us to analyza how the acidity and citric acid content of a wine relates to its perceived quality.

## Insights and Interpretations:
1. The majority of higher quality wines, between 7 and 8, are at a relatively low volatile acidity (between 0.2 and 0.6) and between 0.2 and 0.8 on the citric acid scale
2. Most wines exist between 4 and 7 on the quality scale. 

**Interpretation: In general, the higher the quality of wine, the smaller the spread of acid content. By this we could assume that for a wine to be perceived as "high quality", it must fit within a more specific category of acidity than lower quality wines.** 


In [36]:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio

# Read the data
df = pd.read_csv('winequality-red.csv', sep=';')

# Use columns directly as 3D coordinates
x = df['volatile acidity']
y = df['citric acid']
z = df['quality']

# Create the 3D mesh plot
fig = go.Figure(data=[go.Mesh3d(
    x=x,
    y=y,
    z=z,
    intensity=z,               # Color mapping based on quality
    colorscale='Viridis',
    colorbar=dict(title='Quality'),
    opacity=1.0,
    alphahull=5,
    flatshading=True,          # Makes lighting more uniform (clearer faces)
    lighting=dict(
        ambient=0.5,           # More uniform ambient light
        diffuse=0.6,
        specular=0.1,          # Reduce shininess
        roughness=0.9,
        fresnel=0.1
    ),
    lightposition=dict(
        x=100,
        y=200,
        z=0
    ),
    name='Wine Quality Mesh'
)])

# Update layout to start axes at 0
fig.update_layout(
    title='3D Mesh Plot: Volatile Acidity vs Citric Acid vs Quality (Readable View)',
    scene=dict(
        xaxis=dict(title='Volatile Acidity', range=[0, df['volatile acidity'].max()]),
        yaxis=dict(title='Citric Acid', range=[0, df['citric acid'].max()]),
        zaxis=dict(title='Quality', range=[0, df['quality'].max()])
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

# Show the plot
fig.show()

# Save the figure as a PNG image
pio.write_image(fig, 'Fisher_Bachman-Rhodes+mesh-readable.png', format='png', width=1000, height=600, scale=2)


# Visualization 3: Fixed Acidity vs pH vs Quality 3D Surface Plot

**Purpose and Rationale**  
This plot visualizes fixed acidity vs pH vs wine quality on the x, y, and z axes respectively. This allows us to analyze how the fixed acidity and pH level of a wine relates to its perceived quality.

## Insights and Interpretations:
1. All wines have a pH level between 2.5 and 4
2. Nearly all wines exist between a quality level of 3 and 8.
3. There is the largest spread in quality at the pH level of 3.4
4. There is the largest spread in quality at the fixed acidity range from 10 to 14.
5. The highest quality positive spikes exist at/around 5 and 12 on the fixed acidity scale and between 2.8 and 3.5 on the pH scale

**Interpretation: In general, higher quality wines exist in a small pH range but can exist at multiple different, but specific, fixed acidity levels. This may imply that the flavour impact (and thus, perceived quality) of pH levels and fixed acidity levels can differ greatly.** 


In [37]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from scipy.interpolate import griddata
import plotly.io as pio

# Read the data
df = pd.read_csv('winequality-red.csv', sep=';')

# Prepare grid
x = df['fixed acidity']
y = df['pH']
z = df['quality']

xi = np.linspace(x.min(), x.max(), 50)
yi = np.linspace(y.min(), y.max(), 50)
xi, yi = np.meshgrid(xi, yi)
zi = griddata((x, y), z, (xi, yi), method='linear')

# Create 3D surface plot using Plotly
fig = go.Figure(data=[go.Surface(
    x=xi,
    y=yi,
    z=zi,
    colorscale='RdBu',
    colorbar=dict(title='Quality')
)])

# Set layout and axis labels, ensuring axes start at 0
fig.update_layout(
    title='3D Surface Plot: Fixed Acidity vs pH vs Quality',
    scene=dict(
        xaxis=dict(title='Fixed Acidity', range=[0, df['fixed acidity'].max()]),
        yaxis=dict(title='pH', range=[0, df['pH'].max()]),
        zaxis=dict(title='Quality', range=[0, df['quality'].max()])
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

# Show interactive plot
fig.show()

# Save the figure as a PNG image
pio.write_image(fig, 'Fisher_Bachman-Rhodes+surface.png', format='png', width=1000, height=600, scale=2)


# Visualization 4: Alcohol vs Quality vs Frequency 3D Histogram Plot

**Purpose and Rationale**  
This histogram visualizes alcohol content vs alcohol quality vs frequency on the x, y, and z axes respectively. This allows us to analyze how the frequency of consumption is related to a wine's alcohol quality and alcohol content.

## Insights and Interpretations:
1. In general, the lower the alcohol percentage, the more frequently it is drunk
2. The frequency of occurrence is most concentrated between quality levels 5-8
3. The frequency of occurrence is most concentrated between alcohol content levels 8-11

**Interpretation: We can interpret this data to mean that the most commonly purchased/consumed red wine is on the lower end of alcohol contents and in the midrange of quality.** 


In [38]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.io as pio

# Load dataset
df = pd.read_csv('winequality-red.csv', sep=';')

# Bin alcohol and quality
alcohol_bins = np.floor(df['alcohol']).astype(int)
quality_bins = df['quality']

# Create 2D histogram
hist, xedges, yedges = np.histogram2d(alcohol_bins, quality_bins, bins=(10, 10))

# Bin coordinates
xpos, ypos = np.meshgrid(xedges[:-1], yedges[:-1], indexing="ij")
xpos = xpos.ravel()
ypos = ypos.ravel()
zpos = np.zeros_like(xpos)
dz = hist.ravel()

# Bar dimensions
dx = dy = 0.8

bars = []

# Correct indices for 12 triangles forming 6 faces of a cuboid
faces_i = [0, 0, 0, 1, 1, 2, 2, 3, 4, 4, 5, 6]
faces_j = [1, 2, 4, 2, 5, 3, 6, 0, 5, 6, 7, 7]
faces_k = [2, 3, 5, 5, 6, 0, 7, 4, 6, 7, 4, 4]

# Create 3D bars as cuboids
for x, y, z, h in zip(xpos, ypos, zpos, dz):
    if h > 0:
        x0, y0, z0 = x, y, z
        x1, y1, z1 = x + dx, y + dy, z + h

        # Define 8 vertices of the cuboid
        vertices = [
            [x0, y0, z0],  # 0
            [x1, y0, z0],  # 1
            [x1, y1, z0],  # 2
            [x0, y1, z0],  # 3
            [x0, y0, z1],  # 4
            [x1, y0, z1],  # 5
            [x1, y1, z1],  # 6
            [x0, y1, z1],  # 7
        ]

        x_cuboid, y_cuboid, z_cuboid = zip(*vertices)

        bars.append(go.Mesh3d(
            x=x_cuboid,
            y=y_cuboid,
            z=z_cuboid,
            i=faces_i,
            j=faces_j,
            k=faces_k,
            color='steelblue',
            opacity=1.0,
            flatshading=True,
            showscale=False,
        ))

# Plot
fig = go.Figure(data=bars)
fig.update_layout(
    title='3D Histogram Plot: Alcohol vs Quality vs Frequency',
    scene=dict(
        xaxis=dict(title='Alcohol (Binned)', range=[0, xedges[-1]]),
        yaxis=dict(title='Quality', range=[0, yedges[-1]]),
        zaxis=dict(title='Frequency', range=[0, dz.max() * 1.1])  # Add slight padding
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

# Show plot
fig.show()

# Save the figure as a PNG image
pio.write_image(fig, 'Fisher_Bachman-Rhodes+histogram.png', format='png', width=1000, height=600, scale=2)


# Visualization 5: Residual Sugar vs Quality vs Frequency 3D Histogram Plot

**Purpose and Rationale**  
This histogram visualizes alcohol residual sugar vs alcohol content vs quality on the x, y, and z axes respectively. This allows us to analyze how the perceived quality of a wine is related to a wine's alcohol content and sugar content.

## Insights and Interpretations:
1. In general, the lower the sugar content, the higher the perceived quality
2. In general, the higher the alcohol content, the higher the perceived quality

**Interpretation: In general, red wines with higher alcohol contents and lower sugar contents than their counterparts are perceived as having higher average quality.** 


In [39]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.io as pio

# Read the data
df = pd.read_csv('winequality-red.csv', sep=';')

# Bin alcohol and residual sugar into discrete steps
df['alcohol_bin'] = pd.cut(df['alcohol'], bins=10)
df['sugar_bin'] = pd.cut(df['residual sugar'], bins=10)

# Group by binned alcohol and residual sugar and calculate mean quality
grouped = df.groupby(['sugar_bin', 'alcohol_bin'])['quality'].mean().reset_index()

# Convert bin intervals to midpoints for plotting
grouped['sugar_mid'] = grouped['sugar_bin'].apply(lambda x: x.mid)
grouped['alcohol_mid'] = grouped['alcohol_bin'].apply(lambda x: x.mid)

# Sort values for a continuous line
grouped_sorted = grouped.sort_values(by=['sugar_mid', 'alcohol_mid'])

# Create interactive 3D line plot using Plotly
fig = go.Figure(data=go.Scatter3d(
    x=grouped_sorted['sugar_mid'],
    y=grouped_sorted['alcohol_mid'],
    z=grouped_sorted['quality'],
    mode='lines+markers',
    line=dict(color='darkgreen', width=4),
    marker=dict(size=5)
))

# Set axis labels and force axes to start at zero
fig.update_layout(
    scene=dict(
        xaxis=dict(title='Residual Sugar', range=[0, grouped_sorted['sugar_mid'].max()]),
        yaxis=dict(title='Alcohol', range=[0, grouped_sorted['alcohol_mid'].max()]),
        zaxis=dict(title='Mean Quality', range=[0, grouped_sorted['quality'].max()])
    ),
    title='3D Line Plot: Residual Sugar vs Alcohol vs Quality',
    margin=dict(l=0, r=0, b=0, t=40)
)

# Show the interactive plot
fig.show()

# Save the figure as a PNG image
pio.write_image(fig, 'Fisher_Bachman-Rhodes+line.png', format='png', width=1000, height=600, scale=2)






# References:
# Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Wine Quality [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56S3T.