**Methods**

Given that the response variable (wine quality based on sensory data) is quantitative, we will use regression for our predictions. All of the other variables in this dataset appear to have the potential to affect the taste and/or smell of wine. Therefore, to begin our analysis, we have visualized each of the predictor variables as scatter plots with the predictor variable on the x-axis and wine quality on the y-axis. 

In [63]:
fixed_acidity_plot <- ggplot(wine_data, aes(x = fixed_acidity, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Fixed Acidity (g(tartaric acid)/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Fixed Acidity and Wine Quality")
fixed_acidity_plot

PearsonCorrelation_Fixed_Acidity <- cor(wine_data$fixed_acidity, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Fixed_Acidity

volatile_acidity_plot <- ggplot(wine_data, aes(x = volatile_acidity , y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Volatile Acidity (g(acetic acid)/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Volatile Acidity and Wine Quality")
volatile_acidity_plot

PearsonCorrelation_Volatile_Acidity <- cor(wine_data$volatile_acidity, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Volatile_Acidity

residual_sugar_plot <- ggplot(wine_data, aes(x = residual_sugar, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Residual Sugar (g/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Residual Sugar and Wine Quality")
residual_sugar_plot

PearsonCorrelation_Residual_Sugar <- cor(wine_data$residual_sugar, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Residual_Sugar

citric_acid_plot <- ggplot(wine_data, aes(x = citric_acid, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Citric Acid (g/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Citric Acid and Wine Quality")
citric_acid_plot

PearsonCorrelation_Citric_Acid <- cor(wine_data$citric_acid, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Citric_Acid

chlorides_plot <- ggplot(wine_data, aes(x = chlorides, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Chlorides (g(sodium chloride)/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Chlorides and Wine Quality")
chlorides_plot

PearsonCorrelation_Chlorides <- cor(wine_data$chlorides, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Chlorides

free_sulfur_dioxide_plot <- ggplot(wine_data, aes(x = free_sulfur_dioxide, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Free Sulfur Dioxide (mg/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Free Sulfur Dioxide and Wine Quality")
free_sulfur_dioxide_plot

PearsonCorrelation_Free_Sulfur_Dioxide <- cor(wine_data$free_sulfur_dioxide, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Free_Sulfur_Dioxide

total_sulfur_dioxide <- ggplot(wine_data, aes(x = total_sulfur_dioxide, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Total Sulfur Dioxide (mg/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Total Sulfur Dioxide and Wine Quality")
total_sulfur_dioxide

PearsonCorrelation_Total_Sulfur_Dioxide <- cor(wine_data$total_sulfur_dioxide, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Total_Sulfur_Dioxide

density_plot <- ggplot(wine_data, aes(x = density, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Density (g/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Density and Wine Quality")
density_plot

PearsonCorrelation_Density <- cor(wine_data$density, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Density

pH_plot <- ggplot(wine_data, aes(x = pH, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("pH") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between pH and Wine Quality")
pH_plot

PearsonCorrelation_pH <- cor(wine_data$pH, wine_data$quality, method = c("pearson"))
PearsonCorrelation_pH

sulphates_plot <- ggplot(wine_data, aes(x = sulphates, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Sulphates (g(potassium sulfate)/dm^3)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Sulphates and Wine Quality")
sulphates_plot

PearsonCorrelation_Sulphates <- cor(wine_data$sulphates, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Sulphates

alcohol_plot <- ggplot(wine_data, aes(x = alcohol, y = quality)) +
  geom_point(alpha = 0.4) +
  xlab("Alcohol (%vol)") +
  ylab("Wine Quality") +
  theme(text = element_text(size = 14)) +
  ggtitle("Relationship Between Alcohol and Wine Quality")
alcohol_plot

PearsonCorrelation_Alcohol <- cor(wine_data$alcohol, wine_data$quality, method = c("pearson"))
PearsonCorrelation_Alcohol

ERROR: Error in ggplot(wine_data, aes(x = fixed_acidity, y = quality)): object 'wine_data' not found


We can see that fixed acidity, volatile acidity, residual sugar, free sulfur dioxide, and alcohol are moderately correlated with wine quality. In order to conduct our data analysis, we will first use a multivariate K-nearest neighbors-based approach and perform cross-validation to determine at which value of K the minimum RMSPE occurs. Using this K value, we will re-train our KNN regression model and then make predictions on the testing data set. The RMSPE will let us evaluate our model's accuracy. We will also use a multiple linear regression approach to calculate the line of best fit, including the intercept and slope coefficients, and we will evaluate any outliers. We will again compute the RMSPE. We will choose the model with the least bias (i.e., the lower RMSPE). If the linear regression model is the better fit, we will visualise our data using a flat plane; otherwise, we will use a flexible/wiggly plane.