# **Wine Quality Prediction**

-------------

## **Objective**

The objective of this project is to predict the quality of wine based on various chemical properties using a machine learning model.

## **Data Source**

The dataset used in this project consists of wine quality data for red and white wines, which includes various chemical properties and quality ratings. The dataset is available on UCI Machine Learning Repository.

## **Import Library**

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix


## **Import Data**

In [None]:
# Import Data
# Load dataset
data_red = pd.read_csv('winequality-red.csv')
data_white = pd.read_csv('winequality-white.csv')

# Combine both datasets
data_red['type'] = 'red'
data_white['type'] = 'white'
data = pd.concat([data_red, data_white], axis=0)


## **Describe Data**

In [None]:
# Describe Data

data.describe()


## **Data Visualization**

In [None]:
# Data Visualization


# Correlation chart
colormap = plt.cm.viridis
plt.figure(figsize=(12, 12))
plt.title('Correlation of Features', y=1.05, size=15)
sns.heatmap(data.drop('type', axis=1).astype(float).corr(), linewidths=0.1, vmax=1.0, square=True, linecolor='white', annot=True)
plt.show()


## **Data Preprocessing**

In [None]:
# Data Preprocessing


# Check for missing values
data.isnull().sum()

# Check for duplicate entries
extra = data[data.duplicated()]
print("Number of duplicate rows: ", extra.shape[0])

# Drop duplicates
data = data.drop_duplicates()

# Encode 'type' column
data['type'] = data['type'].map({'red': 0, 'white': 1})


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
# Define Target Variable (y) and Feature Variables (X)

y = data.quality  # set 'quality' as target
X = data.drop(['quality'], axis=1)  # rest are features


## **Train Test Split**

In [None]:
# Train Test Split

seed = 8  # set seed for reproducibility
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=seed)


## **Modeling**

In [None]:
# Modeling

RF_clf = RandomForestClassifier(random_state=seed)
RF_clf.fit(X_train, y_train)


## **Model Evaluation**

In [None]:
# Model Evaluation

pred_RF = RF_clf.predict(X_test)
accuracy = accuracy_score(y_test, pred_RF)
print("Accuracy: ", accuracy)

conf_matrix = confusion_matrix(y_test, pred_RF)
print("Confusion Matrix:\n", conf_matrix)


## **Prediction**

In [None]:
def show_quality(inputs):
    try:
        new = np.array([inputs])
        Ans = RF_clf.predict(new)
        fin = str(Ans)[1:-1]  # Remove [ ]
        return fin
    except Exception as e:
        return f"Error: {str(e)}"

## **Explaination**
This project aims to predict wine quality based on chemical properties using machine learning. The dataset includes red and white wines, sourced from the UCI Machine Learning Repository. We use Python libraries for data manipulation, visualization, and modeling. After preprocessing and splitting the data, a Random Forest Classifier is trained and evaluated for predicting wine quality. This project showcases the application of machine learning in quality assessment, offering insights into wine composition and predictive modeling techniques.







