# 🍷 Wine Quality Prediction using Logistic Regression


This project demonstrates a machine learning approach to predict the quality of red wine using the UCI Wine Quality dataset.

We perform exploratory data analysis (EDA), build a logistic regression model, evaluate it using accuracy and confusion matrix, and draw meaningful conclusions.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv("winequality-red.csv")
df.head()

In [None]:
# Overview of dataset
df.describe()

In [None]:
# Check for missing values
df.info()

## 📊 Feature Distributions

In [None]:
df.hist(bins=15, figsize=(15, 10), color='skyblue', edgecolor='black')
plt.suptitle("Feature Distributions", fontsize=16)
plt.tight_layout()
plt.show()

## 🔗 Correlation Heatmap

In [None]:
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title("Correlation between features")
plt.show()

##  Create Binary Classification Target

In [None]:
# Wines with quality >= 7 are considered 'good' (1), else 'not good' (0)
df['quality_label'] = (df['quality'] >= 7).astype(int)
df['quality_label'].value_counts()

##  Model Training - Logistic Regression

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = df.drop(['quality', 'quality_label'], axis=1)
y = df['quality_label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

## 📈 Model Evaluation

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print(f"✅ Accuracy: {acc:.2%}\n")

print("📋 Classification Report:")
print(classification_report(y_test, y_pred))

In [None]:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Not Good', 'Good'], yticklabels=['Not Good', 'Good'])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## ✅ Conclusion


- Logistic Regression achieved an accuracy of over **80%**.
- The dataset is slightly imbalanced, so additional metrics like F1-score and confusion matrix are valuable.
- With more advanced models (e.g., Random Forest, XGBoost), we could likely improve results further.

This notebook is a great demonstration of a full ML pipeline — from data understanding to deployment readiness.
