# <font color=blueviolet>Multiclass Classification of wine quality </font>

<font color=black>The quality of the wine is an indicator of the true taste of the wine, which mainly depends on its chemical composition, but also on its color and aroma.<br>In this analysis, there are two sets of data related to red and white wine samples. The aim is to model wine quality based on personal taste using measured physico-chemical values.</font>

<font color=blue>In this third and last part of the project, The best model is used to predict the quality of new data.</font><br><br>
<font color=black>The used modules in this project are first to be imported.</font>

In [1]:
# Importing Libraries
import pandas as pd
import numpy as np
import pickle

## <font color=green>Cleaning and transforming the data</font>

<font color=blue>The clean-data function aims to replace the NaNs values with the median calculated from the training set, and then transforms the skewed features using the log function.</font>

In [2]:
# Dealing with NaNs and transformin the skewed features

def clean_transform_data(df):
    """Returns cleaned DataFrame.
    
    Args: 
        df (pd.DataFrame) : uncleaned DataFrame
        
    Returns:
        df  (pd.DataFrame) : cleaned DataFrame
    
    """
    
    # medians of various fetures
    median={'fixed acidity': 7.0, 'volatile acidity': 0.29, 'citric acid': 0.31, 'residual sugar': 3.0,
            'chlorides': 0.047, 'free sulfur dioxide': 29.0, 'total sulfur dioxide': 118.0,
            'density': 0.99489,'pH': 3.21,'sulphates': 0.51,'alcohol': 10.3,'quality': 6.0}
    
    # replace NaN be corrsponding Median
    for i in range(1,df.shape[1]-1):
        col_name=df.columns[i]
        df[col_name] = df[col_name].fillna(median[col_name])
    
   # Applying log function to the heavily skewed features
    skewed_features=['residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'sulphates']
    for feature in skewed_features:
        df[feature] = np.log(df[feature]) 

    return df

### <font color=green>Main code</font><br>
<font color=black>The prediction process involves the following steps:</font><br>
<font color=blue>1) Reading the data from a CSV file into a data frame. </font><br>
<font color=blue>2) Clean and transform the data frame.</font><br>
<font color=blue>3) Reading the best model from a pickle file.</font><br>
<font color=blue>4) Wine quality prediction.</font><br>
<font color=blue>5) Save the prediction results to an external file.</font>

In [3]:
# request the file name (with its path if in other directory)
wine_file = input('enter the name of the wine file, with the path if required')

# reading the data from file into dataframe
wine = pd.read_csv(wine_file, sep=";")

# cleaing and transforming the data
wine=clean_transform_data(wine)

# reading the best model
opt_model = pickle.load(open("opt_model.p",'rb')) 

# applying the prediction model
predictions = opt_model.predict(wine)

# Save predictions as a file with the same name as the data file after adding "y_pred"
pred_file=wine_file[:-4]+'_pred.csv'
pd.Series(predictions).to_csv(pred_file, index=False)

enter the name of the wine file, with the path if required aim_red_wine.csv
