DataSet Name:- WineQuality-white.csv

DataSet Description Link:- https://archive.ics.uci.edu/ml/datasets/Wine+Quality

DataSet Download Link:- https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

DataSet Description:-

Attribute Description:- Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Aim:- Implement distance measuring techniques for two features of your dataset: (a) Euclidean (b)Minkowski (c) Manhattan (d) Jaccard (e) Cosine (f) Simple matching coefficient (g)hamming

In [3]:
import pandas as pd
import numpy as np
from scipy.spatial import distance
from sklearn.metrics.pairwise import cosine_similarity


df = pd.read_csv('winequality.csv')

df.head()

#!pip install scipy

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [4]:
#defining arrays
arrFA = np.array(df['fixed acidity'])
arrVA = np.array(df['volatile acidity'])
arrCA = np.array(df['citric acid'])
arrRS = np.array(df['residual sugar'])
arrCH = np.array(df['chlorides'])
arrFSD = np.array(df['free sulfur dioxide'])
arrTSD = np.array(df['total sulfur dioxide'])
arrDEN = np.array(df['density'])
arrPH = np.array(df['pH'])
arrSUL = np.array(df['sulphates'])
arrAL = np.array(df['alcohol'])
arrQUA = np.array(df['quality'])

In [5]:
# euclidean distance
print("***********Eulidean Distance*********\n")
print("Distance from fixed acidity to quality: "+
str(distance.euclidean(arrFA,arrQUA))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.euclidean(arrVA,arrQUA))+
      "\nDistance from citric acid to quality: "+
str(distance.euclidean(arrCA,arrQUA))+
      "\nDistance from residual sugar to quality: "+
      str(distance.euclidean(arrRS,arrQUA))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.euclidean(arrFSD,arrQUA))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.euclidean(arrTSD,arrQUA))+
      "\nDistance from density to quality: "+
str(distance.euclidean(arrDEN,arrQUA))+
      "\nDistance from ph to quality: "+
      str(distance.euclidean(arrPH,arrQUA))+
      "\nDistance from sulphates to quality: "+
str(distance.euclidean(arrSUL,arrQUA))+
      "\nDistance from alcohol to quality: "+
      str(distance.euclidean(arrAL,arrQUA))
     )

***********Eulidean Distance*********

Distance from fixed acidity to quality: 113.28858062488028
Distance from Volatile acidity to quality: 397.04416029580386
Distance from citric acid to quality: 393.0033950235036
Distance from residual sugar to quality: 367.9735010839775
Distance from free sulfur dioxide to quality: 2379.3674579602034
Distance from total sulfur dioxide to quality: 9740.67634972028
Distance from density to quality: 347.3867902768078
Distance from ph to quality: 198.12981527271458
Distance from sulphates to quality: 382.16035495587454
Distance from alcohol to quality: 334.5058433633045


In [6]:
print("************* Minkowski Distance ***************\n")
print("Distance from fixed acidity to quality: "+
str(distance.minkowski(arrFA,arrQUA,p=3))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.minkowski(arrVA,arrQUA,p=3))+
      "\nDistance from citric acid to quality: "+
str(distance.minkowski(arrCA,arrQUA,p=3))+
      "\nDistance from residual sugar to quality: "+
      str(distance.minkowski(arrRS,arrQUA,p=3))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.minkowski(arrFSD,arrQUA,p=3))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.minkowski(arrTSD,arrQUA,p=3))+
      "\nDistance from density to quality: "+
str(distance.minkowski(arrDEN,arrQUA,p=3))+
      "\nDistance from ph to quality: "+
      str(distance.minkowski(arrPH,arrQUA,p=3))+
      "\nDistance from sulphates to quality: "+
str(distance.minkowski(arrSUL,arrQUA,p=3))+
      "\nDistance from alcohol to quality: "+
      str(distance.minkowski(arrAL,arrQUA,p=3))
     )

************* Minkowski Distance ***************

Distance from fixed acidity to quality: 32.60969666353504
Distance from Volatile acidity to quality: 97.55704634512287
Distance from citric acid to quality: 96.55817267537287
Distance from residual sugar to quality: 108.70098681802018
Distance from free sulfur dioxide to quality: 658.552876746563
Distance from total sulfur dioxide to quality: 2470.8469275121524
Distance from density to quality: 85.61290840587668
Distance from ph to quality: 50.21000091893141
Distance from sulphates to quality: 93.93997816720534
Distance from alcohol to quality: 83.46104546933545


In [7]:
print("************* Manhattan Distance ***************\n")
print("Distance from fixed acidity to quality: "+
str(distance.cityblock(arrFA,arrQUA))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.cityblock(arrVA,arrQUA))+
      "\nDistance from citric acid to quality: "+
str(distance.cityblock(arrCA,arrQUA))+
      "\nDistance from residual sugar to quality: "+
      str(distance.cityblock(arrRS,arrQUA))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.cityblock(arrFSD,arrQUA))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.cityblock(arrTSD,arrQUA))+
      "\nDistance from density to quality: "+
str(distance.cityblock(arrDEN,arrQUA))+
      "\nDistance from ph to quality: "+
      str(distance.cityblock(arrPH,arrQUA))+
      "\nDistance from sulphates to quality: "+
str(distance.cityblock(arrSUL,arrQUA))+
      "\nDistance from alcohol to quality: "+
      str(distance.cityblock(arrAL,arrQUA))
     )

************* Manhattan Distance ***************

Distance from fixed acidity to quality: 6288.25
Distance from Volatile acidity to quality: 27427.175000000003
Distance from citric acid to quality: 27153.13
Distance from residual sugar to quality: 21189.15
Distance from free sulfur dioxide to quality: 144237.0
Distance from total sulfur dioxide to quality: 648900.5
Distance from density to quality: 23921.25391
Distance from ph to quality: 13182.189999999999
Distance from sulphates to quality: 26390.73
Distance from alcohol to quality: 22708.879999978


In [8]:
print("************* Jaccard Distance ***************\n")
print("Distance from fixed acidity to quality: "+
str(distance.jaccard(arrFA,arrQUA))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.jaccard(arrVA,arrQUA))+
      "\nDistance from citric acid to quality: "+
str(distance.jaccard(arrCA,arrQUA))+
      "\nDistance from residual sugar to quality: "+
      str(distance.jaccard(arrRS,arrQUA))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.jaccard(arrFSD,arrQUA))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.jaccard(arrTSD,arrQUA))+
      "\nDistance from density to quality: "+
str(distance.jaccard(arrDEN,arrQUA))+
      "\nDistance from ph to quality: "+
      str(distance.jaccard(arrPH,arrQUA))+
      "\nDistance from sulphates to quality: "+
str(distance.jaccard(arrSUL,arrQUA))+
      "\nDistance from alcohol to quality: "+
      str(distance.jaccard(arrAL,arrQUA))
     )

************* Jaccard Distance ***************

Distance from fixed acidity to quality: 0.972029399755002
Distance from Volatile acidity to quality: 1.0
Distance from citric acid to quality: 1.0
Distance from residual sugar to quality: 0.995304205798285
Distance from free sulfur dioxide to quality: 0.9959167006941608
Distance from total sulfur dioxide to quality: 1.0
Distance from density to quality: 1.0
Distance from ph to quality: 1.0
Distance from sulphates to quality: 1.0
Distance from alcohol to quality: 1.0


In [2]:
print("*************Hamming Distance ***************\n")
print("Distance from fixed acidity to quality: "+
str(distance.hamming(arrFA,arrQUA))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.hamming(arrVA,arrQUA))+
      "\nDistance from citric acid to quality: "+
str(distance.hamming(arrCA,arrQUA))+
      "\nDistance from residual sugar to quality: "+
      str(distance.hamming(arrRS,arrQUA))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.hamming(arrFSD,arrQUA))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.hamming(arrTSD,arrQUA))+
      "\nDistance from density to quality: "+
str(distance.hamming(arrDEN,arrQUA))+
      "\nDistance from ph to quality: "+
      str(distance.hamming(arrPH,arrQUA))+
      "\nDistance from sulphates to quality: "+
str(distance.hamming(arrSUL,arrQUA))+
      "\nDistance from alcohol to quality: "+
      str(distance.hamming(arrAL,arrQUA))
     )

*************Hamming Distance ***************



NameError: name 'arrFA' is not defined

In [27]:
print("*************Cosine Distance ***************\n")
print("Distance from fixed acidity to quality: "+
str(distance.cosine(arrFA,arrQUA))+
      "\nDistance from Volatile acidity to quality: "+
      str(distance.cosine(arrVA,arrQUA))+
      "\nDistance from citric acid to quality: "+
str(distance.cosine(arrCA,arrQUA))+
      "\nDistance from residual sugar to quality: "+
      str(distance.cosine(arrRS,arrQUA))+
      "\nDistance from free sulfur dioxide to quality: "+
str(distance.cosine(arrFSD,arrQUA))+
      "\nDistance from total sulfur dioxide to quality: "+
      str(distance.cosine(arrTSD,arrQUA))+
      "\nDistance from density to quality: "+
str(distance.cosine(arrDEN,arrQUA))+
      "\nDistance from ph to quality: "+
      str(distance.cosine(arrPH,arrQUA))+
      "\nDistance from sulphates to quality: "+
str(distance.cosine(arrSUL,arrQUA))+
      "\nDistance from alcohol to quality: "+
      str(distance.cosine(arrAL,arrQUA))
     )

*************Cosine Distance ***************

Distance from fixed acidity to quality: 0.020635321994339595
Distance from Volatile acidity to quality: 0.08015090462655694
Distance from citric acid to quality: 0.07070011207714255
Distance from residual sugar to quality: 0.23442792828245507
Distance from free sulfur dioxide to quality: 0.10857678783241331
Distance from total sulfur dioxide to quality: 0.062378101325087365
Distance from density to quality: 0.011301285002584871
Distance from ph to quality: 0.011565430168914204
Distance from sulphates to quality: 0.03513197697102499
Distance from alcohol to quality: 0.01031941337184239


In [32]:
print("*************** Simple Matching Coefficient")
n00=0
n10=0
n01=0
n11=0

F1 = (0, 1, 1, 0, 1, 0, 1, 0)
F2 = (1, 1, 0, 0, 1, 0, 0, 0)

for i in range(len(F1)): 
    if(F1[i]==0 and F2[i]==0):
        n00+=1
    elif(F1[i]==1 and F2[i]==0):
        n10+=1
    elif(F1[i]==0 and F2[i]==1):
        n01+=1
    elif(F1[i]==1 and F2[i]==1):  
        n11+=1
print("n01",n01)
print("n10",n10)
print("n11",n11)
print("n00",n00)

smc = (n11+n00)/(n11+n00+n01+n10)
print("smc : ",smc)

*************** Simple Matching Coefficient
n01 1
n10 2
n11 2
n00 3
smc :  0.625
