# Assignment 2 - Data engineering and data models


### Authors
* Jordi Mellado Romagosa 
* Jordi Adan Domínguez


## Enginnering
Before building a model and testing it, we need to reduce the amount of data that we handle. To achieve this, we use two different methods and we later build a model and test it with the two methods:


- **Reduction by rows:** for each row of readings, we reduce the 256 readings into 10 values (Min, Max, Sd, Mean, Median, IQR, Q(0.025), Q(0.25), Q(0.75) and Q(0.975)).
- **Reduction by columns:** for each experiment and channel, we build the 256 means of each interval of time, converting all the repetitions into a single row with 256 readings.


### Feature Engineering - Row

We calculate the extra values for each row and then delete the 256 readings.

In [1]:
import pandas as pd


def engineering_rows(filename):
    column_names = ["Id", "Alcoholic", "Paradigm", "Replication", "Channel"]
    for reading in range(256):
        column_names.append("Reading " + str(reading+1))
    data = pd.read_csv(filename, sep=" ", header=None, names=column_names)

    reading_columns = []
    for reading in range(256):
        reading_columns.append("Reading " + str(reading + 1))
    data["Min"] = data[reading_columns].min(axis=1)
    data["Max"] = data[reading_columns].max(axis=1)
    data["Std"] = data[reading_columns].std(axis=1)
    data["Mean"] = data[reading_columns].mean(axis=1)
    data["Median"] = data[reading_columns].median(axis=1)
    data["Quantile025"] = data[reading_columns].quantile(0.025, axis=1)
    data["Quantile25"] = data[reading_columns].quantile(0.25, axis=1)
    data["Quantile75"] = data[reading_columns].quantile(0.75, axis=1)
    data["Quantile975"] = data[reading_columns].quantile(0.975, axis=1)
    data["IQR"] = data["Quantile75"] - data["Quantile25"]

    return data.drop(columns=reading_columns)


data_rows = engineering_rows("results/co2a0000364.txt")
data_rows

Unnamed: 0,Id,Alcoholic,Paradigm,Replication,Channel,Min,Max,Std,Mean,Median,Quantile025,Quantile25,Quantile75,Quantile975,IQR
0,co2a0000364,a,S1obj,0,FP1chan0,-13.316,19.887,6.707825,4.113535,4.7510,-8.738000,-0.62100,8.16900,16.958000,8.79000
1,co2a0000364,a,S1obj,0,FP2chan1,-14.303,24.760,7.977130,3.819156,4.7400,-12.471625,-1.60700,8.64700,20.854000,10.25400
2,co2a0000364,a,S1obj,0,F7chan2,-30.589,31.423,9.714502,5.696625,5.7880,-12.827625,-0.31500,12.86800,22.939000,13.18300
3,co2a0000364,a,S1obj,0,F8chan3,-19.684,27.679,10.524552,2.187758,0.8240,-15.594625,-6.50000,10.10100,20.965625,16.60100
4,co2a0000364,a,S1obj,0,AF1chan4,-8.494,15.432,4.672660,3.346762,3.7130,-5.869625,0.17300,6.76500,11.525000,6.59200
5,co2a0000364,a,S1obj,0,AF2chan5,-11.078,14.801,4.875914,3.770840,4.0590,-7.110625,0.64100,6.98900,12.360000,6.34800
6,co2a0000364,a,S1obj,0,FZchan6,-6.419,9.694,3.341052,2.316797,2.3700,-3.977000,-0.07100,4.81200,8.535000,4.88300
7,co2a0000364,a,S1obj,0,F4chan7,-7.823,14.638,4.621874,2.215707,1.9430,-5.870000,-0.98700,5.36100,10.732000,6.34800
8,co2a0000364,a,S1obj,0,F3chan8,-7.416,17.487,4.747063,4.911406,5.5240,-3.510000,0.88500,7.84300,15.045000,6.95800
9,co2a0000364,a,S1obj,0,FC6chan9,-10.305,15.086,5.054547,1.816441,1.4140,-7.375000,-1.51600,5.32000,12.644000,6.83600


Using the reduction by rows, we mantain the same amount of rows but each row now only has 15 columns.

Then we save the data into another file, appending the data instead of overriting.

In [2]:
import os

filepath = "engineered/test_alcoholic_rows.csv"

if not os.path.exists("engineered"):
    os.makedirs("engineered")


data_rows.to_csv(filepath, header=False, index=False, mode="a")

### Feature Engineering - Columns

We group the readings using the "Id", "Alcoholic", "Paradigm" and "Channel" and make the mean of all the readings.


In [3]:
import pandas as pd


def engineering_columns(filename):
    column_names = ["Id", "Alcoholic", "Paradigm", "Replication", "Channel"]
    for reading in range(256):
        column_names.append("Reading " + str(reading+1))
    data = pd.read_csv(filename, sep=" ", header=None, names=column_names)

    reading_columns = []
    for reading in range(256):
        reading_columns.append("Reading " + str(reading + 1))
    mean_table = data.groupby(["Id", "Alcoholic", "Paradigm", "Channel"])[reading_columns].mean()

    return mean_table.reset_index()


data_columns = engineering_columns("results/co2a0000364.txt")
data_columns

Unnamed: 0,Id,Alcoholic,Paradigm,Channel,Reading 1,Reading 2,Reading 3,Reading 4,Reading 5,Reading 6,...,Reading 247,Reading 248,Reading 249,Reading 250,Reading 251,Reading 252,Reading 253,Reading 254,Reading 255,Reading 256
0,co2a0000364,a,S1obj,AF1chan4,-0.671778,-0.611457,-0.557111,-0.569358,-0.484889,-0.448716,...,4.000099,3.867420,3.993975,4.036160,3.692679,3.276568,2.963185,2.981346,3.385173,3.752877
1,co2a0000364,a,S1obj,AF2chan5,-0.216185,-0.258358,-0.427086,-0.372864,-0.077519,0.127494,...,3.111370,2.755704,2.773790,2.990815,3.189790,3.135519,2.954704,2.858247,3.045025,3.412877
2,co2a0000364,a,S1obj,AF7chan32,-2.698914,-1.469185,0.098049,1.110852,1.038506,0.152358,...,7.512741,7.440358,7.440407,7.235444,6.716926,6.035889,5.505346,5.649951,6.144247,6.632617
3,co2a0000364,a,S1obj,AF8chan33,-0.272395,-1.405654,-1.580457,-0.694370,0.444914,0.577556,...,7.847506,6.430901,4.278951,2.693457,2.223210,2.934654,4.122123,5.381988,6.352654,6.804667
4,co2a0000364,a,S1obj,AFZchan47,-0.453519,-0.459543,-0.513815,-0.344914,-0.133988,-0.043519,...,3.531037,3.314049,3.338321,3.422716,3.416543,3.308074,3.103086,2.940346,3.024728,3.229667
5,co2a0000364,a,S1obj,C1chan52,0.751173,0.082037,0.202642,0.292988,0.968136,0.220605,...,-0.701667,-0.376074,-0.400185,-0.466580,-1.828963,-0.008309,-0.496667,-0.321889,-0.520827,0.057864
6,co2a0000364,a,S1obj,C2chan53,0.399321,-0.348173,-0.161247,-0.342136,0.091926,0.091963,...,0.315037,0.236605,0.146136,0.152210,-0.884679,-0.113074,0.327025,-0.058778,0.809210,-0.185370
7,co2a0000364,a,S1obj,C3chan16,0.727037,-0.086716,0.443704,-0.231469,1.010321,-0.894580,...,-1.804778,-0.755914,-0.490630,0.033728,-0.719667,-0.147074,-1.177852,-0.538988,-1.491370,0.311074
8,co2a0000364,a,S1obj,C4chan17,-0.293753,-0.956852,-0.504790,-0.993000,-0.402222,0.230691,...,0.532160,0.688877,-0.046679,-0.360062,1.562901,-0.745827,1.454407,0.134321,1.575012,-0.872457
9,co2a0000364,a,S1obj,C5chan42,0.182074,0.121914,0.224309,0.495654,0.616148,0.242358,...,-2.343679,-1.939864,-1.041580,-0.493086,-0.872802,-1.668531,-2.126691,-1.837321,-1.138136,-0.824580


Using the reduction by columns, we greatly reduce the amount of rows. But we mantain the same number of columns, excluding the one that counted the repetition of the experiment.

Then we save the data into another file, appending the data instead of overriting.

In [4]:
filepath = "engineered/test_alcoholic_columns.csv"

if not os.path.exists("engineered"):
    os.makedirs("engineered")


data_columns.to_csv(filepath, header=False, index=False, mode="a")

## Further data preparation

After the data reduction, we find that we still need to change some things about the files for building the models.
We prepare some functions for this, creating the necessary files.

### General functions

In [6]:
def read_file(filename):
    data = pd.read_csv(filename, header=None)
    return data.drop_duplicates()


def write_file(filename, data):
    data.to_csv("model_files/" + filename + ".csv", index=False, mode="w")

### Column data

In [7]:
def prepare_columns_train():
    data = read_file("engineered/train_alcoholic_columns.csv")
    data = data.append(read_file("engineered/train_non_alcoholic_columns.csv"))

    column_names = ["Id", "Alcoholic", "Paradigm", "Channel"]
    for reading in range(256):
        column_names.append("Reading " + str(reading+1))
    data.columns = column_names
    data["Alcoholic"] = data["Alcoholic"] == "a"

    write_file("train_1_columns", data)
def prepare_columns_test():
    data = read_file("engineered/test_alcoholic_columns.csv")
    data = data.append(read_file("engineered/test_non_alcoholic_columns.csv"))

    column_names = ["Id", "Alcoholic", "Paradigm", "Channel"]
    for reading in range(256):
        column_names.append("Reading " + str(reading+1))
    data.columns = column_names
    data["Alcoholic"] = data["Alcoholic"] == "a"

    write_file("test_1_columns", data)
prepare_columns_test()
prepare_columns_train()

### Row data

In [None]:
def prepare_rows_train():
    data = read_file("engineered/train_alcoholic_rows.csv")
    data = data.append(read_file("engineered/train_non_alcoholic_rows.csv"))

    column_names = ["Id", "Alcoholic", "Paradigm", "Replication", "Channel", 'Min', 'Max', 'Sd', 'Mean', 'Median', 'IQR', 'Q(0.025)',
                    'Q(0.25)', 'Q(0.75)', 'Q(0.975)']
    data.columns = column_names
    data["Alcoholic"] = data["Alcoholic"] == "a"

    write_file("train_1_rows", data)
def prepare_rows_test():
    data = read_file("engineered/test_alcoholic_rows.csv")
    data = data.append(read_file("engineered/test_non_alcoholic_rows.csv"))

    column_names = ["Id", "Alcoholic", "Paradigm", "Replication", "Channel", 'Min', 'Max', 'Sd', 'Mean', 'Median', 'IQR', 'Q(0.025)',
                    'Q(0.25)', 'Q(0.75)', 'Q(0.975)']
    data.columns = column_names
    data["Alcoholic"] = data["Alcoholic"] == "a"

    write_file("test_1_rows", data)
prepare_rows_test()
prepare_rows_train()

## Reading Data

In [None]:
def read_data(filename):
    # read data
    return pd.read_csv(filepath_or_buffer='model_files/' + filename + '.csv',
                       sep=",",
                       low_memory=False)


## PCA

### PCA (with row reduction)

columns = ['Min', 'Max', 'Sd', 'Mean', 'Median', 'IQR', 'Q(0.025)', 'Q(0.25)', 'Q(0.75)', 'Q(0.975)']  
rows = Repetitions

In [54]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

def pca_row_reduction(filename):
    
    # read data
    df = read_data(filename)
    
    # drop String type columns
    df.drop(df.columns[[0, 2, 4]], axis=1, inplace=True)
    
    # split data table into data X and class labels y
    X = df.iloc[:, 1:].values
    y = df.iloc[:, 0].values
    
    # standardizing data
    X_std = StandardScaler().fit_transform(X)
    
    sklearn_pca = PCA(svd_solver='auto')
    Y_sklearn = sklearn_pca.fit_transform(X_std)


### PCA (with column reduction)

columns = Sensor value records(255)  
rows = Average of channels

In [55]:
def pca_column_reduction(filename):
    
    # read data
    df = read_data(filename)
    print(df)
    
    # drop String type columns
    df.drop(df.columns[[0, 2, 3]], axis=1, inplace=True)

    # split data table into data X and class labels y
    X = df.iloc[:, 4:].values
    y = df.iloc[:, 1].values
    
    # standardizing data
    X_std = StandardScaler().fit_transform(X)
    
    sklearn_pca = PCA(svd_solver='auto')
    Y_sklearn = sklearn_pca.fit_transform(X_std)

In [56]:
pca_row_reduction('train_1_rows')

In [57]:
pca_column_reduction('train_1_columns')

               Id  Alcoholic   Paradigm    Channel  Reading 1  Reading 2  \
0     co2a0000364       True      S1obj   AF1chan4  -0.653350  -0.592275   
1     co2a0000364       True      S1obj   AF2chan5  -0.233000  -0.269600   
2     co2a0000364       True      S1obj  AF7chan32  -2.521950  -1.398925   
3     co2a0000364       True      S1obj  AF8chan33  -0.150550  -1.334600   
4     co2a0000364       True      S1obj  AFZchan47  -0.446850  -0.446850   
5     co2a0000364       True      S1obj   C1chan52   0.774675   0.103275   
6     co2a0000364       True      S1obj   C2chan53   0.369975  -0.338025   
7     co2a0000364       True      S1obj   C3chan16   0.770075  -0.047750   
8     co2a0000364       True      S1obj   C4chan17  -0.285850  -0.932825   
9     co2a0000364       True      S1obj   C5chan42   0.291675   0.267375   
10    co2a0000364       True      S1obj   C6chan41   0.527750   0.271375   
11    co2a0000364       True      S1obj  CP1chan20   0.101225   0.247700   
12    co2a00

## Multidimensional Scaling (MDS)

## Training and Testing (rows)

In [58]:
def model_rows():
    # Training set
    train = read_data('train_1_rows')

    #train.drop(train.columns[[0, 2, 4]], axis=1, inplace=True)
    train.drop(train.columns[[0]], axis=1, inplace=True)

    X_train = train.iloc[:, 1:]
    X_train = pd.get_dummies(X_train)
    y_train = train["Alcoholic"]
    
    # Show Regression Analysis Results 
    #implementing_model(X_train, y_train)
    
    # Test set
    test = read_data('test_1_rows')
    
    #test.drop(test.columns[[0, 2, 4]], axis=1, inplace=True)
    test.drop(test.columns[[0]], axis=1, inplace=True)
    
    X_test = test.iloc[:, 1:]
    X_test = pd.get_dummies(X_test)
    y_test = test["Alcoholic"]
    
    #logistic_regression_model_fitting(X_train, X_test, y_train, y_test)
    return X_train, X_test, y_train, y_test

In [59]:
X_train, X_test, y_train, y_test = model_rows()

In [60]:
import statsmodels.api as sm
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)

def implementing_model_results(X, y):
    logit_model = sm.Logit(y,X)
    result = logit_model.fit(warnings=False)
    print(result.summary())

In [61]:
implementing_model_results(X_train, y_train)

         Current function value: 0.605271
         Iterations: 35




                           Logit Regression Results                           
Dep. Variable:              Alcoholic   No. Observations:               192896
Model:                          Logit   Df Residuals:                   192818
Method:                           MLE   Df Model:                           77
Date:                Wed, 21 Mar 2018   Pseudo R-squ.:                 0.03504
Time:                        19:24:28   Log-Likelihood:            -1.1675e+05
converged:                      False   LL-Null:                   -1.2099e+05
                                        LLR p-value:                     0.000
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Replication              -0.0030      0.000    -20.264      0.000      -0.003      -0.003
Min                       0.0010      0.002      0.425      0.671      -0.004       0.006
Max     

So now we can consider that P-values greater than 0.05 are insignificant. The best labels are:
* Replication
* Min
* Max
* Mean
* IQR
* Q(0.075)

In [62]:
from sklearn import linear_model

def logistic_regression_model_fitting(X_train, X_test, y_train, y_test):
    # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
    logreg = linear_model.LogisticRegression()

    logreg.fit(X_train, y_train)
    
    y_pred = logreg.predict(X_test)
    print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X_test, y_test)))
    return y_pred;

In [63]:
y_pred = logistic_regression_model_fitting(X_train, X_test, y_train, y_test)

Accuracy of logistic regression classifier on test set: 0.62


### Cross Validation

In [64]:
from sklearn import model_selection
#from sklearn.model_selection import cross_val_score

kfold = model_selection.KFold(n_splits=10, random_state=7)
modelCV = linear_model.LogisticRegression()
scoring = 'accuracy'
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print("10-fold cross validation average accuracy: %.3f" % (results.mean()))

10-fold cross validation average accuracy: 0.648


### Confusion Matrix

In [65]:
from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)

[[13674 39702]
 [ 5483 58581]]


The result is telling us that we have 13651 + 58594 correct predictions and 5470 + 39725 incorrect predictions.

### Compute precision, recall, F-measure and support


In [66]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

             precision    recall  f1-score   support

      False       0.71      0.26      0.38     53376
       True       0.60      0.91      0.72     64064

avg / total       0.65      0.62      0.57    117440



## Training and Testing (columns)

In [67]:
def model_columns():
    # Training set
    train = read_data('train_1_columns')

    train.drop(train.columns[[0, 2, 3]], axis=1, inplace=True)
    #train.drop(train.columns[[0]], axis=1, inplace=True)

    X_train = train.iloc[:, 1:]
    #X_train = pd.get_dummies(X_train)
    y_train = train["Alcoholic"]
    
    # Show Regression Analysis Results 
    #implementing_model(X_train, y_train)
    
    # Test set
    test = read_data('test_1_columns')
    
    test.drop(test.columns[[0, 2, 3]], axis=1, inplace=True)
    #test.drop(test.columns[[0]], axis=1, inplace=True)
    
    X_test = test.iloc[:, 1:]
    #X_test = pd.get_dummies(X_test)
    y_test = test["Alcoholic"]
    
    #logistic_regression_model_fitting(X_train, X_test, y_train, y_test)
    return X_train, X_test, y_train, y_test

In [68]:
X_train, X_test, y_train, y_test = model_columns()

In [69]:
print(X_train)

      Reading 1  Reading 2  Reading 3  Reading 4  Reading 5  Reading 6  \
0     -0.653350  -0.592275  -0.543350  -0.567950  -0.519050  -0.519050   
1     -0.233000  -0.269600  -0.428225  -0.367225  -0.086500   0.084450   
2     -2.521950  -1.398925   0.004800   0.883775   0.761700  -0.080600   
3     -0.150550  -1.334600  -1.615350  -0.834150   0.276650   0.447575   
4     -0.446850  -0.446850  -0.507900  -0.349100  -0.166050  -0.104975   
5      0.774675   0.103275   0.200975   0.286350   0.945525   0.139825   
6      0.369975  -0.338025  -0.167075  -0.350225   0.064850   0.125925   
7      0.770075  -0.047750   0.428275  -0.230925   0.965350  -1.109850   
8     -0.285850  -0.932825  -0.542250  -1.042675  -0.481125   0.214625   
9      0.291675   0.267375   0.364950   0.584750   0.609100   0.120775   
10     0.527750   0.271375   0.002825  -0.229100  -0.424475  -0.509950   
11     0.101225   0.247700   0.357500   0.418550   0.333125   0.247650   
12     0.306475   0.294325   0.343075 

In [70]:
print(X_test)

      Reading 1  Reading 2  Reading 3  Reading 4  Reading 5  Reading 6  \
0     -0.291074  -0.526130  -0.381537  -0.083019   0.215259   0.369093   
1     -0.528593  -0.619037  -0.148815   0.339444   0.484185   0.438907   
2      0.989667  -0.086315  -0.538370  -0.267111   0.374852   0.818037   
3     -0.942648  -1.313389  -0.915519  -0.119815   0.449796   0.730167   
4     -0.306519  -0.487333  -0.243241   0.037148   0.190889   0.208907   
5     -0.234407  -0.478500  -0.505630  -0.451315  -0.324741  -0.107722   
6      0.354537   0.463056   0.417852   0.200870   0.047056  -0.052352   
7     -0.065370  -0.508407  -0.725481  -0.698389  -0.680167  -0.463222   
8      0.762556   1.115204   0.988593   0.527444   0.066278  -0.087407   
9     -0.004907  -0.339444  -0.212796  -0.122500  -0.086241  -0.104389   
10     1.534333   2.827444   2.935907   2.076870   1.091352   0.575852   
11    -0.619648  -0.872741  -0.818519  -0.628630  -0.330204  -0.013778   
12     0.281796   0.281815   0.263741 

In [71]:
print(y_train)

0        True
1        True
2        True
3        True
4        True
5        True
6        True
7        True
8        True
9        True
10       True
11       True
12       True
13       True
14       True
15       True
16       True
17       True
18       True
19       True
20       True
21       True
22       True
23       True
24       True
25       True
26       True
27       True
28       True
29       True
        ...  
4514    False
4515    False
4516    False
4517    False
4518    False
4519    False
4520    False
4521    False
4522    False
4523    False
4524    False
4525    False
4526    False
4527    False
4528    False
4529    False
4530    False
4531    False
4532    False
4533    False
4534    False
4535    False
4536    False
4537    False
4538    False
4539    False
4540    False
4541    False
4542    False
4543    False
Name: Alcoholic, Length: 4544, dtype: bool


In [72]:
print(y_test)

0        True
1        True
2        True
3        True
4        True
5        True
6        True
7        True
8        True
9        True
10       True
11       True
12       True
13       True
14       True
15       True
16       True
17       True
18       True
19       True
20       True
21       True
22       True
23       True
24       True
25       True
26       True
27       True
28       True
29       True
        ...  
4962    False
4963    False
4964    False
4965    False
4966    False
4967    False
4968    False
4969    False
4970    False
4971    False
4972    False
4973    False
4974    False
4975    False
4976    False
4977    False
4978    False
4979    False
4980    False
4981    False
4982    False
4983    False
4984    False
4985    False
4986    False
4987    False
4988    False
4989    False
4990    False
4991    False
Name: Alcoholic, Length: 4992, dtype: bool


In [73]:
implementing_model_results(X_train, y_train)

Optimization terminated successfully.
         Current function value: 0.470196
         Iterations 10
                           Logit Regression Results                           
Dep. Variable:              Alcoholic   No. Observations:                 4544
Model:                          Logit   Df Residuals:                     4288
Method:                           MLE   Df Model:                          255
Date:                Wed, 21 Mar 2018   Pseudo R-squ.:                  0.3208
Time:                        19:24:51   Log-Likelihood:                -2136.6
converged:                       True   LL-Null:                       -3145.6
                                        LLR p-value:                2.562e-271
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Reading 1       0.4185      0.200      2.096      0.036       0.027       0.810
Reading 2      -0.8235   

In [74]:
y_pred = logistic_regression_model_fitting(X_train, X_test, y_train, y_test)

Accuracy of logistic regression classifier on test set: 0.57


### Cross Validation

In [75]:
kfold = model_selection.KFold(n_splits=10, random_state=7)
modelCV = linear_model.LogisticRegression()
scoring = 'accuracy'
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print("10-fold cross validation average accuracy: %.3f" % (results.mean()))

10-fold cross validation average accuracy: 0.426


### Confusion Matrix

In [76]:
from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)

[[1126 1114]
 [1026 1726]]


### Compute precision, recall, F-measure and support

In [77]:
print(classification_report(y_test, y_pred))

             precision    recall  f1-score   support

      False       0.52      0.50      0.51      2240
       True       0.61      0.63      0.62      2752

avg / total       0.57      0.57      0.57      4992



## Conclusions (Both models)

The models are not really good, but they are still better than guessing randomly if the user is alcoholic or it is not.

The row reduction is slightly better than the column reduction, but they are very close (accuracy of 0.62 vs 0.57).