We saw in lecture how to estimate regression weights for individual variables. In this lab we will do it again for two examples.

# Predicting wine quality

First, we will work with a UCI database dataset: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

Reference:
Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez 
A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal 
@2009

We have 11 attributes for more than 1500 red wines, as well as a rating for their quality (e.g. acidity, alcohol content). We want to build a model that predicts the quality of a wine as a function of it's other attributes. 


In [None]:
import numpy as np
import cortex
import os
import neurods as nds
from scipy.stats import zscore
import matplotlib.pyplot as plt
from numpy.linalg import inv
%matplotlib inline

This code here loads the dataset, and creates wine_quality, which is our output and 11 features of the wine, wine_features.

In [None]:
import csv
data_name = '/home/shared/cogneuro-connector/data/Week10_MultipleRegression/winequality-red.txt'
with open(data_name) as f:
    reader = csv.reader(f, delimiter="\t")
    d = list(reader)
input_features_names = d[0][:-1]
all_values = np.array(d[1:]).astype(float)
wine_quality = all_values[:,-1]
wine_features = all_values[:,:-1]
print('the input shape is {}'.format(wine_features.shape))
print('the output shape is {}'.format(wine_quality.shape))

The 11 features of a wine are:

In [None]:
print(input_features_names)

This is what they look like for the 1599 wines we have:

In [None]:
plt.plot(wine_features);
plt.xlabel('different wines');
plt.title('wine features')
plt.legend(input_features_names,frameon=False, bbox_to_anchor=(1.5, 1));

Each feature has a different scale naturally. We will therefore normalize them, and normalize the output:

In [None]:
# normalize
X = zscore(wine_features, axis = 0)
Y = zscore(wine_quality)

# replot
plt.plot(X);
plt.xlabel('different wines');
plt.ylabel('normalized scale');
plt.title('normalized wine features')
plt.legend(input_features_names, frameon=False, bbox_to_anchor=(1.5, 1));

#### Q1: Using regression, find the name of the feature that seems to most affect the quality of the wine:

In [None]:
### STUDENT ANSWER:
def OLS(X,Y):
    return np.dot(inv(np.dot(X.T,X)), np.dot(X.T,Y))
weights = OLS(X,Y)
plt.plot(weights);
plt.xticks(range(11),input_features_names,rotation=70)
plt.xlabel('wine features')
plt.ylabel('weight magnitude')
num_max = np.argmax(weights)
print('the feature that is the most predictive is {}'.format(input_features_names[num_max]))

# Estimating voxel responses

Let's go back to the categories experiments we were discussing in the lecture:

In [None]:
# load subject info
sub, xfm = 'S2', 'S2_category_auto'
mask = cortex.db.get_mask(sub, xfm, type='cortical')
basedir = os.path.join(nds.io.data_list['fmri'],'categories')

# load design
design = np.load(os.path.join(basedir,'experiment_design.npz'))
conditions = design['conditions'].tolist()

# fmri responses:
fname = os.path.join(basedir, 'S2_categories1_{n}.nii.gz') 
Y = np.vstack([zscore(nds.fmri.load_data(fname.format(n=n), mask=mask, 
                                         standardize=True)) for n in [1,2]])

# stimuli:
X = np.vstack([design[run] for run in ['run1','run2']])
n,d = X.shape
conv_X = np.zeros_like(X)

# convolve stimuli:
from neurods.fmri import hrf as generate_hrf
t_hrf, hrf_2 = generate_hrf(tr=2)
for i in range(d):
    conv_X[:,i] = np.convolve(X[:,i], hrf_2)[:n]

As we saw in lab, we have 5 different conditions:

In [None]:
print(conditions)

#### Q2: Constructing a contrast map:
- Estimate the magnitude of the brain response to the stimulus, like we did in class. 
- Find the difference between the magnitude of the response for faces, and for places for each voxel.
- Make a flatmap of the difference.
- What do regions with high values correspond to? What do the regions with low values correspond to?

Hints:
- You need to subtract the vectors corresponding to places from the one corresponding to faces.
- You can use the variable conditions to find which dimension is which.


In [None]:
### STUDENT ANSWER
def OLS(X,Y):
    return np.dot(inv(np.dot(X.T,X)), np.dot(X.T,Y))
weights = OLS(conv_X, Y)
print('shape of weights is {}'.format(weights.shape))
condition1 = 1
condition2 = 3
vol = cortex.Volume(weights[condition1]-weights[condition2], sub, xfm, mask = mask)
__  = cortex.quickflat.make_figure(vol);
plt.title('{0} > {1}'.format(conditions[condition1],conditions[condition2]), fontsize = 30);
print('Dark red regions are regions that respond strongly to faces and not to places.')
print('Dark blue regions are regions that respond strongly to places and not to faces.') 