# LSE ST451: Bayesian Machine Learning
## Author: Kostas Kalogeropoulos

## Week 6: Graphical Models

Topics covered 
 - Image processing
 - Ising model
 - Text classification
 - Document-term matrix
 - Naive Bayes Classifier
 - Working with Pipelines in Python
 - Adding progress bars in Python

In addition to the frequently used libraries we will also need the **Image** function from PIL as well as several functions of **sklearn** for text processing, Naive Bayes classifier and pipelines. Finally the progress bar is given by the **tqdm** library.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#Image processing material
from PIL import Image
from tqdm import tqdm
from scipy.special import expit as sigmoid
from scipy.stats import multivariate_normal

#Text Classification Material
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline

### Load Image Data

Load image data from the file bayes.bmp which should be saved in the same directory with this notebook.

In [None]:
#load image data
data = Image.open('bayes.bmp')
img = np.double(data)
img_mean = np.mean(img)
clean = +1*(img>img_mean) + -1*(img<img_mean)

plt.figure()
plt.imshow(clean,cmap='Greys')
plt.title("clean binary image")
[M,N]=clean.shape

In [None]:
clean

### Add noise to the image to disort it

We distort the image by adding normal error with 0 mean and standard deviation of 0.6 

In [None]:
sigma  = 0.6  #noise level
y = clean + sigma*np.random.randn(M,N) #y_i ~ N(x_i; sigma^2);
plt.figure()
plt.imshow(y, cmap='Greys')
plt.title("observed noisy image")

### Set Variational Bayes hyper-parameters

We assume equal weights W_ij=1, a \lambda of 0.5, while running the algorithm for 15 iterations

In [None]:
J = 1  #coupling strength (w_ij)
rate = 0.5  #update smoothing rate lambda
max_iter = 15
ELBO = np.zeros(max_iter)
Hx_mean = np.zeros(max_iter)

### Run the main loop

In [None]:
#Mean-Field VI
print('running mean-field variational inference')
logodds = multivariate_normal.logpdf(y.flatten(), mean=+1, cov=sigma**2) - \
          multivariate_normal.logpdf(y.flatten(), mean=-1, cov=sigma**2)
# y.flatten converts the y matrix into a vector
logodds = np.reshape(logodds, (M, N))

#init
p1 = sigmoid(logodds)
mu = 2*p1-1  #mu_init

a = mu + 0.5 * logodds
qxp1 = sigmoid(+2*a)  #q_i(x_i=+1)
qxm1 = sigmoid(-2*a)  #q_i(x_i=-1)

logp1 = np.reshape(multivariate_normal.logpdf(y.flatten(), mean=+1, cov=sigma**2), (M, N))
logm1 = np.reshape(multivariate_normal.logpdf(y.flatten(), mean=-1, cov=sigma**2), (M, N))

for i in tqdm(range(max_iter)):
    muNew = mu
    for ix in range(N):
        for iy in range(M):
            pos = iy + M*ix
            #The following code sets up the neighbourhood around the index
            neighborhood = pos + np.array([-1,1,-M,M])            
            boundary_idx = [iy!=0,iy!=M-1,ix!=0,ix!=N-1]
            neighborhood = neighborhood[np.where(boundary_idx)[0]]            
            xx, yy = np.unravel_index(pos, (M,N), order='F')
            nx, ny = np.unravel_index(neighborhood, (M,N), order='F')
            
            Sbar = J*np.sum(mu[nx,ny])       
            muNew[xx,yy] = (1-rate)*muNew[xx,yy] + rate*np.tanh(Sbar + 0.5*logodds[xx,yy])
            ELBO[i] = ELBO[i] + 0.5*(Sbar * muNew[xx,yy])
    mu = muNew
            
    a = mu + 0.5 * logodds
    qxp1 = sigmoid(+2*a) #q_i(x_i=+1)
    qxm1 = sigmoid(-2*a) #q_i(x_i=-1)    
    #Hx = -qxm1*np.log(qxm1+1e-10) - qxp1*np.log(qxp1+1e-10) #entropy        
    
    ELBO[i] = ELBO[i] + np.sum(qxp1*logp1 + qxm1*logm1) + np.sum(Hx)
    #Hx_mean[i] = np.mean(Hx)            


### Plot ELBO to check convergence

In [None]:
plt.figure()
plt.plot(ELBO, color='b', lw=2.0, label='ELBO')
plt.title('Variational Inference for Ising Model')
plt.xlabel('iterations'); plt.ylabel('ELBO objective')
plt.legend(loc='upper left')

### Finally check if the image was restored

In [None]:
img_mean = np.mean(mu)
x = +1*(mu>img_mean) + -1*(mu<img_mean)
plt.figure()
plt.imshow(x,cmap='Greys')
plt.title("after %d mean-field iterations" %max_iter)

### Activity 1

Repeat the previous procedure changing some the specifications. For example you can try 
 - adding more noise
 - changing the smooting rate 
 - having different sigma than the one used to distort the image. 

# Text Classification 

Import the 20 newsgroups dataset, which is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It was originally collected by Ken Lang. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

In [None]:
from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train', shuffle=True)
twenty_train.target_names #prints all the categories

In [None]:
twenty_train.data.shape

### Import text into a Document-Term matrix using the Count Vectorizer

The following code will create a huge sparse Document - term matrix, the rows of which represent rows and the columns represent the number of times the words appear in a particular document

In [None]:
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)
Xnames = count_vect.get_feature_names()
X_train_counts.shape
print(Xnames[99000:99040])
print(X_train_counts.toarray()[0,9000:9100])

### TF:  
Just counting the number of words in each document has 1 issue: it will give more weightage to longer documents than shorter documents. To avoid this, we can use frequency (TF - Term Frequencies) i.e. #count(word) / #Total words, in each document.

### TF-IDF: 
Finally, we can even reduce the weightage of more common words like (the, is, an etc.) which occurs in all document. This is called as TF-IDF i.e Term Frequency times inverse document frequency.

We can achieve both using below line of code:

In [None]:
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

### Fit the multinomial Naive Bayes classifier

Use .fit in order to fit the multinomial Naive Bayes classifier to a set of data (X matrix) and a categorical target

In [None]:
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)

### Using Pipeline

We can put all the processing steps together so that we don't have to re write them when considering other data

In [None]:
text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB())])
text_clf = text_clf.fit(twenty_train.data, twenty_train.target)

### Evaluate the Predictive Performance of the Naive Bayes Classifier

Check against a test (unseen) subset of the data

In [None]:
twenty_test = fetch_20newsgroups(subset='test', shuffle=True)
predicted = text_clf.predict(twenty_test.data) 
print('accuracy rate: ')
print(np.mean(predicted == twenty_test.target))

### Improving performance

We can improve the performance by removing stop words
ii. reducing the Laplace smoothing, setting alpha to 0.1 rather than the default which is 1.

In [None]:
text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB(alpha=0.1))])
text_clf = text_clf.fit(twenty_train.data, twenty_train.target)
predicted = text_clf.predict(twenty_test.data) 
print('accuracy rate: ')
print(np.mean(predicted == twenty_test.target))

### Activity 2

Experiment by trying different options on the user-specified parameters