# CS 584 Assignment 1 -- Text Classification (Machine Learning and NLP Basics)

#### Name: (Zhiyao Wen)

## In this assignment, you are required to follow the steps below:
1. Review the lecture slides.
2. Implement the preprocessing.
3. Implement tokenization.
4. Implement feature extraction.
5. Implement Logistic Regression.
6. Implement Stochastic Gradient Descent and Mini-batch Gradient Descent.
7. Evaluate all the experiments and compare all the results.

*** Please read the code very carefully and install these packages (NumPy, Pandas, sklearn, tqdm, and matplotlib) before you start ***

## 1. Data Processing (30 points)

* Download the dataset from Canvas
* Load data by using Pandas
* Preprocessing
* Tokenization
* Split data
* Feature extraction (TF-IDF)

### 1.1 Load Data

Run the following cells (Please make sure the paths of data files are correct.)

In [1]:
import pandas as pd

train_df = pd.read_csv('./data/train.csv', header=None)
train_df.columns = ['label', 'title', 'text']
train_df.head()

Unnamed: 0,label,title,text
0,3,Wall St. Bears Claw Back Into the Black (Reuters),"Reuters - Short-sellers, Wall Street's dwindli..."
1,3,Carlyle Looks Toward Commercial Aerospace (Reu...,Reuters - Private investment firm Carlyle Grou...
2,3,Oil and Economy Cloud Stocks' Outlook (Reuters),Reuters - Soaring crude prices plus worries\ab...
3,3,Iraq Halts Oil Exports from Main Southern Pipe...,Reuters - Authorities have halted oil export\f...
4,3,"Oil prices soar to all-time record, posing new...","AFP - Tearaway world oil prices, toppling reco..."


In [2]:
train_df.shape

(120000, 3)

In [3]:
test_df = pd.read_csv('./data/test.csv', header=None)
test_df.columns = ['label', 'title', 'text']
test_df.head()

Unnamed: 0,label,title,text
0,3,Fears for T N pension after talks,Unions representing workers at Turner Newall...
1,4,The Race is On: Second Private Team Sets Launc...,"SPACE.com - TORONTO, Canada -- A second\team o..."
2,4,Ky. Company Wins Grant to Study Peptides (AP),AP - A company founded by a chemistry research...
3,4,Prediction Unit Helps Forecast Wildfires (AP),AP - It's barely dawn when Mike Fitzpatrick st...
4,4,Calif. Aims to Limit Farm-Related Smog (AP),AP - Southern California's smog-fighting agenc...


In [4]:
test_df.shape

(7600, 3)

### 1.2 Preprocess (Fill the code: 10 points)
In this section, you need to remove all the unrelated characters, including punctuation, urls, and numbers. Please fill up the functions and test them by running the following cell.

In [5]:
import re
import string

class Preprocesser(object):
    def __init__(self, punctuation=True, url=True, number=True):
        self.punctuation = punctuation
        self.url = url
        self.number = number
    
    def apply(self, text):
        
        text = self._lowercase(text)
        
        if self.url:
            text = self._remove_url(text)
            
        if self.punctuation:
            text = self._remove_punctuation(text)
            
        if self.number:
            text = self._remove_number(text)
        
        text = re.sub(r'\s+', ' ', text)
            
        return text
    
        
    def _remove_punctuation(self, text):
        ''' Please fill this function to remove all the punctuations in the text
        '''
        ### Start your code
        
        text = re.sub(r'[^\w\s]', ' ',text) # remove punctuation with regular expression
        
        ### End
        
        return text
    
    def _remove_url(self, text):
        ''' Please fill this function to remove all the urls in the text
        '''
        ### Start your code
        
        text = re.sub(r'http\S+', '', text) # remove url with regular expression
        
        ### End
        
        return text
    
    def _remove_number(self, text):
        ''' Please fill this function to remove all the numbers in the text
        '''
        
        ### Start your code
        text = re.sub("(\s\d+)","",text) # remove number in text with regular expression
        
        ### End
        
        return text
    
    def _lowercase(self, text):
        ''' Please fill this function to lowercase the text
        '''
        
        ### Start your code
        
        text = text.lower() # change text char to lower case
        
        ### End
        
        return text

##### Test your implementation by running the following cell.

In [6]:
text = "Interest rates are trimmed to 7.5 by the South African central bank (https://www.xxx.xxx), but the lack of warning hits the rand and surprises markets."

processer = Preprocesser()
clean_text = processer.apply(text)

print(f'"{text}"') 
print('===>')
print(f'"{clean_text}"')

===>


### 1.3 Tokenization (Fill the code: 5 points)

In [7]:
from nltk.corpus import stopwords

def tokenize(text):
    ''' Please fill this function to tokenize text.
            1. Tokenize the text.
            2. Remove stop words.
            3. Optional: lemmatize words accordingly.
    '''
    
    ### Start your code
    tokens = text.split()
    
    stop_words = stopwords.words() # to remove stop words
    
    # Remove any nonalphabetic, stopwords,or single letter words
    
    tokens = [token for token in tokens if token.isalpha() and token not in stop_words and len(token) > 1]
    ### End
    
    
    
    return tokens

##### Test your implementation by running the following cell.

In [8]:
text = "Interest rates are trimmed to 7.5 by the South African central bank (https://www.xxx.xxx), but the lack of warning hits the rand and surprises markets."

processer = Preprocesser()
clean_text = processer.apply(text)
tokens = tokenize(clean_text)

print(f'{text} ==> {tokens}')



### 1.4 Data split (Fill the code: 5 points)

In [9]:
from sklearn.model_selection import train_test_split

text_train = train_df['text'].values.astype(str)
label_train = train_df['label'].values.astype(int) - 1 # -1 because labels start from 1

text_test = test_df['text'].values.astype(str)
label_test = test_df['label'].values.astype(int) - 1 # -1 because labels start from 1


### Start your code, split the text_train and label_train into training and validation
### Make sure the names of varables are "text_train", "label_train", "text_valid", and "label_valid"

text_train, text_valid, label_train, label_valid = train_test_split(text_train,label_train, train_size = 0.9, random_state =42)


### End

print('The size of training set:', text_train.shape[0])
print('The size of validation set:', text_valid.shape[0])
print('The size of test set:', text_test.shape[0])

The size of training set: 108000
The size of validation set: 12000
The size of test set: 7600


### 1.5 Feature Extraction (Fill the code: 10 points)

In [10]:
from collections import defaultdict
import numpy as np
from tqdm.notebook import tqdm
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer


class TfIdfExtractor(object):
    
    def __init__(self, vocab_size=None):
        self.vocab_size = vocab_size
        
        self.vocab = defaultdict(lambda: 0)
        self.word2idx = {}
        self.df = defaultdict(lambda: 0)
        self.num_doc = 0
        
        self.processer = Preprocesser()
        
        
    def fit(self, texts):
        ''' In this function, you are required to implement the fitting process.
                1. Construct the vocabulary (self.vocab).
                2. Construct the document frequency dictionary (self.df).
                3. Sort the vocabulary based on the frequency (self.vocab).
            Input:
                texts: a list of text (training set)
            Output:
                None
        '''

        self.num_doc = len(texts)
        
        for text in tqdm(texts, desc='fitting text'):
            clean_text = self.processer.apply(text)
            tokens = tokenize(clean_text)
            
            
            ### Start your code (step 1 & 2)
            
            for token in tokens:
                self.vocab[token]+=1         #construct the vocabulary dictionary with the number of the word appear in the text
                if token not in self.vocab:
                    self.df[token]+=1        # construct the df that docement contains the certain word

            
            ### End
            

        ### Start your code (Step 3)
    
        self.vocab = sorted(self.vocab.items(), key=lambda x:x[1], reverse=True)   #sorted the vocab from large number to small number

        # after sorted, self.vocab becomes a list, so chang it back to dic
        self.vocab = {key:value for key, value in self.vocab}

        
        
        
        
        ### End
        
        if self.vocab_size is not None:
            self.vocab = {key: self.vocab[key] for key in list(self.vocab.keys())[:self.vocab_size]}
        
        self.word2idx = {key: idx for idx, key in enumerate(self.vocab.keys())}


    def transform(self, texts):
        ''' In this function, you need to encode the input text into TF-IDF vector.
            Input:
                texts: a list of text.
            Ouput:
                a N-d matrix (Tf-Idf) 
        '''
        tfidf = np.zeros((len(texts), len(self.vocab)))
        
        for i, text in tqdm(enumerate(texts), desc='transforming', total=len(texts)):
            clean_text = self.processer.apply(text)
            tokens = tokenize(clean_text)
            
            ### Start your code
            for word, j in self.word2idx.items():
                counts = tokens.count(word) # count the word in tokens
                tfidf[i][j] = counts
                
        tf_idf = np.where(tfidf ==0,0,1)  # when tfidf =0, to find 0, else 1
        idf = np.sum(tf_idf,axis = 0)  # sum idf
        
        
        #         print(idf)
        tfidf = tfidf * (1/idf) # fit to TF-IDF vector
        
        #         print(tfidf)
            ### End
        
        return tfidf
                

##### Test your implementation by running the following cell.

In [63]:
extractor = TfIdfExtractor(vocab_size=10)
extractor.fit(text_train[:100])
X = extractor.transform(text_train[:10])

X[:5]

fitting text:   0%|          | 0/100 [00:00<?, ?it/s]

transforming:   0%|          | 0/10 [00:00<?, ?it/s]

  tfidf = tfidf * (1/idf) # fit to TF-IDF vector
  tfidf = tfidf * (1/idf) # fit to TF-IDF vector


array([[nan, 0. , 0. , 0. , 0. , 0. , nan, 0. , 0. , 0. ],
       [nan, 0. , 0. , 0. , 0. , 0. , nan, 0. , 0. , 0. ],
       [nan, 0. , 1. , 0. , 0. , 0. , nan, 0.5, 0. , 0. ],
       [nan, 0. , 0. , 0. , 0. , 0. , nan, 0. , 0. , 0. ],
       [nan, 0. , 0. , 0. , 0. , 1. , nan, 0. , 1. , 0. ]])

#### 1.5.4 Run the following code to obtain the TD-IDF and One-hot labels

In [64]:
# You can change this number to see the difference of the performances. (larger vocab size needs more memory)
vocab_size = 4000 
num_class = 4

extractor = TfIdfExtractor(vocab_size=vocab_size)
extractor.fit(text_train)

x_train = extractor.transform(text_train)
x_valid = extractor.transform(text_valid)
x_test = extractor.transform(text_test)


# convert label to one-hot vector
y_train = np.zeros((label_train.shape[0], num_class))
y_train[np.arange(label_train.shape[0]), label_train] = 1

y_valid = np.zeros((label_valid.shape[0], num_class))
y_valid[np.arange(label_valid.shape[0]), label_valid] = 1

y_test = np.zeros((label_test.shape[0], num_class))
y_test[np.arange(label_test.shape[0]), label_test] = 1


print('The size of training set:', x_train.shape)
print('The size of validation set:', x_valid.shape)
print('The size of test set:', x_test.shape)

fitting text:   0%|          | 0/108000 [00:00<?, ?it/s]

transforming:   0%|          | 0/108000 [00:00<?, ?it/s]

transforming:   0%|          | 0/12000 [00:00<?, ?it/s]

transforming:   0%|          | 0/7600 [00:00<?, ?it/s]

The size of training set: (108000, 4000)
The size of validation set: (12000, 4000)
The size of test set: (7600, 4000)


  tfidf = tfidf * (1/idf) # fit to TF-IDF vector
  tfidf = tfidf * (1/idf) # fit to TF-IDF vector


## 2. Logistic Regression (60 points)
In this section, you are required to implement a Logistic Regression(LR) model with $L_2$ regularization from scratch. 


The objective function of LR:

<center> $J = -\frac{1}{N}\sum_{i=1}^{N}\sum_{k=1}^{K}y_{ik}log\frac{e^{f_k}}{\sum_{c=1}^{K}e^{f_c}} + \lambda \sum_{j=1}^{d}w_{kj}^2$ </center>

**Deliverable 1**: Given the objective function, please show the steps to derive the graident of J with respecty of $w_k$. You can either list the steps in the notebook or submit a pdf with all the steps in the submission. **(10 points)**

### 2.1 LR and softmax function (Fill the code, 20 points)

In [11]:
def softmax(x):
    ''' Compute the softmax function for each row of the input x.
        
        Inputs:
            x: A D dimensional vector or N x D dimensional numpy matrix.
        Outputs:
            x: You are allowed to modify x in-place
    '''
    ### Start your code
    

    x = np.exp(x) / np.sum(np.exp(x),axis = 0)
    ### End

    return x


class LogisticRegression(object):
    
    def __init__(self, vocab_size, num_class, lam):
        self.vocab_size = vocab_size
        self.num_class = num_class
        self.lam = lam
        
        ### Start your code (initialize weight(w) and bias(b))
        ### hint: you could use np.random.rand() to randomly initialize the parameters
        
        self.bias = np.random.rand((x.shape[0],1))  # how to know the x dimension ?
        
        x_bias = np.append(self.bias, x, axis=1)
        
        self.weight = np.random.rand(( 1, x_bias.shape[1]))
        
        
        ### End
        
    def objective(self, x, y):
        ''' Implement the objective function
            Inputs:
                x: N-d matrix
                y: N-K matrix
            Output: 
                the objective value of LR (scalar)
        '''
        loss = 0
        
        ### Start your code
        
        fx = np.dot( self.weight,x) + self.bias
        
        fx = softmax(fx)
        
        n = y.shape
        
        # objective funtion
        loss = -(1/n) * np.sum(y * np.log(fx)) + self.lam * (np.linalg.norm(self.weight) * np.linalg.norm(self.weight)) 

        
        ### End
        
        return loss
        
    
    def gradient(self, x, y):
        ''' Implement the gradient of J with respect to w (in Deliverable 1)
            Inputs:
                x: N-d matrix
                y: N-K matrix
            Output:
                w_grad: the gradient of J w.r.t weight
                b_grad: the gradient of J w.r.t bias (K dimensional vector)
        '''

        n = x.shape[0]
        d = x.shape[1]
        K = y.shape[1]
        
        w_grad = 0. # 0 is just a placeholder, it should be a d-K matrix
        b_grad = 0. # 0 is just a placeholder, it should be a K dimensional matrix
        
        w_grad = np.zeros((d,K))
        b_grad = np.zeros((1,K))
         
        
        # Compute Wx(for softmax denominator/numerator)
        expons = np.exp(np.dot(self.weight,x))
        
        ### Start your code
        for k in range(K):
            
            l2 = 2 * lam * self.weight[k] # for l2
            
            w_grad[k, :] = -self.lam*(((expons[k] / np.sum(expons, axis=0)) - y[:, k]).dot(x.T[:,k]) + l2)
            
            b_grad = -1/n * np.mean(y[:,k] * (np.ones((1,k))-softmax( expons  + self.bias)[:,k]))

            
        ### End
        
        return w_grad, b_grad
    
    
    def gradient_descent(self, w_grad, b_grad, lr):     
        ''' Implement the graident descent. 
            Updating weights and bias based on Equation: w = w - learning_rate * gradient)
            
            Inputs:
                w_grad: a matrix which is the gradient of J w.r.t to weight
                b_grad: a vector wich is the graident of J w.r.t to bias
            Output:
                None
        '''
        
        ### Start your code
        
        K = self.weight.shape
        
        for k in range(K):
            self.weight = self.weight - lr * w_grad # times lr
            self.bias = self.bias - lr * b_grad
        
        ### End
        

    
    def predict(self, x):
        y_hat = softmax(np.dot(x, self.w)).squeeze()
        return np.argmax(y_hat, axis=-1)

### 2.2 Stochastic Gradient Descent (SGD) (Fill the code, 15 points)

In [12]:
def sgd(model, X, y, lr, lam, num_epoch=100):
    ''' Implement SGD
        Inputs:
            X: N-d matrix
            y: N-K matrix
            lr: learning rate
            lam: lambda
            num_epoch: the number of epochs
        Output:
            1. A list of training losses against epoch
            2. A list of validation losses against epoch
    '''
    train_losses = []
    valid_losses = []
    
    n, _ = X.shape
    
    for e in range(num_epoch):
        train_loss = 0.
        
        ### Start your code here (Please implement SGD and obtain the training loss)
        
        X, y = shuffle(X, y) # shuffle data
        
        for i in range(len(X)):
            
            ind = np.random.randint(0,len(X)) # select one random index
            
            X_sample = X[ind:ind+1] # random select one sample for each epoch
            y_sample = y[ind:ind+1]
            
            # Compute x.TW (for softmax denominator/numerator)
            expons = softmax(np.dot( model.weight,X_sample) + model.bias)
            
            
            for k in range(y.shape[1]):
                # Get l2 regularization term 
                l2 = 2 * lam * model.weight[k]
                
                train_loss = (train_loss -1 * np.sum(np.dot(y_sample,np.log(expons))) + l2) /n
                
                  
            # Update w
            w_grad, b_grad = nodel.gradient(X_sample, y_sample)
            model.gradient_descent(w_grad, b_grad,lr)
        
        ### End
        
        valid_loss = 0.
        
        ### Start your code (Using validation set to obtain the validation loss)


        valid_loss = model.objective(x_valid,y_valid)
        
        ### End
        
        
        print(f'At epoch {e+1}, training loss: {train_loss:.4f}, validation loss: {valid_loss:.4f}.')
        train_losses.append(np.mean(losses))
        valid_losses.append(np.mean(losses))
            
    return train_losses, valid_losses

In [None]:
''' Update the hyper-parameters (num_epoch, lr, and lam) according to your observation to achieve better performance.
'''
num_epoch = 20
lr = 0.001
lam = 1E-6

sgd_lr = LogisticRegression(vocab_size, num_class, lam)
sgd_train_losses, sgd_valid_losses = sgd(sgd_lr, x_train, y_train, lr, lam, num_epoch)  


# The class I filled may not be compiled, but it should show some ideas.

Run SGD

### 2.3 Mini-batch Gradient Descent (Fill the code: 15 points)

In [3]:
def mini_batch_gd(model, X, y, batch_size, lr, lam, num_epoch=100):
    ''' Implement SGD
        Inputs:
            X: N-d matrix
            y: N-K matrix
            lr: learning rate
            lam: lambda
            num_epoch: the number of epochs
        Output:
            1. A list of training losses against epoch
            2. A list of validation losses against epoch
    '''
    train_losses = []
    valid_losses = []
    
    n, _ = X.shape
    
    for e in range(num_epoch):
        train_loss = 0.
        
        ### Start your code here (Implement Mini-batch GD)
        
        # the step is the same as SGD, except the select data, which is selecting small batch data for each epoch
        X, y = shuffle(X, y)
        
        for i in range(len(X)):
            
            
            
            X_sample = X[:batch_size] # random select one sample 
            y_sample = y[:batch_size]
            
            # Compute x.TW (for softmax denominator/numerator)
            expons = softmax(np.dot(model.weight,X_sample) + model.bias)
            
            
            for k in range(y.shape[1]):
                # Get l2 regularization term 
                l2 = 2 * lam * model.weight[k]
                
                train_loss = (train_loss -1 * np.sum(np.dot(y_sample,np.log(expons))) + l2) /n
                
                  
            # Update w
            w_grad, b_grad = nodel.gradient(X_sample, y_sample)
            model.gradient_descent(w_grad, b_grad,lr)
        
            train_loss = train_loss /batch_size # divide the btach_size
        ### End
        
        valid_loss = 0.
        
        ### Start your code (Using validation set to obtain the validation loss)

        valid_loss = model.objective(x_valid,y_valid)
        
        ### End
        
        train_losses.append(train_loss)
        valid_losses.append(valid_loss)
        print(f'At epoch {e+1}, training loss: {train_loss:.4f}, validation loss: {valid_loss:.4f}.')
            
    return train_losses, valid_losses

Run Mini-batch GD

In [None]:
''' Update the hyper-parameters (num_epoch, lr, lam, and batch_size) according to your observation 
    to achieve better performance.
'''

num_epoch = 20
lr = 0.01
lam = 1E-6
batch_size = 32

mini_gd_lr = LogisticRegression(vocab_size, num_class, lam)
mini_gd_train_losses, mini_gd_valid_losses = mini_batch_gd(mini_gd_lr, x_train, y_train, batch_size, lr, lam, num_epoch)

#The class I filled may not be compiled, but it should show some ideas.

### 2.4 Evaluation
You are required to report the precision and recall for each category on test set and plot the training loss and validation loss for both SGD and Mini-batch GD

##### Please run the following cell to evaluate your model with SGD

In [None]:
from sklearn.metrics import precision_score, recall_score

y_hat = sgd_lr.predict(x_test)
y_true = np.argmax(y_test, axis=1)

precision = precision_score(y_true, y_hat, average=None)
recall = recall_score(y_true, y_hat, average=None)

print('SGD')
print()
print('  Precision:')
print(f'    class {0}: {precision[0]:.4f}, class {1}: {precision[1]:.4f}, class {2}: {precision[2]:.4f}, class {3}: {precision[3]:.4f}')
print()
print('  Recall:')
print(f'    class {0}: {recall[0]:.4f}, class {1}: {recall[1]:.4f}, class {2}: {recall[2]:.4f}, class {3}: {recall[3]:.4f}')

##### Please run the following cell to plot training loss and validation loss for SGD

In [None]:
import matplotlib.pyplot as plt

%matplotlib inline


plt.plot(range(num_epoch), sgd_train_losses, sgd_valid_losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(["Training loss", "Validation loss"])
plt.title('SGD')
plt.show()

##### Please run the following cell to evaluate your model with Mini-batch GD

In [None]:
y_hat = mini_gd_lr.predict(x_test)
y_true = np.argmax(y_test, axis=1)

precision = precision_score(y_true, y_hat, average=None)
recall = recall_score(y_true, y_hat, average=None)

print('Mini-batch GD')
print()
print('  Precision:')
print(f'    class {0}: {precision[0]:.4f}, class {1}: {precision[1]:.4f}, class {2}: {precision[2]:.4f}, class {3}: {precision[3]:.4f}')
print()
print('  Recall:')
print(f'    class {0}: {recall[0]:.4f}, class {1}: {recall[1]:.4f}, class {2}: {recall[2]:.4f}, class {3}: {recall[3]:.4f}')

##### Please run the following cell to plot training loss and validation loss for Mini-batch GD

In [None]:
plt.plot(range(num_epoch), mini_gd_train_losses, mini_gd_valid_losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(["Training loss", "Validation loss"])
plt.title('Mini-batch GD')
plt.show()

## 3. Cross-Validation (10 points)

You are required to implement cross-validation, and use it to choose the best $\lambda$.


### 3.1 Reload dataset

In [16]:
text_train = train_df['text'].values.astype(str)
label_train = train_df['label'].values.astype(int) - 1 # -1 because labels start from 1

text_test = test_df['text'].values.astype(str)
label_test = test_df['label'].values.astype(int) - 1 # -1 because labels start from 1

### 3.2 Define the range of $\lambda$. (Fill the code)

In [17]:
lambdas = [0.001, 0.01, 0.1] ## Fill the values of lambda you want to evaluate in the list

### 3.3 Imlement cross-validation. (Fill the code)

In [None]:
### Start your code
from sklearn.model_selection import KFold

splits = KFold(n_splits = 5, shuffle = True)

for i in lambdas:
    
    train_loss = []
    val_loss = []
    
    for train_index, valida_index in splits.split(text_train):
        # define training and validation sets
        x_train , x_valida = text_train[train_index,:], text_train[valida_index, :]
        y_train, y_valida = label_train[train_index], label_train[valida_index]
        
        # train model with minibatch 
        mini_gd_train_losses, mini_gd_valid_losses = mini_batch_gd(mini_gd_lr, text_train, label_train, batch_size, lr, i, num_epoch)
        
        train_loss.append(mini_gd_train_losses[-1])
        
        val_loss.append(mini_gd_valid_losses[-1])
        
        print("Mean_trainloss ",mean(train_loss) )
        print("Valid_trainloss ",mean(val_loss) )

### End

In [27]:
import numpy as np
splits = KFold(n_splits = 5, shuffle = True)

x = np.arange(12)
print(x)

for train_index, valida_index in splits.split(x):
    print(train_index,valida_index)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  1  2  3  4  6  7  8 10] [ 5  9 11]
[ 1  2  3  5  6  8  9 10 11] [0 4 7]
[ 0  1  3  4  5  6  7  8  9 11] [ 2 10]
[ 0  2  3  4  5  6  7  9 10 11] [1 8]
[ 0  1  2  4  5  7  8  9 10 11] [3 6]


### 3.4 Report the best $lambda$ value, and report the recall and precision for each category on the test set.

In [None]:
'''
my logistic regression class may not work, but show some ideas
if the model work, probably, the best lambda value is 0.01.

to use LG with SGD could be the best, cause selecting the random example may balance the recall and precision for each category.

'''

# 4. Conclusion

provide an analysis for the results