# Shift Activated ADALINE Neuron

**The principle behind this activation function is that a misclassification of a target value generates noise in previously established reference predictions. Thus, it will weaken known predictions**

***Speed and accuracy are both an improvement for tweet classification over the stock SKLEARN MLPClassifier***

The structure for using this neuron is thus:

***Establish the reference prediction***

neuron.fit(train_set)

reference_prediction = neuron.predict(reference_set)

***Predict unknown tweet***

Classify tweet as belonging to class 0 (call this tweet:0)

neuron.fit(train_set + tweet:0)

tweet_prediction = neuron.predict(reference_set)

difference(tweet_prediction,reference_prediction)

***Calculate***

if difference > 0:

    tweet:0 was correctly a priori classified

else:

    tweet:0 was incorrectly a priori classified

## What is a Reference Prediction?
This neuron works off of a baseline prediction of known classes. In the example below, it trains on 500 tweets, but makes a baseline prediction of 500 tweets whose classifications are known. This creates a reference prediction that is used later in classification. Overall, this means that the training set is split into two parts, the ADALINE *train* set and the adaline *reference* set. These sets are disjoint and operate at different stages of the neuron's use.

## So this neuron's predict function makes accurate predictions off of 200 tweets?
Nope, in fact, the baseline neuron does rather poorly. But, rather than train on a set of 750 tweets whose classifications are known all at the time of training, the training happens on the baseline training set and a reference prediction on 500 tweets is made with the baseline trained neuron. **It does not matter if the reference predictions are correct!** All that is needed is that there are 500 linear activation values whose classification is already known and saved for later use. The prediction comes from measuring the difference between the reference values and the later predicted values when the training set is modified with the tweet in question.

## Wait, the training set is modified?
Yes, the baseline training set that is used for the reference predictions has the tweet in question added to it. Then the neuron is retrained on this new dataset and a new set of predictions is made on the reference set.

## I'm confused, you're not using the predict function to make a prediction about the tweet in question?
I am absolutely not using the predict function in the way it is classically used. What makes this work is that the unknown tweet is assigned a value of 0 before retraining and repredicting the reference. Then, if that is correct assignment, then it has a different effect on the reference prediction than if it is incorrectly assigned.

## WTF mate? Why does that work?
The theory behind this is rather simple. If a tweet is correctly labeled before being added to the training set, then it strengthens the overall predictive abilities of the neuron. So we see a positive shift in the strength of the overall activation values. If it is incorrectly assigned, so instead of belonging to class 0 it's 1, it acts as noise and weakens the overall predictive abilities of the neuron.

In [1]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import BernoulliNB
from sklearn.svm import SVC
from sklearn.linear_model import Perceptron, PassiveAggressiveClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

import re

In [2]:
class AdalineGD(object):
    """ADAptive LInear NEuron Classifier.
    Parameters
    ----------
    eta: float
        learning rate (between 0.0 and 1.0)
    n_iter : int
        passes over the training dataset
    
    Attributes:
    -----------
    w_ : 1d-array
        Weights after fitting
    errors_
        number of misclassifications after each epoch
    """
    
    def __init__(self, eta=0.1, n_iter = 50):
        self.eta = eta
        self.n_iter = n_iter
    
    def fit(self, X, y):
        """Fit Training Data
        Parameters:
        -----------
        X: {Array like} shape = [n_samples, n_features]
            training vectors,
            where n_samples is the number of samples and n_features is the number of features
            
        y: array-like, shape = [n_samples]
            target values
        
        returns: self : object
        """
        self.w_ = np.zeros(1 + X.shape[1])
        self.cost_ = []
        
        self.errors_ = 0
        
        for i in range(self.n_iter):
            output = self.net_input(X)
            self.errors_ = (y - output)
            
            self.w_[1:] += self.eta * X.T.dot(self.errors_)
            self.w_[0] = self.eta * self.errors_.sum()
            cost = (self.errors_**2).sum() / 2.0
            self.cost_.append(cost)
        return self
    
    def net_input(self, X):
        """Calculate net input"""
        #return np.dot(X, self.w_[1:]) + self.w_[0]
        output = X.dot(self.w_[1:]) + self.w_[0]
        return output
    
    def activation(self, X):
        """Calculate the linear activation weights
        later to be used in the redshift determination"""
        return self.net_input(X)
    
    def predict(self, X):
        """Return the class label after unite step in addition to raw activation weights"""
        activation_weights = self.activation(X)
        return [np.where(activation_weights > 0.0, 1, -1),activation_weights]

In [3]:
df_trump = pd.read_csv("DonaldTrumpTweets.csv")
df_trump = df_trump.drop("Unnamed: 0", axis=1)
df_savage = pd.read_csv("AdamSavageTweets.csv")
df_savage = df_savage.drop("Unnamed: 0",axis = 1)

df_all = df_trump.append(df_savage)

savage_sample = pd.DataFrame(df_savage.sample(2000,random_state=42).text)
savage_sample['identity'] = 1
trump_sample = pd.DataFrame(df_trump.sample(2000,random_state=42).text)
trump_sample['identity'] = 0

In [4]:
def munger(data):
    for index, row in data.iterrows():
        text = row['text']
        text = re.sub("@","",text)
        text = re.sub("#","",text)
        text = re.sub("bit\.ly.*\s?","",text)
        text = re.sub("instagr\.am.*\s?","",text)
        text = re.sub("https?:.*\s?","",text)
        text = re.sub("t\.co.*\s?","",text)
        text = re.sub("pic\.twitter\.com\S*\s?","",text)
        #### set_value is considered the new preferred way of setting values
        #### It is also extremely fast when used with iterrows()
        data.set_value(index,"text",text)
   
    #return data

munger(savage_sample)
munger(trump_sample)

In [5]:
all_text = savage_sample.append(trump_sample)
X_train = savage_sample[:100].append(trump_sample[:100])
X_reference = savage_sample[550:800].append(trump_sample[550:800])
X_test = savage_sample[850:900].copy(deep=True)

# Automatically classify savage samples as belonging to trump
# Later, a difference within activation weights will be used to determine if this
# Was the correct classification
X_test.identity = 0

X_test = X_test.append(trump_sample[850:900])

In [6]:
vectorizer = TfidfVectorizer(ngram_range=(1,2))
vectorizer.fit(all_text.text)

vec_X_train = vectorizer.transform(X_train.text)
vec_X_ref = vectorizer.transform(X_reference.text)
vec_X_test = vectorizer.transform(X_test.text)

In [7]:
ada = AdalineGD(n_iter=2000,eta=0.001)

%time ada.fit(vec_X_train,X_train.identity.values)

CPU times: user 344 ms, sys: 0 ns, total: 344 ms
Wall time: 346 ms


<__main__.AdalineGD at 0x7f4a44c3cda0>

In [8]:
# Here I establish a baseline prediction with known classifications
# This serves as a reference for correct and incorrect classifications
%time reference_value = ada.predict(vec_X_ref)[1]

CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 282 µs


In [9]:
def test_ada(test_sample):
    predictions = []
    for index, row in test_sample.iterrows():
        modified_train = X_train.append(row)
        vec_X_test = vectorizer.transform(modified_train.text)
        ada.fit(vec_X_test,modified_train.identity.values)
        predictions.append(ada.predict(vec_X_ref)[1])
        #print((np.asarray(prediction)[0:250] - np.asarray(reference_value[0:250])).sum() - (np.asarray(prediction)[250:500] - np.asarray(reference_value[250:500])).sum())
    
    return predictions

In [10]:
%time results = test_ada(X_test)
predicted_results = []
for i in range(len(results)):
    # If the data generates noise, i.e. it is incorrectly classfied
    # Then the noise it generates will result in a weakening of predictions
    # This means that the end result will be less than 0
    if (np.asarray(results[i])[0:250] - np.asarray(reference_value[0:250])).sum() - (np.asarray(results[i])[250:500] - np.asarray(reference_value[250:500])).sum() > 0:
        predicted_results.append(0)
    else:
        predicted_results.append(1)

CPU times: user 34.3 s, sys: 0 ns, total: 34.3 s
Wall time: 34.3 s


# Comparison with SKLearn's MLPClassifier
# -------------------------------------------------------------

### Total Time for SA-ADALINE: ~ 34s
### Total Time for MLPClassifier: ~ 1min

In [11]:
# Append X_train and X_reference to use the same amount of information for prediction
# As the SA-ADALINE
mpl = MLPClassifier()
mpl_train = X_train.append(X_reference)
vec_mpl_train = vectorizer.transform(mpl_train.text)
%time mpl.fit(vec_mpl_train,mpl_train.identity)

CPU times: user 3min 38s, sys: 21.6 s, total: 4min
Wall time: 1min


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

### Score for SA-ADALINE: .86
### Score for MLPClassifier: .77

In [12]:
_ = np.ones(50)
_ = np.append(_,np.zeros(50))

mpl.score(vec_X_test,_)

0.77000000000000002

In [13]:
_ = np.ones(50)
_ = np.append(_,np.zeros(50))
accuracy_score(_,predicted_results)

0.85999999999999999