<a href="https://colab.research.google.com/github/M-H-Amini/MachineLearning-TMU/blob/master/MLe_TMU_Lec3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# In The Name Of ALLAH
# Machine Learning *elementary* Course
## Tarbiat Modares University
### Mohammad Hossein Amini (mhamini@aut.ac.ir)
# Lecture 3

<img src="https://github.com/M-H-Amini/MachineLearning-AUT/blob/master/stuff/MLAUT.jpg?raw=true" width="400">





# Introduction
In this lecture, we will do sentiment analysis on some tweets using **Logistic Regression**.

After preprocessing each tweet, we'll count frequency of its words in both positive and negative tweets. So for each tweet, there are 2 features: Positive and Negative frequencies. we'll use these feature vectors to classify tweets!

we would implement logistic regression using **keras**.

In [None]:
import nltk                                  
from nltk.corpus import twitter_samples, stopwords
import matplotlib.pyplot as plt              
import numpy as np       
import keras
from utils import process_tweet, build_freqs

# Dataset Preparation
Let's download the dataset.

In [None]:
nltk.download('twitter_samples')
nltk.download('stopwords')

In [None]:
print(stopwords.words('english'))

Now we would create some lists containing our tweets as strings.

In [None]:
# select the lists of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')

# concatenate the lists, 1st part is the positive tweets followed by the negative
tweets = all_positive_tweets + all_negative_tweets

# let's see how many tweets we have
print("Number of tweets: ", len(tweets))

In [None]:
print(all_positive_tweets[10])
print(all_negative_tweets[1543])

In [None]:
labels = np.append(np.ones((len(all_positive_tweets))), np.zeros((len(all_negative_tweets))))

Let's count frequency of each word in both positive and negative tweets. We'd use ```build_freqs``` function for this purpose.

In [None]:
freqs = build_freqs(tweets, labels)

In [None]:
print(freqs[('great', 1)], freqs[('great', 0)])
freqs.get(('excellent', 1), 0)

Time for preprocessing and extracting features! ```extractFeatures``` would get a tweet and returns the 2-element feature vector.

In [None]:
def extractFeatures(tweet):
  tweet = process_tweet(tweet)
  pos, neg = 0, 0
  for word in tweet:
    pos += freqs.get((word, 1.), 0)
    neg += freqs.get((word, 0.), 0)
  feature_vec = np.array([pos, neg])
  return feature_vec

Splitting data into train and test now.

In [None]:
train_tweets = all_positive_tweets[:4000] + all_negative_tweets[:4000]
train_labels = [1. for i in range(4000)] + [0. for i in range(4000)]
test_tweets = all_positive_tweets[4000:] + all_negative_tweets[4000:]
test_labels = [1. for i in range(1000)] + [0. for i in range(1000)]

We'd have some numpy arrays to keep training set and test set.

In [None]:
X_train = np.zeros((len(train_tweets), 2))
y_train = np.array(train_labels)
X_test = np.zeros((len(test_tweets), 2))
y_test = np.array(test_labels)

In [None]:
for i in range(X_train.shape[0]):
  X_train[i] = extractFeatures(train_tweets[i])

for i in range(X_test.shape[0]):
  X_test[i] = extractFeatures(test_tweets[i])

# Logistic Regression
Let's do the logistic regression now!

In [None]:
model = keras.Sequential([keras.layers.Dense(1, activation='sigmoid', input_shape=(2,))])
model.summary()

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_test, y_test))

Testing and seeing our brilliant result would be so fun.

In [None]:
print(all_negative_tweets[-1])
print(X_test[-1])
print(model.predict(X_test[-1:]))

And finally, we would create a ```classify``` function which gets a string and classifies it!

In [None]:
def classify(tweet):
  tweet = extractFeatures(tweet)
  return model.predict(tweet[np.newaxis, :])

In [None]:
neg = 'I hated it. It was such an awful movie!'
pos = 'It was an honor to view this great one!'
print(classify(neg), classify(pos))