# Background

A virtual environment is a directory tree that contains Python executable files and other files which indicate that it is a virtual environment. [Source: python.org].

**Flask** is a micro web framework written in Python. Its minimal structure provides an ecosystem of external components such as form validation, upload handling, authentications, etc. Used by companies such as Pinterest and Linkedin [Source: Wiki].

An **API** is an application programming interface -  a computing interface that defines interactions between multiple software intermediaries. It defines the calls and requests that we can make, how to make them, data formats, conventions, etc. [Source: Wiki].

**AWS** is Amazon Web Service - a subsidary that provides on-demand cloud computing platforms and APIs on a pay-as-you-go basis.

Amazon Elastic Compute Cloud (**Amazon EC2**) provides secure, resizable computing capacity in the cloud; Cloud servers.

**AWS Elastic Beanstalk** is how we will deploy our web application. The service provides ways to deploy and scale web applications into AWS cloud servers. 

  Elastic Beanstalk handles deployment, capacity provisioning, load-balancing, and auto scaling, as well as application health monitoring. The following diagram displays typical architecture for a web server environment.

![](images/beanstalkArch.png)

*source:* https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts-webserver.html

**Postman** is a platform for collaborative API development. 

The **pickle** modlue provdies us ways to serialize python objects, converted into a stream of bytes and unpickled (deserialized). Compare to Java serialization.

# Objectives:


#### 1. Build a machine learning-based spam detector API 

#### 2. Deploy the machine learning application into AWS virtual servers

#### We will detect spam messages using simple machine learning, and launch it as a serverless API using AWS Elastic Beanstalk technology.

# Project

## 1.Creating the Flask application

We create a folder directory MyFlask. Working in visual studio code we need to set the python interpreter to our anaconda directory.  


Within VS code Terminal Configure Settings:  
> Set Terminal > Integrated > Automation Profile: Windows to Command Line.  

Then within the settings.json file we need to edit the python.path:  

>  "python.condaPath": "C:\\Users\\jihad\\anaconda3\\Scripts\\conda.exe"

Now we can create a virtual enviorment called "flask" within VS Code, new terminal:
> python -m venv flask

Activating the virtual enviorment in VS Code:


Right click activate within Scripts folder and select 'copy relative location' then paste into the terminal:
> flask\Scripts\activate

Now we are working within the virtual enviorment


Within this enviorment we install the modules we need, running 
>pip install flask

Finally we can run  
> python Application.py

To see the output of the flask web application.

## 2.Creating the RESTful API - GET/POST Method

In this section we turn the flask web application into a RESTful API that will handle GET/POST methods

We define GET and POST methods for our API:

> @app.route('/spamdetect', methods = ['GET', 'POST'])  
>   def spamdetect():  
>  message = request.args.get("message")  
>  return message                         



Lauching **Postman** from Desktop, within our workspace we create a new request. We will test GET first. From VS code we launch our application again to find the URL
> http://127.0.0.1:5000/

Create a message key and add a value to be retrieved. In this case our name works. By selecting POST we can validate that the RESTful API is working properly

![alt text](images/post.png "Creating the post GET request")


## 3.Building the spam detector ML model

##### Stemming Aside

Stemming is a tye of text normalization that enables standarization of words into root words. It *reduces redundancy in data*, *and *variations in the same word*. 
 
   
  Stemming programs are called stemming algorithms or stemmers. An example of a stemming algorithm is a PorterStemmer. A simple example is below:

In [1]:
from nltk.stem import PorterStemmer
words = ["running", "runs"]
stemmer = PorterStemmer()
for word in words:
    root_word = stemmer.stem(word)
    print(root_word)

run
run


#### Libraries needed: *Pandas, nltk, joblib, sklearn*.

In [2]:
import pandas as pd
import nltk
nltk.download('stopwords') # Filtering out useless words (data)
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\jihad\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


##### Messages are categorized as spam or ham. 
#####  Ham refers to email which is generally considered to not be spam.

In [3]:
# Loading in data; UCI ML spam or ham dataset. Downloaded files in /csv_files, from
# https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset


df_train = pd.read_csv('csv_files/spam_train.csv', encoding = 'ISO-8859-1')
df_train.head()

Unnamed: 0,sms,category
0,"Go until jurong point, crazy.. Available only ...",ham
1,Ok lar... Joking wif u oni...,ham
2,Free entry in 2 a wkly comp to win FA Cup fina...,spam
3,U dun say so early hor... U c already then say...,ham
4,"Nah I don't think he goes to usf, he lives aro...",ham


In [4]:
df_test = pd.read_csv('csv_files/spam_test.csv', encoding = 'ISO-8859-1')
df_test.head()

Unnamed: 0,sms,category
0,Well its not like you actually called someone ...,ham
1,"Nope. Since ayo travelled, he has forgotten hi...",ham
2,You still around? Looking to pick up later,ham
3,CDs 4u: Congratulations ur awarded å£500 of CD...,spam
4,There's someone here that has a year &lt;#&gt...,ham


#### Creating a function cleanSms() to clean data

In [5]:
# General function to clean data.
# tokenizing to extract character by character

tokenizer = RegexpTokenizer('r/w+')
stopwords_english=set(stopwords.words('english'))



# Tokenizing and stemming, get rid of redundant data (stop words)

def cleanSms(sms):
    sms = sms.replace("<br /><br />" , " ")   # replace breaks with space
    sms = sms.lower()
    sms_tokens = tokenizer.tokenize(sms)  # array of strings for each word in lower case
    
    # Removing stop words
    sms_tokens_without_stopwords = [token for token in sms_tokens if token not in stopwords_english]
    
    # Stemming
    stemmed_sms_tokens_without_stopwords = [PorterStemmer().stem(token) for token in sms_tokens_without_stopwords]
    
    cleaned_sms=' '.join(stemmed_sms_tokens_without_stopwords)
    return cleaned_sms

#### Cleaning data each train, test dataset


In [6]:
df_train['sms'].apply(cleanSms)
x_train = df_train['sms'].values
y_train = df_train['category'].values


In [7]:
df_test['sms'].apply(cleanSms)
x_test = df_test['sms'].values
y_test=df_test['category'].values

#### Vectorizing data

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(sublinear_tf=True, encoding='ISO-8859-1')
vectorizer.fit(x_train)
x_train=vectorizer.transform(x_train)
x_test=vectorizer.transform(x_test)

#### Building the Model

We are predicting categories. Given an email, is it spam, or it is not (ham)?

  This is a binary dependent variable classification for which logistic regression is well-suited.

In [9]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver = 'lbfgs')
model.fit(x_train,y_train)


LogisticRegression()

#### Predicting with the logistic regerssion model

In [10]:
# Predicting spam
# Messages with phone numbers much more likely to be classified as spam
model.predict(vectorizer.transform(["you won $900 in the new lottery drawing. Call 1234675543"]))

array(['spam'], dtype=object)

In [11]:
# Predicting Ham
model.predict(vectorizer.transform(["See that attached resume, thanks!"]))

array(['ham'], dtype=object)

## 4.Converting the Model to a Spam Detector API

Pickling can be thought of as zipping the model and unzipping it when we need it. It is converting python objects into byte streams, depickling is the opposite. The extension for a pickled object is pkl.

  We need to pickle the model and pickle the vectorizer. The library joblib can do this.

### Pickling 

In [12]:
import joblib

In [13]:
joblib.dump(model, 'spam_detect_model.pkl')
joblib.dump(vectorizer, 'vectorizer.pkl')

['vectorizer.pkl']

These new files (vectorizer.pkl,spam_detect_model.pkl) must be copied into our Flask application folder.

  Once the files are present in our workspace we import joblib libraries and  download sklearn into virtual enviorment
>pip install sklearn

We then load the vectorizer and model into our VS code Flask applcation workspace by adding new code before the flask instantiation.

> vectorizer = joblib.load("vectorizer.pkl")  

>  spam_detect_model = joblib.load("spam_detect_model.pkl")



Now we modify the body of what the API will do:
 
  Once it recieves the message, we use the vectorizer to transform the message. Finally, we pass this transform version through the model, which returns an array with the predicition in index 0. The following is the body of @app.route('/spamdetect', methods =['GET, 'POST'])

>  message = request.args.get("message") 

>vectorized_message = vectorizer.transform([message])  

> result = spam_detect_model.predict(vectorized_message)[0]  

> return result

### Testing the API

Within Postman, we can send a message, and see how the model will classify it. For the message "Hello, how are you?" the model sends back that it is a non-spam message.

![alt text](images/postHam.png)

For the message "You've won $90,000 in lottery please call  +123456789", the spam detector API classifys as spam.

![alt text](images/postSpam.png)

##  5.Launching an AWS EC2 Virtual Server instance using AWS Elastic Beanstalk

Within the AWS Management Interface we can make a web application using AWS Elastic Beanstalk with default configurations.

  We specify the application name as spamdetectAPI. Once the set up is complete (5-6 minutes) we can see the wep app up and running on Amazon servers.
 
  The web application url is http://spamdetectapi-env.eba-2pb2wgfz.us-east-2.elasticbeanstalk.com/

  With the web application up and running we go to th E2 Services to validate that the instance is running.


![alt text](images/webapprunning.png)

## 6.Depoloying and testing the AWS web application API
 

##### We will deploy our spam detector API flask application into AWS EC2 instance.

 We will need sklearn, joblib, and Flask dependencies to work. But how does AWS servers project the dependencies our application has onto the web servers? The answer is a text file: requirements.txt 
 This generated document specifies for AWS to load the necessary project dependencies. 

Within the VS code terminal, activate the virtual enivorment:
> flask\Scripts\activate

Then initialize a file list of project requirements by running:
  > pip freeze > requirements.txt

The generated requirements.txt shows the neccessary dependencies and versions which have been installed in the enviorment. This txt file informs the AWS server to automatically install these dependencies.

Now we update the application name to match the actual file name. So in VS code we change any app or @app tags to application or @application to match the file name application.py.


Next, we create the deployment file. Create a zipped folder with the pickled files, the requirements, and the application file, we call it version_1.zip. Once in the Elastic Beanstalk enviroment interface select upload and deploy. Within the Flask application project folder, select the zipped folder and upload it. Name the version, "Version_1" and deploy. Once completed you will see Running version with the version name given.


![alt text](images/deploysuccess.png)

### Testing the API in Postman

Copy the web application url, and paste it into the GET/POST URL. We add the message-key as a spam messag "hello you have won $9000 in the lottery! please call +1234574935?". The url and response from our spam detect application:


> http://spamdetectapi-env.eba-2pb2wgfz.us-east-2.elasticbeanstalk.com/spamdetect?message=/hello you have won $9000 in the lottery! please call +1234574935?
>> spam

>hello dear friend!
>>ham

![alt text](images/webapp_post_test.png)

## 7. Performing additional AWS Elastic Beanstalk actions:


  ####  Application versioning

Within t|he AWS Elastic Beanstalk console we can manage application versions. From the version section we can switch versions, download source codes, and see dates each version was deployed.

 #### Server logs

On the left hand side we can request and download server logs.

#### Server performance monitoring

The monitoring section displays metrics such as Healthy Host Count,  Target Response Time , Sum Requests, CPU Utilization, Max Network In, Max Network Out.

#### Terminating the Server

After 1 year Amazon will begin to charge one. Within the environment under the actions drop-down is the terminate option. Similar to Github one must specify the name of the environment to terminate the web app.

## 8. Conclusion

**We have delopyed an ML spam classification model into AWS cloud servers and explored AWS Elastic Beanstalk options.**

## 9. Next Steps:


1. Improve the ML model by introducing hyper parameters to the logistic regression fit.   
2. Build the model on different datasets.  
3. Change the ML method of classifying messages.  
4. Incorporate the model into a more sophisticated web application.