# <p style="text-align:center;"> AI for Arabic </p>
![](https://www.ia-challenge.tn/ChallengeAssets/images/heroImageAINC.svg)

# Workflow of Machine Learning project on Android

Solving Machine learning Problems in a local system is only not the case but making it able to community to use is important otherwise the model is up to you only. When it is able to serve then you came to know the feedback and improvements needed to improve it. Implementing a Machine learning model in a jupyter Notebook is a very easy task. And 90 percent of times when any data science practitioner deploys his problem statement it is in the form of a website. So How can we apply it  into an Android app!.

# Machine Learning(ML) Model on Android cover

Workflow of Machine Learning project on Android
When we deploy machine learning on a website, the basic workflow is implementing the model in any Python IDE, extracting it using a pickle module, and with help of any web framework flask or streamlit to deploy in form of the web app. here the complete implementation from frontend to the backend is in Python.

**Now when deploying Machine learning in android there is a little bit of modification in the above workflow. First, we have a model, we pickle it. For implementing Android apps java is popular and working with android studio java is mostly preferred so here our frontend will depend on java and in middle we have to implement a Flask API which is our machine learning model whose output will be in JSON format(JSON is a universal format which any programming language can understand)  and through java android app we will hit at Flask API whose response is in JSON and we will parse this JSON and print it in android frontend.**
![](https://editor.analyticsvidhya.com/uploads/46212ML%20on%20Android%20Workflow%20diagram.png)

# Our workflow Is :
* Building A model
* Building A Flask Api
* Test Application using Postman
* Create Android App
* Connectivity of API to Android APP
* Write backend logic in java

# <p style="text-align:center;"> Building a model </p>

In [3]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import lightgbm as lgb
import sklearn 
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import SGDClassifier
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
import random
import os
print('\n'.join(os.listdir("../input")))

# Any results you write to the current directory are saved as output.

In [4]:
def read_tsv(data_file):
    text_data = list()
    labels = list()
    infile = open(data_file, encoding='utf-8')
    for line in infile:
        if not line.strip():
            continue
        label, text = line.split('\t')
        text_data.append(text)
        labels.append(label)
    return text_data, labels

In [5]:
train_pos=read_tsv('../input/train_Arabic_tweets_positive_20190413.tsv')
train_neg=read_tsv('../input/train_Arabic_tweets_negative_20190413.tsv')
test_pos=read_tsv('../input/train_Arabic_tweets_positive_20190413.tsv')
test_neg=read_tsv('../input/test_Arabic_tweets_negative_20190413.tsv')


In [6]:
train_d=pd.DataFrame({'positive':train_pos[0],'value':'1'})
train_d

In [7]:
len(train_pos[0]),len(train_neg[0]),len(test_pos[0]),len(test_neg[0])

# define functions

In [8]:
def load(pos_train_file, neg_train_file, pos_test_file, neg_test_file):
    pos_train_data, pos_train_labels = read_tsv(pos_train_file)
    neg_train_data, neg_train_labels = read_tsv(neg_train_file)

    pos_test_data, pos_test_labels = read_tsv(pos_test_file)
    neg_test_data, neg_test_labels = read_tsv(neg_test_file)
    print('------------------------------------')

    sample_size = 5
    print('{} random train tweets (positive) .... '.format(sample_size))
    print(np.array(random.sample(pos_train_data, sample_size)))
    print('------------------------------------')
    print('{} random train tweets (negative) .... '.format(sample_size))
    print(np.array(random.sample(neg_train_data, sample_size)))
    print('------------------------------------')

    x_train = pos_train_data + neg_train_data
    y_train = pos_train_labels + neg_train_labels

    x_test = pos_test_data + neg_test_data
    y_test = pos_test_labels + neg_test_labels

    print('train data size:{}\ttest data size:{}'.format(len(y_train), len(y_test)))
    print('train data: # of pos:{}\t# of neg:{}\t'.format(y_train.count('pos'), y_train.count('neg')))
    print('test data: # of pos:{}\t# of neg:{}\t'.format(y_test.count('pos'), y_test.count('neg')))
    print('------------------------------------')
    return x_train, y_train, x_test, y_test


In [16]:
###############################################################

def do_sa(n, my_classifier, name, my_data):
    x_train, y_train, x_test, y_test = my_data
    print('parameters')
    print('n grams:', n)
    print('classifier:', my_classifier.__class__.__name__)
    print('------------------------------------')

    pipeline = Pipeline([
        ('vect', TfidfVectorizer(min_df=0.001, max_df=0.95,
                                 #min_df=0.0001, max_df=0.95,
                                 analyzer='word', lowercase=False,
                                 ngram_range=(1, n))),
        ('clf', my_classifier),
    ])

    pipeline.fit(x_train, y_train)
    feature_names = pipeline.named_steps['vect'].get_feature_names()

    y_predicted = pipeline.predict(x_test)

    # Print the classification report
    print(metrics.classification_report(y_test, y_predicted,
                                        target_names=['pos', 'neg']))

    # Print the confusion matrix
    cm = metrics.confusion_matrix(y_test, y_predicted)
    print(cm)
    print('# of features:', len(feature_names))
    print('sample of features:', random.sample(feature_names, 40))
    accuracy = accuracy_score(y_test, y_predicted)
    precision = precision_score(y_test, y_predicted, average='weighted')
    recall =  recall_score(y_test, y_predicted, average='weighted')
    return name, n, accuracy, precision, recall


# Setup experiments 

In [17]:
ngrams = (1, 2, 3)
results = []
pos_training = '../input/train_Arabic_tweets_positive_20190413.tsv'
neg_training = '../input/train_Arabic_tweets_negative_20190413.tsv'

pos_testing = '../input/test_Arabic_tweets_positive_20190413.tsv'
neg_testing = '../input/test_Arabic_tweets_negative_20190413.tsv'

classifiers = [LinearSVC(), SVC(), MultinomialNB(),XGBClassifier(),lgb.LGBMClassifier(),
               BernoulliNB(), SGDClassifier(), DecisionTreeClassifier(max_depth=5),
               RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
               KNeighborsClassifier(3)
               ]
for g in ngrams:
    dataset = load(pos_training, neg_training, pos_testing, neg_testing)
    for alg in classifiers:
        alg_name = alg.__class__.__name__
        r = do_sa(g, alg, alg_name, dataset)
        results.append(r)
        

 #  Results Summary

In [18]:
print('{0:25}{1:10}{2:10}{3:10}{4:10}'.format('algorithm', 'ngram', 'accuracy', 'precision', 'recall'))
print('---------------------------------------------------------------------')
for r in results:
    print('{0:25}{1:10}{2:10.3f}{3:10.3f}{4:10.3f}'.format(r[0], r[1], r[2], r[3], r[4]))

# Deep Learning Approach
Given that the Random Forest and LinearSVC Classifier models wasn't generalizing well for other datasets (possibly overfitting), I decided to try a DL approach using a pretrained model (i.e: increasing the dataset as a way of overcoming overfitting). For that I chose to use the Arabic-BERT model By Ali Safaya.
The models were pretrained on ~8.2 Billion words:

Arabic version of OSCAR (unshuffled version of the corpus) - filtered from Common Crawl
Recent dump of Arabic Wikipedia

In [26]:
#import torch
#
#if torch.cuda.is_available():       
 #   device = torch.device("cuda")
 #   print(f'There are {torch.cuda.device_count()} GPU(s) available.')
#    print('Device name:', torch.cuda.get_device_name(0))
#
#else:
 #   print('No GPU available, using the CPU instead.')
 #   device = torch.device("cpu")

In [27]:
#from sklearn.feature_extraction.text import CountVectorizer
#from sklearn.linear_model import LogisticRegression
#from sklearn.pipeline import #

# <p style="text-align:center;"> Building a Flask API </p>
![](https://miro.medium.com/max/1103/1*pu5-oy7xcIJafXim7RR_9w.png)

**A user will enter information in the form and while submitting the form it will receive the POST request. And on making a post request Flask API will accept the data entered by a user and pass it to the machine learning model which will predict the output class. The predicted class we will pass to the android app in form of JSON.**



# <p style="text-align:center;"> Test Application using Postman</p>
![](https://editor.analyticsvidhya.com/uploads/11389POSTman_results.png)
Postman is an automatic and interactive tool used to verify APIs of your project. It is Google chrome App that connects with HTTP API. It works at the backend and allows you to check that your API is working fine as per our requirements. By providing the URL of your running flask API and inserting data in the key and value section you can hit your API and get the desired response.

# <p style="text-align:center;">Create Android App</p>
![](https://editor.analyticsvidhya.com/uploads/62228Android_new_project.png)

# Create Android UI
**We know that UI is always created in an XML file. Open the XML file named activity main and here we will build a complete frontend UI. You can use the below code snippet. First, we have given the title of the project, three input fields for respective columns, and one button to submit and get results.**

# Connectivity of API to Android APP

**Now you have to write its backend working in java. The logic we have to implement is you will take the inputs from the android app, hit the API, and the response from the API display back in the Android app. So, there is one problem that the API we have implemented is running locally on your system which the Android app cannot detect. so we need to deploy our API online and we will use Heroku for this task.**

# Deploy API to Heroku

**Login to Heroku and create New app by giving it a unique name. deploy your GUI using Heroku CLI or GitHub.**

# Connect API to Internet

**To hit API we need one library named Volley. So to install Volley visit In Gradle scripts in your project directory, and open the build Gradle file and write below one line of code that will install the required library. As you click on sync now on the top right it will start installing required libraries in the project directory.**
![](https://editor.analyticsvidhya.com/uploads/96786android_install_library.png)

# Write backend logic in java
**We have designed our UI, Now we need to write backend logic to accept data from the frontend in a java file. The flow is where we have to accept all the three values from the android app and when it click predict button we have to hit API and ret JSON response. keeping all the imports as it is, you can follow the below code from class in the Main Activity java file.**