Text Threader is a web application made with Django 2 and Angular 7 to detect the language and sentiment of a given text. It mainly detects Arabic or Tunisian dialect and a Positive or a Negative sentiment and supports testing multiple text documents.
-
Detects the language of a text written in any character encoding (Arabic / Tunisian/ Other)
-
Analyse the Sentiment of a text written in any character encoding (Negative / Positive/ Other)
-
Supports streaming multiple files with texts to classify and analyse
For building and running the application you need:
-
Backend:
-
Frontend:
This step is optional if you are just looking to use the application since it is already set up with the needed models, but if you want to tweak on the classification models used then install Jupyter notebook and open the following notebooks:
-
Language identification
These steps give an overview on the language identification pipeline of the
Lang-classifier.ipynb
Jupyter notebook:- Text Cleaning
- Construct the training and test dataframes using our labaled data
- Convert the training documents into numeric feature vectors using the BOW-tfidf method with character ngrams
- Create a language classifier using Naive Bayes method (tfidf version)
- Evaluate performance of this classifier based on the test corpus: calculate classification accuracy, precision, recall, F1, and confusion matrix
-
Sentiment analysis
These steps give an overview on the sentiment analysis pipeline of the
Sentiment-analysis.ipynb
Jupyter notebook:- Text Cleaning
- Normalization & tokenization
- Remove stop words
- Stemming
- Extract the vocabulary set from the corpus and calculate IDF values of each word in this set
- tune the BOW configuration parameters (min_df, max_df, etc.)
- Building the Clarification model: tested Naive Bayes and Logistic Regression, chose the NB classifier because it gave the better accuracy and confusion matrix.
After making changes to the pipelines, its just a matter of running them to dump all the models that the Backend will use to predict and analyse the texts.
For this step, we recommend setting up a virtual environment and activating it, this is optional: Python 3 Virtual Environment Tutorial
First cd
into the Text-Threader-Backend
directory.
Install project dependencies:
$ pip install -r requirements.txt
Then simply apply the migrations:
$ python manage.py migrate
You can now run the development server:
$ python manage.py runserver
First cd
into the Text-Threader-Frontend
directory.
The frontend for FMS was generated with Angular CLI version 7.1.4.
To get it up and running, you need to first install the dependencies using npm
:
$ npm install
then simply serve it using:
$ npm start
Now you should be able to access Text Threader at http://localhost:4200/
-
At the Home page the user will two options: upload files to be classified and analysed simply by clicking the upload button:
Or navigate to the second tab where he can just enter the text manually to be classified and analysed:
-
To classify a list of texts, the user can simply write them into a
txt
file(s) and click the upload button so that the upload files pop up will appear for the user:after that the user can choose any number of text files from his local file system and click choose:
after choosing the appropriate files, the user can then click on the Analyse button to initialise the language and sentiments analysis process for every text in the documents., the user will have a visual to indicate for him the progress on every file:
Once the classification and analysis are finished on a document, an automatic download of a csv result file will initiate:
then the user can visualise the results for every file in the downloaded file with the _name = input_file_name + Result and it should have two columns that are the language and sentiment predictions of the corresponding text row:
TUN,NEG ARA,POS ARA,NEG ...