Text Threader

Text Threader is a web application made with Django 2 and Angular 7 to detect the language and sentiment of a given text. It mainly detects Arabic or Tunisian dialect and a Positive or a Negative sentiment and supports testing multiple text documents.

Features

Detects the language of a text written in any character encoding (Arabic / Tunisian/ Other)
Analyse the Sentiment of a text written in any character encoding (Negative / Positive/ Other)
Supports streaming multiple files with texts to classify and analyse

Getting Started

Pre-requisites

For building and running the application you need:

Backend:
- Python 3.6
- Django 2.1
Frontend:
- Node.js
- npm (comes with Node.js)

Installation

Classification Model Setup

This step is optional if you are just looking to use the application since it is already set up with the needed models, but if you want to tweak on the classification models used then install Jupyter notebook and open the following notebooks:

Language identification

These steps give an overview on the language identification pipeline of the Lang-classifier.ipynb Jupyter notebook:
- Text Cleaning
- Construct the training and test dataframes using our labaled data
- Convert the training documents into numeric feature vectors using the BOW-tfidf method with character ngrams
- Create a language classifier using Naive Bayes method (tfidf version)
- Evaluate performance of this classifier based on the test corpus: calculate classification accuracy, precision, recall, F1, and confusion matrix
Sentiment analysis

These steps give an overview on the sentiment analysis pipeline of the Sentiment-analysis.ipynb Jupyter notebook:
- Text Cleaning
- Normalization & tokenization
- Remove stop words
- Stemming
- Extract the vocabulary set from the corpus and calculate IDF values of each word in this set
- tune the BOW configuration parameters (min_df, max_df, etc.)
- Building the Clarification model: tested Naive Bayes and Logistic Regression, chose the NB classifier because it gave the better accuracy and confusion matrix.

After making changes to the pipelines, its just a matter of running them to dump all the models that the Backend will use to predict and analyse the texts.

Backend

For this step, we recommend setting up a virtual environment and activating it, this is optional: Python 3 Virtual Environment Tutorial

First cd into the Text-Threader-Backend directory.

Install project dependencies:

$ pip install -r requirements.txt

Then simply apply the migrations:

$ python manage.py migrate

You can now run the development server:

$ python manage.py runserver

Frontend

First cd into the Text-Threader-Frontend directory.

The frontend for FMS was generated with Angular CLI version 7.1.4.

To get it up and running, you need to first install the dependencies using npm:

$ npm install

then simply serve it using:

$ npm start

Now you should be able to access Text Threader at http://localhost:4200/

Usage

At the Home page the user will two options: upload files to be classified and analysed simply by clicking the upload button:

Or navigate to the second tab where he can just enter the text manually to be classified and analysed:

Classify Documents

To classify a list of texts, the user can simply write them into a txt file(s) and click the upload button so that the upload files pop up will appear for the user:

after that the user can choose any number of text files from his local file system and click choose:

after choosing the appropriate files, the user can then click on the Analyse button to initialise the language and sentiments analysis process for every text in the documents., the user will have a visual to indicate for him the progress on every file:

Once the classification and analysis are finished on a document, an automatic download of a csv result file will initiate:

then the user can visualise the results for every file in the downloaded file with the _name = input_file_name + Result and it should have two columns that are the language and sentiment predictions of the corresponding text row:
```
TUN,NEG
ARA,POS
ARA,NEG
...
```

Classify simple text

If the user wishes to just analyse a simple text, then he can navigate to the second tab and enters his text:

very shortly after the results of the language and sentiment predictions will appear on hos screen:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Text-Threader_Backend		Text-Threader_Backend
Text-Threader_Frontend		Text-Threader_Frontend
data		data
jupyter-notebooks		jupyter-notebooks
models-dump		models-dump
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Threader

Features

Getting Started

Pre-requisites

Installation

Classification Model Setup

Backend

Frontend

Usage

Classify Documents

Classify simple text

About

Releases

Packages

Languages

MontaLabidi/Text-Threader

Folders and files

Latest commit

History

Repository files navigation

Text Threader

Features

Getting Started

Pre-requisites

Installation

Classification Model Setup

Backend

Frontend

Usage

Classify Documents

Classify simple text

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages