Skip to content

Text Threader is a web application made with Django and Angular 7 to detect the language and sentiment of a given text. It mainly detects Arabic or Tunisian dialect and a Positive or a Negative sentiment and supports testing multiple text documents.

Notifications You must be signed in to change notification settings

MontaLabidi/Text-Threader

Repository files navigation

Text Threader

Text Threader is a web application made with Django 2 and Angular 7 to detect the language and sentiment of a given text. It mainly detects Arabic or Tunisian dialect and a Positive or a Negative sentiment and supports testing multiple text documents.

Features

  • Detects the language of a text written in any character encoding (Arabic / Tunisian/ Other)

  • Analyse the Sentiment of a text written in any character encoding (Negative / Positive/ Other)

  • Supports streaming multiple files with texts to classify and analyse

Getting Started

Pre-requisites

For building and running the application you need:

Installation

Classification Model Setup

This step is optional if you are just looking to use the application since it is already set up with the needed models, but if you want to tweak on the classification models used then install Jupyter notebook and open the following notebooks:

  • Language identification

    These steps give an overview on the language identification pipeline of the Lang-classifier.ipynb Jupyter notebook:

    • Text Cleaning
    • Construct the training and test dataframes using our labaled data
    • Convert the training documents into numeric feature vectors using the BOW-tfidf method with character ngrams
    • Create a language classifier using Naive Bayes method (tfidf version)
    • Evaluate performance of this classifier based on the test corpus: calculate classification accuracy, precision, recall, F1, and confusion matrix
  • Sentiment analysis

    These steps give an overview on the sentiment analysis pipeline of the Sentiment-analysis.ipynb Jupyter notebook:

    • Text Cleaning
    • Normalization & tokenization
    • Remove stop words
    • Stemming
    • Extract the vocabulary set from the corpus and calculate IDF values of each word in this set
    • tune the BOW configuration parameters (min_df, max_df, etc.)
    • Building the Clarification model: tested Naive Bayes and Logistic Regression, chose the NB classifier because it gave the better accuracy and confusion matrix.

After making changes to the pipelines, its just a matter of running them to dump all the models that the Backend will use to predict and analyse the texts.

Backend

For this step, we recommend setting up a virtual environment and activating it, this is optional: Python 3 Virtual Environment Tutorial

First cd into the Text-Threader-Backend directory.

Install project dependencies:

$ pip install -r requirements.txt

Then simply apply the migrations:

$ python manage.py migrate

You can now run the development server:

$ python manage.py runserver

Frontend

First cd into the Text-Threader-Frontend directory.

The frontend for FMS was generated with Angular CLI version 7.1.4.

To get it up and running, you need to first install the dependencies using npm:

$ npm install

then simply serve it using:

$ npm start

Now you should be able to access Text Threader at http://localhost:4200/

Usage

  • At the Home page the user will two options: upload files to be classified and analysed simply by clicking the upload button:

    main_page

    Or navigate to the second tab where he can just enter the text manually to be classified and analysed:

    main_page_second_tab

Classify Documents

  • To classify a list of texts, the user can simply write them into a txt file(s) and click the upload button so that the upload files pop up will appear for the user:

    upload_files

    after that the user can choose any number of text files from his local file system and click choose:

    uploaded_files

    after choosing the appropriate files, the user can then click on the Analyse button to initialise the language and sentiments analysis process for every text in the documents., the user will have a visual to indicate for him the progress on every file:

    analyse_files

    Once the classification and analysis are finished on a document, an automatic download of a csv result file will initiate:

    analysed_files

    then the user can visualise the results for every file in the downloaded file with the _name = input_file_name + Result and it should have two columns that are the language and sentiment predictions of the corresponding text row:

    TUN,NEG
    ARA,POS
    ARA,NEG
    ...
    

Classify simple text

  • If the user wishes to just analyse a simple text, then he can navigate to the second tab and enters his text:

    analyse_text

    very shortly after the results of the language and sentiment predictions will appear on hos screen:

    analysed_text

About

Text Threader is a web application made with Django and Angular 7 to detect the language and sentiment of a given text. It mainly detects Arabic or Tunisian dialect and a Positive or a Negative sentiment and supports testing multiple text documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published