Textimus Prime

This is a ML application used to classify any text dataset into different labels. The app uses many text classifiers built into it and chooses the best classifier with the highest accuracy to train the models to predict the labels for the unlabelled data(user case).

Getting started

Clone the whole repository with the html templates as it is and run it with any python IDE

Prerequisites

Follow this guide to install required libraries for running this application.

Windows:

Download Microsoft visual studio build tools, this would be required to install libraries needed. Download link

Next install all your libraries by running cmd with administrator rights and running the following command in that folder:

pip install -r requirements.txt

Install spacy language lib by executing this command in adminstrator mode

python -m spacy download en

To complete installing nltk execute these commands one by one in adminstrator mode

python

import nltk

nltk.download()

A window will pop up, download and install all the files and close it

Linux:

Run the following commands to get your linux system to run our program:

sudo python3 -m pip install -r requirements.txt
python3 -m spacy download en

How to use?

Open a terminal in the local repository and run the fit_dataset.py program.
- For Windows:
```
python fit_dataset.py
```
- For Linux:
```
python3 fit_dataset.py
```
After the program is successfully running, go to http://localhost:5000/ in your preferred browser. You must be able to see the following screen:

Sample Run

Let us consider a sample dataset and use it in our application. The data set consists of Consumer Finance Complaints with 4 predefined categories namely - ‘Credit reporting’, ‘Debt collection’, ‘Mortgage’, ‘Student loan’. (Source: data.gov)

The input data to our application should contain a feature name and label fields. The sameple dataset looks like the following:

	Consumer_complaint_narrative	Product
0	This company refuses to provide me verification and validation of debt per ...	Debt collection
1	Started the refinance of home mortgage process with cash out option on XX/ ...	Mortgage
2	I was dropped from my income based repayment plan by FedLoan servicing for ...	Student loan
3	The first communication that I received from the debt collector was a court ...	Debt collection
...	...	...

Now assuming the application is running, let us proceed:

Step 1: Upload labelled dataset in .csv format to train the model. Enter the feature name and label (Here, "Consumer_complaint_narrative" and "Product" respectively). Enter the name by which your trained model should be saved. Finally, Click the Train button. The loading screen appears till the models are trained.

Step 2: The accuracy table of the classifiers. The classifier with best accuracy is automatically chosen and is saved as a model with the name given in previous step.

Step 3: Upload unlabelled dataset in .csv format to predict the labels. Enter the feature name in the unlabelled file whose label have to be predicted. Next, input the filename of the saved model. Then enter the filename of file in which results are going to be saved. Click OK.

Step 4: The results are displayed in the following format:

(Note: If you have already trained your dataset and don't want to train it again, you can jump directly to step 3 by clicking "Go to Predict Page" in step 1)

Built With

Flask - A web microframework for Python
scikit-learn - Machine learning library in Python
spaCy - Industrial-Strength Natural Language Processing
NLTK - NLP tasks
pandas data structures and data analysis tools for the Python

Authors

Allu Praveen
Deepak Divya Tejaswi
Nimit Kanani
Nitish Kumar Naineni

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static		static
templates		templates
README.md		README.md
fit_dataset.py		fit_dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Textimus Prime

Getting started

Prerequisites

Windows:

Linux:

How to use?

Sample Run

Built With

Authors

About

Releases

Packages

Contributors 2

Languages

NitishNaineni-zz/Data_Agnostic_Classifier

Folders and files

Latest commit

History

Repository files navigation

Textimus Prime

Getting started

Prerequisites

Windows:

Linux:

How to use?

Sample Run

Built With

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages