Skip to content

Sumeet2807/Text-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-classifier

A modular yml based utility to help you classify text

How to install?

Simply clone this repo or download the zip. Make sure you have the required python packages listed in the requirements.txt file installed.

What can you use this utility for?

You may use this utility to train and save text classification models on a variety of text data.

How to use?

There are two entrypoints that define the complete functionality of this utility.

1.) train.py

Use this python script to train a text classification with an appropriate preprocessing and model. All the information about the corpus sources, the model and the preprocessing required will be supplied to this script through a Yaml file.

Example run command - python src/train.py <your_training_config_yml_file>

2.) predict.py

Use this python script to run a pretrained model on a corpus. All the information about corpus source and the saved model to use will supplied through a Yaml file.

Example run command - python src/predict.py <your_prediction_config_yml_file>

Yaml file structure:-

This utility uses yaml files to supply training and prediction configurations

1.) Training Yaml -

Training Yaml file consists of 3 mandatory sections - Data, Preprocessing, Model. It contains all the information about the data source to use, the preprocessing steps to follow and finally the type of model to use with its relevant hyperparameters.

Please refer to examples/config_train.yml for detailed information on yml parameters.

1.) Prediction Yaml -

Prediction Yaml file consists of 2 mandatory sections - Data, Model. It contains all the information about the data source to use and the saved model to use for predictions

Please refer to examples/config_predict.yml for detailed information on yml parameters.

Supported models out of the box:-

Model Classes -

1.) Bag-of-words linear ensemble - bow.Linear_ensemble

2.) Bag-of-words linear ensemble sgd (for larger datasets) - bow.Linear_ensemble_sgd

3.) Bidirectional LSTM - rnn.BLSTM

4.) Bert with a dense top - bert.Dense_top

How to integrate your own custom models and other components ?

Coming soon...

About

A utility to help you classify text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors