Text Classification with CNNs in PyTorch

The aim of this repository is to show a baseline model for text classification through convolutional neural networks in the PyTorch framework. The architecture implemented in this model was inspired by the one proposed in the paper: Convolutional Neural Networks for Sentence Classification.

If you want to understand the details about how this model was created, take a look at this very clear and detailed explanation: Text Classification with CNNs in PyTorch

1. The model

The architecture of the model is composed of 4 convolutional layers which generate 32 filters each, then each one of these filters is passed through the max pooling function whose outputs are subsequently cocatenated. Finally, the concatenation is passed through a fully connected layer. The following image describes the model architecture:

2. Files

Pipfile: Here you will find the dependencies that the model needs to be run.
main.py: It contains the controller of pipelines (preprocessing and trainig)
src: It contains three directories, which are: model, parameters and preprocessing.
src/model: It contains two files, model.py and run.py which handles the model definition as well as the training/evaluation phase respectively.
src/parameters: It contains a dataclass which stores the parameters used to preprocess the text, define and train the model.
src/preprocessing: It contains the functions implemented to load, clean and tokenize the text.
data: It contains the data used to train the depicted model.

3. How to use

First you will need to install the dependencies and right after you will need to launch the pipenv virutal environment. So in order to install the dependices, you have to type:

pipenv install

right after you will need to launch the virtual environment such as:

pipenv shell

Then, you can execute the prepropcessing and trainig/evaluation pipelines easily, just typing:

python main.py

If you want to modify some of the parameters, you can modify the dataclass located at src/parameters/parameters.py which has the following form:

@dataclass
class Parameters:

   # Preprocessing parameeters
   seq_len: int = 35
   num_words: int = 2000
   
   # Model parameters
   embedding_size: int = 64
   out_size: int = 32
   stride: int = 2
   
   # Training parameters
   epochs: int = 10
   batch_size: int = 12
   learning_rate: float = 0.001

4. Contributing

Feel free to fork the model and add your own suggestiongs.

Fork the Project
Create your Feature Branch (git checkout -b feature/YourGreatFeature)
Commit your Changes (git commit -m 'Add some YourGreatFeature')
Push to the Branch (git push origin feature/YourGreatFeature)
Open a Pull Request

5. Contact

If you have any question, feel free to reach me out at:

6. License

Distributed under the MIT License. See LICENSE.md for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
img		img
src		src
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

src

src

LICENSE

LICENSE

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

main.py

main.py

Repository files navigation

Text Classification with CNNs in PyTorch

Table of Contents

1. The model

2. Files

3. How to use

4. Contributing

5. Contact

6. License

About

Releases

Packages

Languages

License

dunovank/Text-Classification-CNN-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Text Classification with CNNs in PyTorch

Table of Contents

1. The model

2. Files

3. How to use

4. Contributing

5. Contact

6. License

About

Resources

License

Stars

Watchers

Forks

Languages