Skip to content

NeuroQuestAi/ml-text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Language Text Classification 🧠

Powered by NeuroQuestAI python 3 Code style: Black Packaged with Poetry

This project is a simple text classification 📝 using multi-language BERT. 🇬🇧 | 🇧🇷

Project ☁️

Details of the BERT model used:

This project was inspired by this source:

Training and validation accuracy:

Model ACC

Note: We reached 98% accuracy.

Requirements 🛠️

It is necessary:

Note: A machine with a GPU is not required, but it is recommended to accelerate training.

Datasets 📊

It was used the BBC dataset to classify texts into the following labels:

  • Business 💼
  • Entertainment 🎬
  • Sport ⚽
  • Tech 💻
  • Politics 🏛️

The texts in the dataset were first translated into Brazilian Portuguese, using the Google Translator API. After that, the English 🇬🇧 and Portuguese 🇧🇷 texts were combined to create a multilingual version.

Build and Running 🚀

Clone the project to your computer using Git and go to the project root folder:

git clone git@github.com:NeuroQuestAi/ml-text-classification.git && \
 cd ml-text-classification.git

Use poetry to access the project:

poetry shell

Install all dependencies:

poetry install && poetry update 

Run the model training and evaluation:

./train 

This will generate the torch model in the models folder. Then just test the predictions with the command:

./predictor 

Output example:

Text Lang Prediction
Os negócios são o tecido vital da economia... 🇧🇷 BUSINESS
A variedade de formas de entretenimento reflete... 🇧🇷 ENTERTAINMENT
Os valores como fair play, respeito e camaradagem são... 🇧🇷 SPORT
Desde a revolução digital até as últimas descobertas... 🇧🇷 TECH
A política reflete as diferentes visões, valores... 🇧🇷 POLITICS
Businesses are the lifeblood of the economy, where ideas... 🇬🇧 BUSINESS
From thrilling movies to engaging games, and soul-touching... 🇬🇧 ENTERTAINMENT
Sports are a universal passion that brings people... 🇬🇧 SPORT
Artificial intelligence, cloud computing, the Internet... 🇬🇧 TECH
Active citizen participation in political life... 🇬🇧 POLITICS

Note: Model settings are in the config.json file.

Authors 👨‍💻