Text Classification Using Embeddings for Luganda Text Corpus

This repository contains code for a text classification experiment using embeddings for the Luganda text corpus. In this experiment, we utilized Support Vector Machine (SVM) as the classification model and performed hyper-parameter tuning to optimize its performance. Due to limitations imposed by the Cohere API for free users, the dataset was sampled to 500 samples for training and evaluation purposes.

Experiment Details

Model: Support Vector Machine (SVM)
Data: Luganda Text Corpus
Data Size: 500 samples (due to API limitations)
Embeddings: Utilized embeddings for text representation
Validation Accuracy: 45.6%

Experiment Process

Data Sampling: Due to limitations in the Cohere API for free users, the Luganda text corpus was sampled to 500 data points for the experiment.
Text Embeddings: Text data was converted into embeddings to enable numerical representation for the SVM model. Embeddings play a crucial role in capturing semantic meanings of words and phrases, enhancing the model's understanding of the text.
Model Selection: SVM was chosen as the classification model due to its effectiveness in text classification tasks and its ability to handle high-dimensional data like embeddings.
Hyper-parameter Tuning: Hyper-parameter tuning was performed to optimize the SVM model's performance. Various hyper-parameters such as C, kernel type, and gamma were tuned to achieve the best results.
Training and Evaluation: The model was trained on the sampled dataset and evaluated using validation data. The validation accuracy obtained was 45.6%, indicating the model's ability to correctly classify Luganda text samples.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Text_Classification_Using_Embeddings.ipynb		Text_Classification_Using_Embeddings.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification Using Embeddings for Luganda Text Corpus

Experiment Details

Experiment Process

About

Releases

Packages

Languages

EricPeter/Topic-Classification-Luganda

Folders and files

Latest commit

History

Repository files navigation

Text Classification Using Embeddings for Luganda Text Corpus

Experiment Details

Experiment Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages