Skip to content

melanie-t/twitter-language-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 

Repository files navigation

Language Detection Using N-Grams

https://github.com/melanie-t/twitter-language-detection

Project

This project uses Naive Bayes Classification for Natural Language Processing. The goal of the project is to detect the language (in a pre-specified list) of tweets using variations of N-Grams models. The languages supported are:

  • Basque (eu)
  • Catalan (ca)
  • Galician (gl)
  • Spanish (es)
  • English (en)
  • Portuguese (pt)

Requirements

  • Python Version 3.7+
  • Required Python packages
    • numpy

Setting Up Project

  1. Download the project via clone (on Git Repository) or ZIP file and extract the folder
  2. Open the folder (twitter-language-detection) as a Python project with your choice of IDE
    • Ensure that your Python interpreter is set to Python 3.7
    • Set working directory to twitter-language-detection/src

Running the Project

  1. Run Main.py
  2. Enter the absolute path to the test file
  3. The trace and evaluation files will be saved in src/output

Releases

No releases published

Packages

No packages published

Languages