Language Detection Using N-Grams

https://github.com/melanie-t/twitter-language-detection

Project

This project uses Naive Bayes Classification for Natural Language Processing. The goal of the project is to detect the language (in a pre-specified list) of tweets using variations of N-Grams models. The languages supported are:

Basque (eu)
Catalan (ca)
Galician (gl)
Spanish (es)
English (en)
Portuguese (pt)

Requirements

Python Version 3.7+
Required Python packages
- numpy

Setting Up Project

Download the project via clone (on Git Repository) or ZIP file and extract the folder
Open the folder (twitter-language-detection) as a Python project with your choice of IDE
- Ensure that your Python interpreter is set to Python 3.7
- Set working directory to twitter-language-detection/src

Running the Project

Run Main.py
Enter the absolute path to the test file
The trace and evaluation files will be saved in src/output

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Detection Using N-Grams

Project

Requirements

Setting Up Project

Running the Project

About

Releases

Packages

Languages

melanie-t/twitter-language-detection

Folders and files

Latest commit

History

Repository files navigation

Language Detection Using N-Grams

Project

Requirements

Setting Up Project

Running the Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages