Skip to content

AndyHe021112/math125a

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

math125a

Math125A_LanguageIdentification

Language Identification with Perceptron and Naive Bayes Models

Project Overview

This project focuses on the task of Language Identification within the realm of Natural Language Processing (NLP). It showcases an implementation of two distinct machine learning models - the Perceptron and Naive Bayes - to accurately identify the language of given text samples. Language Identification is crucial in various NLP applications, such as content categorization and as a preprocessing step in complex tasks like translation and sentiment analysis.

Data Source

The dataset used in this project is sourced from Kaggle: Language Identification Dataset. It comprises text samples in various languages, which are divided into training, validation, and testing sets.

Models Used

Naive Bayes: A probabilistic classifier known for its efficiency in text classification tasks. Perceptron: A simple yet effective linear classifier.

Repository Contents

Language_Identification.ipynb: The Jupyter notebook containing the entire project's code and documentation. dataset/: Directory containing the dataset used in the project. requirements.txt: A text file listing the dependencies required to run the notebook.