Skip to content

Latest commit

 

History

History
14 lines (14 loc) · 1.23 KB

README.md

File metadata and controls

14 lines (14 loc) · 1.23 KB

math125a

Math125A_LanguageIdentification

Language Identification with Perceptron and Naive Bayes Models

Project Overview

This project focuses on the task of Language Identification within the realm of Natural Language Processing (NLP). It showcases an implementation of two distinct machine learning models - the Perceptron and Naive Bayes - to accurately identify the language of given text samples. Language Identification is crucial in various NLP applications, such as content categorization and as a preprocessing step in complex tasks like translation and sentiment analysis.

Data Source

The dataset used in this project is sourced from Kaggle: Language Identification Dataset. It comprises text samples in various languages, which are divided into training, validation, and testing sets.

Models Used

Naive Bayes: A probabilistic classifier known for its efficiency in text classification tasks. Perceptron: A simple yet effective linear classifier.

Repository Contents

Language_Identification.ipynb: The Jupyter notebook containing the entire project's code and documentation. dataset/: Directory containing the dataset used in the project. requirements.txt: A text file listing the dependencies required to run the notebook.