math125a

Math125A_LanguageIdentification

Language Identification with Perceptron and Naive Bayes Models

Project Overview

This project focuses on the task of Language Identification within the realm of Natural Language Processing (NLP). It showcases an implementation of two distinct machine learning models - the Perceptron and Naive Bayes - to accurately identify the language of given text samples. Language Identification is crucial in various NLP applications, such as content categorization and as a preprocessing step in complex tasks like translation and sentiment analysis.

Data Source

The dataset used in this project is sourced from Kaggle: Language Identification Dataset. It comprises text samples in various languages, which are divided into training, validation, and testing sets.

Models Used

Naive Bayes: A probabilistic classifier known for its efficiency in text classification tasks. Perceptron: A simple yet effective linear classifier.

Repository Contents

Language_Identification.ipynb: The Jupyter notebook containing the entire project's code and documentation. dataset/: Directory containing the dataset used in the project. requirements.txt: A text file listing the dependencies required to run the notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

math125a

Language Identification with Perceptron and Naive Bayes Models

Project Overview

Data Source

Models Used

Repository Contents

Files

README.md

Latest commit

History

README.md

File metadata and controls

math125a

Language Identification with Perceptron and Naive Bayes Models

Project Overview

Data Source

Models Used

Repository Contents