Language Detection

This repository contains a Python script for training a language detection model using the Multinomial Naive Bayes algorithm. The goal is to predict the language of given text samples.

Notebook

Importing Libraries: The script starts by importing necessary libraries such as pandas, numpy, seaborn, and scikit-learn. These libraries are essential for data manipulation, visualization, and machine learning.
Exploratory Data Analysis (EDA): The dataset is loaded from a CSV file named "Language Detection.csv". The script then performs EDA by checking for null values and visualizing the distribution of data for each language using a pie chart.
Data Cleaning and Transformation: The textclean function is defined to preprocess the textual data. It converts text to lowercase, removes special characters, and strips any leading/trailing spaces. The target variable (language) is encoded using label encoding. The text data is transformed using a CountVectorizer and a TfidfTransformer.
Primary Model Training: The Multinomial Naive Bayes model is trained on the preprocessed data. The training set is split into training and testing subsets. The model is then fit to the training data.
Model Pipeline Creation: In this step we will make a class to create a pipeline with our custome text cleaning. Pipeline will consist of four steps

Text Clean
Count Vectorizer
TfIdf Transformer
Multinomialdb(Naive Bayes)

With the end of project we were able to achive following results

Streamlit application

To run this streamlit application enter the below command

streamlit run app.py

Usage

To use this code, follow these steps:

Install the required libraries pip install requirements.txt
Load your own dataset or modify the existing one.
Run the script to train the language detection model.
Evaluate the model's performance using classification metrics.
You can also tryout the streamlit app

Feel free to customize and extend this code for your specific use case!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dataset		Dataset
resources		resources
.gitignore		.gitignore
README.md		README.md
app.py		app.py
helper_func.py		helper_func.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Detection

Notebook

Streamlit application

Usage

About

Releases

Packages

Languages

Gopalkholade/Language-Detection

Folders and files

Latest commit

History

Repository files navigation

Language Detection

Notebook

Streamlit application

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages