Language Detection using Naive Bayes+Docker+FastAPI+Render

This project demonstrates how to build a Language Detection Model using Naive Bayes and a Bag-of-Words (CountVectorizer) approach.
It trains a machine learning model to classify text into different languages and saves the trained model as a pipeline for reuse.

🚀 Features

Loads and preprocesses a dataset of text samples with their corresponding languages.
Cleans text (removes numbers, special characters, converts to lowercase).
Converts text into numeric vectors using CountVectorizer (Bag of Words).
Trains a Naive Bayes classifier for language detection.
Evaluates performance using Accuracy and F1 Score.
Creates a Pipeline for streamlined training and prediction.
Saves the trained model using Pickle for later use.
Predicts the language of new unseen text.

📂 Project Structure

.
├── Language Detection.csv       # Dataset file
├── language_detection.py        # Main script (your code)
├── trained_pipeline-0.1.0.pkl   # Saved trained pipeline
└── README.md                    # Documentation

⚙️ Installation

Clone this repository:

git clone https://github.com/your-username/language-detection.git
cd language-detection

Install required Python libraries:

pip install pandas numpy scikit-learn seaborn matplotlib

📊 Workflow

Load Dataset
Load Language Detection.csv containing text and their corresponding language labels.
Preprocessing
- Remove special characters, numbers, and extra symbols.
- Convert text to lowercase.
Encoding Labels
- Convert categorical language labels into numerical values using LabelEncoder.
Train-Test Split
- Split dataset into 80% training and 20% testing sets.
Vectorization (Bag of Words)
- Convert text into numeric vectors using CountVectorizer.
Model Training
- Train a Multinomial Naive Bayes classifier on the training data.
Evaluation
- Compute Accuracy and F1 Score on test data.
Pipeline
- Create a Scikit-learn pipeline with CountVectorizer + Naive Bayes.
- Save the pipeline using pickle.
Prediction
- Load pipeline and predict language of new text (e.g., "Ciao, come stai?" → "Italian").

🖥️ Usage

Run the Python script:

python language_detection.py

Example prediction inside the script:

text = "Ciao, come stai?"
y = pipe.predict([text])
print("Detected Language:", le.classes_[y[0]])

Output:

Detected Language: Italian

📈 Evaluation Metrics

Accuracy: Overall correct predictions.
F1 Score: Balanced measure of precision and recall (useful for imbalanced datasets).

🗂️ Saved Model

The trained model pipeline is saved as:

trained_pipeline-0.1.0.pkl

You can load it later and make predictions without retraining:

import pickle
with open("trained_pipeline-0.1.0.pkl", "rb") as f:
    pipe = pickle.load(f)

print(pipe.predict(["Hello, how are you?"]))  # Output: English

📦 Run with Docker

The project includes a Dockerfile so it can be containerized.

Build the Docker image.
Run the container and map it to a port on your machine.
Access the API endpoints and documentation through your browser or API client.

🖥️ API development-FastAPI

The application can be started with Uvicorn, making the API available on your local machine. It exposes two endpoints:

Health Check → returns service status and model version.
Prediction → accepts input text and returns the detected language.
Interactive documentation is automatically available through the FastAPI Swagger UI.

Deployment on Render

Connect your GitHub repository to Render.
Create a new Web Service and choose Docker environment.
Render automatically builds the image using your Dockerfile.
Set the start command (for FastAPI in Docker it’s handled by the base image).
Expose port 80 (or the port your app is running on).
Once deployed, your API is available at a public Render URL with full /docs support.

✅ Requirements

Python 3.7+
pandas
numpy
scikit-learn
seaborn
matplotlib

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
model		model
.gitignore		.gitignore
Dockerfile		Dockerfile
dockerignore		dockerignore
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Detection using Naive Bayes+Docker+FastAPI+Render

🚀 Features

📂 Project Structure

⚙️ Installation

📊 Workflow

🖥️ Usage

📈 Evaluation Metrics

🗂️ Saved Model

📦 Run with Docker

🖥️ API development-FastAPI

Deployment on Render

✅ Requirements

📜 License

About

Uh oh!

Releases

Packages

Languages

Manohargov/Language-predictor

Folders and files

Latest commit

History

Repository files navigation

Language Detection using Naive Bayes+Docker+FastAPI+Render

🚀 Features

📂 Project Structure

⚙️ Installation

📊 Workflow

🖥️ Usage

📈 Evaluation Metrics

🗂️ Saved Model

📦 Run with Docker

🖥️ API development-FastAPI

Deployment on Render

✅ Requirements

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages