LinguaDetectio is a powerful language detection tool designed to accurately identify the language of textual data. It utilizes state-of-the-art techniques and is powered by the FastText model, integrated with the FastAPI framework.
LinguaDetectio provides a fast and accurate solution for language detection tasks. Whether you are working with large volumes of text or need to determine the language of individual sentences, LinguaDetectio can handle it with precision. It incorporates advanced algorithms and linguistic features to deliver reliable language identification results.
To install LinguaDetectio and its dependencies, follow these steps:
-
Ensure you have Python installed on your system (version 3.6 or higher).
-
Clone the LinguaDetectio repository from GitHub:
git clone https://github.com/aymenkrifa/LinguaDetectio.git
-
Navigate to the project directory:
cd LinguaDetectio
-
Create and activate a virtual environment via tools like
virtualenv
,pyenv
, orAnaconda
.. -
Install the required dependencies using pip:
pip install -r requirements.txt
To utilize the language detection functionality, you will need to download the FastText model. Follow these steps to acquire the model:
-
Visit the official FastText website for language identification: https://fasttext.cc/docs/en/language-identification.html
-
Download the language identification model (a pre-trained binary file).
-
Once downloaded, create a folder named 'models' in the LinguaDetectio project directory.
-
Move the downloaded model file into the 'models' folder.
mv path/to/downloaded/model.bin models/
-
The 'models' folder should now contain the FastText language identification model.
To use LinguaDetectio in different ways depending on your preference and requirements.
-
Make sure you have Docker installed on your system.
-
Build the Docker image by running the following command in the project directory:
docker build -t linguadetectio .
-
Run the Docker container with the following command, replacing
<port>
with the desired port number:docker run -d -p <port>:80 linguadetectio
The LinguaDetect server will start, and the endpoint will be accessible at
http://127.0.0.1:<port>/detect
-
Make a
POST
request to the endpoint URL with the text you want to identify the language for. The request should be in the following JSON format:{ "text": "Hello, how are you?" }
-
Start the LinguaDetect server with Uvicorn by running the following command in the project directory:
uvicorn main:app
The server will start, and the endpoint will be accessible at
http://127.0.0.1:8000/detect
. By default, Uvicorn runs on port 8000, but you can specify a different port if needed. -
Make a POST request to the endpoint URL with the text you want to identify the language for. The request should be in the following JSON format:
{ "text": "Hello, how are you?" }
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please create a new issue or submit a pull request on the GitHub repository.
I would like to express our gratitude to the FastText team for providing the powerful language identification model and the FastAPI community for their excellent framework.
Please visit FastText's Language Identification page for more information.
For any inquiries or feedback, please contact the project maintainer at aymenkrifa@gmail.com.
Happy language detection with LinguaDetectio!