This is a Telegram Bot that extracts text from images using pytesseract (Tesseract OCR) and the Python Telegram API. Users can send an image to the bot, and it will respond with the extracted text from the image.
- Image-to-text conversion: Use
pytesseractto convert images into text. - Error handling: Catches and logs errors during image processing.
- Instant response: Quickly processes images and returns extracted text via Telegram.
Follow these steps to set up and run the bot locally.
git clone https://github.com/XredaX/Xtract-Text
cd Xtract-TextMake sure you have Python and the necessary libraries installed. Run:
pip install -r requirements.txtEnsure that Tesseract is installed on your machine. You can install it via:
Linux:
sudo apt update
sudo apt install tesseract-ocrMake sure to set the TESSDATA_PREFIX to point to your Tesseract data files (typically for language support).
You need to set two environment variables for the bot to work:
BOT_TOKEN: Your Telegram bot token.TESSDATA_PREFIX: Path to the Tesseract data directory.
You can set these in your terminal or use a .env file.
For terminal:
export BOT_TOKEN="your_telegram_bot_token"
export TESSDATA_PREFIX="/usr/share/tesseract-ocr/5/tessdata"For .env file (create this in the root directory):
BOT_TOKEN=your_telegram_bot_token
TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata
After setting the environment variables, you can run the bot:
python bot.pyThe bot will now be up and running. Send an image to the bot in Telegram, and it will reply with the extracted text.
/start: Starts the bot and welcomes the user.- Send an image: The bot will reply with the extracted text from the image.
Feel free to open issues or submit pull requests for improvements!