<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vDTraJ9FlNOWLHE0hmJ3Zmf_FQ2WDLZi?usp=sharing)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

## ⚡ PyTesseract: Powerful OCR Tool for Text Extraction

PyTesseract is an open-source Python wrapper for **Tesseract OCR** that allows easy text extraction from images, enabling seamless document parsing and automation. 🚀

### 🔑 **Key Features**:
- 🧩 Supports a wide variety of image formats, including PNG, JPG, TIFF, and more.
- 🌍 Multi-language support for text extraction, with over 100 languages available.
- ✏️ Performs OCR on scanned documents, handwritten notes, and recipe images.
- 📈 Simple integration with other AI tools and frameworks, like **GenAI**, for automated content generation.

### 🔗 **How to Use with GenAI**:
- Combine **PyTesseract's** OCR capabilities with **GenAI** to extract text from images and generate summaries, translations, or insights using language models in one seamless workflow.


###**Setup and Installation**

In [None]:
!pip install pytesseract Pillow requests
!sudo apt-get install tesseract-ocr
!pip install pdf2image
!sudo apt-get install -y poppler-utils

### **Importing Required Libraries for PyTesseract OCR**


In [None]:
import requests
from PIL import Image
import pytesseract
from pdf2image import convert_from_path


### **Function to Download an Image 🖼️**


In [None]:
def download_image(url, save_as):
    response = requests.get(url)
    if response.status_code == 200:
        with open(save_as, 'wb') as file:
            file.write(response.content)
        print(f"Image downloaded: {save_as}")
    else:
        print(f"Failed to download image from {url}")


### **Function to Extract Text from an Image using PyTesseract 📄**


In [None]:
def extract_text_from_image(image_path, lang='eng'):
    image = Image.open(image_path)
    text = pytesseract.image_to_string(image, lang=lang)
    return text

### **Basic Recipe Extraction 🍲**








In [None]:
image_url = "https://images.saymedia-content.com/.image/t_share/MTc0NjE4NDM3OTk2MzI0ODA5/how-to-write-original-food-recipes-10-tips-for-making-your-recipes-easy-to-follow.gif"
image_name = "recipe_english.jpg"

download_image(image_url, image_name)


Image downloaded: recipe_english.jpg


### **Extracting Recipe Text from Image 📜**



In [None]:
recipe_text = extract_text_from_image(image_name)
print("Extracted Recipe:\n", recipe_text)

### **Preprocessing Image for Better OCR Accuracy 🛠️**

In [None]:
from PIL import ImageEnhance, ImageFilter

def preprocess_image(image_path):
    image = Image.open(image_path).convert('L')  # Convert to grayscale
    image = image.filter(ImageFilter.SHARPEN)   # Sharpen the image
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(2)  # Increase contrast
    return image

# Preprocess and Extract Text
preprocessed_image = preprocess_image(image_name)
preprocessed_image.save("processed_recipe.jpg")
text_from_processed = pytesseract.image_to_string(preprocessed_image)
print("Extracted Text from Preprocessed Image:\n", text_from_processed)


### **Downloading a PDF from a URL 📥**

In [None]:
import requests

url = 'https://www.sldttc.org/allpdf/21583473018.pdf'
response = requests.get(url)
with open('sample.pdf', 'wb') as f:
    f.write(response.content)


### **Convert PDF to Images 🖼️**

In [None]:
images = convert_from_path('sample.pdf')


### **Extract Text from PDF Images using OCR 📝**

In [None]:
text = ''
for image in images:
    text += pytesseract.image_to_string(image)
print(text)

### **Display Image 🖼️**


In [None]:
from IPython.display import Image, display

display(Image(url=image_url))

### **Summarizing Text using Gemini✍️**


In [None]:
import google.generativeai as genai
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content(f"Summarize the following content:\n\n{text}")

print("Summary:")
print(response.text)

Summary:
This document summarizes a meta-analysis by Kuncel, Hezlett, and Ones (2001), and subsequent research, on the predictive validity of the GRE General Test.  The analysis, using data from over 80,000 students, found the GRE to be a strong predictor of various criteria for graduate school success, including first-year GPA, overall GPA, comprehensive exam scores, publication citations, faculty ratings, degree attainment, and research productivity.  The GRE's predictive validity surpassed that of undergraduate GPA and letters of recommendation.  While GRE Subject Tests showed even stronger predictive power, the General Test, including its Analytical Writing section (AW), provided unique and valuable information not captured by other measures.  The AW section specifically correlates positively with other writing samples, demonstrating its construct validity and adding to the overall predictive power of the GRE.  The document concludes that the GRE General Test is a valuable tool in 

###**Translating Text with Gemini AI 🌍**

In [None]:
response = model.generate_content(f"Summarize the following content:\n\n{text}")
summary = response.text

translation_response = model.generate_content(f"Translate the following text to French:\n\n{summary}")
print("Translated Summary (French):")
print(translation_response.text)

Translated Summary (French):
Ce document résume les recherches sur la validité prédictive du test général du Graduate Record Examinations (GRE). Une méta-analyse de Kuncel, Hezlett et Ones (2001) portant sur plus de 80 000 étudiants répartis sur 1753 échantillons a révélé que le test général du GRE était un puissant prédicteur de divers critères de réussite aux études supérieures, notamment la moyenne pondérée (GPA), le nombre de citations des publications, les évaluations des professeurs et l'obtention du diplôme. Il a surpassé la moyenne pondérée des études de premier cycle et les lettres de recommandation en termes de pouvoir prédictif, bien que les tests spécifiques du GRE se soient avérés des prédicteurs encore plus puissants. De plus, l'analyse a montré que la section d'écriture analytique (AW) fournit des informations uniques et précieuses non captées par les sections verbale et quantitative, corrélant bien avec d'autres échantillons d'écriture. Globalement, le document conclut 