**Using PDF Parsing Libraries**

Several Python libraries such as PyPDF2, pdfplumber, and pdfminer allow extracting text from PDFs. PyPDF2 provides a simple way to extract all text from a PDF.

In [None]:
import PyPDF2

pdfFile = open('document.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFile)

text = ""
for page in range(pdfReader.numPages):
    pageObj = pdfReader.getPage(page)
    text += pageObj.extractText()

**Using pdfplumber Library**
The pdfplumber library can extract text more cleanly by identifying text blocks:

In [None]:
import pdfplumber

with pdfplumber.open('document.pdf') as pdf:
    pages = pdf.pages
    for page in pages:
        text = page.extract_text()

**Using Google Cloud Vision API**
Google Cloud Vision provides advanced OCR capability to extract text from scanned PDFs. First, we need to convert each page of the PDF to an image. Then the Vision API can detect text in each image:

In [None]:
from google.cloud import vision
import io
from PIL import Image

client = vision.ImageAnnotatorClient()

with open('scanned.pdf', 'rb') as pdf:
    pages = convert_from_bytes(pdf.read())

full_text = ""
for page in pages:
   image = vision.Image(content=page.tobytes)
   response = client.document_text_detection(image=image)
   full_text += response.text

**Extracting Text from Images**
We can also extract text embedded in image files like JPEGs and PNGs using similar OCR techniques:

**Using Google Cloud Vision API**
The Cloud Vision API provides a simple text_detection method to extract text from images:

In [None]:
response = client.text_detection(image=image)
text = response.text

**Using OpenCV and Tesseract OCR**
OpenCV can be used to detect text regions in an image. Then Tesseract OCR can extract text from those regions:

In [None]:
import pytesseract
import cv2

img = cv2.imread('image.jpg')

# Detect text regions
rects = detector(img)

# Extract text from regions
text = ""
for rect in rects:
   x, y, w, h = rect
   text += pytesseract.image_to_string(img[y:y+h, x:x+w])