# Image to Text with Local Files - LangChain OpenAI

This notebook demonstrates how to analyze **local image files** stored on your computer using LangChain and OpenAI's vision models.

## What You'll Learn:
1. Load and encode local images
2. Use ChatOpenAI with vision capabilities
3. Different methods to handle local images
4. Batch processing multiple images
5. Practical use cases

Let's get started! üöÄ

## 1. Setup and Installation

In [None]:
# Install required packages
# Run this in your terminal or uncomment below:
# !pip install langchain-openai langchain-core python-dotenv Pillow

In [None]:
# Import required libraries
import os
import base64
from pathlib import Path
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from PIL import Image
import io

# Load environment variables
load_dotenv()

# Verify API key
if not os.getenv("OPENAI_API_KEY"):
    print("‚ö†Ô∏è Warning: OPENAI_API_KEY not found in environment variables")
    print("Please create a .env file with: OPENAI_API_KEY=your_key_here")
else:
    print("‚úÖ OpenAI API Key loaded successfully!")

## 2. Initialize Vision Model

We'll use GPT-4 with vision capabilities. Available models:
- `gpt-4o` - Latest and most capable
- `gpt-4o-mini` - Faster and more cost-effective
- `gpt-4-turbo` - High performance

In [None]:
# Initialize the vision model
vision_model = ChatOpenAI(
    model="gpt-4o",  # Use gpt-4o for best vision capabilities
    temperature=0,   # Set to 0 for consistent results
    max_tokens=1000  # Adjust based on your needs
)

print(f"‚úÖ Vision model initialized: {vision_model.model_name}")

## 3. Method 1: Base64 Encoding (Recommended)

This is the most reliable method for local images. We'll encode the image as base64 and include it in the message.

In [None]:
def encode_image_to_base64(image_path):
    """
    Encode a local image file to base64 string.
    
    Args:
        image_path: Path to the local image file
        
    Returns:
        Base64 encoded string of the image
    """
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Test the function
print("‚úÖ Base64 encoding function ready!")
print("Usage: base64_image = encode_image_to_base64('path/to/your/image.jpg')")

### 3.1 Analyze a Local Image

In [None]:
def analyze_local_image(image_path, question="What's in this image?"):
    """
    Analyze a local image file using GPT-4 Vision.
    
    Args:
        image_path: Path to the local image file
        question: Question to ask about the image
        
    Returns:
        AI's response about the image
    """
    # Check if file exists
    if not os.path.exists(image_path):
        return f"Error: File not found at {image_path}"
    
    # Get file extension to determine image type
    file_extension = Path(image_path).suffix.lower()
    
    # Map extensions to MIME types
    mime_types = {
        '.jpg': 'image/jpeg',
        '.jpeg': 'image/jpeg',
        '.png': 'image/png',
        '.gif': 'image/gif',
        '.webp': 'image/webp'
    }
    
    mime_type = mime_types.get(file_extension, 'image/jpeg')
    
    # Encode the image
    base64_image = encode_image_to_base64(image_path)
    
    # Create the message with image
    message = HumanMessage(
        content=[
            {"type": "text", "text": question},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:{mime_type};base64,{base64_image}"
                }
            }
        ]
    )
    
    # Get response from the model
    response = vision_model.invoke([message])
    
    return response.content

print("‚úÖ Image analysis function ready!")
print("Usage: result = analyze_local_image('path/to/image.jpg', 'What do you see?')")

### 3.2 Example: Analyze an Image

**Note:** Replace `'your_image.jpg'` with the actual path to your image file.

In [None]:
# Example usage - REPLACE WITH YOUR IMAGE PATH
# image_path = "path/to/your/image.jpg"
# result = analyze_local_image(image_path, "Describe this image in detail.")
# print("AI Analysis:")
# print(result)

print("üí° Tip: Uncomment the code above and replace with your image path")

## 4. Method 2: Using PIL/Pillow for Image Processing

This method allows you to resize or process images before sending them to the API.

In [None]:
def analyze_image_with_pil(image_path, question="What's in this image?", max_size=(1024, 1024)):
    """
    Analyze a local image with optional resizing using PIL.
    
    Args:
        image_path: Path to the local image file
        question: Question to ask about the image
        max_size: Maximum dimensions (width, height) for resizing
        
    Returns:
        AI's response about the image
    """
    # Open and process the image
    img = Image.open(image_path)
    
    # Resize if needed (maintains aspect ratio)
    img.thumbnail(max_size, Image.Resampling.LANCZOS)
    
    # Convert to RGB if necessary (handles RGBA, grayscale, etc.)
    if img.mode != 'RGB':
        img = img.convert('RGB')
    
    # Save to bytes buffer
    buffer = io.BytesIO()
    img.save(buffer, format='JPEG')
    buffer.seek(0)
    
    # Encode to base64
    base64_image = base64.b64encode(buffer.read()).decode('utf-8')
    
    # Create message
    message = HumanMessage(
        content=[
            {"type": "text", "text": question},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}"
                }
            }
        ]
    )
    
    # Get response
    response = vision_model.invoke([message])
    
    return response.content

print("‚úÖ PIL-based image analysis function ready!")
print("This method automatically resizes large images and handles different formats.")

## 5. Batch Processing Multiple Images

In [None]:
def analyze_multiple_images(image_paths, question="What's in this image?"):
    """
    Analyze multiple images and return results.
    
    Args:
        image_paths: List of paths to image files
        question: Question to ask about each image
        
    Returns:
        Dictionary with image paths as keys and analysis as values
    """
    results = {}
    
    for i, image_path in enumerate(image_paths, 1):
        print(f"\nAnalyzing image {i}/{len(image_paths)}: {Path(image_path).name}")
        
        try:
            result = analyze_local_image(image_path, question)
            results[image_path] = result
            print(f"‚úÖ Success")
        except Exception as e:
            results[image_path] = f"Error: {str(e)}"
            print(f"‚ùå Error: {str(e)}")
    
    return results

print("‚úÖ Batch processing function ready!")
print("Usage: results = analyze_multiple_images(['img1.jpg', 'img2.png'])")

### 5.1 Example: Batch Analysis

In [None]:
# Example: Analyze multiple images
# image_list = [
#     "path/to/image1.jpg",
#     "path/to/image2.png",
#     "path/to/image3.jpg"
# ]
# 
# results = analyze_multiple_images(image_list, "Describe this image briefly.")
# 
# # Display results
# for image_path, analysis in results.items():
#     print(f"\n{'='*60}")
#     print(f"Image: {Path(image_path).name}")
#     print(f"{'='*60}")
#     print(analysis)

print("üí° Tip: Uncomment and add your image paths to test batch processing")

## 6. Practical Use Cases

Here are some common use cases with ready-to-use functions.

### 6.1 Extract Text from Images (OCR)

In [None]:
def extract_text_from_image(image_path):
    """
    Extract all text from an image (OCR functionality).
    """
    question = "Extract all text from this image. Provide only the text content, maintaining the original formatting."
    return analyze_local_image(image_path, question)

# Example usage:
# text = extract_text_from_image("screenshot.png")
# print("Extracted Text:")
# print(text)

print("‚úÖ OCR function ready!")

### 6.2 Identify Objects in Images

In [None]:
def identify_objects(image_path):
    """
    Identify and list all objects in an image.
    """
    question = "List all objects you can identify in this image. Provide a detailed inventory."
    return analyze_local_image(image_path, question)

# Example usage:
# objects = identify_objects("room_photo.jpg")
# print("Objects Identified:")
# print(objects)

print("‚úÖ Object identification function ready!")

### 6.3 Describe Image for Accessibility

In [None]:
def create_alt_text(image_path):
    """
    Generate accessibility-friendly alt text for an image.
    """
    question = "Create a concise, descriptive alt text for this image suitable for screen readers. Focus on the main subject and important details."
    return analyze_local_image(image_path, question)

# Example usage:
# alt_text = create_alt_text("product_image.jpg")
# print("Alt Text:")
# print(alt_text)

print("‚úÖ Alt text generation function ready!")

### 6.4 Analyze Document/Receipt

In [None]:
def analyze_receipt(image_path):
    """
    Extract structured information from a receipt or invoice.
    """
    question = """Analyze this receipt/invoice and extract:
    1. Vendor/Store name
    2. Date and time
    3. Items purchased with prices
    4. Subtotal, tax, and total amount
    5. Payment method (if visible)
    
    Format the response in a clear, structured way."""
    return analyze_local_image(image_path, question)

# Example usage:
# receipt_data = analyze_receipt("receipt.jpg")
# print("Receipt Analysis:")
# print(receipt_data)

print("‚úÖ Receipt analysis function ready!")

### 6.5 Compare Two Images

In [None]:
def compare_images(image_path1, image_path2):
    """
    Compare two images and describe the differences.
    """
    # Encode both images
    base64_image1 = encode_image_to_base64(image_path1)
    base64_image2 = encode_image_to_base64(image_path2)
    
    # Create message with both images
    message = HumanMessage(
        content=[
            {"type": "text", "text": "Compare these two images and describe the differences and similarities."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image1}"}
            },
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image2}"}
            }
        ]
    )
    
    response = vision_model.invoke([message])
    return response.content

# Example usage:
# comparison = compare_images("before.jpg", "after.jpg")
# print("Image Comparison:")
# print(comparison)

print("‚úÖ Image comparison function ready!")

## 7. Process Images from a Directory

In [None]:
def analyze_directory(directory_path, question="What's in this image?", extensions=None):
    """
    Analyze all images in a directory.
    
    Args:
        directory_path: Path to directory containing images
        question: Question to ask about each image
        extensions: List of file extensions to process (default: common image formats)
        
    Returns:
        Dictionary with results
    """
    if extensions is None:
        extensions = ['.jpg', '.jpeg', '.png', '.gif', '.webp']
    
    # Get all image files in directory
    image_files = []
    for ext in extensions:
        image_files.extend(Path(directory_path).glob(f"*{ext}"))
        image_files.extend(Path(directory_path).glob(f"*{ext.upper()}"))
    
    image_paths = [str(f) for f in image_files]
    
    print(f"Found {len(image_paths)} images in {directory_path}")
    
    if not image_paths:
        return {"error": "No images found in directory"}
    
    # Analyze all images
    return analyze_multiple_images(image_paths, question)

# Example usage:
# results = analyze_directory("./my_images", "Describe this image briefly.")
# for img_path, analysis in results.items():
#     print(f"\n{Path(img_path).name}: {analysis[:100]}...")

print("‚úÖ Directory analysis function ready!")

## 8. Advanced: Custom Analysis with Structured Output

In [None]:
def analyze_with_json_output(image_path, schema_description):
    """
    Analyze image and request JSON-formatted output.
    
    Args:
        image_path: Path to image
        schema_description: Description of desired JSON structure
    """
    question = f"""{schema_description}
    
    Provide your response in valid JSON format."""
    
    return analyze_local_image(image_path, question)

# Example usage:
# schema = '''Analyze this product image and provide:
# {
#   "product_name": "name of the product",
#   "color": "primary color",
#   "category": "product category",
#   "description": "brief description"
# }'''
# 
# result = analyze_with_json_output("product.jpg", schema)
# print(result)

print("‚úÖ Structured output function ready!")

## 9. Tips and Best Practices

### Image Quality
- ‚úÖ Use clear, well-lit images
- ‚úÖ Ensure text is readable (for OCR tasks)
- ‚úÖ Resize very large images to reduce costs

### Supported Formats
- ‚úÖ JPEG/JPG
- ‚úÖ PNG
- ‚úÖ GIF
- ‚úÖ WebP

### Cost Optimization
- Use `gpt-4o-mini` for simpler tasks (cheaper)
- Resize images before encoding
- Batch similar questions together

### Error Handling
- Always check if files exist before processing
- Handle different image formats appropriately
- Use try-except blocks for production code

## 10. Complete Example Workflow

In [None]:
# Complete workflow example
def complete_image_analysis_workflow(image_path):
    """
    Perform comprehensive analysis on a single image.
    """
    print(f"Analyzing: {Path(image_path).name}")
    print("="*60)
    
    # 1. General description
    print("\n1. General Description:")
    description = analyze_local_image(image_path, "Provide a detailed description of this image.")
    print(description)
    
    # 2. Object identification
    print("\n2. Objects Identified:")
    objects = identify_objects(image_path)
    print(objects)
    
    # 3. Text extraction (if any)
    print("\n3. Text Content:")
    text = extract_text_from_image(image_path)
    print(text)
    
    # 4. Alt text
    print("\n4. Accessibility Alt Text:")
    alt = create_alt_text(image_path)
    print(alt)
    
    print("\n" + "="*60)
    print("‚úÖ Analysis complete!")

# Example usage:
# complete_image_analysis_workflow("my_image.jpg")

print("‚úÖ Complete workflow function ready!")

## 11. Quick Reference

### Basic Usage
```python
# Analyze single image
result = analyze_local_image("image.jpg", "What do you see?")

# Extract text (OCR)
text = extract_text_from_image("document.png")

# Identify objects
objects = identify_objects("photo.jpg")

# Compare two images
comparison = compare_images("img1.jpg", "img2.jpg")

# Batch process
results = analyze_multiple_images(["img1.jpg", "img2.jpg"])

# Process directory
results = analyze_directory("./images")
```

### Common Questions to Ask
- "What's in this image?"
- "Describe this image in detail."
- "Extract all text from this image."
- "What objects can you identify?"
- "Is there any text? If so, what does it say?"
- "What is the main subject of this image?"
- "Describe the colors and composition."
- "What activity is happening in this image?"

## Conclusion

You now have a complete toolkit for analyzing local images with LangChain and OpenAI! üéâ

### What You Learned:
‚úÖ Encode local images to base64  
‚úÖ Use ChatOpenAI for vision tasks  
‚úÖ Process images with PIL/Pillow  
‚úÖ Batch process multiple images  
‚úÖ Practical use cases (OCR, object detection, etc.)  
‚úÖ Compare images  
‚úÖ Process entire directories  

### Next Steps:
- Try different vision models (gpt-4o vs gpt-4o-mini)
- Experiment with different prompts
- Build a custom image analysis pipeline
- Integrate with your applications

Happy coding! üöÄ