<a href="https://colab.research.google.com/github/JadeEmm/ai-image-captioning-app/blob/main/AI_Image_Captioning_App_(Public).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# ============================================================================
# CELL 1: Welcome & Instructions (Markdown)
# ============================================================================

"""
# AI Image Captioning App - Try It Now!

Welcome! This notebook lets you run a professional AI image captioning app directly in your browser.

## How to use this notebook:

1. **Click "Copy to Drive"** (top-left) to get your own copy
2. **Run all cells** below in order (Ctrl+Enter or click the play buttons)
3. **Wait for the app to load** (takes 1-2 minutes first time)
4. **Click the public link** that appears to access your app
5. **Upload images and get AI captions!**

##What you'll get:
- A working AI app with public link to share
- State-of-the-art image captioning (Salesforce BLIP model)
- No setup required - everything runs in the cloud
- Free to use and share with friends!

## Real world use cases solutions like this are great for:
- Content creators needing captions
- Learning about AI and computer vision
- Accessibility (generating alt text)

---

**Ready? Let's build your AI app!**

"""

# ============================================================================
# CELL 2: Setup & Installation
# ============================================================================

print("Setting up your AI Image Captioning App...")
print("Installing required packages...")

# Install packages (with progress bar)
!pip install transformers torch torchvision gradio Pillow --quiet --progress-bar off

print("All packages installed successfully!")
print("Loading libraries...")

# Import everything we need
from transformers import pipeline
import gradio as gr
import torch
from PIL import Image
import warnings
warnings.filterwarnings("ignore")

print("Libraries loaded!")

# Check what device we're using
device = 0 if torch.cuda.is_available() else -1
device_name = "GPU " if device == 0 else "CPU 💻"
print(f"Your app will use: {device_name}")

print("\n" + "="*50)
print("Setup complete! Ready to load AI model...")
print("="*50)

# ============================================================================
# CELL 3: Load AI Model
# ============================================================================

print("Loading AI model...")
print("Downloading Salesforce BLIP model (this takes a moment)...")

# Load the AI model using Hugging Face pipeline
try:
    caption_pipeline = pipeline(
        "image-to-text",
        model="Salesforce/blip-image-captioning-base",
        device=device
    )
    print("AI model loaded successfully!")
    print("Ready to understand and describe images!")

except Exception as e:
    print(f"Error loading model: {e}")
    print("Trying CPU fallback...")

    # Fallback to CPU
    caption_pipeline = pipeline(
        "image-to-text",
        model="Salesforce/blip-image-captioning-base",
        device=-1
    )
    print("Model loaded on CPU!")

print("\nAI is ready to caption your images!")

# ============================================================================
# CELL 4: Caption Generation Function
# ============================================================================

def generate_caption(image):
    """
    Generate caption for uploaded image
    This is the main function that powers your app!
    """
    if image is None:
        return "❌ Please upload an image first!"

    try:
        print("AI is analysing your image...")

        # Make sure image is in the right format
        if image.mode != 'RGB':
            image = image.convert('RGB')

        # Generate caption using AI
        result = caption_pipeline(image)

        # Extract the caption
        if result and len(result) > 0:
            caption = result[0]['generated_text']
            print(f"Caption generated: '{caption}'")
            return f"{caption}"
        else:
            return "Could not generate caption for this image."

    except Exception as e:
        return f"❌ Error: {str(e)}"

# Test the function
print("Testing the AI with a simple image...")

# Create a test image (blue square)
import numpy as np
test_img = Image.fromarray(np.full((100, 100, 3), [100, 150, 200], dtype=np.uint8))
test_result = generate_caption(test_img)
print(f"Test result: {test_result}")

if "Error" not in test_result:
    print("✅ AI is working perfectly!")
else:
    print("⚠️ There might be an issue - but let's continue...")

# ============================================================================
# CELL 5: Create the App Interface
# ============================================================================

print("Creating your app interface...")

# Create the Gradio interface
app = gr.Interface(
    fn=generate_caption,
    inputs=[
        gr.Image(
            label="Upload Any Image",
            type="pil",
            sources=["upload", "webcam"],
            height=350
        )
    ],
    outputs=[
        gr.Textbox(
            label="AI Generated Caption",
            placeholder="Upload an image and watch AI describe it here!",
            lines=3,
            max_lines=5
        )
    ],

    title="AI Image Captioning App",

    description="""
    ## Upload any image and get an instant AI description!

    **What this does:**
    - Upload a photo from your computer or take one with your webcam
    - Advanced AI analyses the image content
    - Get a natural language description in seconds

    **Real world use cases solutions like this are great for:**
    - Content creators needing captions
    - Accessibility (alt text generation)
    - Learning how AI "sees" images

    **Powered by:** Salesforce BLIP - a state-of-the-art vision AI model
    """,

    article="""
    ### How it works:

    This app uses **BLIP** (Bootstrapping Language-Image Pre-training), one of the most advanced
    AI models for understanding images. It was trained on millions of image-text pairs to learn
    how to describe visual content in natural language.

    ### Educational Value:

    This demonstrates the incredible progress in **Computer Vision** and **Natural Language Processing**.
    The AI doesn't just recognize objects - it understands relationships, contexts, and can describe
    complex scenes in human-like language.

    ### Technical Details:
    - **Model:** Salesforce BLIP (14M parameters)
    - **Framework:** Hugging Face Transformers
    - **Interface:** Gradio
    - **Runtime:** Google Colab (free)

    ---

    **Enjoy exploring AI image captioning!**

    *Built using Python, Transformers, and Gradio*
    """,

    theme=gr.themes.Soft(),
    allow_flagging="never"
)

print("✅ Interface created successfully!")
print("Your app is ready to launch!")

# ============================================================================
# CELL 6: Launch the App
# ============================================================================

print("LAUNCHING YOUR AI IMAGE CAPTIONING APP!")
print("Creating public link... (this takes a moment)")
print("Your app will be accessible to anyone with the link!")

# Close any existing apps
gr.close_all()

# Launch with public sharing
app.launch(
    share=True,      # Creates public link anyone can access
    debug=False,     # Clean interface
    show_error=True, # Show helpful error messages
    quiet=False      # Show the public link
)

print("SUCCESS! Your AI app is now running!")
print("\nIMPORTANT: Use the public link above to access your app")
print("🔗 Share this link with friends - they can use your app too")
print("Try it with photos, screenshots, drawings - anything!")

print("\n" + "="*60)
print("CONGRATULATIONS!")
print("You've successfully built and deployed an AI application!")
print("="*60)

# ============================================================================
# CELL 7: Bonus Features & Tips (Markdown)
# ============================================================================

"""
## Tips for Best Results:

### Image Quality:
- **Clear, well-lit photos** work best
- **Single main subject** gets more accurate descriptions
- **Common objects/scenes** are described most accurately
- **High resolution** isn't necessary - AI works with any size

### Experiments to Try:
- Upload artwork and see how AI interprets it
- Try historical photos vs modern ones
- Test with drawings or sketches
- Compare AI descriptions with your own

### Sharing Your App:
- The public link works for 72 hours
- Anyone can use it without signing up
- No usage limits - caption as many images as you want!

### Customisation Ideas:
- Try different AI models (change the model name in Cell 3)
- Add multiple caption generation
- Include confidence scores
- Build batch processing for multiple images

---

## What You've Learned:

✅ How to use Hugging Face Transformers
✅ Computer vision and AI model deployment
✅ Creating web interfaces with Gradio
✅ Running AI models in the cloud
✅ Building shareable AI applications


### Want the source code? Check out the GitHub repository for the full project!: