# 📄 Document Insights Generator

## Welcome!

This notebook helps you:
- Upload documents (PDF, Word, Text)
- Ask questions about them
- Get AI-powered insights
- Save everything to Google Drive

**No installation needed - runs entirely in your browser!**

---

## 🔑 What You Need:
1. A Google account (you already have one if you're using Colab)
2. An OpenAI API key (get it free at: https://platform.openai.com/api-keys)

---

## Step 1: Install Required Libraries

**Click the play button (▶️) on the left of this cell to install everything needed.**

This takes about 30 seconds.

In [None]:
# Install required packages
!pip install -q openai PyPDF2 python-docx google-colab
print('✅ All libraries installed successfully!')

## Step 2: Connect to Google Drive

**Run this cell and follow the popup to allow access to your Google Drive.**

This lets the notebook save your files and insights automatically.

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Create a folder for document insights if it doesn't exist
insights_folder = '/content/drive/MyDrive/DocumentInsights'
os.makedirs(insights_folder, exist_ok=True)

print('✅ Connected to Google Drive!')
print(f'📁 Your files will be saved to: {insights_folder}')

## Step 3: Enter Your OpenAI API Key

**Get your API key:**
1. Go to: https://platform.openai.com/api-keys
2. Click 'Create new secret key'
3. Copy the key
4. Paste it below (it will be hidden)

**Note:** Free tier gives you $5 credit - enough for hundreds of document analyses!

In [None]:
import getpass
import openai

# Enter your OpenAI API key securely
openai_api_key = getpass.getpass('Enter your OpenAI API Key: ')
openai.api_key = openai_api_key

print('✅ API Key set successfully!')

## Step 4: Define Helper Functions

**Just run this cell - no need to understand the code.**

These functions handle document processing and insight generation.

In [None]:
import io
from PyPDF2 import PdfReader
from docx import Document as DocxDocument
from datetime import datetime
import shutil

def extract_text_from_file(file_path):
    """Extract text from PDF, DOCX, or TXT files"""
    try:
        if file_path.lower().endswith('.pdf'):
            reader = PdfReader(file_path)
            text = ''
            for page in reader.pages:
                text += page.extract_text() + '\n'
            return text.strip()
        
        elif file_path.lower().endswith('.docx'):
            doc = DocxDocument(file_path)
            text = '\n'.join([para.text for para in doc.paragraphs])
            return text.strip()
        
        elif file_path.lower().endswith('.txt'):
            with open(file_path, 'r', encoding='utf-8') as f:
                return f.read().strip()
        
        else:
            return "Unsupported file format. Please use PDF, DOCX, or TXT."
    
    except Exception as e:
        return f"Error extracting text: {str(e)}"

def generate_insights(document_text, question, max_length=4000):
    """Generate insights using OpenAI"""
    try:
        # Truncate document if too long
        if len(document_text) > max_length:
            document_text = document_text[:max_length] + "..."
        
        prompt = f"""Read the following document and answer this question: {question}

Document:
{document_text}

Please provide a clear, detailed answer based only on the information in the document."""
        
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that analyzes documents and provides clear insights."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=800,
            temperature=0.3
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        return f"Error generating insights: {str(e)}"

def save_to_drive(original_file_path, insights_text, question):
    """Save document and insights to Google Drive"""
    try:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        base_name = os.path.basename(original_file_path)
        name_without_ext = os.path.splitext(base_name)[0]
        
        # Copy original document to Drive
        drive_doc_path = os.path.join(insights_folder, f"{timestamp}_{base_name}")
        shutil.copy(original_file_path, drive_doc_path)
        
        # Save insights to Drive
        insights_filename = f"{timestamp}_{name_without_ext}_insights.txt"
        insights_path = os.path.join(insights_folder, insights_filename)
        
        with open(insights_path, 'w', encoding='utf-8') as f:
            f.write(f"Document: {base_name}\n")
            f.write(f"Question: {question}\n")
            f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
            f.write("=" * 80 + "\n\n")
            f.write("INSIGHTS:\n\n")
            f.write(insights_text)
        
        return drive_doc_path, insights_path
    
    except Exception as e:
        return None, f"Error saving to Drive: {str(e)}"

print('✅ Helper functions loaded!')

## Step 5: Upload Your Document

**Run this cell and click 'Choose Files' to upload your document.**

Supported formats: PDF, DOCX, TXT

In [None]:
from google.colab import files

print('📤 Click "Choose Files" below to upload your document...')
uploaded = files.upload()

if uploaded:
    uploaded_filename = list(uploaded.keys())[0]
    print(f'\n✅ File uploaded: {uploaded_filename}')
    print(f'📊 File size: {len(uploaded[uploaded_filename]) / 1024:.2f} KB')
else:
    print('❌ No file uploaded. Please run this cell again.')

## Step 6: Ask Your Question

**Type your question in the prompt below.**

Examples:
- "What are the main points in this document?"
- "Summarize the key findings"
- "What are the risks mentioned?"
- "List the recommendations"

In [None]:
question = input('❓ Enter your question about the document: ')
print(f'\n✅ Question recorded: {question}')

## Step 7: Generate Insights and Save to Google Drive

**Run this cell to:**
1. Extract text from your document
2. Generate AI-powered insights
3. Display the results
4. Save everything to Google Drive

This may take 10-30 seconds depending on document size.

In [None]:
print('🔄 Processing your document...\n')

# Extract text
print('📖 Step 1: Extracting text from document...')
document_text = extract_text_from_file(uploaded_filename)

if "Error" in document_text or "Unsupported" in document_text:
    print(f'❌ {document_text}')
else:
    print(f'✅ Text extracted successfully ({len(document_text)} characters)\n')
    
    # Show preview of extracted text
    print('📄 Text Preview (first 500 characters):')
    print('-' * 80)
    print(document_text[:500] + '...' if len(document_text) > 500 else document_text)
    print('-' * 80 + '\n')
    
    # Generate insights
    print('🤖 Step 2: Generating AI insights...')
    insights = generate_insights(document_text, question)
    
    if "Error" in insights:
        print(f'❌ {insights}')
    else:
        print('✅ Insights generated!\n')
        
        # Display insights
        print('=' * 80)
        print('🎯 INSIGHTS:')
        print('=' * 80)
        print(insights)
        print('=' * 80 + '\n')
        
        # Save to Google Drive
        print('💾 Step 3: Saving to Google Drive...')
        doc_path, insights_path = save_to_drive(uploaded_filename, insights, question)
        
        if doc_path and insights_path:
            print('✅ Files saved successfully!')
            print(f'📁 Document: {doc_path}')
            print(f'📁 Insights: {insights_path}')
            print(f'\n🎉 All done! Check your Google Drive folder: DocumentInsights')
        else:
            print(f'⚠️ Could not save to Drive: {insights_path}')

## 🔄 Process Another Document

**Want to analyze another document?**

Just go back to **Step 5** and run the cells from there again!

---

## 📚 Additional Features

### View All Your Saved Files

In [None]:
import os

print('📂 Your saved documents and insights:\n')
files_list = os.listdir(insights_folder)

if files_list:
    for i, file in enumerate(sorted(files_list), 1):
        file_size = os.path.getsize(os.path.join(insights_folder, file)) / 1024
        print(f'{i}. {file} ({file_size:.2f} KB)')
else:
    print('No files saved yet.')

---

## 💡 Tips & Troubleshooting

### Tips:
- **Ask specific questions** for better insights
- **Larger documents** may take longer to process
- **Check your Google Drive** folder 'DocumentInsights' for all saved files

### Common Issues:

**"API Key Error"**
- Make sure you entered your OpenAI API key correctly in Step 3
- Check you have credit remaining at: https://platform.openai.com/usage

**"File Upload Failed"**
- Make sure the file is PDF, DOCX, or TXT
- Try a smaller file (under 10 MB)

**"Google Drive Connection Error"**
- Re-run Step 2 and authorize access again

---

## 🔗 Share This Notebook

**To share with others:**
1. Click 'Share' in the top-right corner
2. Choose 'Anyone with the link can view'
3. Copy and share the link

**To save to GitHub:**
1. File → Download → Download .ipynb
2. Upload the .ipynb file to your GitHub repository

---

## 📧 Support

If you need help:
- Check the Tips & Troubleshooting section above
- Re-read the instructions in each step
- Ask a technical colleague for assistance

**Enjoy generating insights from your documents! 🎉**