<a href="https://colab.research.google.com/github/aquafire088/invoice-processor/blob/main/Welcome_To_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install gradio spaces transformers accelerate numpy requests torch torchvision qwen-vl-utils av ipython reportlab fpdf python-docx pillow huggingface_hub

Collecting spaces
  Downloading spaces-0.37.1-py3-none-any.whl.metadata (1.0 kB)
Collecting qwen-vl-utils
  Downloading qwen_vl_utils-0.0.11-py3-none-any.whl.metadata (6.3 kB)
Collecting av
  Downloading av-15.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.6 kB)
Collecting reportlab
  Downloading reportlab-4.4.2-py3-none-any.whl.metadata (1.8 kB)
Collecting fpdf
  Downloading fpdf-1.7.2.tar.gz (39 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-docx
  Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_6

In [1]:
import torch
import gradio as gr
import os
from threading import Thread
from PIL import Image
from transformers import AutoProcessor, TextIteratorStreamer
from huggingface_hub import login
import traceback
from getpass import getpass

# Import sécurisé pour le modèle Qwen2-VL
try:
    from transformers.models.qwen2_vl import Qwen2VLForConditionalGeneration
    print("✅ Qwen2VLForConditionalGeneration importé avec succès")
except ImportError:
    try:
        from transformers import AutoModelForVision2Seq as Qwen2VLForConditionalGeneration
        print("⚠️  Utilisation d'AutoModelForVision2Seq comme fallback")
    except ImportError:
        print("❌ Impossible d'importer le modèle Qwen2-VL")
        raise ImportError("Modèle Qwen2-VL non supporté dans cette version de transformers")

# Configuration du modèle
MODEL_ID = "prithivMLmods/Qwen2-VL-OCR-2B-Instruct"

# Authentification Hugging Face sécurisée
def setup_huggingface_auth():
    """Configure l'authentification Hugging Face de manière sécurisée"""
    try:
        # Priorité aux variables d'environnement
        hf_token = os.getenv('HF_TOKEN') or os.getenv('HUGGINGFACE_TOKEN')

        if not hf_token:
            print("⚠️  Token HF non trouvé dans les variables d'environnement")
            # En mode interactif, demander le token
            if hasattr(__builtins__, '__IPYTHON__'):
                hf_token = getpass("Entrez votre token Hugging Face: ")
            else:
                print("Mode non-interactif: définissez HF_TOKEN dans vos variables d'environnement")
                return False

        if hf_token:
            login(hf_token)
            print("✅ Authentification Hugging Face réussie")
            return True
    except Exception as e:
        print(f"⚠️  Échec de l'authentification HF: {str(e)}")
        return False

# Configuration du device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🖥️  Device utilisé: {device}")

# Initialisation des variables globales
MODEL_LOADED = False
model = None
processor = None

def load_model():
    """Charge le modèle et le processeur"""
    global model, processor, MODEL_LOADED

    print("🔄 Chargement du modèle en cours...")
    try:
        # Authentification
        auth_success = setup_huggingface_auth()
        if not auth_success:
            print("⚠️  Tentative de chargement sans authentification")

        # Chargement du modèle
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            MODEL_ID,
            trust_remote_code=True,
            torch_dtype=torch.float16 if device == "cuda" else torch.float32,
            device_map="auto" if device == "cuda" else None,
            low_cpu_mem_usage=True
        ).eval()

        # Placement manuel sur CPU si nécessaire
        if device == "cpu":
            model = model.to(device)

        # Chargement du processeur
        processor = AutoProcessor.from_pretrained(
            MODEL_ID,
            trust_remote_code=True
        )

        print("✅ Modèle chargé avec succès!")
        MODEL_LOADED = True
        return True

    except Exception as e:
        print(f"❌ Erreur lors du chargement du modèle: {str(e)}")
        traceback.print_exc()
        MODEL_LOADED = False
        model = None
        processor = None
        return False

def diagnose_model():
    """Diagnostic du modèle et de l'environnement"""
    print("\n🔍 DIAGNOSTIC DU SYSTÈME")
    print("=" * 50)

    # Vérification PyTorch
    print(f"PyTorch version: {torch.__version__}")
    print(f"CUDA disponible: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"GPU: {torch.cuda.get_device_name(0)}")
        print(f"Mémoire GPU: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

    # Vérification du modèle
    print(f"Modèle chargé: {MODEL_LOADED}")
    if MODEL_LOADED:
        print(f"Device du modèle: {model.device}")
        print(f"Type du modèle: {type(model)}")
        print(f"Processeur: {type(processor)}")

    # Test basique
    if MODEL_LOADED:
        try:
            # Test du tokenizer
            test_text = "Test"
            tokens = processor.tokenizer(test_text, return_tensors="pt")
            print(f"✅ Tokenizer fonctionne: {tokens['input_ids'].shape}")
        except Exception as e:
            print(f"❌ Erreur tokenizer: {e}")

    print("=" * 50)

# Fonction de test simplifiée
def test_model_simple():
    """Test basique du modèle"""
    if not MODEL_LOADED:
        return "❌ Modèle non chargé"

    try:
        # Création d'une image test
        test_image = Image.new('RGB', (100, 100), color='white')
        test_text = "Décrivez cette image"

        # Test des inputs
        inputs = processor(
            text=test_text,
            images=[test_image],
            return_tensors="pt"
        )

        # Test de génération minimal
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=10,
                do_sample=False
            )

        result = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return f"✅ Test réussi: {result[:100]}..."

    except Exception as e:
        return f"❌ Test échoué: {str(e)}"

# Extensions d'images supportées
SUPPORTED_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp')

def format_plain_text(output_text):
    """Nettoie le texte de sortie"""
    if not output_text:
        return ""
    return output_text.replace("<|im_end|>", "").strip()

def generate_strict_invoice_prompt(selected_fields):
    """Génère un prompt strict pour l'extraction de factures"""
    if not selected_fields:
        return "⚠️ Veuillez sélectionner au moins un champ à extraire."

    field_mapping = {
        "Type de document": "type_document",
        "Numéro de facture": "numero_facture",
        "Référence client": "reference_client",
        "Date de facture": "date_facture",
        "Date d'échéance": "date_echeance",
        "Nom du client": "nom_client",
        "Client final": "client_final",
        "Adresse": "adresse_client",
        "ICE (client/fournisseur)": "ice_client",
        "Référence article": "reference_article",
        "Désignation": "designation",
        "Quantité": "quantite",
        "P.U. Brut": "pu_brut",
        "Remise": "remise",
        "P.U. Net": "pu_net",
        "Montant HT": "montant_ht",
        "TVA": "montant_tva",
        "Montant TTC": "montant_ttc",
        "Montant en lettres": "montant_lettres",
        "Mode de paiement": "mode_paiement",
        "RIB": "rib",
        "Adresse société": "adresse_societe",
        "R.C.": "rc",
        "C.N.S.S": "cnss",
        "Patente": "patente",
        "I.F": "if"
    }

    valid_fields = [f for f in selected_fields if f in field_mapping]
    if not valid_fields:
        return "❌ Aucun champ valide sélectionné."

    json_structure = "{\n" + ",\n".join([f'  "{field_mapping[f]}": ""' for f in valid_fields]) + "\n}"

    prompt = (
        "**Extraction de facture**\n\n"
        "Analysez précisément l'image de facture et extrayez les champs demandés.\n\n"
        "**Champs requis**:\n" + "\n".join(f"- {f}" for f in valid_fields) +
        "\n\n**Format de sortie**:\n```json\n" + json_structure + "\n```\n\n"
        "**Instructions**:\n"
        "- Retournez uniquement le JSON\n"
        "- Utilisez les valeurs exactes de la facture\n"
        "- Champs manquants = \"\"\n"
        "- Pas de traduction\n"
        "- Formatage numérique exact (ne pas modifier virgules/points)"
    )
    return prompt

def generate_prompt_from_fields(general, client, items, totals, payment, company):
    """Combine les champs sélectionnés"""
    all_fields = general + client + items + totals + payment + company
    return generate_strict_invoice_prompt(all_fields)

def qwen_inference(media_input, text_input):
    """Exécute l'inférence avec Qwen2-VL"""
    if not MODEL_LOADED or model is None or processor is None:
        yield "❌ Erreur: Modèle non chargé correctement. Veuillez redémarrer l'application."
        return

    if not text_input or not text_input.strip():
        yield "❌ Erreur: Aucun prompt fourni"
        return

    try:
        # Gestion de l'image
        if media_input is None:
            yield "❌ Erreur: Aucune image fournie"
            return

        # Traitement de l'image
        image = None
        if isinstance(media_input, str):
            if not media_input.lower().endswith(SUPPORTED_EXTENSIONS):
                yield f"❌ Format non supporté. Formats acceptés: {', '.join(SUPPORTED_EXTENSIONS)}"
                return
            try:
                image = Image.open(media_input)
            except Exception as e:
                yield f"❌ Erreur d'ouverture de l'image: {str(e)}"
                return
        else:
            # Cas où media_input est déjà un objet PIL
            image = media_input

        # Conversion RGB si nécessaire
        if image.mode != 'RGB':
            image = image.convert('RGB')

        yield "🔄 Traitement de l'image en cours..."

        # Préparation des inputs avec gestion d'erreurs
        try:
            # Méthode alternative pour Qwen2-VL
            inputs = processor(
                text=text_input,
                images=[image],
                return_tensors="pt",
                padding=True,
                truncation=True
            )
        except Exception as e:
            # Fallback avec messages individuels
            try:
                inputs = processor(
                    text=[text_input],
                    images=[image],
                    return_tensors="pt"
                )
            except Exception as e2:
                yield f"❌ Erreur lors du traitement des inputs: {str(e2)}"
                return

        # Déplacement vers le bon device
        try:
            inputs = {k: v.to(model.device) if hasattr(v, 'to') else v for k, v in inputs.items()}
        except Exception as e:
            yield f"❌ Erreur lors du déplacement vers le device: {str(e)}"
            return

        # Méthode sans streaming pour plus de stabilité
        try:
            yield "🔄 Génération en cours..."

            # Configuration simplifiée
            generation_kwargs = {
                **inputs,
                "max_new_tokens": 512,
                "temperature": 0.1,
                "do_sample": False,  # Mode déterministe
                "pad_token_id": processor.tokenizer.eos_token_id if hasattr(processor.tokenizer, 'eos_token_id') else None
            }

            # Suppression des paramètres None
            generation_kwargs = {k: v for k, v in generation_kwargs.items() if v is not None}

            # Génération directe
            with torch.no_grad():
                outputs = model.generate(**generation_kwargs)

            # Décodage
            generated_text = processor.tokenizer.decode(
                outputs[0][inputs["input_ids"].shape[1]:],
                skip_special_tokens=True
            )

            # Résultat final
            final_result = format_plain_text(generated_text)
            if final_result:
                yield final_result
            else:
                yield "⚠️ Aucun résultat généré"

        except torch.cuda.OutOfMemoryError:
            yield "❌ Erreur: Mémoire GPU insuffisante. Essayez avec une image plus petite."
            return
        except Exception as e:
            yield f"❌ Erreur lors de la génération: {str(e)}"
            print(f"[DEBUG] Erreur génération: {e}")
            traceback.print_exc()
            return

    except Exception as e:
        print(f"[DEBUG] Exception générale dans qwen_inference: {e}")
        traceback.print_exc()
        yield f"❌ Erreur lors de l'extraction: {str(e)}"

# Interface Gradio
def create_interface():
    """Crée l'interface Gradio"""
    with gr.Blocks(
        title="Extracteur de Factures Qwen2",
        css="""
        .gradio-container {
            max-width: 1200px !important;
        }
        .gr-button-primary {
            background: linear-gradient(45deg, #2196F3, #21CBF3) !important;
        }
        """
    ) as demo:
        gr.Markdown("## 📄 Extraction de Données de Factures - Qwen2-VL")

        # Statut du modèle
        status_msg = "✅ Modèle chargé avec succès" if MODEL_LOADED else "❌ Modèle non chargé"
        gr.Markdown(f"**Statut**: {status_msg}")

        with gr.Row():
            with gr.Column(scale=1):
                gr.Markdown("### 1. Sélection des champs")
                with gr.Group():
                    general = gr.CheckboxGroup(
                        ["Type de document", "Numéro de facture", "Référence client",
                         "Date de facture", "Date d'échéance"],
                        label="📋 Informations Générales"
                    )
                    client = gr.CheckboxGroup(
                        ["Nom du client", "Client final", "Adresse", "ICE (client/fournisseur)"],
                        label="👤 Client"
                    )
                    items = gr.CheckboxGroup(
                        ["Référence article", "Désignation", "Quantité", "P.U. Brut",
                         "Remise", "P.U. Net", "Montant HT"],
                        label="🛍️ Ligne d'articles"
                    )
                    totals = gr.CheckboxGroup(
                        ["Montant HT", "TVA", "Montant TTC", "Montant en lettres"],
                        label="💰 Totaux"
                    )
                    payment = gr.CheckboxGroup(
                        ["Mode de paiement", "RIB"],
                        label="💳 Paiement"
                    )
                    company = gr.CheckboxGroup(
                        ["Adresse société", "R.C.", "C.N.S.S", "Patente", "I.F"],
                        label="🏢 Coordonnées société"
                    )

                generate_btn = gr.Button("🔧 Générer le prompt", variant="primary")

            with gr.Column(scale=1):
                gr.Markdown("### 2. Configuration de l'extraction")
                prompt_box = gr.Textbox(
                    label="📝 Prompt généré",
                    lines=8,
                    placeholder="Le prompt apparaîtra ici...",
                    show_copy_button=True
                )
                image_input = gr.Image(
                    label="📤 Téléverser une facture",
                    type="filepath",
                    sources=["upload"]
                )
                extract_btn = gr.Button(
                    "🚀 Lancer l'extraction",
                    variant="primary",
                    size="lg"
                )
                output_textbox = gr.Textbox(
                    label="📊 Résultats d'extraction",
                    lines=10,
                    interactive=False,
                    show_copy_button=True
                )

        # Gestion des événements
        generate_btn.click(
            generate_prompt_from_fields,
            inputs=[general, client, items, totals, payment, company],
            outputs=prompt_box
        )

        extract_btn.click(
            qwen_inference,
            inputs=[image_input, prompt_box],
            outputs=output_textbox
        )

    return demo

# Lancement de l'application
if __name__ == "__main__":
    demo = create_interface()
    demo.launch(

    )

✅ Qwen2VLForConditionalGeneration importé avec succès
🖥️  Device utilisé: cuda
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://af343d3b5d3526bb74.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


In [3]:

# Lancement de l'application
if __name__ == "__main__":
    demo = create_interface()
    demo.launch(


    )

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://85f0fac366d35d221c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


In [None]:


def identify_and_save_blob(blob_path):
    """Identifies if the blob is an image and saves it."""
    try:
        with open(blob_path, 'rb') as file:
            blob_content = file.read()
            try:
                Image.open(io.BytesIO(blob_content)).verify()  # Check if it's a valid image
                extension = ".png"  # Default to PNG for saving
                media_type = "image"
            except (IOError, SyntaxError):
                raise ValueError("Unsupported media type. Please upload a valid image.")

            filename = f"temp_{uuid.uuid4()}_media{extension}"
            with open(filename, "wb") as f:
                f.write(blob_content)

            return filename, media_type

    except FileNotFoundError:
        raise ValueError(f"The file {blob_path} was not found.")
    except Exception as e:
        raise ValueError(f"An error occurred while processing the file: {e}")

@spaces.GPU
def qwen_inference(model_name, media_input, text_input=None):
    """Handles inference for the selected model."""
    model = models[model_name]
    processor = processors[model_name]

    if isinstance(media_input, str):
        media_path = media_input
        if media_path.endswith(tuple([i for i in image_extensions.keys()])):
            media_type = "image"
        else:
            try:
                media_path, media_type = identify_and_save_blob(media_input)
            except Exception as e:
                raise ValueError("Unsupported media type. Please upload a valid image.")

    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": media_type,
                    media_type: media_path
                },
                {"type": "text", "text": text_input},
            ],
        }
    ]

    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(messages)
    inputs = processor(
        text=[text],
        images=image_inputs,
        padding=True,
        return_tensors="pt",
    ).to("cuda")

    streamer = TextIteratorStreamer(
        processor.tokenizer, skip_prompt=True, skip_special_tokens=True
    )
    generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=1024)

    thread = Thread(target=model.generate, kwargs=generation_kwargs)
    thread.start()

    buffer = ""
    for new_text in streamer:
        buffer += new_text
        # Remove <|im_end|> or similar tokens from the output
        buffer = buffer.replace("<|im_end|>", "")
        yield buffer

def format_plain_text(output_text):
    """Formats the output text as plain text without LaTeX delimiters."""
    # Remove LaTeX delimiters and convert to plain text
    plain_text = output_text.replace("\\(", "").replace("\\)", "").replace("\\[", "").replace("\\]", "")
    return plain_text

def generate_document(media_path, output_text, file_format, font_size, line_spacing, alignment, image_size):
    """Generates a document with the input image and plain text output."""
    plain_text = format_plain_text(output_text)
    if file_format == "pdf":
        return generate_pdf(media_path, plain_text, font_size, line_spacing, alignment, image_size)
    elif file_format == "docx":
        return generate_docx(media_path, plain_text, font_size, line_spacing, alignment, image_size)

def generate_pdf(media_path, plain_text, font_size, line_spacing, alignment, image_size):
    """Generates a PDF document."""
    filename = f"output_{uuid.uuid4()}.pdf"
    doc = SimpleDocTemplate(
        filename,
        pagesize=A4,
        rightMargin=inch,
        leftMargin=inch,
        topMargin=inch,
        bottomMargin=inch
    )
    styles = getSampleStyleSheet()
    styles["Normal"].fontSize = int(font_size)
    styles["Normal"].leading = int(font_size) * line_spacing
    styles["Normal"].alignment = {
        "Left": 0,
        "Center": 1,
        "Right": 2,
        "Justified": 4
    }[alignment]

    story = []

    # Add image with size adjustment
    image_sizes = {
        "Small": (200, 200),
        "Medium": (400, 400),
        "Large": (600, 600)
    }
    img = RLImage(media_path, width=image_sizes[image_size][0], height=image_sizes[image_size][1])
    story.append(img)
    story.append(Spacer(1, 12))

    # Add plain text output
    text = Paragraph(plain_text, styles["Normal"])
    story.append(text)

    doc.build(story)
    return filename

def generate_docx(media_path, plain_text, font_size, line_spacing, alignment, image_size):
    """Generates a DOCX document."""
    filename = f"output_{uuid.uuid4()}.docx"
    doc = docx.Document()

    # Add image with size adjustment
    image_sizes = {
        "Small": docx.shared.Inches(2),
        "Medium": docx.shared.Inches(4),
        "Large": docx.shared.Inches(6)
    }
    doc.add_picture(media_path, width=image_sizes[image_size])
    doc.add_paragraph()

    # Add plain text output
    paragraph = doc.add_paragraph()
    paragraph.paragraph_format.line_spacing = line_spacing
    paragraph.paragraph_format.alignment = {
        "Left": WD_ALIGN_PARAGRAPH.LEFT,
        "Center": WD_ALIGN_PARAGRAPH.CENTER,
        "Right": WD_ALIGN_PARAGRAPH.RIGHT,
        "Justified": WD_ALIGN_PARAGRAPH.JUSTIFY
    }[alignment]
    run = paragraph.add_run(plain_text)
    run.font.size = docx.shared.Pt(int(font_size))

    doc.save(filename)
    return filename

# CSS for output styling
css = """
  #output {
    height: 500px;
    overflow: auto;
    border: 1px solid #ccc;
  }
.submit-btn {
    background-color: #cf3434  !important;
    color: white !important;
}
.submit-btn:hover {
    background-color: #ff2323 !important;
}
.download-btn {
    background-color: #35a6d6 !important;
    color: white !important;
}
.download-btn:hover {
    background-color: #22bcff !important;
}
"""

# Gradio app setup
with gr.Blocks(css=css) as demo:
    gr.Markdown("# Qwen2VL Models: Vision and Language Processing")

    with gr.Tab(label="Image Input"):

        with gr.Row():
            with gr.Column():
                model_choice = gr.Dropdown(
                    label="Model Selection",
                    choices=list(MODEL_OPTIONS.keys()),
                    value="OCR-KIE"
                )
                input_media = gr.File(
                    label="Upload Image", type="filepath"
                )
                text_input = gr.Textbox(label="Question", placeholder="Ask a question about the image...")
                submit_btn = gr.Button(value="Submit", elem_classes="submit-btn")

            with gr.Column():
                output_text = gr.Textbox(label="Output Text", lines=10)
                plain_text_output = gr.Textbox(label="Standardized Plain Text", lines=10)

        submit_btn.click(
            qwen_inference, [model_choice, input_media, text_input], [output_text]
        ).then(
            lambda output_text: format_plain_text(output_text), [output_text], [plain_text_output]
        )

        # Add examples directly usable by clicking
        with gr.Row():
            with gr.Column():
                line_spacing = gr.Dropdown(
                    choices=[0.5, 1.0, 1.15, 1.5, 2.0, 2.5, 3.0],
                    value=1.5,
                    label="Line Spacing"
                )
                font_size = gr.Dropdown(
                    choices=["8", "10", "12", "14", "16", "18", "20", "22", "24"],
                    value="18",
                    label="Font Size"
                )
                alignment = gr.Dropdown(
                    choices=["Left", "Center", "Right", "Justified"],
                    value="Justified",
                    label="Text Alignment"
                )
                image_size = gr.Dropdown(
                    choices=["Small", "Medium", "Large"],
                    value="Small",
                    label="Image Size"
                )
                file_format = gr.Radio(["pdf", "docx"], label="File Format", value="pdf")
                get_document_btn = gr.Button(value="Get Document", elem_classes="download-btn")

        get_document_btn.click(
            generate_document, [input_media, output_text, file_format, font_size, line_spacing, alignment, image_size], gr.File(label="Download Document")
        )

demo.launch(debug=True)

Loading OCR-KIE...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/4.42G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/572 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/408 [00:00<?, ?B/s]

You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.


chat_template.json: 0.00B [00:00, ?B/s]

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://30f5dff2a297203ad5.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://30f5dff2a297203ad5.gradio.live


