#### Pruebas y analisis de la Generacion de JINA
En este apartado del codigo se da una breve explicacion del funcionamiento del modelo Jina en su version "jina-re-ranker-m0"

Los ejemplos son adaptaciones de los ejemplos proporcionados por la cuenta oficial de Jina en Hugginface

In [None]:
from transformers import AutoModel
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model = AutoModel.from_pretrained(
    '../models/jina-reranker-m0',
    torch_dtype=dtype,
    local_files_only=True,
).to(device)


In [None]:
# Ejemplo de ejecucion y funcionamiento de Jina con imagenes
query = "slm markdown"
documents = [
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png",
    # El contenido de esta pagina tiene la palabra "markdown" justamente compatible con el query
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png",
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/wired-preview.png",
    "https://jina.ai/blog-banner/using-deepseek-r1-reasoning-model-in-deepsearch.webp"
]

# Creamos los pares, documento y query
image_pairs = [[query, doc] for doc in documents]

scores = model.compute_score(image_pairs, max_length=2048, doc_type="image")
# [0.49375027418136597, 0.7889736890792847, 0.47813892364501953, 0.5210812091827393]

In [None]:
# Ejemplo de ejecucion y funcionamiento de Jina con multilenguaje
query = "slm markdown"
documents = [
    # Como se puede ver el primer documento contiene directamente la palabra "markdown"
    "We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high accuracy -- making it an ideal tool for grounding large language models. The models effectiveness results from two key innovations: (1) a three-stage data synthesis pipeline that generates high quality, diverse training data by iteratively drafting, refining, and critiquing web content extraction; and (2) a unified training framework combining continuous pre-training with multi-objective optimization. Intensive evaluation demonstrates that ReaderLM-v2 outperforms GPT-4o-2024-08-06 and other larger models by 15-20% on carefully curated benchmarks, particularly excelling at documents exceeding 100K tokens, while maintaining significantly lower computational requirements.",
    "数据提取么？为什么不用正则啊，你用正则不就全解决了么？",
    "During the California Gold Rush, some merchants made more money selling supplies to miners than the miners made finding gold.",
    "Die wichtigsten Beiträge unserer Arbeit sind zweifach: Erstens führen wir eine neuartige dreistufige Datensynthese-Pipeline namens Draft-Refine-Critique ein, die durch iterative Verfeinerung hochwertige Trainingsdaten generiert; und zweitens schlagen wir eine umfassende Trainingsstrategie vor, die kontinuierliches Vortraining zur Längenerweiterung, überwachtes Feintuning mit spezialisierten Kontrollpunkten, direkte Präferenzoptimierung (DPO) und iteratives Self-Play-Tuning kombiniert. Um die weitere Forschung und Anwendung der strukturierten Inhaltsextraktion zu erleichtern, ist das Modell auf Hugging Face öffentlich verfügbar.",
]

# Creamos los pares, documento y query
text_pairs = [[query, doc] for doc in documents]

scores = model.compute_score(text_pairs, max_length=1024, doc_type="text")
# [0.6839263439178467, 0.4432148039340973, 0.5904013514518738, 0.45481112599372864]

In [None]:
# Ejemplo de ejecucion y funcionamiento de Jina con multiples imagienes entrada salida
query = "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png"

documents = [
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png",
    # Como podemos notar es practicamente la misma imagen de entrada
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png",
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/wired-preview.png",
    "https://jina.ai/blog-banner/using-deepseek-r1-reasoning-model-in-deepsearch.webp"
]

image_pairs = [[query, doc] for doc in documents]
scores = model.compute_score(image_pairs, max_length=2048, doc_type="image", query_type='image')
# [0.6275860667228699, 0.9922324419021606, 0.8090347051620483, 0.7941296100616455]