# JINA

## Who is Jina AI
Jina AI, founded in 2020 in Berlin, is a pioneering AI company focused on revolutionizing the future of artificial intelligence through its search foundation. Specializing in multimodal AI, Jina AI aims to empower businesses and developers to harness the power of multimodal data for value creation and cost savings through its integrated suite of components, including embeddings, rerankers, prompt ops, and core infrastructure.
Jina AI's cutting-edge embeddings boast top-tier performance, featuring an 8192 token-length model ideal for comprehensive data representation. Offering multilingual support and seamless integration with leading platforms like OpenAI, these embeddings facilitate cross-lingual applications.

## Milvus and Jina AI's Embedding
In order to store and search these embeddings efficiently for speed and scale, specific infrastructure designed for this purpose is required. Milvus is a widely known advanced open-source vector database capable of handling large-scale vector data. Milvus enables fast and accurate vector(embedding) search according plenty of metrics. Its scalability allows for seamless handling of massive volumes of image data, ensuring high-performance search operations even as datasets grow. 

## Examples
Jina embeddings have been integrated into the PyMilvus model library. Now, we will demonstrate code examples to show how to use Jina embeddings in action.

## Prerequisites

Before using Jina AI's embedding model, install pymilvus and pymilvus[model].

In [9]:
!pip install -U pymilvus
!pip install pymilvus[model]

# for zsh
# !pip install "pymilvus[model]" 

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting protobuf>=3.20.0 (from pymilvus[model])
  Using cached protobuf-3.20.2-py2.py3-none-any.whl.metadata (720 bytes)
Using cached protobuf-3.20.2-py2.py3-none-any.whl (162 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 4.21.6
    Uninstalling protobuf-4.21.6:
      Successfully uninstalled protobuf-4.21.6
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-health-checking 1.57.0 requires protobuf>=4.21.6, but you have protobuf 3.20.2 which is incompatible.
grpcio-reflection 1.57.0 requires protobuf>=4.21.6, but you have protobuf 3.20.2 which is incompatible.[0m[31m
[0mSuccessfully installed protobuf-3.20.2

## General-Purpose Embedding
Jina AI's core embedding model, excels in understanding detailed text, making it ideal for semantic search, content classification thus supports advanced sentiment analysis, text summarization, and personalized recommendation systems.

In [14]:
from pymilvus.model.dense import JinaEmbeddingFunction

jina_api_key = "<YOUR_JINA_API_KEY>"
ef = JinaEmbeddingFunction("jina-embeddings-v2-base-en", jina_api_key)

query = "what is information retrieval?"
doc = "Information retrieval is the process of finding relevant information from a large collection of data or documents."

qvecs = ef.encode_queries([query])
dvecs = ef.encode_documents([doc])

[array([-4.22398150e-01, -4.95849600e-01,  2.17690600e-01,  1.25464300e-01,
        9.49532600e-03,  2.99926760e-01,  5.02336800e-01, -3.25339180e-02,
        6.38811400e-01,  4.85525940e-01, -1.78815570e-01,  5.43201980e-02,
       -3.31211630e-01, -2.87676130e-01, -8.84068100e-01,  8.56759200e-01,
        2.50366200e-01,  1.02111820e-01,  3.65391300e-01, -2.61073530e-01,
       -1.66957300e-01, -4.58269400e-01, -5.95354400e-01, -2.24278050e-01,
        3.04818300e-01,  7.31384300e-01,  8.92403700e-01,  2.25847510e-01,
        1.09902630e+00,  3.23155000e-01,  4.72837180e-01, -3.31612740e-01,
       -5.59465700e-01,  3.31787100e-01, -3.87137260e-01, -8.16441100e-01,
        4.43359380e-01, -1.63072850e-01,  9.15876100e-01,  8.21393670e-01,
       -8.97147060e-01,  3.19279250e-01,  8.02219960e-02,  1.17047990e+00,
       -9.94428340e-01, -2.02420370e-01, -3.70282860e-01,  3.27715200e-02,
       -5.76498870e-01, -1.02183320e+00, -5.25948640e-01, -1.03543530e+00,
       -4.51594780e-02, 

## Bilingual Embeddings
Jina AI's bilingual models enhance multilingual platforms, global support, and cross-lingual content discovery. Designed for German-English and Chinese-English translations, they foster understanding among diverse linguistic groups, simplifying interactions across languages.

In [15]:
from pymilvus.model.dense import JinaEmbeddingFunction

jina_api_key = "<YOUR_JINA_API_KEY>"
ef = JinaEmbeddingFunction("jina-embeddings-v2-base-de", jina_api_key)

query = "what is information retrieval?"
doc = "Information Retrieval ist der Prozess, relevante Informationen aus einer großen Sammlung von Daten oder Dokumenten zu finden."

qvecs = ef.encode_queries([query])
dvecs = ef.encode_documents([doc])

[array([-2.58712770e-01, -1.29451750e-01,  1.76284790e-01,  1.73278810e-01,
       -5.10482800e-02,  2.26470950e-01, -3.47457900e-01,  2.55798340e-01,
       -7.03315700e-02, -3.64624020e-01, -9.43145750e-02, -1.27883910e-01,
        2.19894410e-01, -1.85930250e-01, -5.41009900e-02,  2.17361450e-01,
        1.58960340e-01,  2.06985470e-02,  2.28973390e-01, -4.24537660e-02,
       -1.04682920e-01, -2.11044310e-01, -2.81257630e-02,  2.45483400e-01,
        1.99539180e-01,  3.64501950e-01, -9.25350200e-02,  1.90734860e-01,
       -3.48724370e-01, -1.15280150e-01, -5.41896820e-02, -2.27371220e-01,
        2.84866330e-01, -5.20248400e-02, -1.40731810e-01, -3.92065050e-02,
        2.55565640e-02,  3.78919470e-02,  1.57066350e-01,  5.22823330e-02,
        9.29183960e-02, -1.81716920e-01, -2.17629430e-01, -2.51922600e-01,
        1.52748110e-01, -6.36508200e-02,  2.99865720e-01, -6.12030030e-02,
        3.47717300e-01, -2.86384670e-02, -7.83438700e-02, -8.23745700e-02,
       -2.28683470e-01, 

## Code Embeddings
Jina AI's code embedding model provides searching ability through code and documentation. It supports English and 30 popular programming languages that can be used for enhancing code navigation, streamlined code review and automated documentation assistance.

In [12]:
from pymilvus.model.dense import JinaEmbeddingFunction

jina_api_key = "<YOUR_JINA_API_KEY>"
ef = JinaEmbeddingFunction("jina-embeddings-v2-base-code", jina_api_key)

# Case1: Enhanced Code Navigation
# query: text description of the functionality
# document: relevant code snippet

query = "function to calculate average in Python."
doc = '''
def calculate_average(numbers):
    total = sum(numbers)
    count = len(numbers)
    return total / count
'''

# Case2: Streamlined Code Review
# query: text description of the programming concept
# document: relevante code snippet or PR

query = "pull quest related to Collection"
doc = "fix:[restful v2] parameters of create collection ..."

# Case3: Automatic Documentation Assistance
# query: code snippet you need explanation
# document: relevante document or DocsString

query = "What is Collection in Milvus"
doc = '''
In Milvus, you store your vector embeddings in collections. All vector embeddings within a collection share the same dimensionality and distance metric for measuring similarity.
Milvus collections support dynamic fields (i.e., fields not pre-defined in the schema) and automatic incrementation of primary keys.
'''

qvecs = ef.encode_queries([query])
dvecs = ef.encode_documents([doc])

## Jina Reranker
Jina Ai also provides rerankers to further enhance retrieval quality after searching using embeddings.

In [17]:
from pymilvus.model.reranker import JinaRerankFunction

jina_api_key = "<YOUR_JINA_API_KEY>"

rf = JinaRerankFunction("jina-reranker-v1-base-en", jina_api_key)

query = "What event in 1956 marked the official birth of artificial intelligence as a discipline?"

documents = [
    "In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.",
    "The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.",
    "In 1951, British mathematician and computer scientist Alan Turing also developed the first program designed to play chess, demonstrating an early example of AI in game strategy.",
    "The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems."
]

bge_rf(query, documents)

AttributeError: 'JinaRerankFunction' object has no attribute 'bge_rf'