# RAG With llama-index  + Milvus + Qwen - Part 1

References

- https://studio.nebius.com/
- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/
- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore

## Step-1: Configuration

- 1) убедитесь, что есть файл .env
- 2) kernel - studio1

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

if os.getenv('NEBIUS_API_KEY'):
    print ("✅ Found NEBIUS_API_KEY in environment, using it")
else:
    raise ValueError("❌ NEBIUS_API_KEY not found in environment. Please set it in .env file before running this script.")

✅ Found NEBIUS_API_KEY in environment, using it


## Step-2: Read documents

In [4]:
%%time 

from llama_index.core import SimpleDirectoryReader
import pprint

# load documents
documents = SimpleDirectoryReader(
    input_dir = '../data/10k',
).load_data()

print (f"Loaded {len(documents)} chunks")

# print("Document [0].doc_id:", documents[0].doc_id)
# pprint.pprint (documents[0], indent=4)

Loaded 545 chunks
CPU times: user 15 s, sys: 57.3 ms, total: 15 s
Wall time: 15 s


## Step-3: Setup Embedding Model

We have a choice of local embedding model (fast) or running it on the cloud

If running locally:
- choose smaller models
- less accuracy but faster

If running on the cloud
- We can run large models (billions of params)

In [5]:
import os
from llama_index.core import Settings

## These work excellently for most RAG applications
#"BAAI/bge-small-en-v1.5"        # 384 dimensions  - Fast & efficient
#"BAAI/bge-base-en-v1.5"         # 768 dimensions  - Balanced
#"BAAI/bge-large-en-v1.5"        # 1024 dimensions - High quality
#"sentence-transformers/all-MiniLM-L6-v2"  # 384 dimensions - Very popular


# Option 1: Running embedding models on Nebius cloud
from llama_index.embeddings.nebius import NebiusEmbedding
EMBEDDING_MODEL = 'Qwen/Qwen3-Embedding-8B'  # 8B params, 4096 dim
EMBEDDING_LENGTH = 4096
Settings.embed_model = NebiusEmbedding(
                        model_name=EMBEDDING_MODEL,
                        embed_batch_size=50,  # Batch size for embedding (default is 10)
                        api_key=os.getenv("NEBIUS_API_KEY") # if not specfified here, it will get taken from env variable
                       )

## Option 2: Running embedding models locally
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
# Settings.embed_model = HuggingFaceEmbedding(
#     # model_name = 'sentence-transformers/all-MiniLM-L6-v2' # 23 M params
#     model_name = 'BAAI/bge-small-en-v1.5'  # 33M params
#     # model_name = 'Qwen/Qwen3-Embedding-0.6B'  # 600M params
#     # model_name = 'BAAI/bge-en-icl'  # 7B params
#     #model_name = 'intfloat/multilingual-e5-large-instruct'  # 560M params
# )

## Step-4: Connect to Milvus

- Milvus - векторное хранилище

In [7]:
from pymilvus import MilvusClient

DB_URI = './rag.db'  # For embedded instance
COLLECTION_NAME = 'rag'

milvus_client = MilvusClient(DB_URI)
print ("✅ Connected to Milvus instance: ", DB_URI)

# if we already have a collection, clear it first
if milvus_client.has_collection(collection_name = COLLECTION_NAME):
    milvus_client.drop_collection(collection_name = COLLECTION_NAME)
    print ('✅ Cleared collection :', COLLECTION_NAME)


  from pkg_resources import DistributionNotFound, get_distribution


✅ Connected to Milvus instance:  ./rag.db


In [8]:
%%time 

# connect to vector db
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore

vector_store = MilvusVectorStore(
    uri = DB_URI ,
    dim = EMBEDDING_LENGTH ,
    collection_name = COLLECTION_NAME,
    overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

print ("✅ Connected Llama-index to Milvus instance: ", DB_URI )

✅ Connected Llama-index to Milvus instance:  ./rag.db
CPU times: user 15.9 ms, sys: 1.06 ms, total: 17 ms
Wall time: 546 ms


## Step-5: Create Index and Save to DB

# падает без впн, с впн долго грузит большие эмбеддинги

In [10]:
%%time

# create an index

from llama_index.core import VectorStoreIndex

print ("⚙️ Creating index from documents...")
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
print ("✅ Created index:", index )
print ("✅ Saved index to db ", DB_URI )

⚙️ Creating index from documents...


2025-10-16 13:47:48,474 - INFO - Retrying request to /embeddings in 0.492942 seconds
2025-10-16 13:47:49,092 - INFO - HTTP Request: POST https://api.studio.nebius.ai/v1/embeddings "HTTP/1.1 403 Forbidden"


CPU times: user 676 ms, sys: 13.2 ms, total: 689 ms
Wall time: 1min 1s


PermissionDeniedError: <html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>