### Architecture

The worker node is responsible for loading indexes, searching, and returning results back<br>
The object storage is used to store  the collections and indexes.

 <img src="documents/Milvus arch.PNG" alt="drawing" style="width:500px;"/>

1. Each Milvus instance can manange multiple databases, the default one is "default"
2. Each database  serves as a container storeing collections, partitions, and indexes within it. 
3. Collection is like table with schema which defined fields for data storage; support saclar and vector fields; PR or auto-generated keys</br>
   Datatypes supported</br>
   <img src="documents/Milvus dtypes.PNG" alt="drawing" style="width:500px;"/>

3. RBAC(access control) is implemented by database. 
- Users can be created and configured at a database level. 
- Roles can also be created for each database with specified permissions and then assigned to users.
 

### Partations
- A collection  can be split up into multiple partitions.  
- All data in a partition are stored physically together. 
- Default partition called _default.  
- Data can be inserted into or queried from partitions specially   
- Partitions help optimize storage and retrieval: use popular filter fieldsas partition keys.  .

### Index
Speed up searching
- In Milvus, we can create indexes on either scalar or vector fields. 
- only be one index per field.  no support for composite indexes in Milvus. 
- Indexes help organize vectors using an ANN(apporximate nearest neighor) metric type like L2 or IP. The index is set up in such a way that will help search using these metrics efficiently. 
- Vector indexes are a <B>prerequisite</B> for doing ANN searches on vector fields. These indexes must be created before any such search. 
- Index types</BR>
      <img src="documents/Milvus index type.PNG" alt="drawing" style="width:500px;"/>
    ```
    index_params = {
        "metric_type":"L2",
        "index_type":"IVF_FLAT",
        "params" :{"nlist":1024}
    }
    collection_vr.create_index(
        field_name="the embedded field",   #create index on   vector field
        index_params=index_params)

    utility.index_building_progress(collection_name,using=connection_id)
    ```

### Data management
- Row also called entity
- Bulk inserts are recommended 
   1. define  CollectionSchema
   2. create Collection
   3. prepare insert_data based on the schmema ,  each field is a list which means we generate data column by coolumn, NOT row by row<br>
    in general, the embedding field is FLOAT_VECTOR, created by ```embeddings_model.embed_query(the content for embedding)```<br>
    the finnal insert_data is [list1, list2,.....listembedding] 
   4. Bulk insert ```Collection.insert(insert_data)```<br>

   
- Flush operation    
    - After inserts are done, a flush operation is needed to index the newly inserted data 
    - Milvus automatically flushes data after the pending records reach a specific size after insertion   
    - But if immediate querying is needed, it is recommended to manually trigger the flush operation 

- Create index
    ```
    index_params = {
        "metric_type":"L2",
        "index_type":"IVF_FLAT",
        "params" :{"nlist":1024}
    }

    course_collection.create_index(
        field_name="desc_embedding",   #create index on   vector field
        index_params=index_params
    )

    utility.index_building_progress(collection_name,using=connection_id)
    ```

- Querying scalar data or search vecotr filed 
   - A collection should first be loaded into memory before queries can be executed against it<br>
    ``` course_collection.load()```


- Upsert operation      
    - Milvus also supports the upsert operation   
    - if a duplicate record is inserted with the same primary key, the existing record is updated rather than creating a new record   
- Delete   
    - Entity can also be deleted using the primary key or a Boolean expression as a filter.  ```course_collection.delete()```
    - Drop a collection ```utility.drop_collection(collection_name,using=connection_id)```
    - Drop a database ```db.drop_database(db_name, using=connection_id)```



    



### Query 
- sql statement
- can secifily fileds, partititons,limints and offsets(which is the number of rows to skip before returning the remaining data. This helps with pagination type querie)  
- aggration: only count support
- Filtering:  like  && ||   == != >= <=  in  array_contains  json_contains ...




### Search vector
1. the input string or the search query should first be converted to a vector using the <b>same embedding model</b> as the one used when ingesting the vector field. 
2. The metric used for comparison should be the <b>same metric(e.g IP,L2)</b> that was used when creating the index for the vector field.
index is a prerequisite before search can be performed on the vector field.
3. can also specify the limit on the number of rows returned and an offset from which to return rows. 
4. can also specify radius parameter to filter based on similarity(distance). The smaller the distance, the higher the similarity. 
Do note that the range of values for distance will vary based on the metric type used, so radius needs to be adjusted for that. 
5. the computed distance is also returned in addition to the query results. 


- <b>L2:  the smaller the closer
- cos: the larger the closer</b>

### Setting up  env to practise Mivus
1. cd to the Folder A where the compose yml is <br>
``` cd LLM_Foundationse_VectorDB_4__Cach_and_RAG\Exercise Files```
2. Create and start a container to install Mivus<br>
```docker-compose -f milvus-standalone-docker-compose.yml up -d```  <br>
 <img src="documents/Milvus docker container.PNG" alt="drawing" style="width:600px;"/>
3. List containers to check the contianter is ready<br>
``` docker ps (or docker container ls)```<br>
<img src="documents/Milvus running containers.PNG  " alt="drawing" style="width:600px;"/>
4. strat the miwus UI<br>
``` localhost:8000/#/connect``` <BR>
<img src="documents/Milvus local management UI .PNG " alt="drawing" style="width:600px;"/>
5. create a new vitral env name Milus<br>
```conda create --name Mivus   python=3.11.5```
6. active the env under Folder A <br>
```conda active Mivus ```

<br>
 
For  step 5&6 , I would like to run the notebook using VS code installed before Anaconda <br> 
when run ```conda create --name Mivus   python=3.11.5``` in powershell<br>
I got the error <br>
 > "The term 'conda' is not recognized as the name of a cmdlet, function, script file, or operable program."

After 
1. install Anaconda 
2. add the system path, 
3. run ```conda init powershell```   in Anaconda PowerShell prompt, to add conda into the normal PowerShell's path<br>
the command above still did not work in VS'PowerShell <br>
But it works in Anaconda powershell prompt<br>
  <img src="documents/conda evn .PNG " alt="drawing" style="width:600px;"/>


4. Add <b>Python extension</b> for VS Code,Python extension works with conda fine,Create a conda environment and the extension will allow you to select it as your environment/interpreter. <br>
 <img src="documents/change your Python interpreter  VSCode .PNG " alt="drawing" style="width:600px;"/><br>
 And ```conda create --name Mivus   python=3.11.5```  works as well<br>
 we could swith from PowerShell to Command Prompt to go into the conda (Mivus) environment.<br>
 
https://stackoverflow.com/questions/54828713/working-with-anaconda-in-visual-studio-code
 
- conda create --name Mivus   python=3.11.5
- conda activate Mivus
- conda deactivate

### Vector DB used as LLM Cache

Reason:
LLM is expensive (time and cost)
- reduce cost and lanency

Cache: prompt, response, prompt embedding<br>

workfolw:<br>
  <img src="documents/prompt cache workflow.PNG" alt="drawing" style="width:600px;"/>

Best practise
- track the cache hit rate to evelaute the cache efficecent 
- benchmark
- limit size of cached entries:A cache can grow too big over time, impacting the efficiency and relevancy of the results. Set a limit for the cache size and manage it over time. 
- It's recommended to add a last used timestamp to the cache collection and update it every time a cached entry is returned to the user. 
This helps track which entries are often used and which ones are not.
- To control the cache size, prune entries in the cache. It is recommended to prune them based on their age as well
- get user feedback on if the answers returned from the cache are correct and relevant.  

### RAG:Retrieval-augmented generation  

#### shortcomings of LLM 
- LLMs can only answer questions based on the data they are trained on.  
- The answers from the LLMs may not be current. Their cut-off date is usually the date on which their original training data sources are extracted.
- LLMs have a tendency to hallucinate. They sometimes provide make-believe answers that are not factually correct. 
- for enterprise use cases, LLMs cannot answer questions based on enterprise or confidential data where this data is not part of the training dataset. 
- It is possible to build custom LLMs using organizational data only, but that can prove to be expensive to build. It is also expensive to keep the LLM updated with new data on a daily basis. 


#### RAG: Knowledge curation process<br>
  <img src="documents/curation.PNG" alt="drawing" style="width:600px;"/><BR>
-  For text data that needs to be converted to vectors, we need to do chunking. <BR>
  vector field can only hold a limited amount of data. <BR>
  Also, when a prompt is issued, we want to only retrieve a small part of the original content that contains relevant information about the prompt. <BR>
  For this, we split up the original text into chunks of equal sizes. <BR>
  The size of the chunks may vary based on the use case, but it's usually 1024. <BR>
  Each chunk is stored as a separate row or entity in the vector database.<BR>
-  Once chunks are available, we need to convert these individual chunks into embeddings using an embedding model.  ```embeddings_model.embed_query()```


#### RAG:question-answering process
  <img src="documents/QandA.PNG" alt="drawing" style="width:600px;"/>


#### Applications scenarios 
-  interactive chatbots that businesses use to communicate with their customers. 
-  RAG can help in automated responses to customer queries by email.
-  RAG can help with root cause analysis of technical issues faced. Based on log messages, absorbed metrics, and information from manuals
- e-commerce search:RAG can help customers quickly find what they are searching for and provide good narratives about the product or service. They can also customize such information for the customer. Enterprises have help desk for functions like human  repeating. 
- Desk help  for functions like human resources, legal, or logistics.  