<a href="https://colab.research.google.com/github/ankush-003/alerts-simulation-and-remediation/blob/main/queryBot/ASMR_Data_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ASMR Data Pipeline**

In [1]:
%pip install --quiet pymongo[srv] sentence_transformers redis[hiredis] dnspython langchain motor

## Setting up envs

In [2]:
import os
from google.colab import userdata

os.environ["MONGO_URI"] = userdata.get('MONGO_URI')
os.environ["REDIS_HOST"] = userdata.get('REDIS_HOST')
os.environ["REDIS_PWD"] = userdata.get('REDIS_PASSWORD')

## **Pipeline**

In [None]:
from langchain_community.vectorstores import MongoDBAtlasVectorSearch
from langchain_community.embeddings import HuggingFaceEmbeddings
# from dotenv import load_dotenv
import os
import pymongo
import logging
import nest_asyncio
from langchain.docstore.document import Document
import redis
import threading

# config
nest_asyncio.apply()
logging.basicConfig(level = logging.INFO)
database = "AlertSimAndRemediation"
collection = "alert_embed"
stream_name = "alerts"


# embedding model
embedding_args = {
    "model_name" : "BAAI/bge-large-en-v1.5",
    "model_kwargs" : {"device": "cpu"},
    "encode_kwargs" : {"normalize_embeddings": True}
}
embedding_model = HuggingFaceEmbeddings(**embedding_args)

# Mongo Connection
connection = pymongo.MongoClient(os.environ["MONGO_URI"])
alert_collection = connection[database][collection]

# Redis connection
r = redis.Redis(host=os.environ['REDIS_HOST'], password=os.environ['REDIS_PWD'], port=16652)

# Preprocessing
async def create_textual_description(entry_data):
    entry_dict = {k.decode(): v.decode() for k, v in entry_data.items()}

    category = entry_dict["Category"]
    created_at = entry_dict["CreatedAt"]
    acknowledged = "Acknowledged" if entry_dict["Acknowledged"] == "1" else "Not Acknowledged"
    remedy = entry_dict["Remedy"]
    severity = entry_dict["Severity"]
    source = entry_dict["Source"]
    node = entry_dict["node"]

    description = f"A {severity} alert of category {category} was raised from the {source} source for node {node} at {created_at}. The alert is {acknowledged}. The recommended remedy is: {remedy}."

    return description, entry_dict

# Saving alert doc
async def save(entry):
    vector_search = MongoDBAtlasVectorSearch.from_documents(
        documents=[Document(
            page_content=entry["content"],
            metadata=entry["metadata"]
        )],
        embedding=embedding_model,
        collection=alert_collection,
        index_name="alert_index",
    )
    logging.info("Alerts stored successfully!")

# Listening to alert stream
async def listen_to_alerts(r):
    try:
        last_id = '$'

        while True:
            entries = r.xread({stream_name: last_id}, block=0, count=None)

            if entries:
                stream, new_entries = entries[0]

                for entry_id, entry_data in new_entries:
                    description, entry_dict = await create_textual_description(entry_data)
                    await save({
                        "content" : description,
                        "metadata" : entry_dict
                    })
                    print(description)
                    # Update the last ID read
                    last_id = entry_id

    except KeyboardInterrupt:
        print("Exiting...")

await listen_to_alerts(r)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


A Critical alert of category Security was raised from the Software source for node a4a18bf2-44fd-4a87-a7f7-89fbce94dcfb at 2024-05-08 02:59:49. The alert is Not Acknowledged. The recommended remedy is: Strengthen account security.
A Critical alert of category Security was raised from the Software source for node a4a18bf2-44fd-4a87-a7f7-89fbce94dcfb at 2024-05-08 03:00:15. The alert is Not Acknowledged. The recommended remedy is: Investigate potential security breach.
A Safe alert of category RuntimeMetrics was raised from the Software source for node a4a18bf2-44fd-4a87-a7f7-89fbce94dcfb at 2024-05-08 03:00:34. The alert is Not Acknowledged. The recommended remedy is: No Alert.
A Critical alert of category Security was raised from the Software source for node a4a18bf2-44fd-4a87-a7f7-89fbce94dcfb at 2024-05-08 03:01:01. The alert is Not Acknowledged. The recommended remedy is: Investigate potential security breach.
A Critical alert of category Applications was raised from the Software so