# CGRAG Demo: Security Threat Detection with Qdrant

이 노트북은 CGRAG 시스템의 주요 기능(Malware Detection, Network Anomaly Detection, Threat Intelligence Search 등) 실전 흐름을 한 눈에 보여줍니다.

---

## 1. Introduction

**CGRAG** is an AI-powered security system leveraging Qdrant vector database and RAG pipelines for:
- Malware similarity detection
- Network anomaly detection
- Cyber threat intelligence retrieval

**Data Used:**
- `sample_malware_hashes.json`: Malware metadata (name, description, hash, etc.)
- `cve_database.json`: CVE/global vulnerability metadata
- `network_logs.csv`: Normal (baseline) network traffic logs

---

## 2. Qdrant Server Connection & Environment Preparation

### Prerequisites - Install required packages

In [None]:
!pip install qdrant-client sentence-transformers pandas


### Import modules & connect to Qdrant

In [None]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
import pandas as pd
import json

### Connect to local Qdrant

In [None]:
qdrant = QdrantClient("localhost", port=6333)
encoder = SentenceTransformer("all-MiniLM-L6-v2")


### Check if expected collections exist

In [None]:
collections = ["malware_signatures", "network_patterns", "threat_intel"]
for c in collections:
info = qdrant.get_collection(c) if qdrant.collection_exists(c) else None
print(f"{c}: {'Exists' if info else 'Missing'}")


---

## 3. Data Loading & Embedding Preview

### Load sample data files

In [None]:
with open("../data/sample_malware_hashes.json", encoding="utf8") as f:
malware_db = json.load(f)

with open("../data/cve_database.json", encoding="utf8") as f:
cve_db = json.load(f)

network_df = pd.read_csv("../data/network_logs.csv")


### Show sample rows from each data source

In [None]:
import pandas as pd
pd.DataFrame(malware_db).head(3)