<a href="https://colab.research.google.com/github/RITIK1442840127/2025_Coding/blob/main/Enterprise_Knowledge_Management_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Enterprise Knowledge Management System  
**M.Tech (AI & DSE) – IIT Patna**  
**Ritik Tiwari (24A03RES160)**


#Install Libraries

In [4]:
!pip install sentence-transformers faiss-cpu nltk scikit-learn


Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m88.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.2


#Import Libraries

In [5]:
import re
import nltk
import faiss
import numpy as np

from nltk.corpus import stopwords
from sentence_transformers import SentenceTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity




#Download Stopwords

In [6]:
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


#Sample Enterprise Documents

In [7]:
documents = [
    "Enterprise knowledge management improves organizational efficiency",
    "Semantic search enables context aware information retrieval",
    "Traditional keyword search cannot capture semantic meaning",
    "AI driven knowledge systems enhance enterprise productivity"
]


#Text Preprocessing

In [8]:
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^a-z ]', '', text)
    tokens = text.split()
    tokens = [t for t in tokens if t not in stop_words]
    return " ".join(tokens)

processed_docs = [preprocess(doc) for doc in documents]
processed_docs


['enterprise knowledge management improves organizational efficiency',
 'semantic search enables context aware information retrieval',
 'traditional keyword search cannot capture semantic meaning',
 'ai driven knowledge systems enhance enterprise productivity']

#Embedding Generation

In [9]:
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(processed_docs)
doc_embeddings.shape


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

(4, 384)

#Vector Database (FAISS)

In [10]:
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(doc_embeddings))

print("Total documents indexed:", index.ntotal)


Total documents indexed: 4


#Semantic Search Function

In [11]:
def semantic_search(query, top_k=2):
    query_embedding = model.encode([preprocess(query)])
    distances, indices = index.search(np.array(query_embedding), top_k)
    return [documents[i] for i in indices[0]]


#User Query & Results

In [15]:
query = "How AI helps in enterprise knowledge management"
results = semantic_search(query)

for i, res in enumerate(results, 1):
    print(f"Result {i}: {res}")


Result 1: AI driven knowledge systems enhance enterprise productivity
Result 2: Enterprise knowledge management improves organizational efficiency


#Keyword Search (TF-IDF Comparison)

In [13]:
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(processed_docs)

query_vec = tfidf.transform([preprocess(query)])
scores = cosine_similarity(query_vec, tfidf_matrix)

scores


array([[0.54397842, 0.        , 0.        , 0.49851262]])

### Conclusion
Semantic embedding–based search provides more relevant and context-aware results
compared to traditional keyword-based retrieval methods.
