<a href="https://colab.research.google.com/github/BhargavaSimhaR/AgenticAI/blob/main/SimpleRAGbasedBOT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import HuggingFaceHub,HuggingFacePipeline
from langchain.chains import RetrievalQA

In [None]:
import os
import warnings
from transformers import logging

# Suppress warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# Reduce verbosity of HuggingFace logs
logging.set_verbosity_error()

# Optional: Environment config
os.environ["TRANSFORMERS_VERBOSITY"] = "error"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [None]:
from langchain.schema import Document

# Manual context input instead of PDF
manual_text = """
MODULE-3

Intrusion Detection Host -Based Intrusion Detection – Network -Based
Intrusion Detection – Distributed or Hybrid Intrusion Detection – Intrusion
Detection Exchange Format – Honeypots – Example System Snort

What is Intrusion?

• Cyber Intrusion is to compromise a computer system by breaking the
security of such a system or causing it to enter into an insecure state
• Some intruders will try to implant code that has been carefully
developed. Others will infiltrate the network, stealthily siphoning out
data on a regular basis or altering public-facing Web sites with varied
messages.
• An attacker can acquire physical access to your system (by physically
accessing a restricted computer and its hard drive and/or BIOS),
externally (by assaulting your Web servers or breaching your firewall),
or internally (by physically accessing a restricted machine and its hard
disc and/or BIOS) (your own users, customers, or partners)

Broad classes of Intruders
• Cyber criminals: Are either individuals or members of an organized crime group with a goal of financial reward. To achieve this, their activities may include identity theft, theft of financial credentials, corporate espionage, data theft, or data ransoming.
• Activists: Are either individuals, usually working as insiders, or members of a larger group of outsider attackers, who are motivated by social or political causes. They are also known as hacktivists, and their skill level is often quite low.
• State-sponsored organizations: Are groups of hackers sponsored by governments to conduct espionage or sabotage activities. They are also known as Advanced Persistent Threats (APTs), due to the covert nature and persistence over extended periods involved with many attacks in this class.
• Apprentice: Hackers with minimal technical skill who primarily use existing attack toolkits. They likely comprise the largest number of attackers, including many criminal and activist attackers. Given their use of existing known tools, these attackers are the easiest to defend against. They are also known as “script-kiddies” due to their use of existing scripts (tools).
• Journeyman: Hackers with sufficient technical skills to modify and extend attack toolkits to use newly discovered, or purchased, vulnerabilities; or to focus on different target groups. They may also be able to locate new vulnerabilities to exploit that are similar to some already known. A number of hackers with such skills are likely found in all intruder classes listed above, adapting tools for use by others.
• Master: Hackers with high-level technical skills capable of discovering brand new categories of vulnerabilities, or writing new powerful attack toolkits

Examples of Intrusion

Intruder Behaviour
Target Acquisition and Information Gathering: Where the attacker identifies and characterizes the target systems using publicly available information, both technical and non-technical, and the use network exploration tools to map target resources
Initial Access: The initial access to a target system, typically by exploiting a remote network vulnerability, by guessing weak authentication credentials used in a remote service, or via the installation of malware on the system using some form of social engineering or drive-by- download attack
Privilege Escalation: Actions taken on the system, typically via a local access vulnerability to increase the privileges available to the attacker to enable their desired goals on the target
system.
Information Gathering or System Exploit: Actions by the attacker to access or modify information or resources on the system, or to navigate to another target system.
Maintaining Access: Actions such as the installation of backdoors or other malicious software as we discuss in Chapter 6, or through the addition of covert authentication credentials or other configuration changes to the system, to enable continued access by the attacker after the initial attack.
Covering Tracks: Where the attacker disables or edits audit logs, to remove evidence of attack activity, and uses rootkits and other measures to hide covertly installed files or code

Intrusion Detection

An IDS comprises three logical components:
• Sensors: Sensors are responsible for collecting data. The input for a sensor
may be any part of a system that could contain evidence of an intrusion.
Types of input to a sensor includes network packets, log files, and system call
traces. Sensors collect and forward this information to the analyzer.
• Analyzers: Analyzers receive input from one or more sensors or from other
analyzers. The analyzer is responsible for determining if an intrusion has
occurred. The output of this component is an indication that an intrusion has
occurred. The output may include evidence supporting the conclusion that an
intrusion occurred. The analyzer may provide guidance about what actions to
take as a result of the intrusion. The sensor inputs may also be stored for
future analysis and review in a storage or database component.
• User interface: The user interface to an IDS enables a user to view output
from the system or control the behavior of the system. In some systems, the
user interface may equate to a manager, director, or console component

• Intrusion detection is based on the assumption that the behavior of the
intruder differs from that of a legitimate user in ways that can be quantified. Of course, we cannot expect that there will be a crisp, exact distinction between an attack by an intruder and the normal use of resources by an authorized user. Rather, we must expect that there will be
some overlap.
• Although the typical behavior of an intruder differs from the typical behavior of an authorized user, there is an overlap in these behaviors. Thus, a loose interpretation of intruder behavior, which will catch more
intruders, will also lead to a number of false positives, or false alarms,
where authorized users are identified as intruders. On the other hand, an
attempt to limit false positives by a tight interpretation of intruder behavior will lead to an increase in false negatives, or intruders not
identified as intruders.

• To be of practical use, an IDS should detect a substantial percentage
of intrusions while keeping the false alarm rate at an acceptable
level. If only a modest percentage of actual intrusions are detected,
the system provides a false sense of security. On the other hand, if
the system frequently triggers an alert when there is no intrusion (a
false alarm), then either system managers will begin to ignore the
alarms, or much time will be wasted analyzing the false alarms.
• Unfortunately, because of the nature of the probabilities involved, it
is very difficult to meet the standard of high rate of detections with a
low rate of false alarms. In general, if the actual numbers of
intrusions is low compared to the number of legitimate uses of a
system, then the false alarm rate will be high unless the test is
extremely discriminating. This is an example of a phenomenon
known as the base-rate fallacy.

Analysis Approaches

IDSs typically use one of the following alternative approaches to analyze
sensor data to detect intrusions:
1. Anomaly detection: Involves the collection of data relating to the behavior
of legitimate users over a period of time. Then current observed behavior is
analyzed to determine with a high level of confidence whether this behavior
is that of a legitimate user or alternatively that of an intruder.
2. Signature or Heuristic detection: Uses a set of known malicious data
patterns (signatures) or attack rules (heuristics) that are compared with
current behavior to decide if is that of an intruder. It is also known as misuse
detection. This approach can only identify known attacks for which it has
patterns or rules.

• The anomaly detection approach involves first developing a model of legitimate user behavior by collecting and processing sensor data from the normal operation of the monitored system in a training phase. This may occur at distinct times, or
there may be a continuous process of monitoring and evolving the model over
time.
A variety of classification approaches are used, which broadly categorized as:
• Statistical: Statistical approaches use the captured sensor data to develop a
statistical profile of the observed metrics. Analysis of the observed behavior using univariate, multivariate, or time-series models of observed metrics.
• Knowledge based: Knowledge based approaches classify the observed data using a
set of rules. These rules are developed during the training phase, usually manually, to
characterize the observed training data into distinct classes. Formal tools may be used to describe these rules, such as a finite-state machine or a standard description
language. Approaches use an expert system that classifies observed behavior according to a set of rules that model legitimate behavior.
• Machine-learning: Machine-learning approaches use data mining techniques to
automatically develop a model using the labeled normal training data. This model is
then able to classify subsequently observed data as either normal or anomalous. Approaches automatically determine a suitable classifica tion model from the training data using data mining techniques.

Signature Detection
• Signature approaches match a large collection of known patterns of
malicious data against data stored on a system or in transit over a
network. The signatures need to be large enough to minimize the
false alarm rate, while still detecting a sufficiently large fraction of
malicious data. This approach is widely used in anti virus products, in
network traffic scanning proxies, and in NIDS.

• The advantages of this approach include the relatively low cost in
time and resource use, and its wide acceptance. • Disadvantages include the significant effort required to constantly
identify and review new malware to create signatures able to identify
it, and the inability to detect zero-day attacks for which no signatures
exist.

IDSs are often classified based on the source
and type of data analyzed, as:
• Host-based IDS (HIDS): Monitors the characteristics of a single host
and the events occurring within that host, such as process identifiers
and the system calls they make, for evidence of suspicious activity.
• Network-based IDS (NIDS): Monitors network traffic for particular
network segments or devices and analyzes network, transport, and
application protocols to identify suspicious activity.
• Distributed or hybrid IDS: Combines information from a number of
sensors, often both host and network-based, in a central analyzer that
is able to better identify and respond to intrusion activity

Host Based Intrusion Detection

A Host-Based Intrusion Detection System (HIDS) is a security tool that
monitors and analyzes activities on a specific computer or endpoint to
detect and alert on potential security breaches or malicious activities
HIDS systems are so-named because they operate on individual host
systems. In this context, a host could be a server, a PC, or any other
type of device that produces logs, metrics, and other data that can be
monitored for security purposes.

What it Does:
•Monitors Host Activity:
HIDS software, often deployed as an agent on individual hosts, continuously
monitors system events, logs, and network traffic related to that specific host.
•Detects Suspicious Behavior:
It analyzes this activity to identify patterns or events that could indicate an
intrusion, unauthorized access, or malicious activity.
•Examples of Monitored Activities:
•File access and modifications
•Registry changes
•Process execution
•Network traffic on the host's network interface
•System logs

•Alerts and Notifications:
When suspicious activity is detected, the
HIDS generates alerts or notifications to
inform security personnel or administrators.
•Used in conjunction with other security
measures:
HIDS is often used alongside other security
tools, such as network intrusion detection
systems (NIDS) and intrusion prevention
systems (IPS).

•Signature-based Detection:
HIDS can use a database of known attack signatures to identify malicious patterns.
•Anomaly-based Detection:
It can also monitor baseline behavior and flag deviations from the norm as
potential threats.
The primary benefit of a HIDS is that it can detect both external and internal intrusions, something that
is not possible either with network-based IDSs or fire walls.
As we discuss in the previous section, host-based IDSs can use either anomaly or signature and
heuristic approaches to detect unauthorized behavior on the moni tored host.

DataSources and Sensors

A fundamental component of intrusion detection is the sensor that collects
data. Some record of ongoing activity by users must be provided as input to
the analysis component of the IDS.
• System call traces
• Audit (log file) records
• File integrity checksums
• Registry access
The sensor gathers data from the chosen source, filters the gathered data to
remove any unwanted information and to standardize the information
format, and forwards the result to the IDS analyzer, which may be local or
remote

Distributed HIDS

• Traditionally, work on host-based IDSs focused on single-system stand-
alone operation. The typical organization, however, needs to defend a

distributed collection of hosts supported by a LAN or internetwork.
Although it is possible to mount a defense by using stand-alone IDSs on
each host, a more effective defense can be achieved by coordination
and cooperation among IDSs across the network

Major issues in the design of a distributed IDS
• A distributed IDS may need to deal with different sensor data formats.
• One or more nodes in the network will serve as collection and analysis
points for the data from the systems on the network. Thus, either raw
sensor data or summary data must be transmitted across the network.
Therefore, there is a requirement to assure the integrity and
confidentiality of these data.
• With a centralized architecture, there is a single central point of
collection and analysis of all sensor data. This eases the task of
correlating incoming reports but creates a potential bottleneck and
single point of failure. With a decentralized architecture, there is more
than one analysis center, but these must coordinate their activities and
exchange information.

Architecture for Distributed Intrusion
Detection

Three main components:
1. Host agent module: An audit collection module operating as a
background process on a monitored system. Its purpose is to collect
data on security- related events on the host and transmit these to the
central manager. Figure shows details of the agent module
architecture.
2. LAN monitor agent module: Operates in the same fashion as a host
agent module except that it analyzes LAN traffic and reports the results to
the cen tral manager.
3. Central manager module: Receives reports from LAN monitor and host
agents and processes and correlates these reports to detect intrusion

Agent Architecture

• The agent captures each audit record produced by the native audit
collection system. A filter is applied that retains only those records that
are of security interest. These records are then reformatted into a
standardized format referred to as the host audit record (HAR).
• Next, a template-driven logic module analyzes the records for
suspicious activity. At the lowest level, the agent scans for notable
events that are of interest independent of any past events. Examples
include failed files, accessing system files, and changing a file’s access
control.
• At the next higher level, the agent looks for sequences of events, such
as known attack patterns (signatures). Finally, the agent looks for
anomalous behavior of an individual user based on a historical profile
of that user, such as number of programs executed, number of files
accessed, and the like.

• When suspicious activity is detected, an alert is sent to the central
manager. The central manager includes an expert system that can
draw inferences from received data. The manager may also query
individual systems for copies of HARs to correlate with those from
other agents.
• The LAN monitor agent also supplies information to the central
manager. The LAN monitor agent audits host-host connections,
services used, and volume of traffic. It searches for significant events,
such as sudden changes in network load, the use of security-related
services, and suspicious network activities.

Network Based Intrution Detection

• A network-based IDS (NIDS) monitors traffic at selected points on a
network or interconnected set of networks.
• The NIDS examines the traffic packet by packet in real time, or close
to real time, to attempt to detect intrusion patterns.
• The NIDS may examine network-, transport-, and/or application-level
protocol activity
• A typical NIDS facility includes a number of sensors to monitor packet
traffic, one or more servers for NIDS management functions, and one
or more management consoles for the human interface

Types of Network Sensors
• Sensors can be deployed in one of two modes: inline and passive. • An inline sensor is inserted into a network segment so that the traffic
that it is monitoring must pass through the sensor. One way to achieve
an inline sensor is to combine NIDS sensor logic with another network
device, such as a firewall or a LAN switch. This approach has the
advantage that no additional separate hardware devices are needed; all
that is required is NIDS sensor software.
• . An alternative is a stand-alone inline NIDS sensor. The primary
motivation for the use of inline sensors is to enable them to block an
attack when one is detected. In this case the device is performing both
intrusion detection and intrusion prevention functions.

A passive sensor monitors a copy of network traffic; the actual traffic
does not pass through the device. From the point of view of traffic flow,
the passive sensor is more efficient than the inline sensor, because it
does not add an extra handling step that contributes to packet delay.
.The sensor con nects to the network transmission medium, such as a
fiber optic cable, by a direct physical tap. The tap provides the sensor
with a copy of all network traffic being carried by the medium. The
network interface card (NIC) for this tap usually does not have an IP
address configured for it. All traffic into this NIC is simply collected with
no protocol interaction with the network
"""

# Wrap your text in a Document object
docs = [Document(page_content=manual_text)]

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [None]:
vectorstore = FAISS.from_documents(chunks, embedding_model)

In [None]:
from transformers import pipeline

# Load text2text model from Hugging Face
generator = pipeline("text2text-generation", model="google/flan-t5-base", temperature=0.7)

In [None]:
llm=HuggingFacePipeline(pipeline=generator)

  llm=HuggingFacePipeline(pipeline=generator)


In [None]:
retriever= vectorstore.as_retriever()
qa_chain = load_qa_chain(llm, chain_type="stuff")

stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  qa_chain = load_qa_chain(llm, chain_type="stuff")


In [None]:
def qa(query):
    docs = retriever.get_relevant_documents(query)
    answer = qa_chain.run(input_documents=docs, question=query)
    return answer

In [None]:
while True:
    query = input("Enter your query: ")
    if query == "exit":
        break
    answer=qa(query)
    print(answer)

Enter your query: explain intrusion


  docs = retriever.get_relevant_documents(query)
  answer = qa_chain.run(input_documents=docs, question=query)


Examples of Intrusion Detection Intrusion detection is based on the assumption that the behavior of the intruder differs from that of a legitimate user in ways that can be quantified. Of course, we cannot expect that there will be a crisp, exact distinction between an attack by an intruder and the normal use of resources by an authorized user in ways that can be quantified. Of course, we cannot expect that there will be a crisp, exact distinction between an attack by an intruder and the normal use of resources by an authorized user in ways that can be quantified. Of course, we cannot expect that there will be a crisp, exact distinction between an attack by an intruder and the normal use of resources by an authorized user in ways that can be quantified. Of course, we cannot expect that there will be a crisp, exact distinction between an attack by an intruder and the normal use of resources by an authorized user in ways that can be quantified.
Enter your query: exit
