# CrewAI sequential workflow of control analysis

We were able to effectively use a chat session (with context) to analyze a PDF and match it against a
control from the NIST 800-53 standard (specifically CA-5). In this notebook, we explore turning
that chat session into a CrewAI Workflow.

In this particular experiment, we will pull in a specific EPA PDF, convert it to text (using
Python's PyMuPDF library) and use a single Agent with a sequential workflow modeled after the chat session.

### Install Notes

To run this you will need:

```pip install PyMuPDF crewai```

In [1]:
import fitz
import textwrap
from crewai import Agent, Task, Crew, Process

## Import an EPA policy document from PDF

Here we'll import the EPA policy document that specifically deals with "CA" controls. We'll convert it from PDF into text.

(Note, the original code cleaned up bullet points and headings of the document, but we can trust that we'll have that luxury with other policy libraries. So let's leave it as is and see how well we handle the result.

In [2]:
# The document we're grabbing
policy_doc = "../resources/EPA_Policy_Example/information_security_assessment_authorization_and_monitoring_procedure.pdf"

# The following opens the doc in fitz, iterates the pages, and combines the text of
# each page (separating the pages with a new line)
policy_text = "\n".join([page.get_text() for page in fitz.open(policy_doc)])

# print(policy_text)

In [3]:
# Define the Control we are Assessing

Control = textwrap.dedent("""
  CA-5 PLAN OF ACTION AND MILESTONES Control: 
    a. Develop a plan of action and milestones for the system to document the planned remediation 
       actions of the organization to correct weaknesses or deficiencies noted during the assessment 
       of the controls and to reduce or eliminate known vulnerabilities in the system; and 
    b. Update existing plan of action and milestones [Assignment: organization-defined frequency] 
       based on the findings from control assessments, independent audits or reviews, and continuous
       monitoring activities.
""")

# Define the Discussion Related to the Control

Discussion = textwrap.dedent("""
  Discussion: Plans of action and milestones are useful for any type of organization to track planned
  remedial actions. Plans of action and milestones are required in authorization packages and subject
  to federal reporting requirements established by OMB. Please find evidence (if present) from the IT
  security and policy documents that address this control.
""")

In [4]:
# Create the Investigator agent
Investigator = Agent(
    role='Investigator',
    goal=textwrap.dedent("""
      We are conducting an examination of the IT process and policy documents of a system for adherence
      to NIST SP 800-53 security controls.
      
      We need to do two key things: 
        1. Gather evidence (if available) of the implementation of a security control. 
        2. Identify the values for properties (variables) in the control from the evidence.
           The following are segments of IT process and policy documents found from search
           results based upon the control we are examining.
    """),
    verbose=True,
    memory=True,
    backstory=textwrap.dedent("""
      You are a seasoned Investigator tasked with determining if the provided NIST 800-53 security 
      controls are present in the policy documentation. You are skilled at understanding and analyzing
      software policy. If something is unclear or if the evidence is "weak", you believe it is your duty
      to point out inconsistencies or incomplete policies.
    """),
    tools=[]
)

In [5]:
#
# Task for the Investigator to search the policy documents for evidence of the implementation
# of a particular control
#
Gather_Evidence = Task(
    description=textwrap.dedent(f"""
      The following are segments of IT process and policy documents found from search results based upon
      the control we are examining:
      
      ---
      
      {policy_text}
      
      ---
      
      The control we are assessing is: 
      
      {Control}
      
      Please find evidence (if present) from the IT security and policy documents that address
      this control. For the evidence you found, please provide:
        * The document name
        * A section number or identifier (if available)
        * A quote of the evidence from the document.
    """),
    expected_output="A list of evidence.",
    agent=Investigator,
)

In [6]:
#
# Task for the Investigator to use populate poperties from the control that are fulfilled by the
# policy evidence.
#
Evaluate_Control = Task(
    description=textwrap.dedent(f"""
      Here is the Control again:
      
      {Control}

      The bracketed sections (within [ and ] in the control) represent properties that
      should be specified in the evidence if the control is fully satisfied. For each
      property in the control please:
        1. List the property from the control (using the name given by Assignment).
        2. A quote from the supporting evidence that provides the value of that property (if available).
        3. The value of the property extracted from the evidence (if available)
    
      Then please tell us yes or no if you feel that the control is addressed by the policy documents.
    """),
    expected_output='A list of properties, if the control is satisfied, and why.',
    agent=Investigator,
)

In [7]:
# 
# Task for the Investigator to combine the output of the previous tasks and format it into an
# XML report
#
Format_Evidence = Task(
    description=textwrap.dedent(f"""
      Please put the results of your examination into the following XML format:
      
      <CONTROL>
        <CONTROL_ID></CONTROL_ID>
        <CONTROL_TITLE></CONTROL_TITLE>
        <EVIDENCE>
          <DOC_TITLE></DOC_TITLE>
          <DOC_SECTION></DOC_SECTION>
          <DOC_QUOTE></DOC_QUOTE>
        </EVIDENCE>
        <PARAMETER>
          <NAME></NAME>
          <VALUE></VALUE>
        </PARAMETER>
        <CONTROL_IS_SATISFIED></CONTROL_IS_SATISFIED>
        <CONTROL_SATISFACTION_RATIONAL></CONTROL_SATISFACTION_RATIONAL>
      </CONTROL>

      Please output the XML by itself (do not provide any surrounding text nor code block quotes).
    """),
    expected_output='A properly formatted XML file with the evaluation results.',
    context=[Gather_Evidence, Evaluate_Control],
    agent=Investigator,
)

In [8]:
# Forming the crew and defining the process
crew = Crew(
    agents=[Investigator],
    tasks=[Gather_Evidence, Evaluate_Control, Format_Evidence],
    process=Process.sequential
)

# Kick off the crew
result = crew.kickoff()

# Print the result
print(result)



[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI now can give a great answer.

Final Answer:

* Document Name: Information Security – Assessment, Authorization and Monitoring (CA) Procedure Directive No: CIO 2150-P-04.3
* Section: CA-5 – Plan of Action and Milestones
* Quote: 
  ```
  For All Systems:
  1) Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and  
  2) Update existing plan of action and milestones monthly based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities.
  ```[0m

[1m> Finished chain.[0m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI now can give a great answer.

Final Answer:

- Property: [Assignment: organization-defined frequency]
- Quote: "Update exis

In [9]:
print(Gather_Evidence.output)

* Document Name: Information Security – Assessment, Authorization and Monitoring (CA) Procedure Directive No: CIO 2150-P-04.3
* Section: CA-5 – Plan of Action and Milestones
* Quote: 
  ```
  For All Systems:
  1) Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and  
  2) Update existing plan of action and milestones monthly based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities.
  ```


In [10]:
print(Evaluate_Control.output)

- Property: [Assignment: organization-defined frequency]
- Quote: "Update existing plan of action and milestones monthly based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities."
- Value: Monthly

Yes, the control CA-5 is addressed by the policy documents. The provided evidence outlines the development of a plan of action and milestones to document remediation actions and specifies that the plan should be updated monthly based on findings from various assessments and monitoring activities.


In [11]:
print(result)

<CONTROL>
  <CONTROL_ID>CA-5</CONTROL_ID>
  <CONTROL_TITLE>Plan of Action and Milestones</CONTROL_TITLE>
  <EVIDENCE>
    <DOC_TITLE>Information Security – Assessment, Authorization and Monitoring (CA) Procedure Directive No: CIO 2150-P-04.3</DOC_TITLE>
    <DOC_SECTION>CA-5 – Plan of Action and Milestones</DOC_SECTION>
    <DOC_QUOTE>
      For All Systems:
      1) Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and  
      2) Update existing plan of action and milestones monthly based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities.
    </DOC_QUOTE>
  </EVIDENCE>
  <PARAMETER>
    <NAME>Assignment: organization-defined frequency</NAME>
    <VALUE>Monthly</VALUE>
  </PARAMETER>
  <CONTROL_IS_SATISFIED>Yes</CON