# GC JADC2 POC

- Manually pull the **Description** for each **Goal and Objectives** in [Section 2.2](./resources/DOD-DIGITAL-MODERNIZATION-STRATEGY-2019.PDF) into a [spreadsheet](./data/goals-and-objectives.csv)
- Use [`txtai`](https://github.com/neuml/txtai) for this prototype
- Use the answers to **Question R** as "queries" and see how it looks

## Goals & Objectives Format
Each DoD CIO goal is presented with the following component parts: 

- Goal:
  - A **Description** of the goal, including what it encompasses
  - The **Mission Impact** on the Department resulting from achievement of the objectives for that goal 
- Objective:
  - A **Description** that provides a rationale for the work and describes what the objective will accomplish
  - Each objective is further decomposed into the **Strategy Elements** that describe the specific, focused initiatives needed to accomplish that particular objective 


In [None]:
# import sys
# import pathlib
# PACKAGE_ROOT = pathlib.Path(__name__).absolute().parent.parent.parent.parent
# REPO_ROOT = PACKAGE_ROOT.parent
# sys.path.insert(0,str(REPO_ROOT))

In [2]:
import pandas as pd
import re
import string

In [3]:
def clean_text(text:str) -> str:
    """Make text lowercase, remove text in square brackets,remove links,remove punctuation
    and remove words containing numbers."""
    text = re.sub("https?://\S+|www\.\S+", "", text)
    text = re.sub("[%s]" % re.escape(string.punctuation), "", text)
    text = re.sub("\n", "", text)
    return text

In [4]:
with open("./data/goals-and-objectives.txt","r") as f:
    raw_text = f.read()

'Goal 1: Innovate for Competitive Advantage \n\nDescription: Innovation is a key element of future readiness. It is essential to preserving and \nexpanding US military competitive advantage in the face of near-peer competition and \nasymmetric threats. A theme running through the National Defense Strategy—and subordinate \nstrategies like the Artificial Intelligence Strategy—is that preserving and expanding our military \nadvantage depends on our ability to deliver technology faster than our adversaries and the agility \nof our enterprise to adapt our way of fighting to the potential advantages of innovative \ntechnology. The Department will evaluate opportunities for innovation, pursuing those deemed \nmost suitable to address military problems and including those likely to deliver leap-ahead \ncapabilities. \n\nCloud and cognitive computing will significantly alter warfighting and defense business \noperations. Recognizing this, the Department established the Joint Artificial Intelli

In [15]:
goals = re.findall(r"Goal \d: [\w ]+", raw_text)
goals = [g.strip() for g in goals]
goals

['Goal 1: Innovate for Competitive Advantage',
 'Goal 2: Optimize for Efficiencies and Improved Capability',
 'Goal 3: Evolve Cybersecurity for an Agile and Resilient Defense Posture',
 'Goal 4: Cultivate Talent for a Ready Digital Workforce']

In [26]:
g_desc = re.compile(r"Description: (.*) Mission Impact:")
final_dict= {i: {} for i in range(len(goals))}

for idx in range(len(goals)-1):
    end = raw_text.index(f"{goals[idx+1]}")
    start = raw_text.index(f"{goals[idx]}") + len(goals[idx])
    goal_text = raw_text[start:end]
    temp = re.sub(r"\n+", "", goal_text).strip()
    description = g_desc.search(temp).group(1)
    
    o_texts = re.findall(r"Objective \d+: [\w|( +)|-]+", temp, flags=re.DOTALL)
    objectives = {i: {} for i in range(len(o_texts))}
    for o_idx in range(len(o_texts)-1):
        end = temp.index(f"{o_texts[o_idx+1]}")
        start = temp.index(f"{o_texts[o_idx]}") + len(o_texts[o_idx])
        objectives[o_idx] = {"name": o_texts[o_idx], "description": temp[start+2:end].encode('ascii', errors='ignore').strip().decode('ascii')}
    objectives[len(o_texts)-1] = {"name": o_texts[len(o_texts)-1], "description": temp[(temp.index(f"{o_texts[-1]}") + len(o_texts[-1]))+2:].encode('ascii', errors='ignore').strip().decode('ascii')}
    
    final_dict[idx] = {"name": re.findall(r"Goal \d: ([\w ]+)",goals[idx]), "description": clean_text(description).encode('ascii', errors='ignore').strip().decode('ascii'), "objectives": objectives}

start = raw_text.index(f"{goals[-1]}") + len(goals[-1])
goal_text = raw_text[start:]
temp = re.sub(r"\n+", "", goal_text).strip()
description = g_desc.search(temp).group(1)

o_texts = re.findall(r"Objective \d+: [\w|( +)|-]+", temp, flags=re.DOTALL)
objectives = {i: {} for i in range(len(o_texts))}
for o_idx in range(len(o_texts)-1):
    end = temp.index(f"{o_texts[o_idx+1]}")
    start = temp.index(f"{o_texts[o_idx]}") + len(o_texts[o_idx])
    objectives[o_idx] = {"name": o_texts[o_idx], "description": temp[start+2:end].encode('ascii', errors='ignore').strip().decode('ascii')}
objectives[len(o_texts)-1] = {"name": o_texts[len(o_texts)-1], "description": temp[(temp.index(f"{o_texts[-1]}") + len(o_texts[-1]))+2:].encode('ascii', errors='ignore').strip().decode('ascii')}

final_dict[len(goals)-1] = {"name": re.findall(r"Goal \d: ([\w ]+)",goals[-1]), "description": clean_text(description).encode('ascii', errors='ignore').strip().decode('ascii'), "objectives": objectives}


In [27]:
import json
with open("./data/goals-and-objectives.json","w+") as f:
    json.dump(final_dict, f)