## Requirements

First of all the GF markdown files need to be in the folder GF_FOLDER. They can be downloaded from there GitLab if you have access to it.

### Pip packages

* markdown
* beautifulsoup4

## Workflow

* Scrapes the first few chapters of GF by scraping the structure file
* Iterates over each of these chapters so they get all subdirectory names and directories in a Chapter class in sthe CHAPTER_TREE
* Adds all markdown files under each directory and puts them in File class
* Saves the root Chapter tree to a .pkl file which basically holds the whole structure of the GF
* At the bottom I show how to use it

In [1]:
import os
import markdown
from bs4 import BeautifulSoup
import pickle

from utils import Chapter, File

In [2]:
GF_FOLDER = os.path.join("C:", os.sep, "Users", "ManuV", "Documents", "Bachelorproef", "BP_Info_Support", "docs")
STRUCTURE_FILE = "structure.md"
CHAPTER_TREE = Chapter("root",GF_FOLDER)

SAVE_TREE_DIR = os.path.join("C:", os.sep, "Users", "ManuV", "Documents", "Bachelorproef", "BP_Info_Support", "data")

In [3]:
# Reads structure file from STRUCTURE_FILE in GF_FOLDER
structure_dir = os.path.join(GF_FOLDER,STRUCTURE_FILE)
with open(structure_dir, 'r') as f:
    tempMd= f.read()

# Convert markdown file to HTML
structure_file_html = markdown.markdown(tempMd)

In [4]:
# Makes assumption that chapters names are in h2 element with a links
soup_structure = BeautifulSoup(structure_file_html, 'html.parser')
raw_chapters = soup_structure.find_all('h2')

for chapter in raw_chapters:
    link = chapter.find('a')
    name = os.path.basename(link.get('href').replace('index.md','')[:-1])
    href, text = link.get('href').replace('index.md','')[2:], name
    CHAPTER_TREE.add_child(Chapter(text,os.path.join(CHAPTER_TREE.dir,href)))

In [5]:
CHAPTER_TREE.print_tree()

- root
  - 05-essence-model
  - 01-customer-needs
  - 02-delivery-guidance
  - 03-architecture-guidance
  - 04-technology-guidance


In [6]:
# Adds every directory not in blacklist to the CHAPTER_TREE
blacklist = ["images","documents"]
def visit_directory(chapter):
    dir = os.path.join(chapter.parent.dir,chapter.dir)
    for root, dirs, files in os.walk(dir):
        for dir in dirs:
            if not dir in blacklist:
                child = Chapter(dir, os.path.join(root, dir))
                chapter.add_child(child)
                visit_directory(child)

In [7]:
for chapter in CHAPTER_TREE.children:
    visit_directory(chapter)

In [8]:
CHAPTER_TREE.print_tree()

- root
  - 05-essence-model
    - principles
  - 01-customer-needs
    - 01-business-value-and-scoping
  - 02-delivery-guidance
    - 01-delivery-approaches
    - 02-practices
      - adr-examples
      - specification-by-example
      - team-kickoff
        - 0-preparation
        - 1-meet
        - 2-team-building
        - 3-group-roles
        - 4-professional-roles
        - 5-project-goal
        - 6-personal-goals
        - 7-working-arrangements
      - team-topologies
      - 0-preparation
      - 1-meet
      - 2-team-building
      - 3-group-roles
      - 4-professional-roles
      - 5-project-goal
      - 6-personal-goals
      - 7-working-arrangements
    - 03-activities
      - 01-customer
      - 02-solution
      - 03-delivery
    - 04-phases
    - 05-roles
    - 06-remote-working
      - 02-tools-practices
      - 03-meetings
    - adr-examples
    - specification-by-example
    - team-kickoff
      - 0-preparation
      - 1-meet
      - 2-team-building
      - 3-group

In [9]:
blacklist = []
def get_markdown_files(directory):
    markdown_files = []
    for file_name in os.listdir(directory):
        file_path = os.path.join(directory, file_name)
        if os.path.isfile(file_path) and file_name.endswith(".md") and file_name not in blacklist:
            file = File(os.path.basename(file_path),file_path)
            markdown_files.append(file)
    return markdown_files

In [10]:
def add_markdown_files(chapter):
    chapter.files = get_markdown_files(chapter.dir)
    for child in chapter.children:
        add_markdown_files(child)

In [11]:
# Adds markdown files to each Chapter. Chapter under nodes have all files in their entire structure.
add_markdown_files(CHAPTER_TREE)

In [12]:
# Save structure tree
with open(os.path.join(SAVE_TREE_DIR,"gf_structure.pkl"), "wb") as f:
    pickle.dump(CHAPTER_TREE, f)

# Load structure tree
# with open(os.path.join(SAVE_TREE_DIR,"gf_structure.pkl"), "rb") as f:
#     root = pickle.load(f)

In [13]:
def find_chapter_by_name(root, name):
    if root.name == name:
        return root
    for child in root.children:
        found_chapter = find_chapter_by_name(child, name)
        if found_chapter:
            return found_chapter
    return None

def find_chapter_by_dir(root, dir):
    if root.dir == dir:
        return root
    for child in root.children:
        found_chapter = find_chapter_by_dir(child, dir)
        if found_chapter:
            return found_chapter
    return None

In [14]:
rnd_chapter = CHAPTER_TREE.get_random_chapter()
print(f'Chapter Name: {rnd_chapter.name}')
rnd_file = rnd_chapter.get_random_file()
print(f'File Name: {rnd_file.name}')
print(f'File Dir: {rnd_file.dir}')
print('Structured Text in file')
rnd_file.text

Chapter Name: 02-azure
File Name: 01-azure-kubernetes-service.md
File Dir: C:\Users\ManuV\Documents\Bachelorproef\BP_Info_Support\docs\04-technology-guidance/04b-runtime-infrastructure\05-container-orchestration\02-azure\01-azure-kubernetes-service.md
Structured Text in file


{'Azure Kubernetes Service (AKS)': ['For official product documentation visit theAzure Kubernetes Service (AKS)documentation page.'],
 'Kubernetes version support in AKS': ['Kubernetes releases a new version every 3 months. Versions are supported 9 - 12 months.AKS will only support versions in the "N-2" window.',
  'N is the latest supported AKS version, which is 1 version behind the latest Kubernetes version.As an example, if the latest available Kubernetes version is 1.19, the following table of support status apply:| Kubernetes | Notation | AKS Support | | ---------- | -------- | ----------- | | 1.19       | N+1      | Preview     | | 1.18       | N        | Supported   | | 1.17       | N-1      | Supported   | | 1.16       | N-2      | Supported   | | 1.15       | N-3      | Unsupported |'],
 'Updating a cluster': ['Microsoft does not activly update the Kubernetes cluster. This is left as a task for the customer. This is even true if a cluster falls behind on the supported versions