# md_hier - Markdown Hierarchy Parser

> Parse markdown documents into hierarchical dictionaries for easy navigation and access

In [None]:
from toolslm.md_hier import *

In [None]:
#| hide
from nbdev.showdoc import show_doc

The `md_hier` module provides utilities for parsing markdown documents and converting them into structured hierarchical dictionaries. This is particularly useful for processing documentation, extracting sections, or navigating complex markdown files programmatically.

## Overview

The module contains two main functions:
- `markdown_to_dict`: Creates a flat dictionary with dot-separated keys representing the hierarchy
- `create_heading_dict`: Creates a nested dictionary structure matching the markdown hierarchy

Both functions handle code blocks properly by ignoring headings that appear within fenced code blocks.

In [None]:
show_doc(markdown_to_dict)

---

### markdown_to_dict

>      markdown_to_dict (markdown_content:str)

*Parse markdown content into a hierarchical dictionary with dot-separated keys.*

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| markdown_content | str | Markdown text including headings |
| **Returns** | **AttrDict** | **Dictionary with dot-separated hierarchical keys and content values** |

`markdown_to_dict` parses markdown content and returns a flat dictionary where keys represent the hierarchical path using dot notation. Each heading becomes a key, and the value contains all content under that heading (including subheadings).

- **Hierarchical keys**: Uses dot notation (e.g., `"Parent.Child.Grandchild"`)
- **Content preservation**: Each section includes its heading and all content up to the next heading of equal or higher level
- **Code block awareness**: Ignores headings inside fenced code blocks
- **Special character cleaning**: Removes special characters from heading names for clean keys

Let's see it in action:

In [None]:
sample_md = """
# Introduction

Welcome to our documentation.

## Getting Started

Follow these steps to begin.

### Installation

Run the following command:

```bash
pip install our-package
```

### Configuration

Set up your config file.

## Advanced Usage

For advanced users only.

# Appendix

Additional resources.
"""

result = markdown_to_dict(sample_md)
print("Available sections:")
for key in result.keys():
    print(f"  {key}")

Available sections:
  Introduction
  Introduction.Getting Started
  Introduction.Getting Started.Installation
  Introduction.Getting Started.Configuration
  Introduction.Advanced Usage
  Appendix


You can access any section directly:

In [None]:
print("Installation section:")
print(result['Introduction.Getting Started.Installation'])

Installation section:
### Installation

Run the following command:

```bash
pip install our-package
```


Notice how parent sections contain all their child content:

In [None]:
print(result['Introduction.Getting Started'][:200] + "...")

## Getting Started

Follow these steps to begin.

### Installation

Run the following command:

```bash
pip install our-package
```

### Configuration

Set up your config file....


In [None]:
show_doc(create_heading_dict)

---

### create_heading_dict

>      create_heading_dict (text:str)

*Create a nested dictionary structure from markdown headings.*

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| text | str | The markdown text to parse |
| **Returns** | **AttrDict** | **Nested dictionary structure representing the heading hierarchy** |

`create_heading_dict` creates a nested dictionary structure that mirrors the markdown hierarchy. Unlike `markdown_to_dict`, this returns a tree-like structure where each heading becomes a dictionary key containing its subheadings.

- **Nested structure**: Creates a tree-like dictionary hierarchy
- **Navigation friendly**: Easy to traverse programmatically
- **Code block filtering**: Removes code blocks before processing
- **AttrDict support**: Returns `dict2obj` for attribute-style access

Let's see the nested structure:

In [None]:
nested_result = create_heading_dict(sample_md)
nested_result

```json
{ 'Appendix': {},
  'Introduction': { 'Advanced Usage': {},
                    'Getting Started': { 'Configuration': {},
                                         'Installation': {}}}}
```

You can navigate the structure using attribute or dictionary access:

In [None]:
assert 'Getting Started' in nested_result.Introduction
assert 'Installation' in nested_result.Introduction['Getting Started']
print(list(nested_result.Introduction['Getting Started'].keys()))

['Installation', 'Configuration']


## Comparison: Flat vs Nested

Use `markdown_to_dict` when:

- You need the actual content of sections
- You want to search or extract specific sections by path
- You're building content extraction tools
- You need a simple key-value lookup

Use `create_heading_dict` when:

- You need to understand document structure
- You're building navigation interfaces
- You want to traverse the hierarchy programmatically
- You need to check for the existence of sections without loading content

## Edge Cases and Special Handling

Both functions handle several edge cases gracefully:

### Code Block Protection

Headings inside code blocks are ignored:

In [None]:
code_md = """
# Real Heading

This is real content.

```python
# This is not a heading - it's code!
print("Hello world")
## Neither is this
```

# Another Real Heading

More content.
"""

code_result = markdown_to_dict(code_md)
print("Parsed headings (code block headings ignored):")
for key in code_result.keys(): print(f"  {key}")

Parsed headings (code block headings ignored):
  Real Heading
  Another Real Heading


### Special Characters in Headings

Special characters are cleaned from keys but preserved in content:

In [None]:
special_md = """
# API *Reference* & Guide!

This heading has special characters.

## Getting [Started] - Quick Guide

Subheading with brackets and dashes.
"""

special_result = markdown_to_dict(special_md)
print("Clean keys:")
for key in special_result.keys(): print(f"  '{key}'")

print("\nOriginal content preserved:")
print(special_result['API Reference  Guide.Getting Started  Quick Guide'][:100] + "...")

Clean keys:
  'API Reference  Guide'
  'API Reference  Guide.Getting Started  Quick Guide'

Original content preserved:
## Getting [Started] - Quick Guide

Subheading with brackets and dashes....


### Different Starting Levels

The functions adapt to documents that don't start with `#`:

In [None]:
level3_md = """
### Level 3 Start

Document starting at level 3.

#### Sub-level

Content here.

### Another Level 3

More content.
"""

level3_result = markdown_to_dict(level3_md)
print("Keys (relative to starting level):")
for key in level3_result.keys(): print(f"  {key}")

Keys (relative to starting level):
  Level 3 Start
  Level 3 Start.Sublevel
  Another Level 3
