# Convert Hugo to Quarto

This notebook looks into exporting the majority of `.mmark` posts for Quarto.

# Get the data

In [1]:
from pathlib import Path 

input_dir = Path('../data')
output_dir = Path('../output')

Clean out the input and output directories

In [2]:
!rm -rf {input_dir}
!rm -rf {output_dir}

Clone the latest Hugo version of the repository

In [3]:
!git clone --branch hugo-eol --depth 1 https://github.com/EdwardJRoss/skeptric.git {input_dir}

Cloning into '../data'...
remote: Enumerating objects: 1223, done.[K
remote: Counting objects: 100% (1223/1223), done.[K
remote: Compressing objects: 100% (1184/1184), done.[K
remote: Total 1223 (delta 21), reused 1070 (delta 18), pack-reused 0[K
Receiving objects: 100% (1223/1223), 48.29 MiB | 6.07 MiB/s, done.
Resolving deltas: 100% (21/21), done.
Note: switching to '396f3a5218deb5ed0a811d232bb22a808d78543e'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false



Create a new blank quarto Blog

In [4]:
!quarto create-project {output_dir} --type website:blog

Creating project at [1m/home/eross/src/projects/hugo2quarto/output[22m:
  - Created _quarto.yml
  - Created .gitignore
  - Created index.qmd
  - Created posts/welcome/index.qmd
  - Created posts/post-with-code/index.qmd
  - Created about.qmd
  - Created styles.css
  - Created posts/_metadata.yml


Because we want to store all posts in folders at the top level (rather than under posts) we need to update the listing `contents` on the home page appropriately.

In [5]:
!sed -i 's|contents: posts|contents: "/*/*.md"|' {output_dir}/index.qmd

In [6]:
!head ../output/index.qmd

---
title: "output"
listing:
  contents: "/*/*.md"
  sort: "date desc"
  type: default
  categories: true
  sort-ui: false
  filter-ui: false
page-layout: full


# Get the posts

Most of the posts are individual files in content/post

In [7]:
post_dir = (input_dir / "content") / "post"

In [8]:
post_paths = sorted(post_dir.iterdir())
len(post_paths)

485

## Check the filetypes

In [9]:
post_paths[0].suffix

'.mmark'

*Mostly* mmark files, with a few md and HTML and some Rmd files.

In [10]:
from collections import Counter

Counter(p.suffix for p in post_paths)

Counter({'.mmark': 466, '.md': 12, '.Rmd': 3, '.html': 3, '.R': 1})

Let's look at the remaining files.
The html files are actually output from the Rmd files.

In [11]:
for p in post_paths:
    if p.suffix != '.mmark':
        print(p.name)

analytic-worth.md
athena-r.md
bayes_toy_coin.Rmd
bayes_toy_coin.html
blogdown.Rmd
blogdown.html
building-layered-api-with-fashion-mnist.md
calculate-centroid-on-sphere.md
duckworth-lewis.md
fashion-mnist-with-prototype-methods.md
hackernews-dataset-eda.md
jupyter-hugo-blog.md
lower-precision.md
monad-by-example.md
peeling-fastai-layered-api-with-fashion-mnist.md
plotting-bayesian-parameters-tidyverse.Rmd
plotting-bayesian-parameters-tidyverse.html
rule-of-five.R
sentencetransformers-to-tensorflow.md


Some (but not all) of the markdown posts are generated from Jupyter notebooks

In [12]:
!ls ../data/notebooks/

building-layered-api-with-fashion-mnist.ipynb
calculate-centroid-on-sphere.ipynb
fashion-mnist-with-prototype-methods.ipynb
hackernews-dataset-eda.ipynb
jupyter-hugo-blog.ipynb
peeling-fastai-layered-api-with-fashion-mnist.ipynb
sentencetransformers-to-tensorflow.ipynb


In [13]:
!ls ../data/static/

css	     images		notebooks  posts      rmarkdown-libs
favicon.ico  jupyter-hugo-blog	post	   resources


### Parsing frontmatter

In [14]:
path = post_paths[0]
path

PosixPath('../data/content/post/2020-headphones.mmark')

In [15]:
import frontmatter

In [16]:
post = frontmatter.load(path)

type(post)

frontmatter.Post

The most common keys are tags, title, date and feature_image

In [17]:
dict(post)

{'tags': ['tools'],
 'title': 'Bluetooth Headphones in 2020',
 'date': '2020-10-21T21:27:08+11:00',
 'feature_image': '/images/jabra85h.jpg'}

These are the most common keys by far

In [18]:
key_counter = Counter()
for path in post_paths:
    key_counter.update(frontmatter.load(path).keys())
key_counter.most_common()

[('title', 484),
 ('date', 484),
 ('feature_image', 450),
 ('tags', 439),
 ('draft', 47),
 ('image', 14),
 ('categories', 6),
 ('description', 2),
 ('author', 2),
 ('output', 2),
 ('featured_image', 1),
 ('feature_image_url', 1),
 ('feature_source', 1)]

The exceptions are mostly drafts or Rmd files.

In [19]:
key_counter = Counter()
for path in post_paths:
    meta = dict(frontmatter.load(path))
    if path.suffix == '.mmark' and 'draft' not in meta:
        key_counter.update(meta.keys())
key_counter.most_common()

[('title', 424),
 ('date', 424),
 ('feature_image', 424),
 ('tags', 396),
 ('categories', 2),
 ('feature_image_url', 1),
 ('feature_source', 1),
 ('description', 1)]

We can convert these into the right tags for quarto

In [20]:
meta_map = {
    'title': 'title',
    'date': 'date',
    'feature_image': 'image',
    'tags': 'categories',
    'draft': 'draft',
}

In [21]:
post_meta = {meta_map.get(k, k): v for k, v in dict(post).items()}

post_meta

{'categories': ['tools'],
 'title': 'Bluetooth Headphones in 2020',
 'date': '2020-10-21T21:27:08+11:00',
 'image': '/images/jabra85h.jpg'}

We can use frontmatter to convert them to YAML.

In [22]:
print(frontmatter.dumps(frontmatter.Post(post.content, handler=frontmatter.YAMLHandler(), **post_meta))[:200])

---
categories:
- tools
date: '2020-10-21T21:27:08+11:00'
image: /images/jabra85h.jpg
title: Bluetooth Headphones in 2020
---

I've been looking for some bluetooth headphones that I can use both on a 


In [23]:
#export

meta_map = {
    'title': 'title',
    'date': 'date',
    'feature_image': 'image',
    'tags': 'categories',
    'draft': 'draft',
}

def post_hugo2quarto(post):
    post_meta = {meta_map.get(k, k): v for k, v in dict(post).items()}
    return frontmatter.Post(post.content, handler=frontmatter.YAMLHandler(), **post_meta)

In [24]:
print(frontmatter.dumps(post_hugo2quarto(post))[:200])

---
categories:
- tools
date: '2020-10-21T21:27:08+11:00'
image: /images/jabra85h.jpg
title: Bluetooth Headphones in 2020
---

I've been looking for some bluetooth headphones that I can use both on a 


# Exporting

In [25]:
path = post_paths[0]
path

PosixPath('../data/content/post/2020-headphones.mmark')

In [26]:
output_path = output_dir / path.stem

output_path

PosixPath('../output/2020-headphones')

Let's try an example

In [27]:
output_path.mkdir()

post = frontmatter.load(path)
output_post = post_hugo2quarto(post)
frontmatter.dump(output_post, output_path / 'index.md')

In [28]:
!head -n 10 {output_path}/index.md

---
categories:
- tools
date: '2020-10-21T21:27:08+11:00'
image: /images/jabra85h.jpg
title: Bluetooth Headphones in 2020
---

I've been looking for some bluetooth headphones that I can use both on a mobile phone and a computer at the same time.
I want something portable enough to take with me, but comfortable enough to wear all day.


Clean up

In [29]:
(output_path / 'index.md').unlink()
output_path.rmdir()

Put this together into a function

In [30]:
#export
def export_post(path, output_dir=output_dir):
    output_path = output_dir / path.stem
    output_path.mkdir()
    post = frontmatter.load(path)
    output_post = post_hugo2quarto(post)
    frontmatter.dump(output_post, output_path / 'index.md')

And export all the `.mmark` files

In [31]:
for path in post_paths:
    if path.suffix == '.mmark':
        export_post(path)

Also copy the static resources

In [32]:
import shutil

for folder in ['images', 'notebooks', 'resources', 'post']:
    shutil.copytree((input_dir / "static") / folder, output_dir / folder)