# Adding GitHub Pages Header

- 09/20/23

## Goal

- Header that needs to be added to the top of any page to include:
```
___
layout: page
title: "PAGE-TITLE"
permalink: /URL-PATH
___
```


- This notebook will update pre-selected markdown files to ensure they have the correct header added.

### Questions/Thoughts
- Note sure if it matters if using the repo-hosted images or the ones from the LP
- Am going to assume the local ones (which end with _Repo.md) for now

In [20]:
import os, glob

pages_files = sorted(glob.glob("../../**/*_Repo.md", recursive=True))
pages_files

['../../docs/_source/Installation Overview-v23_Repo.md',
 '../../docs/_source/Instructions-updating-dojo-env-v23_Repo.md',
 '../../docs/_source/instructions-mac-intel-v23_Repo.md',
 '../../docs/_source/instructions-mac-mchip-v23_Repo.md',
 '../../docs/_source/instructions-windows-v23_Repo.md']

In [21]:
base_url = "https://coding-dojo-data-science.github.io"


> Constructing the header info for each of the found pages. (note: there is definitely a more programmatic way to do this, but do not have the time right now to figure out)

In [22]:
page = pages_files[0]
page

'../../docs/_source/Installation Overview-v23_Repo.md'

In [23]:
from pathlib import Path

page_file = Path(page)
page_file.exists()

True

In [24]:
txt = page_file.read_text()
txt

'# Python Installation for Data Science - Overview\n\n___\n\n- [Click here](https://hackmd.io/@jirvingphd/dojo-env-overview) for the web version of these instructions.\n\n___\n\n\n\n<center>\n<img src="images/Data Science Thumbnail.png_raw=true" width=500px></center>\n\n\nSo far in this program, you have worked in Google Colab, which provides a cloud-based coding environment. \n- We will transition to using a Python environment stored on your local machine. \n    - Jupyter Notebook will replace Google Colab. \n    - GitHub Desktop will sync your work.\n___\n\n## Installation Timeline/Deadline\n\n- In the Data Enrichment course, you will need to submit a CORE ASSIGNMENT containing the error-free test notebook that is included within these instructions. This will ensure that you have the tools you will need to be successful.\n- We recommend you begin the step-by-step installation AS SOON AS POSSIBLE to ensure you have time to troubleshoot any difficulties you may encounter.\n- These step

In [25]:
import re

def extract_first_h1(text):
    # Remove anything before the first newline character
    # text = text.split("\n", 1)[0]
    repl_parentheses = "_"
    to_replace = {'*':'',
                  ' (':repl_parentheses,
                  ') ':repl_parentheses,
                  '(':repl_parentheses,
                  ')':repl_parentheses,
                  '  ':'',
                  }
    for char,replacement in to_replace.items():
        text = text.replace(char,replacement)
    
    # The pattern looks for a line that starts with '# ' and captures everything after that until the end of the line.
    pattern = r"^# (.+)$"
    
    # Use re.search to find the first match in the text
    match = re.search(pattern, text, re.MULTILINE)
    
    # If a match is found, return the captured group, otherwise return None
    return match.group(1) if match else None


In [26]:
title = extract_first_h1(txt)
title

'Python Installation for Data Science - Overview'

In [27]:
FILES = {}
for page in pages_files:
    page_file = Path(page)
    txt = page_file.read_text()
    title = extract_first_h1(txt)
    FILES[title] = {'file':page,
                    'text':txt,
                    'title':title}
    print(title)
FILES.keys()

Python Installation for Data Science - Overview
Updating to a new dojo-env
Mac_Intel_Installation Overview
Mac_Apple Chip_Installation Overview
Windows Installation Instructions


dict_keys(['Python Installation for Data Science - Overview', 'Updating to a new dojo-env', 'Mac_Intel_Installation Overview', 'Mac_Apple Chip_Installation Overview', 'Windows Installation Instructions'])

## Next step:construct permalink

In [28]:
links_dict = {'overview':'dojo-env-setup',
              'windows':'dojo-env-windows',
              'apple chip': 'dojo-env-mac-apple-chip',
              'apple intel':'dojo-env-mac-intel'}

In [29]:
test_keys = list(FILES.keys())
key = test_keys[0]
key

'Python Installation for Data Science - Overview'

In [30]:
FILES[key]

{'file': '../../docs/_source/Installation Overview-v23_Repo.md',
 'text': '# Python Installation for Data Science - Overview\n\n___\n\n- [Click here](https://hackmd.io/@jirvingphd/dojo-env-overview) for the web version of these instructions.\n\n___\n\n\n\n<center>\n<img src="images/Data Science Thumbnail.png_raw=true" width=500px></center>\n\n\nSo far in this program, you have worked in Google Colab, which provides a cloud-based coding environment. \n- We will transition to using a Python environment stored on your local machine. \n    - Jupyter Notebook will replace Google Colab. \n    - GitHub Desktop will sync your work.\n___\n\n## Installation Timeline/Deadline\n\n- In the Data Enrichment course, you will need to submit a CORE ASSIGNMENT containing the error-free test notebook that is included within these instructions. This will ensure that you have the tools you will need to be successful.\n- We recommend you begin the step-by-step installation AS SOON AS POSSIBLE to ensure you h

In [31]:
# determine the urls
for file, details in FILES.items():
	title = details['title']
	for page_type, url_end in links_dict.items():
		if page_type in title.lower():
			url = base_url +'/' + url_end
			
			FILES[file]['url'] = url
			print(url)

https://coding-dojo-data-science.github.io/dojo-env-setup
https://coding-dojo-data-science.github.io/dojo-env-setup
https://coding-dojo-data-science.github.io/dojo-env-setup
https://coding-dojo-data-science.github.io/dojo-env-mac-apple-chip
https://coding-dojo-data-science.github.io/dojo-env-windows


In [32]:
details = FILES[key]
details

{'file': '../../docs/_source/Installation Overview-v23_Repo.md',
 'text': '# Python Installation for Data Science - Overview\n\n___\n\n- [Click here](https://hackmd.io/@jirvingphd/dojo-env-overview) for the web version of these instructions.\n\n___\n\n\n\n<center>\n<img src="images/Data Science Thumbnail.png_raw=true" width=500px></center>\n\n\nSo far in this program, you have worked in Google Colab, which provides a cloud-based coding environment. \n- We will transition to using a Python environment stored on your local machine. \n    - Jupyter Notebook will replace Google Colab. \n    - GitHub Desktop will sync your work.\n___\n\n## Installation Timeline/Deadline\n\n- In the Data Enrichment course, you will need to submit a CORE ASSIGNMENT containing the error-free test notebook that is included within these instructions. This will ensure that you have the tools you will need to be successful.\n- We recommend you begin the step-by-step installation AS SOON AS POSSIBLE to ensure you h

```
___
layout: page
title: "PAGE-TITLE"
permalink: /URL-PATH
___
```

In [33]:
header = f'___\nlayout: page \ntitle: "{details["title"]}" \npermalink: {details["url"]}\n___'
print(header)

___
layout: page 
title: "Python Installation for Data Science - Overview" 
permalink: https://coding-dojo-data-science.github.io/dojo-env-setup
___


In [126]:
# def make_header(details):
# 	header = f"""___
# 	layout: page
# 	title: "{details["title"]}"
# 	permalink: {details["url"]}
# 	"""
# 	header = header.replace('\t','')
#     # header = f'___\nlayout: page \ntitle: "{details["title"]}" \npermalink: {details["url"]}\n___'
# 	return header +'\n'

def make_header(details,out_folder):
    title = details['title']
    out_folder = out_folder
    header = "| layout | title | parent| nav_order |"
    header+='\n' 
    header+="| ------ | -------------------------- | -------------------------------- | --------- |"
    header+=f"\n| page   |{title} | {out_folder} | 4         |"
    return header

In [127]:

# header = "| layout | title                      | parent                           | nav_order |"
# header+='\n' 
# header+="| ------ | -------------------------- | -------------------------------- | --------- |"
# header+=f"\n| page   |{title} | {out_folder} | 4         |"
# print(header)

In [130]:
test_header = make_header(FILES['Mac_Apple Chip_Installation Overview'], out_folder)
Markdown(test_header)

| layout | title | parent| nav_order |
| ------ | -------------------------- | -------------------------------- | --------- |
| page   |Mac_Apple Chip_Installation Overview | ../_posts/ | 4         |

In [103]:
from IPython.display import display, Markdown

In [104]:
output_text  = header+"\n" + details['text']
# display(Markdown(output_text[:1000]))

# Saving the final files to _posts

In [105]:
# 
orig_fname = FILES['Mac_Apple Chip_Installation Overview']['file'] 
orig_fname

'../../docs/_source/instructions-mac-mchip-v23_Repo.md'

> CHANGER OUTFOLDER!

In [106]:
fname_out = out_folder + orig_fname.split('source/')[-1]
fname_out =fname_out.replace('-v23_Repo','-pages')
fname_out

'../_posts/instructions-mac-mchip-pages.md'

## Full Workflow

In [107]:
import os, glob
import re



In [134]:
# PARAMS TO SET
base_url = "https://coding-dojo-data-science.github.io"
out_folder = "../Instructions/"

links_dict = {'overview':'dojo-env-setup',
              'windows':'dojo-env-windows',
              'apple chip': 'dojo-env-mac-apple-chip',
              'apple intel':'dojo-env-mac-intel',
              'updat':'update-dojo-env'}

order_dict = {'overview':1,
              'windows':2,
              'apple intel':3,
              'apple chip':4,
              'updat':5}

In [135]:

def extract_first_h1(text):
    # Remove anything before the first newline character
    # text = text.split("\n", 1)[0]
    repl_parentheses = "_"
    to_replace = {'*':'',
                  ' (':repl_parentheses,
                  ') ':repl_parentheses,
                  '(':repl_parentheses,
                  ')':repl_parentheses,
                  '  ':'',
                  }
    for char,replacement in to_replace.items():
        text = text.replace(char,replacement)
    
    # The pattern looks for a line that starts with '# ' and captures everything after that until the end of the line.
    pattern = r"^# (.+)$"
    
    # Use re.search to find the first match in the text
    match = re.search(pattern, text, re.MULTILINE)
    
    # If a match is found, return the captured group, otherwise return None
    return match.group(1) if match else None


# def make_header(details):
# 	header = f'___\nlayout: page \ntitle: "{details["title"]}" \npermalink: {details["url"]}\n___'
# 	return header +'\n'


def make_header(details,out_folder):
    title = details['title']
    out_folder = out_folder
    order = details['order']
    header = "| layout | title | parent| nav_order |"
    header+='\n' 
    header+="| ------ | -------------------------- | -------------------------------- | --------- |"
    header+=f"\n| page   |{title} | {out_folder} | 4         |"
    return header

In [136]:
# Get list of files
pages_files = sorted(glob.glob("../../**/*_Repo.md", recursive=True))
pages_files

['../../docs/_source/Installation Overview-v23_Repo.md',
 '../../docs/_source/Instructions-updating-dojo-env-v23_Repo.md',
 '../../docs/_source/instructions-mac-intel-v23_Repo.md',
 '../../docs/_source/instructions-mac-mchip-v23_Repo.md',
 '../../docs/_source/instructions-windows-v23_Repo.md']

In [137]:
links_dict

{'overview': 'dojo-env-setup',
 'windows': 'dojo-env-windows',
 'apple chip': 'dojo-env-mac-apple-chip',
 'apple intel': 'dojo-env-mac-intel',
 'updat': 'update-dojo-env'}

In [140]:
FILES = {}
for page in pages_files:
    page_file = Path(page)
    txt = page_file.read_text()
    title = extract_first_h1(txt)
    
    details = {'file':page,
                    'text':txt,
                    'title':title}
    
    for page_type, url_end in links_dict.items():
        if page_type.lower() in title.lower():
            url = base_url +'/' + url_end
            details['url'] = url
            
    
    for page_type, order in order_dict.items():
        if page_type.lower() in title.lower():
            url = base_url +'/' + url_end
            details['order'] = order
                  
            
    if 'url' not in details:
        print(details.keys())
        print(details['title'])
        print(details)
        raise Exception

    
    header = make_header(details,out_folder)
    # details['header'] = header
    
    output_text  = header+"\n" + details['text']
    details['output_text'] = output_text
			# print(url)
    
    FILES[title] = details
    print(title)
FILES.keys()

Python Installation for Data Science - Overview
Updating to a new dojo-env
Mac_Intel_Installation Overview
Mac_Apple Chip_Installation Overview
Windows Installation Instructions


dict_keys(['Python Installation for Data Science - Overview', 'Updating to a new dojo-env', 'Mac_Intel_Installation Overview', 'Mac_Apple Chip_Installation Overview', 'Windows Installation Instructions'])

In [141]:


for title, details in FILES.items():
    orig_fname = details['file']
    output_text = details['output_text']
    # # Get header
    # header = make_header(details)
    # output_text  = header+"\n" + details['text']
    # Make final file name
    fname_out = out_folder + orig_fname.split('source/')[-1]
    fname_out =fname_out.replace('-v23_Repo','-pages').replace('.md','.markdown')
    

    with open(fname_out,'w') as f:
        f.write(output_text)
     



## Copy images to sub-folder in posts (i know not ideal)

In [142]:
import shutil
img_folder = out_folder+'images/'
img_folder


'../_posts/images/'

In [143]:
img_src_folder = "images/"

In [144]:
shutil.copytree(img_src_folder,img_folder,dirs_exist_ok=True)

'../_posts/images/'