# Create portable version

<img src="Export_jupiter.webp" style="width:280px; height:280px;">

This page describes and actually impliments, how to make the jupyter notebooks of the whole project portable.

# Recursive HTML exporting and link replacement

This one is in use right now to create linked and working html pages.

To achieve this, you can write a Python script that:

1. Recursively searches for `.ipynb` files in a directory.
2. Exports each `.ipynb` file to an HTML file.
3. Replaces links within the HTML files that point to `.ipynb` files with `.html` links (so that the links work correctly in a browser).

You can use the following libraries:
- `os` or `pathlib` for file traversal.
- `nbconvert` for converting `.ipynb` to `.html`.
- `re` for regular expressions to replace links in HTML content.

## Prerequesits

### Install the Required Modules

You can install the necessary modules using the following commands in your terminal or command prompt.

1.  **Install  `nbconvert`**: This module is used to convert Jupyter notebooks to HTML.

In [1]:
pip install nbconvert

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


2.  **Install  `nbformat`**: This module is used to read and write Jupyter notebook files.

In [2]:
pip install nbformat

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



Here's an outline of the script:

### Script: Convert and Modify Links in HTML

In [33]:
import os
import nbformat
from nbconvert import HTMLExporter
import re

def convert_ipynb_to_html(ipynb_file):
    # Load the notebook
    with open(ipynb_file, 'r', encoding='utf-8') as f:
        notebook = nbformat.read(f, as_version=4)
    
    # Convert to HTML
    html_exporter = HTMLExporter()
    (body, resources) = html_exporter.from_notebook_node(notebook)
    
    # Define the HTML filename
    html_filename = os.path.splitext(ipynb_file)[0] + '.html'

    if os.path.isfile(html_filename):
        os.remove(html_filename)
    
    # Write the HTML file in the same directory as the ipynb file
    with open(html_filename, 'w', encoding='utf-8') as f:
        f.write(body)
    
    return html_filename

def replace_ipynb_links_in_html(html_file, root_dir):
    with open(html_file, 'r', encoding='utf-8') as f:
        content = f.read()
    
    # Replace .ipynb links with relative .html links
    updated_content = re.sub(
        r'(?<=href=["\'])(.*?\.ipynb)(#.*)?(?=["\'])', 
        lambda match: make_relative_html_link(match.group(1), match.group(2), root_dir), 
        content
    )
    
    # Write the updated content back to the file
    with open(html_file, 'w', encoding='utf-8') as f:
        f.write(updated_content)

def make_relative_html_link(ipynb_link, anchor, root_dir):
    # Convert .ipynb link to corresponding .html link
    html_link = os.path.splitext(ipynb_link)[0] + '.html'
    
    # Get the absolute path of the html link
    absolute_html_path = os.path.abspath(html_link)
    
    # Remove the root directory (CWD) from the absolute path to make it relative
    relative_html_path = os.path.relpath(absolute_html_path, start=root_dir)

    if anchor:
        relative_html_path += anchor
    
    return relative_html_path

def recursive_convert_and_replace_links(root_dir):
    # Recursively find .ipynb files
    for subdir, dirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith('.ipynb'):
                ipynb_path = os.path.join(subdir, file)
                
                # Convert to HTML and save it in the same directory
                html_file = convert_ipynb_to_html(ipynb_path)
                
                # Replace .ipynb links with relative .html links, removing the cwd from the path
                replace_ipynb_links_in_html(html_file, root_dir)

if __name__ == "__main__":
    # Use the current working directory as the root directory
    root_directory = os.getcwd()
    
    recursive_convert_and_replace_links(root_directory)


### How the Script Works:

1. **Convert `.ipynb` to `.html`**: 
   - The `convert_ipynb_to_html` function uses `nbconvert` to convert a Jupyter Notebook (`.ipynb`) to an HTML file and saves it in the `output_dir`.

2. **Replace `.ipynb` Links with `.html` Links**:
   - The `replace_ipynb_links_in_html` function reads the generated HTML file, finds any links to `.ipynb` files, and replaces them with `.html` links.
   - This is done using a regular expression that matches the links in the HTML content.

3. **Recursively Process Files**:
   - The `recursive_convert_and_replace_links` function walks through the directory tree (`os.walk`) and processes all `.ipynb` files. It calls the conversion and replacement functions for each file found.

### Notes:

- **File Paths**: Make sure to update the `root_directory` and `output_directory` variables to match your desired input and output paths.
- **Dependencies**: You may need to install `nbconvert` and `nbformat` via `pip install nbconvert nbformat`.
  
This script will help you convert all `.ipynb` files to HTML and ensure that the links between notebooks are correctly pointing to the corresponding `.html` files.

## Create pdf with chrome headless

WiP

In [30]:
!/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --print-to-pdf-no-header --no-margins --window-size=1280,1024 --print-to-pdf="output.pdf" programming/Programmieren1.de.html

[76146:259:1027/095612.882640:ERROR:chrome_browser_main.cc(1141)] The use of Rosetta to run the x64 version of Chromium on Arm is neither tested nor maintained, and unexpected behavior will likely result. Please check that all tools that spawn Chromium are Arm-native.
[76146:36099:1027/095616.001720:ERROR:trust_store_mac.cc(821)] Error parsing certificate:
ERROR: Failed parsing extensions

3300299 bytes written to file output.pdf


## Create pdf with wkhtmltopdf

WiP

In [19]:
!brew install wkhtmltopdf

[34m==>[0m [1mAuto-updating Homebrew...[0m
Adjust how often this is run with HOMEBREW_AUTO_UPDATE_SECS or disable with
HOMEBREW_NO_AUTO_UPDATE. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
[34m==>[0m [1mDownloading https://github.com/wkhtmltopdf/packaging/releases/download/0.12.[0m
[34m==>[0m [1mDownloading from https://objects.githubusercontent.com/github-production-rel[0m
######################################################################### 100.0%
[32m==>[0m [1mInstalling Cask [32mwkhtmltopdf[39m[0m
[34m==>[0m [1mRunning installer for wkhtmltopdf with sudo; the password may be necessary.[0m
Password:



In [22]:
!wkhtmltopdf --enable-local-file-access --no-stop-slow-scripts --print-media-type programming/Programmieren1.de.html output.pdf

Loading pages (1/6)
Counting pages (2/6)                                                             ] 46%           ] 47%==>                             ] 50%
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)


# Create with pdfkit

WiP

In [13]:
pip install pdfkit bs4 requests pypdf2

Defaulting to user installation because normal site-packages is not writeable
Collecting pypdf2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
Installing collected packages: pypdf2
Successfully installed pypdf2-3.0.1
Note: you may need to restart the kernel to use updated packages.


In [16]:
import os
from bs4 import BeautifulSoup
import pdfkit
from PyPDF2 import PdfMerger

def get_all_local_links(html_file, base_folder):
    """
    Extracts all local links to other HTML files within the same folder.
    """
    links = set()
    with open(html_file, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')
        for a_tag in soup.find_all('a', href=True):
            href = a_tag['href']
            if href.endswith('.html') and os.path.exists(os.path.join(base_folder, href)):
                links.add(os.path.join(base_folder, href))
    return list(links)

def save_as_pdf(input_html, output_pdf):
    """
    Converts a local HTML file to a PDF.
    """
    options = {
        'enable-local-file-access': None,
        'quiet': ''
    }
    pdfkit.from_file(input_html, output_pdf, options=options)

def create_pdf_with_toc(main_html, base_folder, output_pdf):
    """
    Main function to create a merged PDF with a table of contents.
    """
    visited = set()  # Track visited HTML files to avoid duplicates
    links_to_visit = [os.path.join(base_folder, main_html)]
    merger = PdfMerger()  # PdfMerger to combine PDFs

    page_number = 1
    while links_to_visit:
        html_file = links_to_visit.pop(0)
        if html_file in visited:
            continue
        visited.add(html_file)
        
        pdf_path = f"page_{page_number}.pdf"
        print(f"Converting: {html_file}")
        save_as_pdf(html_file, pdf_path)  # Convert each HTML to a PDF
        merger.append(pdf_path)  # Append PDF to the merger
        merger.addBookmark(f"Page {page_number}: {os.path.basename(html_file)}", page_number - 1)  # Add bookmark
        
        # Find additional links within the current HTML file
        new_links = get_all_local_links(html_file, base_folder)
        for link in new_links:
            if link not in visited:
                links_to_visit.append(link)

        page_number += 1

    merger.write(output_pdf)
    merger.close()
    print(f"Merged PDF successfully created as '{output_pdf}'")

# Example usage:
# Assume `base_folder` contains the main HTML file and all linked HTML files
base_folder = "./programming"
main_html = "Programmieren1.de.html"  # Main HTML file to start with
output_pdf = "merged_document.pdf"
create_pdf_with_toc(main_html, base_folder, output_pdf)

Converting: ./programming/Programmieren1.de.html


Illegal character in Name Object (b'/\x91i8\xf8\x92_E\x1e\x89\xed\xff\xc5\x06!\x84\xf3\xbd\xcd~\x9e')
Illegal character in Name Object (b'/f\xeaQ\xbd\x15\xb4\x15\x13\x9e\xbc\x1b\xad(y\x98\xfbP\xadeF')
Illegal character in Name Object (b'/7\xf5\xb6n\r\x19\xbe"\x8ai\x9ex\xb9\xcfr|#\x16\xd2\xcf')
Illegal character in Name Object (b'/\x9e\xda#\xde\x0cq\x87\xdd\xef\xd6x\xe6\xef\x18Z\x90\xfaPU\xc2')
Illegal character in Name Object (b"/!\x01\xa8e\xcc\xd0\x82\xa1\x96\x9d\xb1'UT\xa9p\x0b'!L")
Illegal character in Name Object (b"/\xd9\x12\xefl'\xca\xcaF2\xaf<uKs%\xa8\xe0\xde\x93\x1e")
Illegal character in Name Object (b'/#\\\xa7J\xd4\xfc4\xe50\x9e\x18X"#\x1b=\x7f\xcf\x03\x9e')
Illegal character in Name Object (b'/\xc0%\xc5|\x8e\xfd\x15\x05\xda\x0f\xd8\x13\x81\x13\xc4I\x8cOOa')
Illegal character in Name Object (b'/\xee\xc0\xf3\xfdj\xac\xeb\x84\x1f;!\xb4\xf8\x86J\xa9\xe2:0\xa4')


DeprecationError: addBookmark is deprecated and was removed in PyPDF2 3.0.0. Use add_outline_item instead.

## Create PDF with pandoc

WiP

In [9]:
!pandoc -s programming/Programmieren1.de.html -o output.pdf --toc

The SVG /private/var/folders/29/66ymvpfx7j7dxrkh4vzx5zyc0000gn/T/tex2pdf.-bbc607c786810983/d9088db889f93d7c57c17520ee1d8a39d1bbe06a.svg has no dimensions
The SVG /private/var/folders/29/66ymvpfx7j7dxrkh4vzx5zyc0000gn/T/tex2pdf.-bbc607c786810983/d9088db889f93d7c57c17520ee1d8a39d1bbe06a.svg has no dimensions
Error producing PDF.
! Package svg Error: File `d9088db889f93d7c57c17520ee1d8a39d1bbe06a_svg-tex.pdf
' is missing.

See the svg package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.101 ...db889f93d7c57c17520ee1d8a39d1bbe06a.svg}}



# Create PDF with beautiful soup and pdfkit

Does not work so well.

### 1.  **Extract Headlines from HTML Files:**

-   You'll need to parse the HTML files to extract the headline tags (`<h1>`,  `<h2>`, etc.) and use them to generate the table of contents.
-   The  `BeautifulSoup`  library from  `bs4`  is perfect for parsing HTML.

### 2.  **Create the Table of Contents:**

-   Use the extracted headlines to create a TOC in HTML format, with links to the corresponding sections.

### 3.  **Insert the TOC at the Beginning of the Combined HTML File:**

-   Add the generated TOC to the beginning of your combined HTML file before converting it to PDF.

### 4.  **Convert to PDF:**

-   Once you have the HTML with the TOC, convert it to PDF as before.

Here’s a Python script that implements these steps:

#### **Step 1: Install Necessary Libraries**

You’ll need to install the required libraries:

In [6]:
pip install beautifulsoup4 pdfkit

Defaulting to user installation because normal site-packages is not writeable
Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2
Note: you may need to restart the kernel to use updated packages.


#### Step 2: Python Script to Generate TOC and Convert HTML to PDF

In [5]:
import os
from bs4 import BeautifulSoup
import pdfkit

# Function to extract headlines from HTML
def extract_headlines(html_content, file_index):
    soup = BeautifulSoup(html_content, 'html.parser')
    headlines = []
    for tag in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
        for header in soup.find_all(tag):
            # Create an anchor link for each headline
            anchor = f"section_{file_index}_{len(headlines)}"
            header['id'] = anchor
            headlines.append((header.text.strip(), tag, anchor))
    return headlines, str(soup)

# Function to gather all HTML files recursively
def gather_html_files(directory):
    html_files = []
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith('.ipynb.html'):
                html_files.append(os.path.join(root, file))
    return sorted(html_files)  # Sorting ensures the order is preserved

# Specify the top-level folder containing the HTML files
top_level_folder = 'csharp'
output_html = 'combined_with_toc.html'

# Gather all HTML files recursively
html_files = gather_html_files(top_level_folder)

toc_entries = []
full_html_content = "<html><head><title>Document with TOC</title></head><body>"

# Generate the TOC and combine HTML files
for i, filepath in enumerate(html_files):
    with open(filepath, 'r', encoding='utf-8') as infile:
        content = infile.read()
        headlines, updated_html = extract_headlines(content, i)
        toc_entries.extend(headlines)
        full_html_content += updated_html
        full_html_content += '<div style="page-break-after: always;"></div>'

# Create the TOC HTML structure
toc_html = '<h1>Table of Contents</h1><ul>'
for text, tag, anchor in toc_entries:
    toc_html += f'<li><a href="#{anchor}">{text}</a></li>'
toc_html += '</ul><div style="page-break-after: always;"></div>'

# Add the TOC to the beginning of the document
full_html_content = toc_html + full_html_content + "</body></html>"

# Write the combined HTML with TOC to a file
with open(output_html, 'w', encoding='utf-8') as outfile:
    outfile.write(full_html_content)

# Convert the combined HTML file with TOC to PDF
pdfkit.from_file(output_html, 'output_with_toc.pdf')

print("PDF with Table of Contents has been generated as 'output_with_toc.pdf'.")

PDF with Table of Contents has been generated as 'output_with_toc.pdf'.



### **Requirements:**

1.  **wkhtmltopdf:**  Make sure  `wkhtmltopdf`  is installed on your system for  `pdfkit`  to work. You can download it from  wkhtmltopdf.org.
2.  **HTML Structure:**  Ensure that your HTML files are well-formed, with proper heading tags for the TOC to be generated accurately.

This script will produce a PDF with a generated Table of Contents at the beginning, linking to all the headlines within your HTML files.