# Create portable version

<img src="Export_jupiter.webp" style="width:280px; height:280px;">

This page describes and actually impliments, how to make the jupyter notebooks of the whole project portable.

# Recursive HTML exporting and link replacement

This one is in use right now to create linked and working html pages.

To achieve this, you can write a Python script that:

1. Recursively searches for `.ipynb` files in a directory.
2. Exports each `.ipynb` file to an HTML file.
3. Replaces links within the HTML files that point to `.ipynb` files with `.html` links (so that the links work correctly in a browser).

You can use the following libraries:
- `os` or `pathlib` for file traversal.
- `nbconvert` for converting `.ipynb` to `.html`.
- `re` for regular expressions to replace links in HTML content.

## Prerequesits

### Install the Required Modules

You can install the necessary modules using the following commands in your terminal or command prompt.

1.  **Install  `nbconvert`**: This module is used to convert Jupyter notebooks to HTML.

In [5]:
pip install nbconvert

17172.36s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Collecting nbconvert
  Downloading nbconvert-7.16.4-py3-none-any.whl (257 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m257.4/257.4 KB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting nbclient>=0.5.0
  Downloading nbclient-0.10.0-py3-none-any.whl (25 kB)
Collecting tinycss2
  Downloading tinycss2-1.3.0-py3-none-any.whl (22 kB)
Collecting bleach!=5.0.0
  Downloading bleach-6.1.0-py3-none-any.whl (162 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.8/162.8 KB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nbformat>=5.7
  Downloading nbformat-5.10.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.5/78.5 KB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting markupsafe>=2.0
  Downloading MarkupSafe-2.1.5-cp310-cp310-macosx_10_9_x86_64.whl (14 kB)
Collecting jinja2>=3.0
  Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━

2.  **Install  `nbformat`**: This module is used to read and write Jupyter notebook files.

In [6]:
pip install nbformat

17192.76s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


You should consider upgrading via the '/usr/local/bin/python3.10 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.



Here's an outline of the script:

### Script: Convert and Modify Links in HTML

In [5]:
import os
import nbformat
from nbconvert import HTMLExporter
import re

def convert_ipynb_to_html(ipynb_file):
    # Load the notebook
    with open(ipynb_file, 'r', encoding='utf-8') as f:
        notebook = nbformat.read(f, as_version=4)
    
    # Convert to HTML
    html_exporter = HTMLExporter()
    (body, resources) = html_exporter.from_notebook_node(notebook)
    
    # Define the HTML filename
    html_filename = os.path.splitext(ipynb_file)[0] + '.html'
    
    # Write the HTML file in the same directory as the ipynb file
    with open(html_filename, 'w', encoding='utf-8') as f:
        f.write(body)
    
    return html_filename

def replace_ipynb_links_in_html(html_file, root_dir):
    with open(html_file, 'r', encoding='utf-8') as f:
        content = f.read()
    
    # Replace .ipynb links with relative .html links
    updated_content = re.sub(
        r'(?<=href=["\'])(.*?\.ipynb)(?=["\'])', 
        lambda match: make_relative_html_link(match.group(1), root_dir), 
        content
    )
    
    # Write the updated content back to the file
    with open(html_file, 'w', encoding='utf-8') as f:
        f.write(updated_content)

def make_relative_html_link(ipynb_link, root_dir):
    # Convert .ipynb link to corresponding .html link
    html_link = os.path.splitext(ipynb_link)[0] + '.html'
    
    # Get the absolute path of the html link
    absolute_html_path = os.path.abspath(html_link)
    
    # Remove the root directory (CWD) from the absolute path to make it relative
    relative_html_path = os.path.relpath(absolute_html_path, start=root_dir)
    
    return relative_html_path

def recursive_convert_and_replace_links(root_dir):
    # Recursively find .ipynb files
    for subdir, dirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith('.ipynb'):
                ipynb_path = os.path.join(subdir, file)
                
                # Convert to HTML and save it in the same directory
                html_file = convert_ipynb_to_html(ipynb_path)
                
                # Replace .ipynb links with relative .html links, removing the cwd from the path
                replace_ipynb_links_in_html(html_file, root_dir)

if __name__ == "__main__":
    # Use the current working directory as the root directory
    root_directory = os.getcwd()
    
    recursive_convert_and_replace_links(root_directory)


### How the Script Works:

1. **Convert `.ipynb` to `.html`**: 
   - The `convert_ipynb_to_html` function uses `nbconvert` to convert a Jupyter Notebook (`.ipynb`) to an HTML file and saves it in the `output_dir`.

2. **Replace `.ipynb` Links with `.html` Links**:
   - The `replace_ipynb_links_in_html` function reads the generated HTML file, finds any links to `.ipynb` files, and replaces them with `.html` links.
   - This is done using a regular expression that matches the links in the HTML content.

3. **Recursively Process Files**:
   - The `recursive_convert_and_replace_links` function walks through the directory tree (`os.walk`) and processes all `.ipynb` files. It calls the conversion and replacement functions for each file found.

### Notes:

- **File Paths**: Make sure to update the `root_directory` and `output_directory` variables to match your desired input and output paths.
- **Dependencies**: You may need to install `nbconvert` and `nbformat` via `pip install nbconvert nbformat`.
  
This script will help you convert all `.ipynb` files to HTML and ensure that the links between notebooks are correctly pointing to the corresponding `.html` files.

# Create PDF with beautiful soup and pdfkit

Does not work so well.

### 1.  **Extract Headlines from HTML Files:**

-   You'll need to parse the HTML files to extract the headline tags (`<h1>`,  `<h2>`, etc.) and use them to generate the table of contents.
-   The  `BeautifulSoup`  library from  `bs4`  is perfect for parsing HTML.

### 2.  **Create the Table of Contents:**

-   Use the extracted headlines to create a TOC in HTML format, with links to the corresponding sections.

### 3.  **Insert the TOC at the Beginning of the Combined HTML File:**

-   Add the generated TOC to the beginning of your combined HTML file before converting it to PDF.

### 4.  **Convert to PDF:**

-   Once you have the HTML with the TOC, convert it to PDF as before.

Here’s a Python script that implements these steps:

#### **Step 1: Install Necessary Libraries**

You’ll need to install the required libraries:

In [1]:
pip install beautifulsoup4 pdfkit

Collecting beautifulsoup4
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 KB[0m [31m549.8 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting pdfkit
  Downloading pdfkit-1.0.0-py3-none-any.whl (12 kB)
Collecting soupsieve>1.2
  Downloading soupsieve-2.6-py3-none-any.whl (36 kB)
Installing collected packages: pdfkit, soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 pdfkit-1.0.0 soupsieve-2.6
You should consider upgrading via the '/usr/local/bin/python3.10 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


#### Step 2: Python Script to Generate TOC and Convert HTML to PDF

In [3]:
import os
from bs4 import BeautifulSoup
import pdfkit

# Function to extract headlines from HTML
def extract_headlines(html_content, file_index):
    soup = BeautifulSoup(html_content, 'html.parser')
    headlines = []
    for tag in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
        for header in soup.find_all(tag):
            # Create an anchor link for each headline
            anchor = f"section_{file_index}_{len(headlines)}"
            header['id'] = anchor
            headlines.append((header.text.strip(), tag, anchor))
    return headlines, str(soup)

# Function to gather all HTML files recursively
def gather_html_files(directory):
    html_files = []
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith('.ipynb.html'):
                html_files.append(os.path.join(root, file))
    return sorted(html_files)  # Sorting ensures the order is preserved

# Specify the top-level folder containing the HTML files
top_level_folder = 'csharp'
output_html = 'combined_with_toc.html'

# Gather all HTML files recursively
html_files = gather_html_files(top_level_folder)

toc_entries = []
full_html_content = "<html><head><title>Document with TOC</title></head><body>"

# Generate the TOC and combine HTML files
for i, filepath in enumerate(html_files):
    with open(filepath, 'r', encoding='utf-8') as infile:
        content = infile.read()
        headlines, updated_html = extract_headlines(content, i)
        toc_entries.extend(headlines)
        full_html_content += updated_html
        full_html_content += '<div style="page-break-after: always;"></div>'

# Create the TOC HTML structure
toc_html = '<h1>Table of Contents</h1><ul>'
for text, tag, anchor in toc_entries:
    toc_html += f'<li><a href="#{anchor}">{text}</a></li>'
toc_html += '</ul><div style="page-break-after: always;"></div>'

# Add the TOC to the beginning of the document
full_html_content = toc_html + full_html_content + "</body></html>"

# Write the combined HTML with TOC to a file
with open(output_html, 'w', encoding='utf-8') as outfile:
    outfile.write(full_html_content)

# Convert the combined HTML file with TOC to PDF
pdfkit.from_file(output_html, 'output_with_toc.pdf')

print("PDF with Table of Contents has been generated as 'output_with_toc.pdf'.")

OSError: wkhtmltopdf reported an error:
Exit with code 1 due to network error: ProtocolUnknownError



### **Requirements:**

1.  **wkhtmltopdf:**  Make sure  `wkhtmltopdf`  is installed on your system for  `pdfkit`  to work. You can download it from  wkhtmltopdf.org.
2.  **HTML Structure:**  Ensure that your HTML files are well-formed, with proper heading tags for the TOC to be generated accurately.

This script will produce a PDF with a generated Table of Contents at the beginning, linking to all the headlines within your HTML files.