<a href="https://colab.research.google.com/github/bnsreenu/python_for_microscopists/blob/master/372_All_about_base64.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://youtu.be/k1jjDnBzgGk

# **Base64 Tutorial in Python**

Base64 is a way to represent *binary data* using text characters.
Computers store everything as binary (0s and 1s), but often we want to send or store that data in systems that only reliably handle text (like JSON, XML, or email bodies).

Base64 solves this by mapping binary data into 64 printable characters:
 - A-Z, a-z, 0-9, +, and /  (with '=' used for padding).

# Key points:
- It makes binary data safe to transmit in text form.
- The output is always a multiple of 4 characters long.
- Padding '=' is added if necessary.

# Example 1: ASCII string

In [None]:
import base64

text = "Hi"
print(type(text))  # <class 'str'>

# Now, let us convert this text to base64

In [None]:
# The following would fail:
base64.b64encode(text)  # TypeError: a bytes-like object is required

# base64 convertion operates on binary data (not on strings or numpy arrays)
# so we need to convert our string to binary.

In [None]:
text_bytes = text.encode("utf-8")
print(type(text_bytes))  # <class 'bytes'>
encoded_text = base64.b64encode(text_bytes)

print("Base64 of 'Hi':", encoded_text)
print("Note that the b in b'SGk=' indicates a bytes object - as opposed to a string")
print(" ")

In [None]:
decoded_text = base64.b64decode(encoded_text).decode("utf-8")
print("Decoded back:", decoded_text)

# Explanation of padding:
# "Hi" -> SGk=
# Why '='? Because the output length must be divisible by 4.
# '=' is just a filler; it doesn't carry meaning but helps alignment.

# Example 2: A longer string with spaces

In [None]:

longer_text = "Hello, how are you?"
encoded_longer = base64.b64encode(longer_text.encode("utf-8"))
print("\nOriginal:", longer_text)
print("Base64 encoded:", encoded_longer)
print(" ")

decoded_longer = base64.b64decode(encoded_longer).decode("utf-8")
print("Decoded back:", decoded_longer)

# Example 3: Non-ASCII characters (Unicode)

In [None]:

# Let's try with an emoji and a non-Latin character
unicode_text = "Python üêç is fun! ‰Ω†Â•Ω"

print("\nOriginal (Unicode):", unicode_text)

# If we try to encode with ASCII, it will fail because üêç and ‰Ω†Â•Ω are not part of ASCII
try:
    ascii_bytes = unicode_text.encode("ascii")
except UnicodeEncodeError as e:
    print("\nError when trying ASCII encoding:")
    print(e)

### Continuation of Example 3: Non-ASCII characters (Unicode)
#### Correct way: encode with UTF-8

In [None]:

encoded_unicode = base64.b64encode(unicode_text.encode("utf-8"))
print("\nBase64 encoded (UTF-8):", encoded_unicode)

# Decode it back
decoded_unicode = base64.b64decode(encoded_unicode).decode("utf-8")
print("Decoded back:", decoded_unicode)

# Notice:
# - Base64 itself doesn‚Äôt care about what characters you use ‚Äî
#   it only works on bytes.
# - The key is how you first encode your text into bytes (ASCII fails, UTF-8 works).


# Practical Example 1: Encode and Decode an Image (or a numpy array)

In [None]:
# Let's use skimage to load a real image
# and matplotlib to display it.

from skimage import io
import matplotlib.pyplot as plt
import numpy as np
import base64

# Load the image
image = io.imread("/content/drive/MyDrive/ColabNotebooks/data/Ki-67/Ki-67.jpg")
plt.imshow(image)
plt.axis("off")
plt.title("Original Image")
plt.show()




In [None]:
# --- Convert NumPy array directly to bytes and then to Base64 ---
# Note that base 64 conversion works with bytes, not numpy arrays
image_bytes = image.tobytes()
image_base64 = base64.b64encode(image_bytes)
print("\nFirst 200 characters of Base64 encoded image:\n", image_base64[:200])

In [None]:
# --- Decode Base64 back to bytes and reconstruct the NumPy array ---
decoded_bytes = base64.b64decode(image_base64)
decoded_image = np.frombuffer(decoded_bytes, dtype=image.dtype).reshape(image.shape)

# Display the decoded image
plt.imshow(decoded_image)
plt.axis("off")
plt.title("Decoded Image from Base64 (NumPy bytes)")
plt.show()

# Practical Example 2: Encode and Decode a pdf document


## How PDF Base64 works

- PDFs are **structured binary documents**. They are not simple arrays of numbers like a NumPy image.
- When creating a PDF with **ReportLab**, it generates all the content (text, images, tables, etc.) in a binary format.
- To encode this PDF into Base64, we need the **actual bytes** of the PDF.
- We can‚Äôt just use `.tobytes()` like with NumPy arrays because PDFs have a specific internal structure (headers, objects, streams).

### Using a buffer
- We use an **in-memory buffer** (`io.BytesIO`) to capture the PDF bytes:
  ```python
  pdf_buffer = io.BytesIO()
  doc = SimpleDocTemplate(pdf_buffer, pagesize=letter)
  # ... add content ...
  doc.build(story)



## 1. First, let us create a PDF with ReportLab
- **Step 1: Create a blank PDF document**
  - `SimpleDocTemplate` creates a blank PDF in memory (not on disk yet).
  - We use an in-memory buffer (`io.BytesIO`) to store the PDF bytes.
  - Optionally, we can supply a sample style sheet for text formatting (`getSampleStyleSheet()`).
  - At this point, the document is empty.

- **Step 2: Build the content (story)**
  - We create a `story` list that contains the elements we want in the PDF:
    - `Paragraph` for text
    - `Spacer` for vertical spacing
    - `Image` for pictures
    - `Table` for tables
  - Example:
    ```
    story.append(Paragraph("Title", styles["Title"]))
    story.append(RLImage(img_buffer, width=400, height=400))
    story.append(Paragraph("Caption text", styles["Normal"]))
    ```

- **Step 3: Generate the PDF bytes**
  - `doc.build(story)` takes the story content and writes a fully structured PDF into the buffer.
  - The PDF **already exists in binary form** in the buffer at this point.
  - This is **not Base64** yet ‚Äî it‚Äôs the raw PDF bytes.

Now, we have a pdf that can be encoded to base64 and decoded back.

- **Step 4: Encode and decode the PDF file to Base 64**


In [None]:
!pip install reportlab

In [None]:

# --- Practical Example 2: Create a PDF with an Image, Encode to Base64, Save and Decode ---
from reportlab.platypus import SimpleDocTemplate, Paragraph, Image as RLImage, Spacer
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet
import io as sysio

# Step 1: Create a PDF and save to memory buffer
pdf_buffer = sysio.BytesIO()
doc = SimpleDocTemplate(pdf_buffer, pagesize=letter)
styles = getSampleStyleSheet()
story = []

# Add a title
story.append(Paragraph("Example histo image", styles["Title"]))
story.append(Spacer(1, 20))

# Save matplotlib image to buffer
img_buffer = sysio.BytesIO()
plt.imsave(img_buffer, image)  # save as PNG into memory
img_buffer.seek(0)

# Get original image dimensions
h, w = image.shape[:2]

# Define maximum display width (in points; 1 point = 1/72 inch)
max_display_width = 400
scale = max_display_width / w
display_width = w * scale
display_height = h * scale

# Add image to PDF with preserved aspect ratio
story.append(RLImage(img_buffer, width=display_width, height=display_height))
story.append(Spacer(1, 20))

# Add explanatory text
caption_text = (
    "The sample has been stained with Ki-67 IHC stain, "
    "a nuclear protein that is expressed in all actively dividing cells. "
    "It is a marker of cell proliferation and is used in cancer diagnosis and prognosis."
)
story.append(Paragraph(caption_text, styles["Normal"]))

# Build PDF
doc.build(story)   # produces the actual PDF bytes in the buffer. It‚Äôs not Base64.

# Step 2: Save PDF to disk so we can actually open it
with open("example_with_image.pdf", "wb") as f:
    f.write(pdf_buffer.getvalue())
print("\nPDF saved as 'example_with_image.pdf'")

################# Demonstration of base64 encoding and decoding ###################
# Step 3: Encode PDF to Base64
# Note that this Base64 encoding is just an optional extra step for transmitting the PDF as text.
# Just for demo purposes for this tutorial.
pdf_base64 = base64.b64encode(pdf_buffer.getvalue())
print("\nFirst 200 characters of Base64 encoded PDF:\n", pdf_base64[:200])


# Step 4: Decode base64 back and save as another PDF
decoded_pdf_bytes = base64.b64decode(pdf_base64)
with open("decoded_example_with_image.pdf", "wb") as f:
    f.write(decoded_pdf_bytes)
print("Decoded PDF saved as 'decoded_example_with_image.pdf'")
