<center>
<img src="https://sparklewerk.com/collections/bganpunks/bganpunks_poster_2_by_1.png" alt="demo hypermap" />
</center>




## Introduction

This is a Jupyter notebook. You can run this MIT licensed open source code for free on Google's Colab.

This notebook hypermaps [the BASTARD GAN PUNKS V2 NFT collection](https://bastardganpunks.club/). For more information on hypermaps, please visit [sparklewerk.com](https://sparklewerk.com/projects/hypermaps).

This is free code running on free compute. If you are dissatisfied, we will have customer service get you a full refund. If on the other hand, you have constructive feedback please join the conversation on GitHub: [hypermap issues](https://github.com/ManyHands/hypermap/issues).

## License

Copyright (c) 2022, Many Hands SPC. All Rights Reserved.

Licensed under the MIT License (the "License").


In [1]:
# Licensed under the MIT License (the "License");
#
# MIT License
#
# Copyright (c) 2022, Many Hands SPC
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

## Set up

### Constants

All the configuration constants are defined first. Constants are control switches for configuring this machine before it runs.


#### 2D plots

Computing the 2D renderings takes a few minutes. If you are only interested in the 3D, there is a switch to disable all the 2D compute.

In [2]:
# To save time the entire 2D rendering part of this can be disabled:
render_2d_plots = False

# How big to make the final Image the NFTs will be plotted into
# Twitter allows as big as (4096, 4096)
canvas_dimensions = (4096, 4096) # Maximum allowed on Twitter images 
                  # (4096, 2304) # 16:9 is for Twitter card large images
                  # (4096, 2048) # 2:1

# Similar to CSS padding:
canvas_padding_percentage = 0.075

# How big to plot the bastards on that canvas:
bastards_into_canvas_width = bastards_into_canvas_height = 96
bastards_into_canvas_size = (bastards_into_canvas_width, bastards_into_canvas_height)

#### UMAP input size

There are two constants that control how small the bastards will be resized to for processing. They start naturally as squares of (1024, 1024) end as squares of (X, X):
- `bastards_into_umap_width`: (int, int) for UMAP processing
- `bastards_into_projector_width`: (int, int) for Projector viz spritesheet


In [3]:
bastards_into_umap_width = bastards_into_umap_height = 24
bastards_into_umap_size = (bastards_into_umap_width, bastards_into_umap_height)

#### Projector sprite size
By experimentation, it seems that for `bastards_into_projector_size`, (96, 96) is a decent downsampling size:

- (24, 24) is too small to see much detail. It does work but meh…
- (48, 48) kinda works but meh
- (64, 64) is a round size (1024px/16 = 64px)
- (96, 96) is too big for Projector spritesheet

Although (96, 96) is a nice size (details show well), Projector refuses to accept a spritesheet that large. Largest it will accept is (8192, 8192) and (9600, 9600) for 10K would be too big. [TODO: could subset punks before UMAPing]


In [4]:
bastards_into_projector_witdth = bastards_into_projector_height = 24
bastards_into_projector_size = (bastards_into_projector_witdth, bastards_into_projector_height)

### Installs


In [5]:
# TODO: there are ways to detect if umap-learn has already been installed. 
# Would make this a wee faster on repeat runs.
!pip install umap-learn

Collecting umap-learn
  Downloading umap-learn-0.5.2.tar.gz (86 kB)
[K     |████████████████████████████████| 86 kB 3.1 MB/s 
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.6.tar.gz (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 27.9 MB/s 
Building wheels for collected packages: umap-learn, pynndescent
  Building wheel for umap-learn (setup.py) ... [?25l[?25hdone
  Created wheel for umap-learn: filename=umap_learn-0.5.2-py3-none-any.whl size=82708 sha256=eb04ef8e5cbf58cf5d87967ba10692d5f4a9da2d1169fb315f221eefb5d00d55
  Stored in directory: /root/.cache/pip/wheels/84/1b/c6/aaf68a748122632967cef4dffef68224eb16798b6793257d82
  Building wheel for pynndescent (setup.py) ... [?25l[?25hdone
  Created wheel for pynndescent: filename=pynndescent-0.5.6-py3-none-any.whl size=53943 sha256=9a615ea59920163c523d58c476f130a64d1fee86e5dd45a104a8b3fcbce1e4d7
  Stored in directory: /root/.cache/pip/wheels/03/f1/56/f80d72741e400345b5a5b50ec3d929aca581bf45e0225d5c50
Successfull

### Imports

In [6]:
%matplotlib inline
import datetime
import numpy as np
import os
import pandas as pd
import umap

from IPython.display import display
from PIL import Image, ImageDraw
from ipywidgets import IntProgress
from math import trunc
from matplotlib import pyplot as plt
from numpy import asarray
from packaging import version
from skimage.color import rgb2gray

### TensorFlow set up

## GPU detection

At least one of the hypermapping algorithms, [UMAP](https://umap-learn.readthedocs.io/en/latest/), knows how to use server side GPUs and Colab provides such toys for free use.

In the menubar, select `Runtime→Change Runtime Type`, then
select GPU from the Hardware Accelerator drop-down


First, let's see if we have an Nvidia GPU.

In the following code cell, if the result is:
```
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
```
Then that means the Runtime is not set to GPU.

In [7]:
!nvidia-smi

Thu Feb  3 11:24:01 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P8    31W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [8]:
try:
  # This tensorflow_version is a Colab-only thing
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
print("TensorFlow version: ", tf.__version__)

assert version.parse(tf.__version__).release[0] >= 2, \
    "This notebook requires TensorFlow 2.0 or above."

from tensorboard.plugins import projector
%load_ext tensorboard

# TODO: print tensorboard version?

tensorboard_data_dump_dir = '/content/tensorboard_data'
if not os.path.exists(tensorboard_data_dump_dir):
  os.makedirs(tensorboard_data_dump_dir)

TensorFlow version:  2.7.0


Not all GPUs on Colab are Nvidia models. So, the above may not have found an Nvidia GPU yet there may still be another manufacturers GPU. Here is how to check:

In [9]:
%tensorflow_version 2.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


Hey, if you really want to continue without a GPU, you _could_ but it will be slow. Just comment out the above `raise SystemError()` and rerun all. 

## Data wrangling

The full bastards image collection can be found in the allbastards.com repo on GitHub.


### Inspect data

First let's make sure we're parsing the data correctly.

In [10]:
# If repo has already been cloned, doing so again will error so let's not
if not os.path.isdir('allbastards.com'):
  !git clone https://github.com/rkalis/allbastards.com.git

Cloning into 'allbastards.com'...
remote: Enumerating objects: 86718, done.[K
remote: Counting objects: 100% (68380/68380), done.[K
remote: Compressing objects: 100% (11489/11489), done.[K
remote: Total 86718 (delta 57667), reused 67365 (delta 56887), pack-reused 18338[K
Receiving objects: 100% (86718/86718), 195.29 MiB | 17.29 MiB/s, done.
Resolving deltas: 100% (60838/60838), done.
Checking out files: 100% (22680/22680), done.


In [11]:
!ls allbastards.com/public/img/full | wc -l

11306


In [12]:
path, dirs, files = next(os.walk("allbastards.com/public/img/full"))
file_count = len(files)
print(file_count)
print(f'files[0] = "{files[0]}"')
print(f'Trimmed  = "{files[0][:-5]}"')

11306
files[0] = "4333.webp"
Trimmed  = "4333"


In [13]:
def get_bastard_by_id(an_id: int):
  a_bastard = Image.open(os.path.join("allbastards.com/public/img/full",f'{an_id}.webp'))
  return a_bastard

The calmAF bastards (read: static webp files) are all (1024,1024)

In [None]:
%%time 
# this can take about 15 seconds

calm_ids = []
hyped_ids = []

# TODO: Need datastructure that has IDs and such not just image. PyTink? Derp: https://github.com/ManyHands/hypermap/issues/25
for file in files:
    a_bastard = Image.open(os.path.join("allbastards.com/public/img/full",file))
    if a_bastard.is_animated:
      hyped_ids.append(int(file[:-5]))
    else:
      calm_ids.append(int(file[:-5]))
                
print(f'calms: {len(calm_ids)}')
print(f'hypes: {len(hyped_ids)}')

Let's use Punk #42 as our poster child. Nice

In [None]:
a_bastard_filename = os.path.join('allbastards.com/public/img/full', '42.webp')
a_bastard = Image.open(a_bastard_filename) 

width, height = a_bastard.size
print('size = (', width, ',', height, ')')
print('format = ', a_bastard.format)
print('mode = ', a_bastard.mode)

plt.imshow(np.asarray(a_bastard))

Images of size (1024,1024) is 2 to the 10th. Mega. We need to downsample that before presenting the data to UMAP. And the bastards are color images (R,G,B) but UMAP wants simple scalors for values, so the color needs to be grayscaled. (Notice how the axis numbering changes.)

In [None]:
print(f'Downsampling to {bastards_into_umap_size}')
a_bastard_downsized = a_bastard.resize(bastards_into_umap_size)
a_bastard_downsized_grayed = rgb2gray(np.asarray(a_bastard_downsized))

plt.imshow(a_bastard_downsized_grayed, interpolation='nearest', cmap='gray')
plt.show()

And then we flatten() the images to make them a sub-array of the 2D array to be presented to UMAP.

In [None]:
print(a_bastard_downsized_grayed.flatten()[0])
print(a_bastard_downsized_grayed.flatten()[254])
print(type(a_bastard_downsized_grayed.flatten()[254]))

Yup, that's the correct data type.

## UMAP

First we need to do the learning part, then once we have a trained model we embed and visualize the data.

### Vectorize data

Next we manipulate the data in prep for feeding it to UMAP.

We need to present all the bastards to Projector in the structure it wants, which is a 2D array. That array is a list of all the bastards to be projected. Each bastard has to be recast as a 1D feature vector, each feature a single number. So, each 2D image gets reshaped to a 1D array, and each color pixel (R,G,B) gets grayscaled to a single value.

In [None]:
%%time
# This seems to take on Colab: ~2.5m

def load_calm_bastards_from_repo():
  # TODO: Surely there is some elegant pythonic way of doing this.

  # Just create the column names first, one for each pixel
  number_of_pixels = bastards_into_umap_width * bastards_into_umap_height
  feat_cols = [ 'pixel'+str(i) for i in range(number_of_pixels) ]

  calms_images = np.zeros((len(calm_ids), number_of_pixels))
  print(f'Shape of calm images: {calms_images.shape}')

  idx = 0
  for file in files:
    a_bastard = Image.open(os.path.join("allbastards.com/public/img/full",file))
    # TODO: let's get a progress bar going here. We do know how many files to process.
    if not a_bastard.is_animated:
      a_smaller_bastard = a_bastard.resize(bastards_into_umap_size)
      a_bastard_grayed = rgb2gray(asarray(a_smaller_bastard)) # This normalizes to [0..1]
      calms_images[idx] = a_bastard_grayed.flatten()
      idx = idx + 1
        
  return pd.DataFrame(calms_images,columns=feat_cols)

calms = load_calm_bastards_from_repo()

In [None]:
# Optionally, peek inside the DataFrame
calms

### Visualize embedding

All the above was data wrangling; now it's time to crunch some numbers.

Note: we are not setting a random seed (See docs for [random_state](https://umap-learn.readthedocs.io/en/latest/reproducibility.html)). This way is faster. The plots will look different between runs though. But we are not aiming for reproducable science papers.

This next one-liner function is where the UMAP happens, for the server-side number crunch; TensorBoard Projecter does its own UMAP client-side in the browser's GPU. 


In [None]:
def embed_data():
  return umap.UMAP(n_neighbors=20, min_dist=0.1, n_components=2).fit_transform(calms)

#### 2D hypermap the calmAFs

First let us plot so 2D static images before getting into 3D and/or interactive implementations.

In [None]:
def show_simple_scatterplot(): 
  """
  Plots all bastards in 2D space as blue dots, no images
  """
  subset_of_embedding = embedding #[0:100] TODO: if want subsetting, need IDs for Projector
  fig = plt.figure(figsize=(8, 8))
  plt.scatter(subset_of_embedding[:,0], subset_of_embedding[:,1], s=1)
  plt.show()

The min to max ranges of the X and Y values are the bounding box of the plot.

In [None]:
%%time
if render_2d_plots:
  embedding = embed_data()

  print('({}, {})'.format(np.min(embedding[:,0]), np.max(embedding[:,0])))
  print('({}, {})'.format(np.min(embedding[:,1]), np.max(embedding[:,1])))
  
  show_simple_scatterplot()

In [None]:
def generate_canvas():
  """
  Plot a vertical gradient between two colors. Do it first and it
  is an analogy to a CSS gradient background.
  Via https://stackoverflow.com/a/63138452
  """
  # color_top = '#66023c'
  # color_bottom = '#0a0006'
  color_top = color_bottom = '#ff00a2'

  canvas = Image.new('RGB', canvas_dimensions, color_top)
  #base = Image.new('RGB', (width, height), colour1)
  top_coat = Image.new('RGB', canvas_dimensions, color_bottom)
  mask = Image.new('L', canvas_dimensions)
  mask_data = []
  for y in range(canvas_dimensions[1]):
    mask_data.extend([int(255 * (y / canvas_dimensions[1]))] * canvas_dimensions[0])

  # print(f'mask.size: {mask.size} a.k.a {mask.size[0] * mask.size[1]}, len(mask_data): {len(mask_data)}')

  mask.putdata(mask_data)
  canvas.paste(top_coat, (0, 0), mask)
  return canvas

def scatter_bastards_in_2d(invert_vertically :bool = False):
  #cv2_anvas_dimensions = ((1024+512), (1024+512))
  canvas_width = canvas_dimensions[0]
  canvas_height = canvas_dimensions[1]

  x_min = np.min(embedding[:,0])
  x_max = np.max(embedding[:,0])
  x_delta = x_max - x_min
  x_factor = canvas_width / x_delta

  y_min = np.min(embedding[:,1]) 
  y_max = np.max(embedding[:,1])
  y_delta = y_max - y_min
  y_factor = canvas_height / y_delta

  print(f'X: ({x_min}, {x_max})')
  print(f'Y: ({y_min}, {y_max})')

  canvas = generate_canvas()

  def translate_to_canvas(x, y):
    pad_percentage = canvas_padding_percentage
    x_dest = trunc((x - x_min) * x_factor * (1 - (2*pad_percentage))) + trunc((pad_percentage * canvas_width))

    if invert_vertically:
      y_dest = canvas_height - ( trunc((y - y_min) * y_factor * (1 - (2*pad_percentage))) + trunc((pad_percentage * canvas_height)) )
    else:
      y_dest = trunc((y - y_min) * y_factor * (1 - (2*pad_percentage))) + trunc((pad_percentage * canvas_height))

    # Need to center image on the (X,Y), not upper left on the (X,Y)
    x_dest -= bastards_into_canvas_width // 2
    y_dest -= bastards_into_canvas_height // 2

    # if idx < 20:
    #  print(f'x: {x}, y: {y}, x_dest: {x_dest}, y_dest: {y_dest}')
    return (x_dest, y_dest)

  idx = 0
  # TODO: nasty, shouldn't be reading file again. weak
  for file in files:
    a_bastard = Image.open(os.path.join("allbastards.com/public/img/full",file))
    if not a_bastard.is_animated:
      a_smaller_bastard = a_bastard.resize(bastards_into_canvas_size)
      location = translate_to_canvas(embedding[idx,0], embedding[idx,1])
      # if idx < 20:
      #   print(location)
      canvas.paste(a_smaller_bastard, location) #, mask=a_smaller_bastard)
      idx = idx + 1

  print(f'number of files plotted: {idx}')
  return canvas


In [None]:
# TODO: implement progress bars for slow cells

# from ipywidgets import IntProgress
# from IPython.display import display
# import time

#max_count = 100

#progress_bar = IntProgress(min=0, max=max_count) # instantiate the bar
#display(progress_bar) # display the bar

#count = 0
#while count <= max_count:
#    progress_bar.value += 1 # signal to increment the progress bar
#    # time.sleep(.1)
#    count += 1

In [None]:
%%time
# This has been seen to take:
# NFTs plotted at 
# -   (64, 64): ~2.0m -- ~2.5m
# - (128, 128): ~2.5m

if render_2d_plots:
  print(f'start scattering at {datetime.datetime.now()}')
  scattered_bastards = scatter_bastards_in_2d(invert_vertically = False)
  print(f'done scattering at {datetime.datetime.now()}')
  display(scattered_bastards)
  print(f'done displaying at {datetime.datetime.now()}')


#### 3D hypermap the calmAFs

Next, feed the data into TensorBoard's Embedding Projector (or simply, Projector). 



##### Sprite sheet

To actually show the images floating in a 2D or 3D space, TensorBoard Projector requires a sprite sheet which contains a sprite for each image to be projected.

For now we're just using the calmAFs (static images), not the hypedAFs (animated GIFs). There are 10459 calms and 847 hypeds. The sprite sheet needs to be square, so let's just use the first 10000, for 100 x 100 sprite sheet. [TODO: plot all 10459 calms.]

The sprite sheet can be a PNG or a JPEG. (Not sure if an animated GIF will work in Projector.) So, for just-the-calms we'll go PNG.

In [None]:
def create_sprite_sheet():
  spritesheet_square_length = 100 # 10,000 = 100 x 100 in spritesheet
  master_width = bastards_into_projector_witdth * spritesheet_square_length
  master_height = bastards_into_projector_witdth * spritesheet_square_length
  spriteimage = Image.new(
    mode='RGBA',
    size=(master_width, master_height),
    color=(0,0,0,0) # fully transparent
  )

  # This CUT_OFF_LIMIT is a vile hack. Spritesheet must be square. Padding needed, but not now
  CUT_OFF_LIMIT = 10000 # TODO: remove this hack, sprite up ENTIRE collection

  punk_index = 0
  for x in range(CUT_OFF_LIMIT):
    a_punk = get_bastard_by_id(calm_ids[x]).resize(bastards_into_projector_size)
    div, mod = divmod(punk_index, spritesheet_square_length)
    h_loc = bastards_into_projector_witdth * div
    w_loc = bastards_into_projector_witdth * mod
    spriteimage.paste(a_punk, (w_loc, h_loc))
    punk_index = punk_index + 1

  return spriteimage

In [None]:
%%time
# This cell has been seen to take ~2min for (128, 128)

def write_files_for_tensorboard():
  # First, generate spritesheet for Projector to use downsammpled sprites
  sprite_sheet = create_sprite_sheet()
  sprite_filename = os.path.join(tensorboard_data_dump_dir, 'embeddings/sprite.png')
  if not os.path.exists(os.path.dirname(sprite_filename)):
    os.makedirs(os.path.dirname(sprite_filename))
  sprite_sheet.save(sprite_filename)

  # Next the data for the dimensionality reducers (UMAP, t-SNE, PCA) to crunch on
  vectorized_punks = tf.Variable(calms[0:9999])
  checkpoint = tf.train.Checkpoint(embedding=vectorized_punks)
  checkpoint.save(os.path.join(tensorboard_data_dump_dir, 'embedding.ckpt'))


  config = projector.ProjectorConfig()
  embedder = config.embeddings.add()

  embedder.tensor_name = 'embedding/.ATTRIBUTES/VARIABLE_VALUE'

  embedder.sprite.image_path = sprite_filename
  embedder.sprite.single_image_dim.extend(bastards_into_projector_size)

  projector.visualize_embeddings(tensorboard_data_dump_dir, config)

write_files_for_tensorboard()

##### Projector 

**Bug in TensorBoard launch**

**NOTE:** TensorBoard regularly fails to find the data just written to the file system. If so just rerun the following cell; that usually gets it to wake up and get to work.

Also note that:
- "Fetching tensor values…" normally takes a minute or two
- "Fetching sprite image…" normally takes a minute
- Then PCA will run automatically
- When PCA is done, click on UMAP or tSNE for other hypermap algorithms that will each provide a different view of the collection.

☝ ↑ ☝ ↑ ☝ ↑ ☝ ↑ ☝ ↑ ☝ 

In [None]:
# If this results in "No datasets found" then simply rerun this cell

%reload_ext tensorboard
%tensorboard --logdir={tensorboard_data_dump_dir}

# If this results in "No datasets found" then simply rerun this cell

# TODO: errors seen that need to be worked out
# - "Error parsing tensor bytes" 
# - "Error parsing tensor bytes RangeError: byte length of Float32Array should be a multiple of 4"

## History

- v1.1.1 (2022-01-30, John Tigue):
  - Cleaned up and made public for first time
- v1.1.0 (2022-01-29, John Tigue):
  - 2D canvas of any size via constants
  - 2D canvas gets backround gradient
  - 2D canvas padding as constant
- v1.0.3 (2022-01-21, John Tigue):
  - UMAP to 2D at (128,128) on (4096, 4096) for Twitter
  - Add gradient background
  - Add labeling
- v1.0.2 (2022-01-20, John Tigue): 
  - UMAP'ed at (48px, 48px)
  - Projected at (64px, 64px) 
- v1.0.1 (2022-01-11, John Tigue): 
  - Added Intro text
