<h1 style="text-align: center;">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY Project #1</div>
<span style="">Project #1: Mosaic Project</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/guides/project-mosaic/">https://discovery.cs.illinois.edu/guides/project-mosaic/</a></div>
</h1>

<hr style="color: #DD3403;">

In [13]:
#imports
import pandas as pd
import DISCOVERY
from numpy import sqrt
from numpy import square 

# Section 1: Find the Average Color of Image

Building off Section 3, we now want to find the average color of **ANY** image.  To do that, we need to create a function that is given a DataFrame `image` and returns the average color of that image.

Write this in the function `findAverageColor` below.  You must return the **average color as a dictionary** with the values:
- `avg_r`, for the average red color,
- `avg_g`, for the average green color,
- `avg_b`, for the average blue color

A **dictionary** is a data structure that stores multiple values. Accumulate the values `avg_r`, `avg_g`, and `avg_b` just like you would in the simulation:

---
```py
def findAverageImageColor(image):
  ...

  # Return a dictionary of average color:
  d = {"avg_r" : avg_r, "avg_g": avg_g, "avg_b": avg_b}
  return d
```
---

Write the entire `findAverageImageColor` function to find the average color of the `image` passed into the function:

In [14]:
def findAverageImageColor(image):
    avg_r = image.r.mean()
    avg_g = image.g.mean()
    avg_b = image.b.mean()
    d = {"avg_r": avg_r, "avg_g": avg_g, "avg_b": avg_b}
    return d

<hr style="color: #DD3403;">

# Section 2: Splitting Up Your Base Image

To create a mosaic from an image, we must split the base image into small regions to be replaced with the tile images.  To accomplish this, we need a function that will **find the subset of pixels found in a region of an image**.

- Thinking about the 3x3 pixel image `sample.png` (from Section 1), we might need a 2x2 square (or 1x3 rectangle) of pixels instead of using all 3x3 pixels.

### Your `findImageSubset` function

Create a function `findImageSubset` that finds the subset of the image starting at (`x`, `y`), spanning `width` pixels wide and `height` pixels tall. Your function should return the **subset of all the pixels in that region of the image**.

- Example: `findImageSubset(image, x=0, y=0, width=3, height=3)` -- returns subset of all the pixels in the square defined by: x=0...2 and y=0...2 (9 total pixels)

- Example: `findImageSubset(image, x=5, y=5, width=5, height=5)` -- returns subset of all the pixels in the square defined by: x=5...9 and y=5...9 (25 total pixels)

- Example: `findImageSubset(image, x=5, y=0, width=5, height=5)` -- returns subset of all the pixels in the square defined by: x=5...9 and y=0...4 (25 total pixels)

In [15]:
def findImageSubset(image, x, y, width, height):
    return image[(image.x >= x) & (image.y >= y) & (image.x <= (width + x-1)) & (image.y <= (height + y-1))]

<hr style="color: #DD3403;">

# Section 3: Finding the Average Color of a Subset

You have created two functions:

1. A function that finds the average color of a DataFrame of pixels (`findAverageImageColor`), **AND**
2. A function that finds a subset of pixels of an image (`findImageSubset`)

Create a new function, `findAverageImageSubsetColor` that combines both of them and returns the average color of a subset of the image:


In [16]:
def findAverageImageSubsetColor(image, x, y, width, height):
  # Find the subset:
  subset = findImageSubset(image, x, y, width, height)

  # Find and return the average color of the subset:
  average_color = findAverageImageColor(subset)
  return average_color

<hr style="color: #DD3403;">

# Section 4: Finding the Average Color of Your Tile Images

To create an image mosaic, we need to find the average pixel color of every one of our tile images so that you can find the BEST tile image to use when you're creating your mosaic.  The code below is already complete and does the following:

- Goes through each image file in your `tiles` directory,
- Finds the average pixel color of each image using your `findAverageColor` function from Section 4,
- Finally, creates a new DataFrame `df_tiles` with the average color of each image and returns that DataFrame.

In [17]:
def createTilesDataFrame(path):
  data = []

  # Loop through all images in the `path` directory:
  for tileImageFileName in DISCOVERY.listTileImagesInPath(path):
    # Load the image as a DataFrame and find the average color:
    image = DISCOVERY.df_image(tileImageFileName)
    averageColor = findAverageImageColor(image)

    # Store the fileName and average colors in a dictionary:
    d = { "fileName": tileImageFileName, "r": averageColor["avg_r"], "g": averageColor["avg_g"], "b": averageColor["avg_b"] }
    data.append(d)

  # Create the `df_tiles` DataFrame:
  df_tiles = pd.DataFrame(data)
  return df_tiles


<hr style="color: #DD3403;">

# Section 5: Finding the Best Match

This function needs to find the best tile for a given average color.

To do this, you will use two pieces of data:

1. You will use the DataFrame of all of your tile images that is generated in the previous section.  This will be passed into your function as `df_tiles`.
2. You will use the average color of a subset of your image.  This is passed into your function as `avg_r`, `avg_g`, and `avg_b`.

Using this data, `findBestTile` must **find the best tile image from `df_tiles` for a given average color**.  You should do this by finding the single row in `df_tiles` that has the smallest distance from the average color.

## Example 

Imagine you just have three tiles, so your `df_tiles` DataFrame is:

| fileName | r | g | b |
| -------- | - | - | - |
| red.jpg | 255 | 0 | 0 |
| green.jpg | 0 | 255 | 0 |
| blue.jpg | 0 | 0 | 255 |


If your image subset has `avg_r` = 10, `avg_g` = 200, and `avg_b` = 20, we can use the distance formula (Pythagorean's Theorem) to find how "far away" the average color is from each of the tile images:

1. For the red tile (255, 0, 0), the distance away is $d = \sqrt{(255 - 10)^2 + (0 - 200)^2  + (0 - 20)^2} = 316.8990375$
2. For the green tile (0, 255, 0), the distance away is $d = \sqrt{(0 - 10)^2 + (255 - 200)^2  + (0 - 20)^2} = 59.37171044$
3. For the blue tile (0, 0, 255), the distance away is $d = \sqrt{(0 - 10)^2 + (0 - 200)^2  + (255 - 20)^2} = 308.7474696$

We find that the green tile is the closest since it has the minimum distance.  The green tile should be returned.

In [18]:
def findBestTile(df_tiles, r_avg, g_avg, b_avg):
    df_tiles["distance"] = sqrt(square(df_tiles.r - r_avg) + square(df_tiles.g - g_avg) + square(df_tiles.b - b_avg))
    df_tiles_smallest = df_tiles.nsmallest(1, "distance")
    return df_tiles_smallest

<hr style="color: #DD3403;">

# Section 6: The Mosaic!

Time to put everything together!

First, let's define some variables that you can configure to make your mosaic uniquely yours:

In [19]:
# What is your base image file name?
baseImageFileName = "base_pic.png"

# What folder contains your tile images?
# - You can change this so you can have multiple different folders of tile images.
tileImageFolder = "tiles-small"

# What is the maximum number of tiles should your mosaic use across?
# - More tiles across will increase the quality of the final image.
# - More tiles across will cause your program to run slower.
# ...if you have bugs, start this value slow (it won't look great, but it will make it run fast!)
# ...a value around 200 usually looks quite good, but play around with this number!
maximumTilesX = 400

# What height should your tiles be in your mosaic?
# - A larger tile image will result in a larger output file.
# - A larger tile image will result in your program running slower.
# - A larger tile image will result in more detail in the output file.
tileHeight = 32


## Now create your mosaic!

Run the code to create your mosaic.

- Will run fastest if your laptop is plugged in (when it's unplugged, your laptop will try and save power and may not run at full speed).

## Part 1: Generate the `df_tiles` DataFrame from your tile images

In [20]:
print(f"Creating `df_tiles` from tile images in folder `{tileImageFolder}`...")
df_tiles = createTilesDataFrame(tileImageFolder)
print(f"...found {len(df_tiles)} tile images!")
df_tiles

Creating `df_tiles` from tile images in folder `tiles-small`...
...found 1842 tile images!


Unnamed: 0,fileName,r,g,b
0,tiles-small/278650051_169229558786664_23136402...,152.429688,122.986328,101.108398
1,tiles-small/22159225_138521260106208_672594900...,67.987305,73.602539,81.616211
2,tiles-small/10535127_833547633333738_386499366...,85.172852,84.816406,77.621094
3,tiles-small/359357347_781723997080208_13769219...,123.908203,146.911133,147.132812
4,tiles-small/22582587_578540435821648_361648968...,68.535156,79.321289,63.847656
...,...,...,...,...
1837,tiles-small/237069581_824918014831557_66084702...,89.825195,66.640625,54.106445
1838,tiles-small/11201717_815219745233834_107517816...,151.489258,105.729492,81.719727
1839,tiles-small/14240953_1797618713789836_89786530...,114.096680,89.266602,78.627930
1840,tiles-small/47271387_785542371785089_127948009...,74.506836,69.931641,64.718750


## Part 2: Loading your `baseImage`

⚠️: If you are using **an extremely large image** this may take awhile to run (particularly on older laptops or if you're not plugged in).  You may want to resize your baseImage if this is taking a long time to run.  An image ~1000px across may take anywhere from 10 seconds to 5 minutes.

In [21]:
print(f"Loading your base image `{baseImageFileName}`...")
baseImage = DISCOVERY.df_image(baseImageFileName)
width = baseImage.x.max()
height = baseImage.y.max()

baseImage

Loading your base image `base_pic.png`...


Unnamed: 0,x,y,r,g,b
0,0,0,199,211,233
1,0,1,199,211,233
2,0,2,199,211,233
3,0,3,199,211,233
4,0,4,199,211,233
...,...,...,...,...,...
12192763,3023,4027,100,105,72
12192764,3023,4028,102,106,70
12192765,3023,4029,95,99,62
12192766,3023,4030,92,96,60


## Part 3: Create a mosaic by finding the best match

⚠️: If you are using a large values for `maximumTilesX` (set at the beginning of this section), this may take a long time.

In [23]:
import sys

print(f"Finding best replacement image for each tile...")
# Find the pixelsPerTile to know the pixels used in the base image per mosaic tile:
import math

pixelsPerTile = int(math.ceil(width / maximumTilesX))
width = int(math.floor(width / pixelsPerTile) * pixelsPerTile)
height = int(math.floor(height / pixelsPerTile) * pixelsPerTile)
tilesX = int(width / pixelsPerTile)
tilesY = int(height / pixelsPerTile)

# Create the mosaic:
from PIL import Image
mosaic = Image.new('RGB', (int(tilesX * tileHeight), int(tilesY * tileHeight)))
for x in range(0, width, pixelsPerTile):
  for y in range(0, height, pixelsPerTile):
    avg_color = findAverageImageSubsetColor(baseImage, x, y, pixelsPerTile, pixelsPerTile)
    replacement = findBestTile(df_tiles, avg_color["avg_r"], avg_color["avg_g"], avg_color["avg_b"])

    tile = DISCOVERY.getTileImage(replacement["fileName"].values[0], tileHeight)
    mosaic.paste(tile, (int(x / pixelsPerTile) * tileHeight, int(y / pixelsPerTile) * tileHeight))

  # Print out a progress message:
  curRow = int((x / pixelsPerTile) + 1)
  pct = (curRow / tilesX) * 100
  sys.stdout.write(f'\r  ...progress: {curRow * tilesY} / {tilesX * tilesY} ({pct:.2f}%)')

# Save it
mosaic.save('mosaic-hd.jpg')

# Save a smaller one (for posting):
import PIL
d = max(width, height)
factor = d / 4000
if factor <= 1: factor = 1

small_w = width / factor
small_h = height / factor    
baseImage = mosaic.resize( (int(small_w), int(small_h)), resample=PIL.Image.LANCZOS )
baseImage.save('mosaic-web.jpg')

# Print a message:
tada = "\N{PARTY POPPER}"
print("")
print("")
print(f"{tada} MOSAIC COMPLETE! {tada}")
print("- See `mosaic-hq.jpg` to see your HQ mosaic!")
print("- See `mosaic.jpg` to see a mosaic best suited for the web!")

Finding best replacement image for each tile...
  ...progress: 9054 / 189631 (4.77%)

KeyboardInterrupt: 

<hr style="color: #DD3403;">

# Section 10: Extra Design- Adding a Frame

In [185]:
#adding a frame
from PIL import ImageOps

def frame():
  frame_size = 200
  frame_color = (153, 204, 255)
  compression_level = 200
  bordered_mosaic = ImageOps.expand(mosaic, border=frame_size, fill=frame_color)

frame()

<hr style="color: #DD3403;">