# Cloud of emojis
The following notebook will display all the emojis present in the comments as a cloud of emojis, where in the whole canvas, every single emoji will be placed randomly.

This is a robust notebook that involves unexpectedly high complexity. This notebook works similarly to the cloud of words, but this time we will work with emojis only.
This time we will not be using `WordCloud` to generate the cloud, since the library is not very friendly at handling emojis, it shows them rotated, with weird spacing, etc.

Here, we will generate an image manually, and the library `Pillow` is perfect for this task.

**Note**: The cloud of words uses information from enriched files, more specifically from the `tokens_wo_stopwords` column generated in the notebook `02_2_enriched_columns.ipynb`.
If the enriched files don't exist, no information will be displayed, or it may fail.

We start with our standard imports.

In [None]:
from collections import Counter
from tqdm import tqdm
from PIL import Image
import polars as pl
import cairosvg
import random
import os
import sys
import io

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))
import config
from paths import Paths
from src.emoji_sampler import Sampler

channel_paths = Paths(channel_handle=config.channel_handle)

# Emoji frequencies
In the word of cloud of words notebook we made use of the lazy framework from polars. Here we do the same, but for the emojis column on the enriched dataset.

In [2]:
def build_emoji_counter(files: list[str], column: str = "emojis") -> Counter:
    counter = Counter()

    for path in tqdm(files, desc="Building word frequencies"):
        # Load the token column lazily
        df = pl.read_parquet(path, columns=[column]).lazy()
        
        # Explode the list column (each token gets its own row)
        df_exploded = df.explode(column)

        # Drop nulls, just in case
        df_filtered = df_exploded.drop_nulls(column)

        # Count token frequencies
        token_counts = (
            df_filtered
            .group_by(column)
            .len()
            .rename({column: "emoji", "len": "count"})
            .collect()
        )

        # Update the global counter
        counter.update(dict(zip(token_counts["emoji"], token_counts["count"])))

    return counter

In [3]:
files = channel_paths.list_enriched_files()
counter = build_emoji_counter(files)

Building word frequencies:   0%|          | 0/8 [00:00<?, ?it/s]

Building word frequencies: 100%|██████████| 8/8 [00:00<00:00, 38.36it/s]


In [4]:
print(f'There are a total of {len(counter.keys()):_} distinct emojis, and a total of {sum(counter.values()):_} total emojis.')

There are a total of 1_390 distinct emojis, and a total of 1_116_387 total emojis.


## Cloud of emojis, SVG to Binary
In order to make the cloud of emojis, we perform quite a technical process:
1) Build the frequencies of emojis
2) Sample a random emoji from the frequency table
3) Get the SVG for the corresponding emoji
4) Create the Image for this SVG, to place into the canvas
5) Repeat from step 2) until all emojis have been sampled

### SVG library
The emojis used here are from the Twitter SVG Library [https://github.com/twitter/twemoji](https://github.com/twitter/twemoji), it is very complete, although there may be some missing emojis.

In [None]:
EMOJI_SVG_FOLDER = os.path.join("..", "assets", "svg")

## SVG to Image
The advantage that this method provides is that we can essentially **generate an emoji of any size**, having only the svg. If we had a png file, if we enlarged the emoji it would get pixelated.

In [None]:
# Identify the codepoint of the emoji, so that it matches the assets/svg structure
def emoji_to_codepoint(emoji):
    return '-'.join(f"{ord(char):x}" for char in emoji)

# Create image from the identified emoji
def render_svg_to_image(emoji, size=96):

    # Get the codepoint from the emoji
    codepoint = emoji_to_codepoint(emoji)

    # Path from codepoint
    svg_path = os.path.join(EMOJI_SVG_FOLDER, f"{codepoint}.svg")
    
    # Emoji doesn't exist in the library
    if not os.path.exists(svg_path):
        print(f"SVG not found for {emoji} at {svg_path}")
        return None
    
    # Bytes to Image
    png_bytes = cairosvg.svg2png(url=svg_path, output_width=size, output_height=size)
    return Image.open(io.BytesIO(png_bytes)).convert('RGBA')

# Build image with table of frequencies

In [None]:
def draw_emoji_wordcloud(emoji_freq, image_size=(15360, 8640), output_file="emoji_svg_wordcloud.png"):
    canvas = Image.new('RGB', image_size, 'white')
    seen_svgs = dict()
    emoji_sampler = Sampler(emoji_freq)

    for emoji in emoji_sampler:
        if emoji in seen_svgs:
            img = seen_svgs[emoji]
        else:
            img = render_svg_to_image(emoji, size=72)
            seen_svgs[emoji] = img

        if img is None:
            continue
        
        x = random.randint(0, image_size[0] - img.width)
        y = random.randint(0, image_size[1] - img.height)
        canvas.paste(img, (x, y), img)  # Paste with alpha

    canvas.save(output_file)

### Running the function

In [None]:
output_path = os.path.join(channel_paths.results_dir, f"{config.channel_handle}_emoji_svg_wordcloud.png")
size = (15360, 8640) # 4k
draw_emoji_wordcloud(counter, image_size=size, output_file= output_path)

SVG not found for 🩵 at ..\assets\svg\1fa75.svg
SVG not found for 🫩 at ..\assets\svg\1fae9.svg
SVG not found for 🩷 at ..\assets\svg\1fa77.svg
SVG not found for 🪾 at ..\assets\svg\1fabe.svg
SVG not found for 🫷 at ..\assets\svg\1faf7.svg
SVG not found for 🫨 at ..\assets\svg\1fae8.svg
SVG not found for 🛜 at ..\assets\svg\1f6dc.svg
SVG not found for 🫚 at ..\assets\svg\1fada.svg
SVG not found for 🫏 at ..\assets\svg\1facf.svg
SVG not found for 🪯 at ..\assets\svg\1faaf.svg
SVG not found for 🩶 at ..\assets\svg\1fa76.svg
SVG not found for 🫜 at ..\assets\svg\1fadc.svg
SVG not found for 🪼 at ..\assets\svg\1fabc.svg
SVG not found for 🫸 at ..\assets\svg\1faf8.svg
SVG not found for 🪽 at ..\assets\svg\1fabd.svg
SVG not found for 🪿 at ..\assets\svg\1fabf.svg
SVG not found for 🫛 at ..\assets\svg\1fadb.svg
SVG not found for 🫆 at ..\assets\svg\1fac6.svg
SVG not found for 🪭 at ..\assets\svg\1faad.svg
SVG not found for 🪇 at ..\assets\svg\1fa87.svg
SVG not found for 🪏 at ..\assets\svg\1fa8f.svg
SVG not found