# Extract Unicode Character from Google Doc

**Author**: <u>Angelo Turri</u>
<br>
**Date**: <u>08/13/2024</u>
<br><br>


The functions in this notebook use a Google Doc link to extract a series of Unicode characters and their respective coordinates. Using these coordinates, the Unicode characters are then arranged in the shape of the intended alphabetical character[s].

In [151]:
url = 'https://docs.google.com/document/d/e/2PACX-1vSHesOf9hv2sPOntssYrEdubmMQm8lwjfwv6NPjjmIRYs_FOYXtqrYgjh85jBUebK9swPXh_a5TJ5Kl/pub'

In [152]:
import urllib.request  
from bs4 import BeautifulSoup

In [153]:
def get_raw_coords(url):
    
    """
    This function extracts HTML from a Google Doc link.
    It parses the DOM of this HTML and extracts each Unicode
    character and its x- and y-coordinates.
    
    It also transforms the x- and y-coordinates from strings
    into integers.
    """
    
    html = urllib.request.urlopen(url)
    htmlParse = BeautifulSoup(html, 'html.parser')
    raw_coords = []
    
    # All characters and coordinates are located inside span
    # tags, which are in turn located in a table
    for item in htmlParse.find('table').findChildren("tr"):
        coords = [i.get_text() for i in item.findChildren("span")]
        try:
            # Turns string coordinates into integers for later processing
            coords = [int(coords[0]), coords[1], int(coords[2])]
        except:
            pass
        raw_coords.append(coords)
        
    # The first table row is a header and gets removed
    raw_coords = raw_coords[1:]
    return raw_coords

In [154]:
raw_coords = get_raw_coords(url)
raw_coords

[[87, '█', 3],
 [23, '░', 2],
 [61, '█', 4],
 [2, '░', 1],
 [65, '█', 4],
 [31, '░', 5],
 [30, '░', 6],
 [20, '█', 4],
 [35, '█', 1],
 [2, '█', 6],
 [52, '█', 1],
 [12, '█', 0],
 [66, '█', 6],
 [69, '█', 4],
 [85, '█', 0],
 [55, '█', 6],
 [77, '█', 3],
 [9, '█', 0],
 [53, '█', 6],
 [5, '█', 3],
 [44, '░', 3],
 [40, '█', 3],
 [14, '█', 0],
 [0, '█', 5],
 [60, '░', 0],
 [14, '█', 3],
 [22, '█', 1],
 [26, '█', 6],
 [66, '█', 2],
 [29, '█', 5],
 [8, '█', 0],
 [80, '░', 5],
 [34, '█', 5],
 [8, '░', 3],
 [87, '█', 1],
 [47, '█', 6],
 [6, '█', 0],
 [45, '█', 1],
 [45, '█', 2],
 [29, '█', 1],
 [65, '█', 6],
 [52, '█', 2],
 [6, '█', 3],
 [2, '░', 4],
 [58, '█', 0],
 [66, '█', 1],
 [16, '█', 0],
 [54, '░', 3],
 [24, '█', 6],
 [67, '░', 1],
 [9, '█', 6],
 [62, '█', 2],
 [44, '█', 0],
 [21, '█', 4],
 [62, '░', 5],
 [15, '█', 6],
 [0, '█', 0],
 [4, '█', 0],
 [70, '█', 2],
 [63, '░', 4],
 [78, '█', 1],
 [81, '█', 0],
 [87, '░', 0],
 [2, '░', 2],
 [16, '░', 4],
 [36, '░', 4],
 [43, '█', 3],
 [68, '█'

In [155]:
def get_symbols(coords):
    
    """
    This function uses the unicode characters and coordinates
    extracted using the get_raw_coords() function.
    
    It turns these characters and coordinates into a 2D grid in
    the form of a list of lists.
    """
    
    # Finds the maximum and minimum y-coordinates.
    y_coords = [lst[2] for lst in coords]
    y_range = list(range(min(y_coords), max(y_coords)+1))[::-1]

    # Finds the maximum and minimum x-coordinates.
    x_coords = [lst[0] for lst in coords]
    x_range = list(range(min(x_coords), max(x_coords)+1))

    # Uses the x- and y-range to construct a 2D grid.
    dct = {a: {b: ' ' for b in x_range} for a in y_range}

    # If a unicode inhabits a cell on this 2D grid, that cell
    # in the dictionary gets populated with that character.
    # If a cell on the grid is empty, it gets an empty character.
    for lst in coords:
        dct[lst[2]][lst[0]] = lst[1]
        
    # Transforms the dictionary into a list of lists so it can
    # be printed later.
    symbols = [''.join(list(value.values())) for value in list(dct.values())]
    
    return symbols

In [156]:
symbols = get_symbols(raw_coords)
symbols

['██████████░ ██████░    ███████░  ██░           ███░ ████████░    ██░    ███░   ████████░  ',
 '██░           ██░    ███░    ██░ ███░   ███░   ██░  ██░     ██░  ██░  ███░   ███░     ███░',
 '██░           ██░   ███░          ██░  █████░ ███░  ██░      ██░ ██░███░     ██░       ██░',
 '████████░     ██░   ██░           ███░ ██░██░ ██░   ██░      ██░ ████░       ██░       ██░',
 '██░           ██░   ███░           ██░██░ ██░██░    ██░      ██░ ██░███░     ██░       ██░',
 '██░           ██░    ███░    ██░   ████░   ████░    ██░     ██░  ██░  ███░   ███░     ███░',
 '██████████░ ██████░    ███████░     ██░     ██░     ████████░    ██░    ███░   ████████░  ']

In [157]:
for symbol in symbols:
    print(symbol)

██████████░ ██████░    ███████░  ██░           ███░ ████████░    ██░    ███░   ████████░  
██░           ██░    ███░    ██░ ███░   ███░   ██░  ██░     ██░  ██░  ███░   ███░     ███░
██░           ██░   ███░          ██░  █████░ ███░  ██░      ██░ ██░███░     ██░       ██░
████████░     ██░   ██░           ███░ ██░██░ ██░   ██░      ██░ ████░       ██░       ██░
██░           ██░   ███░           ██░██░ ██░██░    ██░      ██░ ██░███░     ██░       ██░
██░           ██░    ███░    ██░   ████░   ████░    ██░     ██░  ██░  ███░   ███░     ███░
██████████░ ██████░    ███████░     ██░     ██░     ████████░    ██░    ███░   ████████░  


In [158]:
def unicode_from_url(url):
    
    """
    This function uses two helper functions to extract
    unicode characters from a Google Doc and arrange them in the
    originally intended uppercase letters.
    
    The first function extracts the characters and coordinates from
    a URL.
    
    The second function arranges the characters in a 2D grid for printing.
    """
    
    raw_coords = get_raw_coords(url)
    symbols = get_symbols(raw_coords)
    
    for symbol in symbols:
        print(symbol)

In [159]:
unicode_from_url(url)

██████████░ ██████░    ███████░  ██░           ███░ ████████░    ██░    ███░   ████████░  
██░           ██░    ███░    ██░ ███░   ███░   ██░  ██░     ██░  ██░  ███░   ███░     ███░
██░           ██░   ███░          ██░  █████░ ███░  ██░      ██░ ██░███░     ██░       ██░
████████░     ██░   ██░           ███░ ██░██░ ██░   ██░      ██░ ████░       ██░       ██░
██░           ██░   ███░           ██░██░ ██░██░    ██░      ██░ ██░███░     ██░       ██░
██░           ██░    ███░    ██░   ████░   ████░    ██░     ██░  ██░  ███░   ███░     ███░
██████████░ ██████░    ███████░     ██░     ██░     ████████░    ██░    ███░   ████████░  
