# Super Characters Method Implementation on Quora Sincerity Kaggle Competition 2018
Based on the Sun, Yang, Dong, Zhang, and Young paper (October 15th, 2018) "Super Characters: A Conversion from Sentiment Classification to Image Classification", I will be experimenting with the different parameters for the construction of super characters as referenced in the paper, along with my own implementation of CNN training on constructed super characters

## Imports

In [14]:
from PIL import Image, ImageDraw, ImageFont
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
import numpy as np
import textwrap
import os

## Constructing An Example Super Character

In [11]:
#Constructing Background of Super Character with Black Background
#Note: Paper Suggested using Unicode MS Font for Implementation Purposes
img = Image.new('RGB', (224,224), color = 'black')
text_overlay = ImageDraw.Draw(img)
fnt = ImageFont.truetype('/Library/Fonts/arialunicodems.ttf')
text_overlay.text((8,8), "This is where the sample text will be inserted", font=fnt, fill=(255,255,255))
img.save("sample_1.png")

## Specs of Super Character Algorithmic Implementation
The paper specifies integrating two steps in order to test the model for training purposes. Let us walk through both of those steps as we proceed with the Super Chatacters algorithm.

1. Sentences or paragraphs are drawn onto blank images character by character. Each generated "Super Character" image retains the same sentiment as the original text.
2. Feed generated Super Character images with its labels to train with computer vision CNN model.

### Step 1: Create Mapping of Super Characters to Sentiment

In [15]:
#Declaring Important Global Variables for Testing Implementation
RIGHT_PAD = 4
VERTICAL_PAD = 15
FONT_SIZE = 12
FONT = ImageFont.truetype('/Library/Fonts/arialunicodems.ttf', FONT_SIZE)
TEXT_FILL = (255,255,255)
WRAP_TEXT_BORDER = 36
IMAGE_W = 224
IMAGE_H = 224

In [3]:
#Reading Training Data from CV File 
train_df = pd.read_csv("./train.csv", engine='python')
target = train_df.target
questions_text = train_df.question_text.str.split()

In [16]:
#Reading Testing Data from CV File
test_df = pd.read_csv("./test.csv", engine='python')
qid = test_df.qid
questions_text_test = test_df.question_text.str.split()

In [5]:
#Regulate Word Projections at Line-Changes for SC Questions
def create_text_wrap(text):
    new_text = textwrap.wrap(text, width=WRAP_TEXT_BORDER)
    return new_text

In [6]:
#Function Overlays Text to Black Background
def overlay_text(d, question):
    delta_v_pad = 0
    for phrase in question:
        d.text((RIGHT_PAD, delta_v_pad), phrase, font = FONT, fill = TEXT_FILL)
        delta_v_pad += VERTICAL_PAD

### This will read data from Training Data

In [13]:
#Create each super character to be later be matched with proper sentiment
sc_to_target = {} #holds mapping for sc to sentiment of words
index = 0

count = 0
num_accounted = 0
#For testing, we are only going to do 1000 for insincere and 1000 for sincere
for question in questions_text:
    if count > 10000 and num_accounted == 250:
        break
    if target[index] == 0 and count > 10000:
        sc_question = ' '. join(question)
        formated_sc_q = create_text_wrap(sc_question)
        img_new = Image.new('RGB', (IMAGE_W, IMAGE_H), color='black')
        d = ImageDraw.Draw(img_new)
        overlay_text(d, formated_sc_q)
        file_name = "example_" + str(index) + ".png"
        sc_to_target[file_name] = target[index] #creates mapping of sc to sentiment
        img_new.save(file_name)
        num_accounted += 1
    count += 1
    index += 1

### This will read data from Testing Data

In [18]:
#Create each super character to be later be matched with proper sentiment
index_test = 0

#For testing, we are only going to do 1000 for insincere and 1000 for sincere
count_test = 0
for question in questions_text_test:
    sc_question = ' '. join(question)
    formated_sc_q = create_text_wrap(sc_question)
    img_new = Image.new('RGB', (IMAGE_W, IMAGE_H), color='black')
    d = ImageDraw.Draw(img_new)
    overlay_text(d, formated_sc_q)
    file_name = "example_" + str(index_test) + ".png"
    img_new.save(file_name)
    count_test += 1 #comment out after testing has been created
    index_test += 1