# GEF Project - Kapakki Lo

Recognizing Chinese Handwritten Characters

---


In this notebook, we will be implementing a neural network model to recognize different Chinese Characters in the dataset found here:

https://www.kaggle.com/datasets/pascalbliem/handwritten-chinese-character-hanzi-datasets?resource=download

In [None]:
import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
print('Tensorflow version:',tf.__version__)
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

import seaborn as sns
plt.style.use('seaborn')

import os
print('Keras version:', tf.keras.__version__)

---
## Preparing the Dataset

Recognizing 7000+ chinese characters would definitely not be a good idea, as that would cause the computer to go up into flames.<br>
This is why we will be training the neural network to find **features** from the character, so as to obtain the ChangJie input values for the character.
<br><br>

This is the page I used to find suitable datasets:<br>
https://en.wikipedia.org/wiki/Cangjie_input_method


In [None]:
# Insert the path the training data is located
ROOT_PATH = ""
ORGANIZED_DATA = ROOT_PATH + ""
TRAINING_DATA = ROOT_PATH + ""


class character:
    name = ""

    def __init__(self, name, data):
        self.data = data
        self.name = name
        
    def find_characters(self):
        data = self.data
        for datum in data:
            # For each character type in the list, we would check if it is found in the dataset.
            # If it is not found, then we would have to add the data in ourselves.
            try:
                # 1. Try Opening folder
                items = 
                # 2. Since folder is found, iterate through each image in the folder and copy it to the corresponding directory of the letter.
                for item in items:
                    # Copy it to outer directory
                    pass
            except Exception as e:
                print(f"{datum} is not found in the database. ({self.name})")
    
    def train_data(self, iterations):
        
        pass



A = character("A", ["日", "曰"]) # 90° rotated 日 (as in 巴)
B = character("B", ["月", "冂", "爫", "冖"]) # the top four strokes of 目; the top and top-left part of 炙, 然, and 祭; the top-left four strokes of 豹 and 貓; and the top four strokes of 骨;
C = character("C", ["金", "丷", "八"]) # the penultimate two strokes of 四 and 匹
D = character("D", ["木", "寸", "才"]) # the first two strokes of 寸 and 才; the first two strokes of 也 and 皮
E = character("E", ["水", "氵", "又"]) # the last five strokes of 暴 and 康
F = character("F", ["火", "小", "灬"]) # the first three strokes in 當 and 光
G = character("G", ["土", "士"])

H = character("H", ["竹", "⺮", "㇀", "㇒"])
I = character("I", ["戈", "广", "厶", "㇔"])
J = character("J", ["十", "宀"])
K = character("K", ["大", "乂", "疒"]) #  first two strokes of 右
L = character("L", ["中", "衤"]) # Vertical stroke; first four strokes of 書 and 盡
M = character("M", ["一", "厂", "工"])
N = character("N", ["弓"]) # Crossbow and the hook

O = character("O", ["人", "亻"]) # The dismemberment; the first two strokes of 丘 and 乓; the first two strokes of 知, 攻, and 氣; and the final two strokes of 兆
P = character("P", ["心", "忄", "勹", "㇃", "⺗", "匕", "七"])
Q = character("Q", ["手", "扌"])
R = character("R", ["口"])

S = character("S", ["尸 ", "匚", "㇕", "㇆", "㇁"]) # the first four strokes of 長 and 髟
T = character("T", ["廿", "艹"]) # ....
U = character("U", ["山"]) # Three-sided enclosure with an opening on the top
V = character("V", ["女", "𧘇"])
W = character("W", ["田"]) # as well as any four-sided enclosure with something inside it, including the first two strokes in 母 and 毋
X = character("X", ["金"])
Y = character("Y", ["卜", "辶"]) # The 卜 shape and rotated forms, the first two strokes in 斗
Z = character("Z", ["Z: N/A"])


In [None]:
characters = [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]

for char in characters:
    char.train_data()

---

## Loading the Dataset

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

In [None]:
print(f'Shape of an image: {x_train[0].shape}')
print(f'Max pixel value: {x_train.max()}')
print(f'Min pixel value: {x_train.min()}')
print(f'Classes: {np.unique(y_train)}')

In [None]:
# Creating a list of labels

text_labels = ["日",
             "月",
             "金",
             "木",
             "水",
             "火", 
             "土",
             "竹",
             "戈",
             "十",
             "大",
             "中",
             "一",
             "弓",
             "人",
             "心",
             "手",
             "口",
             "尸",
             "廿",
             "山",
             "女",
             "田",
             "難",
             "卜",
             "Z"
             ]