Arabic Font Classification

Acknowledgement

The structure and some fundamental parts of this code are adapted from Full Stack Deep Learning (FSDL).

Demo and Notebook

You can see this project in action in the accompanied demo and post, or run the code in this notebook.

Project Structure

The /cloud folder imitates storing data in the cloud. In real world settings, the dataset will be stored on a cloud storage service such as Amazon S3. The actual code lives in the /codebase folder. There is a clear seperation between training code (under /codebase/training) and everything else including models, networks, datasets, and other utilities (under /codebase/font_classifier). This seperation makes system deployment easier and cleaner.

As presented in the FSDL course, to version control the data, we don't check the actual images in git. Instead, a json file is created containing one entry per data instance. Each entry consists of the data instance URL (cloud storage), label, and other metadata if relevant. This json file is what gets tracked by git and therefore we can get the data at the required version by checking the corresponding git commit. As the dataset gets bigger the size of the json file gets larger, in which case git-lfs can be used. Benefits of this way of handling data:

Reproducibility: since it is tracked by git, we can get the exact data that we used a week ago or a year ago.
Extendibility: the dataset can be extended to incorporate new data while making sure to never use previous test set instances as training instances and vise versa.
Portability: reduces disk space required for the project, which makes it portable over git or any other means.

Running the Code

To run the code locally:

Install requirements:
```
$ pip install -r requirements.txt
```

Fetch and extract data from releases to /cloud folder:

$ wget 'https://github.com/mhmoodlan/arabic-font-classification/releases/download/v0.1.0/rufa.tar.gz' -O  ./cloud/rufa.tar.gz
$ cd /cloud && tar -xzf 'rufa.tar.gz'

Spin a simple server in the /cloud folder at http://0.0.0.0:8000/ :
```
$ cd /cloud && python -m http.server
```

Run an experiment:

$ cd /codebase/code && export PYTHONPATH=. && python training/run_experiment.py --save \
    '{"dataset": "RuFaDataset", "model": "FontModel", "network": "cnn", "train_args": {"epochs": 6, "mode": "test", "validate_mismatch": "False"}}'

The 'mode' config in 'train_args' takes one of two values: 'val' or 'test'.

In 'val' mode: the model is trained and validated on synthetic data only. If 'validate_mismatch' is set to True, further data mismatch validation is performed on a subset of the real data.

In 'test' mode: the model is trained on the entire synthetic data + the part of the real data used in data mismatch validation in 'val' mode. After training, the final generalization error is reported on the remainder of the real data.

This command should output something similar to the following:

Epoch 1/6
1254/1254 [==============================] - 119s 95ms/step - loss: 0.3185 - accuracy: 0.8751
Epoch 2/6
1254/1254 [==============================] - 40s 32ms/step - loss: 0.0539 - accuracy: 0.9918
Epoch 3/6
1254/1254 [==============================] - 40s 32ms/step - loss: 0.0386 - accuracy: 0.9953
Epoch 4/6
1254/1254 [==============================] - 40s 32ms/step - loss: 0.0270 - accuracy: 0.9976
Epoch 5/6
1254/1254 [==============================] - 40s 32ms/step - loss: 0.0264 - accuracy: 0.9973
Epoch 6/6
1254/1254 [==============================] - 40s 32ms/step - loss: 0.0246 - accuracy: 0.9979
Training took 323.854642 s
In test mode, mismatch data isn't validated since it's used during training.

14/14 [==============================] - 0s 10ms/step - loss: 0.2316 - accuracy: 0.9712
Test score: [0.2316255271434784, 0.971222996711731]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cloud		cloud
codebase		codebase
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud

cloud

codebase

codebase

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Arabic Font Classification

Acknowledgement

Demo and Notebook

Project Structure

Running the Code

About

Releases 1

Packages

Languages

License

mhmoodlan/arabic-font-classification

Folders and files

Latest commit

History

Repository files navigation

Arabic Font Classification

Acknowledgement

Demo and Notebook

Project Structure

Running the Code

About

Topics

Resources

License

Stars

Watchers

Forks

Languages