Deep Learning Projects

Project 1: Hand-written characters classification (including pen-digits and Chinese characters)

Pen-digits data is downloaded from pendigits.

Chinese character data is downloaded from CASIA Online and Offline Chinese Handwriting Databases. It is difficult to preprocess this dataset, I used the code from this blog to transform the original dataset and save each hand-written Chinese character as .npy file.

Main idea: Apply the signature transform to preprocess hand-written characters and then apply logistic regression. Here are some reference papers for the signature transform:

"A Primer on the Signature Method in Machine Learning" by Ilya Chevyreva and Andrey Kormilitzina.
"Calculation of Iterated-Integral Signatures and Log Signatures" by Jeremy Reizenstein.
"Signature moments to characterize laws of stochastic processes" by Ilya Chevyrev and Harald Oberhauser.
"Characteristic functions of measures on geometric rough paths" by Ilya Chevyreva and Terry Lyons.
"Differential Equations Driven by Rough Paths" by Terry J. Lyons, Michael J. Caruana, and Thierry Lévy.
"System Control and Rough Paths" by Terry J. Lyons and Zhongmin Qian.

And the signature transform method is implemented in the signatory package which can be accelerated by GPU and works the fastest in all existing packages. Read more about the tutorial here.

I choose this transform because

there is a one-to-one map between the original time series and the infinite signature vector, as the signature level increases, its elements' norm will decrease factorially, so we can use this transform will only a little information loss;
we can transform long time series to a much shorter vector to reduce computation cost;
it can be applied to any irregularly-sampled time series so we do not need to delete or add points to make the observations have the same length before training models;
it is shift-invariance and invariant under time reparametrizations;
it is robust to outliers.

Besides, I added a dimension to augment raw characters which indicates the number of stroke points belonging to.

Project 2: Sequential data generation

After analyzing sequential data, I also tried to generate new sequential data. One idea is to train GANs based on Wasserstein distance between signature vectors for different sequences. But it failed for the pen-digit data and kept producing random x-y coordinates. Still working in this direction.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Project1		Project1
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning Projects

Project 1: Hand-written characters classification (including pen-digits and Chinese characters)

Project 2: Sequential data generation

About

Uh oh!

Releases

Packages

Languages

BrotherJiang/DeepLearning

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Projects

Project 1: Hand-written characters classification (including pen-digits and Chinese characters)

Project 2: Sequential data generation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages