Human doesn't see world as grids of pixels.
Therefore, several methods to generate vector images are propoesed.
The seminal work, Ha et al. 2017 a.k.asketch-rnn
was proposed in 2017.
As a follow-up study, Song et al. In CVPR. 2018 a.k.asketch-photo2seq
was proposed to solve weak supervision problem and the code was provided.
Unfortunately, The code was written in tensorflow v1.
To enjoy advances of newest version of pytorch, I produce a duplication of sketch-photo2seq.
torch==1.10.1
torchvision==0.11.2
numpy==1.22.3
matplotlib
PIL
svgwrite
We need two datasets.
- QuickDraw
- Download the
Numpy .npz files
from this link and place the file intodatasets/QuickDraw/shoes/npz
- QMUL-Shoes
- Download the
train_svg_spa_png.h5
,test_svg_spa_png.h5
files intodatasets/QMUL/shoes
.- Download the
Fine-Grained SBIR Datasets
(shoes and chairs)- Unzip the
ShoeV2
that is contained inFine_Grained SBIR Datasets
- Move the all "REAL IMAGES" not "sketches" into
datasets/QMUL/shoes/photos
- Write a config file. (Follow the
configs/defualt_config.yml
) and place the file intoconfigs
- run the below code.
python train.py --config="configs/your-config.yml"
- e.g.
python train.py --config="configs/default_config.yml"
- You can follow
sample.ipynb
sketch-photo2seq
directely borrowssketch-rnn
's encoder and decoder.
Therefore I needed pytorch version of `sketch-rnn. and directly borrow below repo.
To build encoder and decoder for raster images, I directely borrowed below repo.