# Personnal Project : **Manga Translator**

This project consist in using the latest object detection model to automate the translation of a manga. In this project, we try it with YOLO v3 and v4. 
This project made use of Roboflow tutorial and their tools to create a custom dataset of about 400 pages.

# YOLOv4

## Configuring cuDNN on Colab for YOLOv4



In [None]:
# CUDA: Let's check that Nvidia CUDA drivers are already pre-installed and which version is it.
!/usr/local/cuda/bin/nvcc --version
# We need to install the correct cuDNN according to this output

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


In [None]:
!nvidia-smi

Fri Jan 29 15:07:37 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# Change the number depending on what GPU is listed above, under NVIDIA-SMI > Name.
# Tesla K80: 30
# Tesla P100: 60
# Tesla T4: 75
%env compute_capability=60

env: compute_capability=60


## Installing Darknet for YOLOv4 on Colab




In [None]:
from google.colab import drive 
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/
# %rm -rf darknet

/content/drive/MyDrive


In [None]:
#we clone the fork of darknet maintained by roboflow
#small changes have been made to configure darknet for training
# !git clone https://github.com/roboflow-ai/darknet.git

Cloning into 'darknet'...
remote: Enumerating objects: 13289, done.[K
remote: Total 13289 (delta 0), reused 0 (delta 0), pack-reused 13289[K
Receiving objects: 100% (13289/13289), 12.13 MiB | 8.16 MiB/s, done.
Resolving deltas: 100% (9107/9107), done.
Checking out files: 100% (2002/2002), done.


In [None]:
%cd /content/darknet
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

[Errno 2] No such file or directory: '/content/darknet'
/content/drive/MyDrive/darknet
--2021-01-18 15:18:33--  https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/75388965/228a9c00-3ea4-11eb-8e80-28d71569f56c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210118T151833Z&X-Amz-Expires=300&X-Amz-Signature=ea8e7f5ac032708f54c761c8a3e4032469ecd85e1104214b6a9cb0bef7bc1ebf&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=75388965&response-content-disposition=attachment%3B%20filename%3Dyolov4-tiny.weights&response-content-type=application%2Foctet-stream [following]
--2021-01-18 15:18:33--  https://github-production-release-a

In [None]:
%cd /content/drive/MyDrive/darknet/
%rm Makefile

/content/drive/MyDrive/darknet


In [None]:
#colab occasionally shifts dependencies around, at the time of authorship, this Makefile works for building Darknet on Colab

%%writefile Makefile
GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=1
AVX=0
OPENMP=0
LIBSO=1
ZED_CAMERA=0
ZED_CAMERA_v2_8=0

# set GPU=1 and CUDNN=1 to speedup on GPU
# set CUDNN_HALF=1 to further speedup 3 x times (Mixed-precision on Tensor Cores) GPU: Volta, Xavier, Turing and higher
# set AVX=1 and OPENMP=1 to speedup on CPU (if error occurs then set AVX=0)
# set ZED_CAMERA=1 to enable ZED SDK 3.0 and above
# set ZED_CAMERA_v2_8=1 to enable ZED SDK 2.X

USE_CPP=0
DEBUG=0

ARCH= -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52] \
	    -gencode arch=compute_61,code=[sm_61,compute_61]

OS := $(shell uname)

# Tesla V100
# ARCH= -gencode arch=compute_70,code=[sm_70,compute_70]

# GeForce RTX 2080 Ti, RTX 2080, RTX 2070, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Tesla T4, XNOR Tensor Cores
# ARCH= -gencode arch=compute_75,code=[sm_75,compute_75]

# Jetson XAVIER
# ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]

# GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4
# ARCH= -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61

# GP100/Tesla P100 - DGX-1
# ARCH= -gencode arch=compute_60,code=sm_60

# For Jetson TX1, Tegra X1, DRIVE CX, DRIVE PX - uncomment:
# ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

# For Jetson Tx2 or Drive-PX2 uncomment:
# ARCH= -gencode arch=compute_62,code=[sm_62,compute_62]


VPATH=./src/
EXEC=darknet
OBJDIR=./obj/

ifeq ($(LIBSO), 1)
LIBNAMESO=libdarknet.so
APPNAMESO=uselib
endif

ifeq ($(USE_CPP), 1)
CC=g++
else
CC=gcc
endif

CPP=g++ -std=c++11
NVCC=nvcc
OPTS=-Ofast
LDFLAGS= -lm -pthread
COMMON= -Iinclude/ -I3rdparty/stb/include
CFLAGS=-Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC

ifeq ($(DEBUG), 1)
#OPTS= -O0 -g
#OPTS= -Og -g
COMMON+= -DDEBUG
CFLAGS+= -DDEBUG
else
ifeq ($(AVX), 1)
CFLAGS+= -ffp-contract=fast -mavx -mavx2 -msse3 -msse4.1 -msse4.2 -msse4a
endif
endif

CFLAGS+=$(OPTS)

ifneq (,$(findstring MSYS_NT,$(OS)))
LDFLAGS+=-lws2_32
endif

ifeq ($(OPENCV), 1)
COMMON+= -DOPENCV
CFLAGS+= -DOPENCV
LDFLAGS+= `pkg-config --libs opencv4 2> /dev/null || pkg-config --libs opencv`
COMMON+= `pkg-config --cflags opencv4 2> /dev/null || pkg-config --cflags opencv`
endif

ifeq ($(OPENMP), 1)
CFLAGS+= -fopenmp
LDFLAGS+= -lgomp
endif

ifeq ($(GPU), 1)
COMMON+= -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DGPU
ifeq ($(OS),Darwin) #MAC
LDFLAGS+= -L/usr/local/cuda/lib -lcuda -lcudart -lcublas -lcurand
else
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
endif
endif

ifeq ($(CUDNN), 1)
COMMON+= -DCUDNN
ifeq ($(OS),Darwin) #MAC
CFLAGS+= -DCUDNN -I/usr/local/cuda/include
LDFLAGS+= -L/usr/local/cuda/lib -lcudnn
else
CFLAGS+= -DCUDNN -I/usr/local/cudnn/include
LDFLAGS+= -L/usr/local/cudnn/lib64 -lcudnn
endif
endif

ifeq ($(CUDNN_HALF), 1)
COMMON+= -DCUDNN_HALF
CFLAGS+= -DCUDNN_HALF
ARCH+= -gencode arch=compute_70,code=[sm_70,compute_70]
endif

ifeq ($(ZED_CAMERA), 1)
CFLAGS+= -DZED_STEREO -I/usr/local/zed/include
ifeq ($(ZED_CAMERA_v2_8), 1)
LDFLAGS+= -L/usr/local/zed/lib -lsl_core -lsl_input -lsl_zed
#-lstdc++ -D_GLIBCXX_USE_CXX11_ABI=0
else
LDFLAGS+= -L/usr/local/zed/lib -lsl_zed
#-lstdc++ -D_GLIBCXX_USE_CXX11_ABI=0
endif
endif

OBJ=image_opencv.o http_stream.o gemm.o utils.o dark_cuda.o convolutional_layer.o list.o image.o activations.o im2col.o col2im.o blas.o crop_layer.o dropout_layer.o maxpool_layer.o softmax_layer.o data.o matrix.o network.o connected_layer.o cost_layer.o parser.o option_list.o darknet.o detection_layer.o captcha.o route_layer.o writing.o box.o nightmare.o normalization_layer.o avgpool_layer.o coco.o dice.o yolo.o detector.o layer.o compare.o classifier.o local_layer.o swag.o shortcut_layer.o activation_layer.o rnn_layer.o gru_layer.o rnn.o rnn_vid.o crnn_layer.o demo.o tag.o cifar.o go.o batchnorm_layer.o art.o region_layer.o reorg_layer.o reorg_old_layer.o super.o voxel.o tree.o yolo_layer.o gaussian_yolo_layer.o upsample_layer.o lstm_layer.o conv_lstm_layer.o scale_channels_layer.o sam_layer.o
ifeq ($(GPU), 1)
LDFLAGS+= -lstdc++
OBJ+=convolutional_kernels.o activation_kernels.o im2col_kernels.o col2im_kernels.o blas_kernels.o crop_layer_kernels.o dropout_layer_kernels.o maxpool_layer_kernels.o network_kernels.o avgpool_layer_kernels.o
endif

OBJS = $(addprefix $(OBJDIR), $(OBJ))
DEPS = $(wildcard src/*.h) Makefile include/darknet.h

all: $(OBJDIR) backup results setchmod $(EXEC) $(LIBNAMESO) $(APPNAMESO)

ifeq ($(LIBSO), 1)
CFLAGS+= -fPIC

$(LIBNAMESO): $(OBJDIR) $(OBJS) include/yolo_v2_class.hpp src/yolo_v2_class.cpp
	$(CPP) -shared -std=c++11 -fvisibility=hidden -DLIB_EXPORTS $(COMMON) $(CFLAGS) $(OBJS) src/yolo_v2_class.cpp -o $@ $(LDFLAGS)

$(APPNAMESO): $(LIBNAMESO) include/yolo_v2_class.hpp src/yolo_console_dll.cpp
	$(CPP) -std=c++11 $(COMMON) $(CFLAGS) -o $@ src/yolo_console_dll.cpp $(LDFLAGS) -L ./ -l:$(LIBNAMESO)
endif

$(EXEC): $(OBJS)
	$(CPP) -std=c++11 $(COMMON) $(CFLAGS) $^ -o $@ $(LDFLAGS)

$(OBJDIR)%.o: %.c $(DEPS)
	$(CC) $(COMMON) $(CFLAGS) -c $< -o $@

$(OBJDIR)%.o: %.cpp $(DEPS)
	$(CPP) -std=c++11 $(COMMON) $(CFLAGS) -c $< -o $@

$(OBJDIR)%.o: %.cu $(DEPS)
	$(NVCC) $(ARCH) $(COMMON) --compiler-options "$(CFLAGS)" -c $< -o $@

$(OBJDIR):
	mkdir -p $(OBJDIR)
backup:
	mkdir -p backup
results:
	mkdir -p results
setchmod:
	chmod +x *.sh

.PHONY: clean

clean:
	rm -rf $(OBJS) $(EXEC) $(LIBNAMESO) $(APPNAMESO)

Writing Makefile


In [None]:
#install environment from the Makefile
#note if you are on Colab Pro this works on a P100 GPU
#if you are on Colab free, you may need to change the Makefile for the K80 GPU
#this goes for any GPU, you need to change the Makefile to inform darknet which GPU you are running on.
# #note the Makefile above should work for you, if you need to tweak, try the below
# %cd darknet/
# #!sed -i 's/OPENCV=0/OPENCV=1/g' Makefile
# #!sed -i 's/GPU=0/GPU=1/g' Makefile
# #!sed -i 's/CUDNN=0/CUDNN=1/g' Makefile
# !#sed -i "s/ARCH= -gencode arch=compute_60,code=sm_60/ARCH= -gencode arch=compute_${compute_capability},code=sm_${compute_capability}/g" Makefile
# !make

chmod +x *.sh
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv4 2> /dev/null || pkg-config --cflags opencv` -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -I/usr/local/cudnn/include -fPIC -c ./src/image_opencv.cpp -o obj/image_opencv.o
[01m[K./src/image_opencv.cpp:[m[K In function ‘[01m[Kvoid draw_detections_cv_v3(void**, detection*, int, float, char**, image**, int, int)[m[K’:
                 float [01;35m[Krgb[m[K[3];
                       [01;35m[K^~~[m[K
[01m[K./src/image_opencv.cpp:[m[K In function ‘[01m[Kvoid cv_draw_object(image, float*, int, int, int*, float*, int*, int, char**)[m[K’:
         char [01;35m[Kbuff[m[K[100];
              [01;35m[K^~~~[m[K
     int [01;35m[Kit_tb_res[m[K = cv::createTrackbar(it_trackbar_name, window_name, &it_trackbar_value, 1000);
         [01;35m[K^~~~~~~~~[m[K
     i

In [None]:
#download the newly released yolov4 ConvNet weights
%cd /content/drive/MyDrive/darknet
# !wget https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-tiny-custom.cfg

/content/drive/MyDrive/darknet


## Set up Custom Dataset for YOLOv4

We used Roboflow to convert our dataset from any format to the YOLO Darknet format. 


In [None]:

#if you already have YOLO darknet format, you can skip this step
%cd /content/drive/MyDrive/darknet 
!curl -L "https://app.roboflow.com/ds/oX5KAe6whg?key=IVDN8mHOsy"  > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

/content/drive/MyDrive/darknet


In [None]:
#Set up training file directories for custom dataset
%cd /content/drive/MyDrive/darknet/
%cp train/_darknet.labels data/obj.names
%mkdir data/obj
# #copy image and labels
%cp train/*.jpg data/obj/
%cp valid/*.jpg data/obj/

%cp train/*.txt data/obj/
%cp valid/*.txt data/obj/

with open('data/obj.data', 'w') as out:
  out.write('classes = 1\n')
  out.write('train = data/train.txt\n')
  out.write('valid = data/valid.txt\n')
  out.write('names = /content/drive/MyDrive/darknet/data/obj.names\n')
  out.write('backup = backup/')

#write train file (just the image list)
import os

with open('data/train.txt', 'w') as out:
  for img in [f for f in os.listdir('train') if f.endswith('jpg')]:
    out.write('data/obj/' + img + '\n')

#write the valid file (just the image list)
import os

with open('data/valid.txt', 'w') as out:
  for img in [f for f in os.listdir('valid') if f.endswith('jpg')]:
    out.write('data/obj/' + img + '\n')

/content/drive/MyDrive/darknet
mkdir: cannot create directory ‘data/obj’: File exists


KeyboardInterrupt: ignored

## Write Custom Training Config for YOLOv4

In [None]:
#we build config dynamically based on number of classes
#we build iteratively from base config files. This is the same file shape as cfg/yolo-obj.cfg
def file_len(fname):
  with open(fname) as f:
    for i, l in enumerate(f):
      pass
  return i + 1

num_classes = file_len('train/_darknet.labels')
print("writing config for a custom YOLOv4 detector detecting number of classes: " + str(num_classes))

#Instructions from the darknet repo
#change line max_batches to (classes*2000 but not less than number of training images, and not less than 6000), f.e. max_batches=6000 if you train for 3 classes
#change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
if os.path.exists('./cfg/custom-yolov4-detector.cfg'): os.remove('./cfg/custom-yolov4-detector.cfg')


with open('./cfg/custom-yolov4-detector.cfg', 'a') as f:
  f.write('[net]' + '\n')
  f.write('batch=64' + '\n')
  #####smaller subdivisions help the GPU run faster. 12 is optimal, but you might need to change to 24,36,64####
  f.write('subdivisions=24' + '\n')
  f.write('width=416' + '\n')
  f.write('height=416' + '\n')
  f.write('channels=3' + '\n')
  f.write('momentum=0.949' + '\n')
  f.write('decay=0.0005' + '\n')
  f.write('angle=0' + '\n')
  f.write('saturation = 1.5' + '\n')
  f.write('exposure = 1.5' + '\n')
  f.write('hue = .1' + '\n')
  f.write('\n')
  f.write('learning_rate=0.001' + '\n')
  f.write('burn_in=1000' + '\n')
  ######you can adjust up and down to change training time#####
  ##Darknet does iterations with batches, not epochs####
  max_batches = num_classes*3000
  #max_batches = 2000
  f.write('max_batches=' + str(max_batches) + '\n')
  f.write('policy=steps' + '\n')
  steps1 = .8 * max_batches
  steps2 = .9 * max_batches
  f.write('steps='+str(steps1)+','+str(steps2) + '\n')

#Instructions from the darknet repo
#change line classes=80 to your number of objects in each of 3 [yolo]-layers:
#change [filters=255] to filters=(classes + 5)x3 in the 3 [convolutional] before each [yolo] layer, keep in mind that it only has to be the last [convolutional] before each of the [yolo] layers.

  with open('cfg/yolov4-custom2.cfg', 'r') as f2:
    content = f2.readlines()
    for line in content:
      f.write(line)    
    num_filters = (num_classes + 5) * 3
    f.write('filters='+str(num_filters) + '\n')
    f.write('activation=linear')
    f.write('\n')
    f.write('\n')
    f.write('[yolo]' + '\n')
    f.write('mask = 0,1,2' + '\n')
    f.write('anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401' + '\n')
    f.write('classes=' + str(num_classes) + '\n')

  with open('cfg/yolov4-custom3.cfg', 'r') as f3:
    content = f3.readlines()
    for line in content:
      f.write(line)    
    num_filters = (num_classes + 5) * 3
    f.write('filters='+str(num_filters) + '\n')
    f.write('activation=linear')
    f.write('\n')
    f.write('\n')
    f.write('[yolo]' + '\n')
    f.write('mask = 3,4,5' + '\n')
    f.write('anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401' + '\n')
    f.write('classes=' + str(num_classes) + '\n')

  with open('cfg/yolov4-custom4.cfg', 'r') as f4:
    content = f4.readlines()
    for line in content:
      f.write(line)    
    num_filters = (num_classes + 5) * 3
    f.write('filters='+str(num_filters) + '\n')
    f.write('activation=linear')
    f.write('\n')
    f.write('\n')
    f.write('[yolo]' + '\n')
    f.write('mask = 6,7,8' + '\n')
    f.write('anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401' + '\n')
    f.write('classes=' + str(num_classes) + '\n')
    
  with open('cfg/yolov4-custom5.cfg', 'r') as f5:
    content = f5.readlines()
    for line in content:
      f.write(line)

print("file is written!")    




writing config for a custom YOLOv4 detector detecting number of classes: 1
file is written!


In [None]:
#we build config dynamically based on number of classes
#we build iteratively from base config files. This is the same file shape as cfg/yolo-obj.cfg
import os
def file_len(fname):
  with open(fname) as f:
    for i, l in enumerate(f):
      pass
  return i + 1

num_classes = file_len('/content/drive/MyDrive/darknet/train/_darknet.labels')
max_batches = num_classes*5000
steps1 = .8 * max_batches
steps2 = .9 * max_batches
steps_str = str(steps1)+','+str(steps2)
num_filters = (num_classes + 5) * 3


print("writing config for a custom YOLOv4 detector detecting number of classes: " + str(num_classes))

#Instructions from the darknet repo
#change line max_batches to (classes*2000 but not less than number of training images, and not less than 6000), f.e. max_batches=6000 if you train for 3 classes
#change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
if os.path.exists('./cfg/custom-yolov4-tiny-detector.cfg'): os.remove('./cfg/custom-yolov4-tiny-detector.cfg')


#customize iPython writefile so we can write variables
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

writing config for a custom YOLOv4 detector detecting number of classes: 1


In [None]:
%%writetemplate ./cfg/yolov4-tiny.cfg
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=24
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = {max_batches}
policy=steps
steps={steps_str}
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
nms_kind=greedynms
beta_nms=0.6

In [None]:
#here is the file that was just written. 
#you may consider adjusting certain things

#like the number of subdivisions 64 runs faster but Colab GPU may not be big enough
#if Colab GPU memory is too small, you will need to adjust subdivisions to 16
# %cat cfg/custom-yolov4-detector.cfg
%cat cfg/yolov4-tiny.cfg

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=24
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 5000
policy=steps
steps=4000.0,4500.0
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]


## Train Custom YOLOv4 Detector

In [None]:
# !mkdir darknet
# %cd darknet
!ls

3rdparty			  darknet.py		 README.roboflow.txt
appveyor.yml			  darknet_video.py	 results
backup				  data			 result.txt
bad.list			  image_yolov2.sh	 result.xml
build				  image_yolov3.sh	 scripts
build.ps1			  include		 src
build.sh			  json_mjpeg_streams.sh  test
cfg				  libdarknet.so		 train
chart_custom-yolov4-detector.png  LICENSE		 uselib
chart.png			  Makefile		 valid
cmake				  net_cam_v3.sh		 video_v2.sh
CMakeLists.txt			  obj			 video_yolov3.sh
darknet				  predictions.jpg	 yolov4.conv.137
DarknetConfig.cmake.in		  README.md


In [None]:
# # !mkdir darknet
# !cp -r /content/drive/MyDrive/darknet/ ./darknet
!cd /content/drive/MyDrive/darknet
%cd /content/drive/MyDrive/darknet
!chmod +x ./darknet


/content/drive/MyDrive/darknet


In [None]:
# !./darknet detector train data/obj.data cfg/custom-yolov4-detector.cfg yolov4.conv.137 -dont_show -map
#If you get CUDA out of memory adjust subdivisions above!
#adjust max batches down for shorter training above
# !./darknet detector train data/obj.data cfg/custom-yolov4-detector.cfg yolov4.conv.137 -dont_show -map
!./darknet detector train data/obj.data cfg/custom-yolov4-tiny-detector.cfg backup/custom-yolov4-tiny-detector_best.weights -dont_show -map


[1;30;43mLe flux de sortie a été tronqué et ne contient que les 5000 dernières lignes.[0m
 4041: 0.361114, 0.334212 avg loss, 0.000100 rate, 0.648503 seconds, 193968 images, 0.191005 hours left
Loaded: 0.000033 seconds

 (next mAP calculation at 4100 iterations) 
 Last accuracy mAP@0.5 = 86.99 %, best = 88.29 % 
 4042: 0.299803, 0.330771 avg loss, 0.000100 rate, 0.677213 seconds, 194016 images, 0.190822 hours left
Loaded: 0.000035 seconds

 (next mAP calculation at 4100 iterations) 
 Last accuracy mAP@0.5 = 86.99 %, best = 88.29 % 
 4043: 0.351662, 0.332860 avg loss, 0.000100 rate, 0.642199 seconds, 194064 images, 0.190716 hours left
Loaded: 0.000033 seconds

 (next mAP calculation at 4100 iterations) 
 Last accuracy mAP@0.5 = 86.99 %, best = 88.29 % 
 4044: 0.450524, 0.344627 avg loss, 0.000100 rate, 0.628923 seconds, 194112 images, 0.190516 hours left
Loaded: 0.000051 seconds

 (next mAP calculation at 4100 iterations) 
 Last accuracy mAP@0.5 = 86.99 %, best = 88.29 % 
 4045: 0.334

# Infer on pages with Saved YOLO Weights



In this part, we use the model previously trained to translate a chapter. To do so, we take advantage of the detection to 


1.   Detect the text inside and translate it
2.   Do inpainting on this zone
3.   Put the text inside the zone as neatly as possible

We used easy-ocr, Google translate and cv2 inpainting to do so.


In [None]:
#define utility function
import os

def imShow(path):
  import cv2
  import matplotlib.pyplot as plt
  %matplotlib inline

  image = cv2.imread(path)
  height, width = image.shape[:2]
  resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)

  fig = plt.gcf()
  fig.set_size_inches(18, 10)
  plt.axis("off")
  #plt.rcParams['figure.figsize'] = [10, 5]
  plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
  plt.show()

In [None]:
#check if weigths have saved yet
#backup houses the last weights for our detector
#(file yolo-obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations)
#(file yolo-obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations)
#After training is complete - get result yolo-obj_final.weights from path build\darknet\x64\bac
!ls backup
#if it is empty you haven't trained for long enough yet, you need to train for at least 100 iterations

 custom-yolov4-detector_best.weights
 custom-yolov4-tiny-detector_1000.weights
 custom-yolov4-tiny-detector_2000.weights
 custom-yolov4-tiny-detector_3000.weights
 custom-yolov4-tiny-detector_4000.weights
 custom-yolov4-tiny-detector_5000.weights
 custom-yolov4-tiny-detector_best.weights
 custom-yolov4-tiny-detector_final.weights
 custom-yolov4-tiny-detector_last.weights
'last 89%'


In [None]:
#coco.names is hardcoded somewhere in the detector
%cp data/obj.names data/coco.names


In [None]:

#/test has images that we can test our detector on
test_images = [f for f in os.listdir('test') if f.endswith('.jpg')]
print(len(test_images))
import random
img_path = "test/" + test_images[5];
for path in test_images:
  img_path = "test/" + path 
  #test out our detector!
  !./darknet detect cfg/custom-yolov4-tiny-detector.cfg backup/custom-yolov4-tiny-detector_best.weights {img_path} -dont-show -thresh 0.3
  imShow('predictions.jpg')

Here are the utilitary functions to make a translated pdf chapter from a page the manga is displayed on.

In [None]:

!pip install PIL 
!pip install fpdf
!pip install googletrans==3.1.0a0
!pip install pytesseract
import requests
import os
from fpdf import FPDF
from PIL import Image
import matplotlib.image as mpimg

import numpy as np
import matplotlib.pyplot as plt
import cv2
import argparse
from pytesseract import Output
from googletrans import Translator
translator = Translator()
#FUTURES AMELIORATIONS : TROUVER LE NOMBRE DE PAGES TOUT SEUL, ADAPTER LA TAILLE DU TEXTE A CELLE DE LA PAGE, 
#SCINDER L IMAGE EN PLUSIEURS SI ELLE EST GRANDE POUR AVOIR UNE MEILLEURE PRECISION SUR LE TEXTE
# IMPORTER LES PAGES DES CHAPITRES EN RAW
def pasVide(text):
    if text !="" and not text.isspace():
        return True
    return False

#MAINTENANT IL VA FALLOIR CHARGER L'IMAGE,
#TROUVER LES ENDROITS AVEC DES TRUCS ECRITS NOIRS SUR BLANC
#LES TRADUIRE
#LES ECRIRE 

def translate_jpg_page(numpage, langue, langue_google):
    print(numpage)
    filename = nom_manga +"/chapters/"+ str(num_chap) +"/"+numpage+".jpg"
    img2 = mpimg.imread(filename, np.uint64)
    img = np.copy(img2).astype('uint8')
    width = img2.shape[1]
    for k in range(400):
        img = np.append(img, [[[255,255,255]] for k in range(img2.shape[0])], axis = 1).astype('uint8') 
    try:
        import Image
    except ImportError:
        from PIL import Image
    import pytesseract
    pytesseract.pytesseract.tesseract_cmd = r'/content/drive/MyDrive/Tesseract-OCR/tesseract.exe'

    grayImage = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
    tried = cv2.bilateralFilter(grayImage,9,75,75)
    (thresh, blackAndWhiteImage) = cv2.threshold(tried, 127, 255, cv2.THRESH_TOZERO)
    results = pytesseract.image_to_data(blackAndWhiteImage, output_type=Output.DICT, lang =langue)
    j = 0
    numtext = 1
    Y = 30
    X2 =0
    Y2 =0
    X= width + 10
    text2 = ""
    text3 = ""
    for i in range(0, len(results["text"])):
       # extract the bounding box coordinates of the text region from
       # the current result
        x = results["left"][i]
        y = results["top"][i]
        w = results["width"][i]
        h = results["height"][i]
        # extract the OCR text itself along with the confidence of the
        # text localization
        text = results["text"][i]
        conf = int(results["conf"][i])

       # filter out weak confidence text localizations
        if conf > 0:

           # strip out non-ASCII text so we can draw the text on the image
           # using OpenCV, then draw a bounding box around the text along
           # with the text itself

            if j!=0 and (y-Y2 >100 or x-X2 >200) and pasVide(text) and pasVide(text2):
                text3 += "[" + str(numtext) + "] " + translator.translate(text2, src = langue_google, dest='en').text +"\n"
                text2 = ""
                numtext+=1
            if text !="":           
                text2+= text
            Y2 = y 
            X2= x
            j+=1
    if pasVide(text2):
        text3 += "[" + str(numtext) + "] "  + translator.translate(text2, src = langue_google, dest='en').text

    dy = 15
    yf = Y

    for line in text3.split('\n'):
        yf += dy
        mots = line.split()
        text4 =""
        j = 0
        for mot in mots:
            j+=1
            text4 += mot + " "
            if j == 7:
                cv2.putText(img, text4, (X, yf), cv2.FONT_HERSHEY_SIMPLEX,0.35, (0,0,0))
                text4= ""
                yf+=dy
                j=0
        cv2.putText(img, text4, (X, yf), cv2.FONT_HERSHEY_SIMPLEX,0.4, (0,0,0))
   
    # show the output image
    cv2.imwrite(filename, img2)
def makePdf(pdfFileName, listPages):

    cover = Image.open(str(listPages[0]) + ".jpg")
    width, height = cover.size

    pdf = FPDF(unit = "pt", format = [width, height])

    for page in listPages:
        pdf.add_page()
        pdf.image(str(page) + ".jpg", 0, 0)

    pdf.output(pdfFileName, "F")
    
    
    
    
    
def create_chapter_pdf(nom_manga, num_chap, nombre_page, lang, lang_google):
    url = 'https://image..../comic/'+ nom_manga +'/chapters/'+ str(num_chap)+'/001.jpg'
    filename = url.split('comic/')[-1]
    if not os.path.exists(os.path.dirname(filename)):
        os.makedirs(os.path.dirname(filename))
    page = ""

    for k in range(1,nombre_page + 1):
        if k <10:
            page = "00" + str(k)
        elif k <100:
            page = "0"+str(k)
        else : 
            page = str(k)
        url = 'https://image..../comic/'+ nom_manga +'/chapters/'+ str(num_chap)+'/'+ page +'.jpg'
        r = requests.get(url, allow_redirects=True)
        filename = url.split('comic/')[-1]
        open(filename, 'wb').write(r.content)
        listPages = []
    filename = os.path.dirname(url.split('comic/')[-1])
#     for k in range(1,nombre_page + 1):
#         if k <10:
#             page = "00" + str(k)
#         elif k <100:
#             page = "0"+str(k)
#         else : 
#             page = str(k)
#         translate_jpg_page(page, lang, lang_google)
#         listPages.append(filename +"/"+ page)
#     filename = url.split('comic/')[-1]
#     name_pdf = os.path.dirname(filename)+ ".pdf"
#     makePdf(name_pdf, listPages)

## Translate pages

In [None]:
from tqdm import tqdm
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin, urlparse
def is_valid(url):
    """
    Checks whether `url` is a valid URL.
    """
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)
def get_all_images(url):
    """
    Returns all image URLs on a single `url`
    """
    soup = bs(requests.get(url).content, "html.parser")
    urls = []
    
    for img in tqdm(soup.find_all("img"), "Extracting images"):
        img_url = img.attrs.get("data-src")
        if not img_url:
          img_url = img.attrs.get("src")
        
          if not img_url:
              # if img does not contain src attribute, just skip
              continue
        img_url = urljoin(url, img_url)
        
        try:
            pos = img_url.index("?")
            img_url = img_url[:pos]
        except ValueError:
            pass
        try:
          pos = urlparse(img_url).path.rstrip('/').split('/')[-1].index(".")
          
          if urlparse(img_url).path.rstrip('/').split('/')[-1][pos:] != ".jpg": # and urlparse(img_url).path.rstrip('/').split('/')[-1][pos:] != ".png"
            img_url = ''
        except:
          img_url = ''
        if is_valid(img_url):
            urls.append(img_url)
    return urls
def download(url, pathname):
    """
    Downloads a file given an URL and puts it in the folder `pathname`
    """
    # if path doesn't exist, make that path dir
    if not os.path.isdir(pathname):
        os.makedirs(pathname)
    # download the body of response by chunk, not immediately
    response = requests.get(url, stream=True)
    # get the total file size
    file_size = int(response.headers.get("Content-Length", 0))
    # get the file name
    filename = os.path.join(pathname, urlparse(url).path.rstrip('/').split('/')[-1])
    # progress bar, changing the unit to bytes instead of iteration (default by tqdm)
    progress = tqdm(response.iter_content(1024), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True, unit_divisor=1024)
    with open(filename, "wb") as f:
        for data in progress:
            # write data read to the file
            f.write(data)
            # update the progress bar manually
            progress.update(len(data))
    return filename


def getImages(url):
  urls = get_all_images(url)
  new_urls = []
  for link in urls:
    new_url = download(link, '/content/drive/MyDrive/manga/')
    new_urls.append(new_url)
  return new_urls  

In [None]:
link = "https://....com/manga/chapter-01"
paths = getImages(link)

## Appliquer YOLO


In [None]:
print(paths)
paths = ['/content/drive/MyDrive/manga/003.jpg']

['/content/drive/MyDrive/manga/001.jpg', '/content/drive/MyDrive/manga/002.jpg', '/content/drive/MyDrive/manga/003.jpg', '/content/drive/MyDrive/manga/004.jpg', '/content/drive/MyDrive/manga/005.jpg', '/content/drive/MyDrive/manga/006.jpg', '/content/drive/MyDrive/manga/007.jpg', '/content/drive/MyDrive/manga/008.jpg', '/content/drive/MyDrive/manga/009.jpg']


In [None]:
from google_trans_new import google_translator  
translator = google_translator()  
for img_path in paths:
    image = plt.imread(img_path)
    (H, W) = image.shape[:2]
    print(H,W)
    # construct a blob from the input image and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes and
    # associated probabilities
    blob = cv2.resize(image, (416,416))/255.0
    plt.imsave(img_path, blob)
    
    !./darknet detect cfg/custom-yolov4-detector.cfg backup/custom-yolov4-detector_best.weights {img_path} -dont-show -ext_output > result.txt
    imShow('predictions.jpg')
    f = open("result.txt", "r")
    h_factor, w_factor = H/416, W/416
    plt.imsave(img_path, image)
    for line in f.readlines():
      if line[0:4]=="Text":
        x1,y1,h,w = 0,0,0,0
        for i in range(len(line)-6):
          if line[i:i+7] == "left_x:":
            print("x1 =" + str(int(line[i+7:i+12])) )
            x1=int(max(int(line[i+7:i+12]),0)*w_factor)
          elif line[i:i+6] == "top_y:":
            print("y1 =" + str(int(line[i+6:i+11])) )
            y1=int(max(int(line[i+6:i+11]),0)*h_factor)
          elif line[i:i+6] == "width:":
            print("w =" + str(int(line[i+6:i+11])) )
            w= int(max(int(line[i+6:i+11]),0)*w_factor)
          elif line[i:i+7] == "height:":
            print("h =" + str(int(line[i+7:i+12])) )
            h= int(max(int(line[i+7:i+12]),0)*h_factor)
        grayImage = cv2.cvtColor(image[y1:y1+h,x1:x1+w], cv2.COLOR_BGR2RGB)
        tried = cv2.bilateralFilter(grayImage,9,75,75)
        (thresh, blackAndWhiteImage) = cv2.threshold(tried, 127, 255, cv2.THRESH_TOZERO)
        text = find_text(blackAndWhiteImage, 'kor') #chi_sim chi_tra
        
        if pasVide(text):
          print("text : " +  translator.translate(text, lang_tgt='en'))


    f.close()
    
    


In [None]:
!pip install git+git://github.com/jaidedai/easyocr.git
import easyocr
reader = easyocr.Reader([ 'ko', 'en']) # need to run only once to load model into memory 'ch_sim','en', 'ko', 'ja'

In [None]:
import os
%cd /content/drive/MyDrive/
%cp manga/*.jpg manga/resized/

%cd /content/drive/MyDrive/darknet/
with open('/content/drive/MyDrive/manga/manga.txt', 'w') as out:
  for img in [f for f in os.listdir('/content/drive/MyDrive/manga') if f.endswith('jpg')]:
    out.write('/content/drive/MyDrive/manga/resized/' + img + '\n')
for f in os.listdir('/content/drive/MyDrive/manga/resized'):
  image = plt.imread('/content/drive/MyDrive/manga/resized/'+f )
  blob = cv2.resize(image, (416,416))/255.0
  plt.imsave('/content/drive/MyDrive/manga/resized/'+f , blob)
!./darknet detector test cfg/voc.data cfg/custom-yolov4-detector.cfg backup/custom-yolov4-detector_best.weights -dont_show -ext_output < ../manga/manga.txt > result.txt




In [None]:
import textwrap
f = open("result.txt", "r")

for line in f.readlines():
  if line[0:5]=="Enter":
    for i in range(len(line)-3):
      if line[i:i+4] == ".jpg":
        filename = line[0:i+4].split('/')[-1]
        img_path = '/content/drive/MyDrive/manga/'+filename
    image= plt.imread(img_path)
    H,W = image.shape[:2]
    h_factor, w_factor = H/416, W/416
  if line[0:4]=="aero":
    x1,y1,h,w = 0,0,0,0
    for i in range(len(line)-6):
      if line[i:i+7] == "left_x:":
        print("x1 =" + str(int(line[i+7:i+12])) )
        x1=int(max(int(line[i+7:i+12]),0)*w_factor)
      elif line[i:i+6] == "top_y:":
        print("y1 =" + str(int(line[i+6:i+11])) )
        y1=int(max(int(line[i+6:i+11]),0)*h_factor)
      elif line[i:i+6] == "width:":
        print("w =" + str(int(line[i+6:i+11])) )
        w= int(max(int(line[i+6:i+11]),0)*w_factor)
      elif line[i:i+7] == "height:":
        print("h =" + str(int(line[i+7:i+12])) )
        h= int(max(int(line[i+7:i+12]),0)*h_factor)
    # grayImage = cv2.cvtColor(image[y1:y1+h,x1:x1+w], cv2.COLOR_BGR2RGB)
    # tried = cv2.bilateralFilter(grayImage,9,75,75)
    # (thresh, blackAndWhiteImage) = cv2.threshold(tried, 127, 255, cv2.THRESH_TOZERO)
    plt.imsave(img_path, image[y1:y1+h,x1:x1+w])
    text = find_text_easy_ocr(img_path, reader) 
    plt.imsave(img_path, image)
    if pasVide(text):
      #On doit créer le mask pour inpainting
      mask = [[[0,0,0] for k in range(H)]for i in range(W)]
      for y in range(y1+1, min(y1+h, H)):
        for x in range(x1+1,min(x1+w, W)):
          mask[y][x] = [1,1,1]

      mask = cv2.cvtColor(np.uint8(np.array(mask)), cv2.COLOR_RGB2GRAY)
      image= cv2.inpaint(image, mask, 3, cv2.INPAINT_TELEA)
      
      
      text=  translator.translate(text, lang_tgt='en')
      lines = textwrap.wrap(text, width = int(w/15)) #On met le texte en plusieurs lignes délimitées par les cadres
      y2 = y1+15
      for line in lines:
            cv2.putText(
              image, #numpy array on which text is written
              line, #text
              (x1,y2), #position at which writing has to start
              cv2.FONT_HERSHEY_SIMPLEX, #font family
              1, #font size
              (130, 80, 150, 255), #font color
              3) #
            y2 += 30
      plt.imshow(image)
      plt.show()
      plt.imsave(img_path, image)

f.close()

In [None]:
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw
import textwrap
mask = [[[0,0,0] for k in range(H)]for i in range(W)]
for y in range(y1, y1+h):
  for x in range(x1,x1+w):
    mask[y][x] = [1,1,1]

mask = cv2.cvtColor(np.uint8(np.array(mask)), cv2.COLOR_RGB2GRAY)
image_inpaint = cv2.inpaint(image, mask, 3, cv2.INPAINT_TELEA)
plt.imshow(image_inpaint)
plt.show()
text=  translator.translate(text, lang_tgt='en')
lines = textwrap.wrap(text, width = w/15) #On met le texte en plusieurs lignes délimitées par les cadres
y2 = y1+15
for line in lines:
      cv2.putText(
        image, #numpy array on which text is written
        line, #text
        (x1,y2), #position at which writing has to start
        cv2.FONT_HERSHEY_SIMPLEX, #font family
        1, #font size
        (130, 80, 150, 255), #font color
        3) #
      y2 += 30
plt.imshow(image_inpaint)
plt.show()
print(lines)

In [None]:

from google_trans_new import google_translator  
translator = google_translator()  
for img_path in paths:
    image = plt.imread(img_path)
    (H, W) = image.shape[:2]
    print(H,W)
    # construct a blob from the input image and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes and
    # associated probabilities
    blob = cv2.resize(image, (416,416))/255.0
    plt.imsave(img_path, blob)
    
    !./darknet detect cfg/custom-yolov4-detector.cfg backup/custom-yolov4-detector_best.weights {img_path} -dont-show -ext_output > result.txt
    imShow('predictions.jpg')
    f = open("result.txt", "r")
    h_factor, w_factor = H/416, W/416
    plt.imsave(img_path, image)
    for line in f.readlines():
      if line[0:4]=="Text":
        x1,y1,h,w = 0,0,0,0
        for i in range(len(line)-6):
          if line[i:i+7] == "left_x:":
            print("x1 =" + str(int(line[i+7:i+12])) )
            x1=int(max(int(line[i+7:i+12]),0)*w_factor)
          elif line[i:i+6] == "top_y:":
            print("y1 =" + str(int(line[i+6:i+11])) )
            y1=int(max(int(line[i+6:i+11]),0)*h_factor)
          elif line[i:i+6] == "width:":
            print("w =" + str(int(line[i+6:i+11])) )
            w= int(max(int(line[i+6:i+11]),0)*w_factor)
          elif line[i:i+7] == "height:":
            print("h =" + str(int(line[i+7:i+12])) )
            h= int(max(int(line[i+7:i+12]),0)*h_factor)
        # grayImage = cv2.cvtColor(image[y1:y1+h,x1:x1+w], cv2.COLOR_BGR2RGB)
        # tried = cv2.bilateralFilter(grayImage,9,75,75)
        # (thresh, blackAndWhiteImage) = cv2.threshold(tried, 127, 255, cv2.THRESH_TOZERO)
        plt.imsave(img_path, image[y1:y1+h,x1:x1+w])
        text = find_text_easy_ocr(img_path, reader) 
        plt.imsave(img_path, image)
        plt.imshow(image[y1:y1+h,x1:x1+w])
        plt.show()
        if pasVide(text):
          print("text : " +  translator.translate(text, lang_tgt='en'))


    f.close()
    
    

In [None]:
!pip install PIL 
!pip install fpdf
!pip install google_trans_new
!pip install pytesseract
!sudo apt install tesseract-ocr
!sudo apt-get install tesseract-ocr-chi-sim
!sudo apt-get install tesseract-ocr-chi-tra

!sudo apt-get install tesseract-ocr-kor
!sudo apt-get install tesseract-ocr-jpn
!apt install libtesseract-dev

In [None]:
!which tesseract
# !export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/
!tesseract --list-langs

/usr/bin/tesseract
List of available languages (3):
osd
chi_sim
eng


In [None]:
pytesseract.pytesseract.tesseract_cmd = (r'/usr/bin/tesseract')
def find_text(image, langue):
  from pytesseract import Output
  try:
      import Image
  except ImportError:
      from PIL import Image
  import pytesseract
  results = pytesseract.image_to_data(image, output_type=Output.DICT, lang =langue)
  text = ""
  for i in range(0, len(results["text"])):
    if int(results["conf"][i])>0:
      text+=results["text"][i]
  print(results)
  return text

In [None]:
def find_text_easy_ocr(image_path, reader):
  try:
      import Image
  except ImportError:
      from PIL import Image
  results = reader.readtext(image_path, detail = 0)
  text = ""
  for i in range(0, len(results)):
    text+=results[i]
  return text

Convert to Keras model :

In [None]:
# %cd ..
!git clone https://github.com/allanzelener/yad2k.git


Cloning into 'yad2k'...
remote: Enumerating objects: 243, done.[K
remote: Total 243 (delta 0), reused 0 (delta 0), pack-reused 243[K
Receiving objects: 100% (243/243), 2.35 MiB | 16.16 MiB/s, done.
Resolving deltas: 100% (106/106), done.


In [None]:
%cd yad2k

/content/drive/MyDrive/YOLOv4-Keras-Converter/yad2k


In [None]:
%ls
!chmod +x ./yad2k.py

environment.yml  LICENSE          test_yolo.py             [0m[01;32myad2k.py[0m*
[01;34metc[0m/             [01;34mmodel_data[0m/      train_overfit.py
[01;34mfont[0m/            README.md        [01;34mvoc_conversion_scripts[0m/
[01;34mimages[0m/          retrain_yolo.py  [01;34myad2k[0m/


In [None]:
!./yad2k.py -p "/content/drive/MyDrive/darknet/cfg/yolov4-tiny.cfg" /content/drive/MyDrive/YOLOv4-Keras-Converter/yolov4-tiny.weights /content/drive/MyDrive/YOLOv4-Keras-Converter/model/yolov4tiny.h5

In [None]:
%cd /content/drive/MyDrive/YOLOv4-Keras-Converter/


/content/drive/MyDrive/YOLOv4-Keras-Converter


In [None]:
!chmod +x ./convert.py

In [None]:
!./convert.py /content/drive/MyDrive/darknet/cfg/custom-yolov4-tiny-detector.cfg yolov4.weights model/yolov4.mlmodel

# Training with Yolov3

In [None]:
!git clone https://github.com/roboflow-ai/keras-yolo3

Cloning into 'keras-yolo3'...
remote: Enumerating objects: 169, done.[K
remote: Total 169 (delta 0), reused 0 (delta 0), pack-reused 169[K
Receiving objects: 100% (169/169), 172.74 KiB | 217.00 KiB/s, done.
Resolving deltas: 100% (80/80), done.


In [None]:
%cd keras-yolo3/

/content/keras-yolo3


In [None]:
!curl -L "https://app.roboflow.com/ds/.........." > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   892  100   892    0     0   1037      0 --:--:-- --:--:-- --:--:--  1036
100 23.3M  100 23.3M    0     0  10.6M      0  0:00:02  0:00:02 --:--:-- 27.8M
Archive:  roboflow.zip
 extracting: README.roboflow.txt     
   creating: test/
 extracting: test/0003_jpg.rf.0845a6b934d8ea4df1ff607610c980d1.jpg  
 extracting: test/0015_jpg.rf.1b3fe77ab35de5a59702a2512f720b2f.jpg  
 extracting: test/0017_jpg.rf.11cec208052fd0f5a8bcde0813b55fea.jpg  
 extracting: test/001_jpg.rf.7ee053c4598b0e64f2b9698c9c4cb541.jpg  
 extracting: test/0021_jpg.rf.50197f6f5b077cf5ea6f7a0e5b950973.jpg  
 extracting: test/002_jpg.rf.8b30b7044d7cb3402deb1e85cbbb8389.jpg  
 extracting: test/002_jpg.rf.a5cc54f827916a2903d2249470dc366a.jpg  
 extracting: test/003_jpg.rf.0413d1eadf47755135bb1fc971ecab4f.jpg  
 extracting: test/0046_jpg.rf.2c95e3f1f3ad88f946d3706caf

In [None]:
%cd train

/content/keras-yolo3/train


In [None]:
%mv * ../

In [None]:
%cd ..

/content/keras-yolo3


In [None]:
%ls

0001_jpg.rf.1951d588f8633574d5ff5d6e116d2ef4.jpg
0001_jpg.rf.1bcf29843cb242de9936837ecfb9a0bb.jpg
0001_jpg.rf.3459453a4acb34608ca8aa5294076da0.jpg
0001_jpg.rf.6560f765574a26bdba33a07fa964276e.jpg
0001_jpg.rf.f6505d8be7de8d8c656261b6642328ea.jpg
0001_jpg.rf.f9963114a969131a6a31bf4a7c71de15.jpg
0002--1-_jpg.rf.28fd85ac4529b366961f302dff88a425.jpg
0002--1-_jpg.rf.8bc723b2021e5805813234a44ce176a6.jpg
0002--1-_jpg.rf.c170c304fe7bda63d8e5fef8a017e015.jpg
0002_jpg.rf.4327d91e50fdf07300850a620d613a96.jpg
0002_jpg.rf.9b09e54acf524ed11878f391cdcb416a.jpg
0002_jpg.rf.f137f608d5942a41caf71290551191df.jpg
0003_jpg.rf.3609a013eeef4f1c86208631bdb72ca9.jpg
0003_jpg.rf.82abf506b5eb046b9456f92d124bd50e.jpg
0003_jpg.rf.ba3b9106b53fede4c09185bb0c24a3a3.jpg
0004_jpg.rf.13ad731c70354cac2fb9dea4bae9863a.jpg
0004_jpg.rf.5d94e0bab58b7f3c90a226728b691a14.jpg
0004_jpg.rf.7648c1193facd36701c3d3ab1bc8a439.jpg
0004_jpg.rf.98463c5a9c6771640bfae2bba629914d.jpg
0004_jpg.rf.d88e49bd9ac094432f399f09feccb69e.jpg
0004_jpg

In [None]:
!wget https://pjreddie.com/media/files/yolov3-tiny.weights

--2021-01-18 22:54:34--  https://pjreddie.com/media/files/yolov3-tiny.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35434956 (34M) [application/octet-stream]
Saving to: ‘yolov3-tiny.weights’


2021-01-18 22:54:39 (7.57 MB/s) - ‘yolov3-tiny.weights’ saved [35434956/35434956]



In [None]:
# %tensorflow_version 1.x

from google.colab import drive
%matplotlib inline
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!python convert.py yolov3-tiny.cfg yolov3-tiny.weights /content/drive/MyDrive/YOLOv3/yolo.h5

2021-01-18 22:58:11.380322: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Loading weights.
Weights Header:  0 2 0 [32013312]
Parsing Darknet config.
Creating Keras model.
Parsing section net_0
Parsing section convolutional_0
conv2d bn leaky (3, 3, 3, 16)
2021-01-18 22:58:13.060290: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-18 22:58:13.061562: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-18 22:58:13.117111: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-18 22:58:13.117703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7

In [None]:
!pip install keras==2.2.4
# !pip install keras.applications==1.0.8
# !pip install gast==0.2.2
# !pip install keras-preprocessing==1.0.5

Collecting keras==2.2.4
  Using cached https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f16c0ff5aaab/Keras-2.2.4-py2.py3-none-any.whl
Installing collected packages: keras
  Found existing installation: Keras 2.4.3
    Uninstalling Keras-2.4.3:
      Successfully uninstalled Keras-2.4.3
Successfully installed keras-2.2.4


In [None]:
"""
Self-contained Python script to train YOLOv3 on your own dataset
"""
import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping

from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
from yolo3.utils import get_random_data


def _main():
    annotation_path = '_annotations.txt'  # path to Roboflow data annotations
    log_dir = '/content/drive/MyDrive/YOLOv3/logs/000/'                 # where we're storing our logs
    classes_path = '_classes.txt'         # path to Roboflow class names
    anchors_path = 'model_data/tiny_yolo_anchors.txt'
    class_names = get_classes(classes_path)
    print("-------------------CLASS NAMES-------------------")
    print(class_names)
    print("-------------------CLASS NAMES-------------------")
    num_classes = len(class_names)
    anchors = get_anchors(anchors_path)

    input_shape = (416,416) # multiple of 32, hw

    is_tiny_version = len(anchors)==6 # default setting
    if is_tiny_version:
        model = create_tiny_model(input_shape, anchors, num_classes,
            freeze_body=2, weights_path='/content/drive/MyDrive/YOLOv3/logs/000/ep483-loss27.896-val_loss1373.584.h5')
    else:
        model = create_model(input_shape, anchors, num_classes,
            freeze_body=2, weights_path='/content/drive/MyDrive/YOLOv3/logs/000/ep483-loss27.896-val_loss1373.584.h5') # make sure you know what you freeze

    logging = TensorBoard(log_dir=log_dir)
    checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
        monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
    early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

    val_split = 0.2 # set the size of the validation set
    with open(annotation_path) as f:
        lines = f.readlines()
    np.random.seed(10101)
    np.random.shuffle(lines)
    np.random.seed(None)
    num_val = int(len(lines)*val_split)
    num_train = len(lines) - num_val

    # Train with frozen layers first, to get a stable loss.
    # Adjust num epochs to your dataset. This step is enough to obtain a not bad model.
    if True:
        model.compile(optimizer=Adam(lr=1e-3), loss={
            # use custom yolo_loss Lambda layer.
            'yolo_loss': lambda y_true, y_pred: y_pred})

        batch_size = 32
        print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
        model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                steps_per_epoch=max(1, num_train//batch_size),
                validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
                validation_steps=max(1, num_val//batch_size),
                epochs=500,
                initial_epoch=483,
                callbacks=[logging, checkpoint])
        model.save_weights(log_dir + 'trained_weights_stage_1.h5')

    # Unfreeze and continue training, to fine-tune.
    # Train longer if the result is not good.
    if True:
        for i in range(len(model.layers)):
            model.layers[i].trainable = True
        model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
        print('Unfreeze all of the layers.')

        batch_size = 32 # note that more GPU memory is required after unfreezing the body
        print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
        model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
            steps_per_epoch=max(1, num_train//batch_size),
            validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
            validation_steps=max(1, num_val//batch_size),
            epochs=100,
            initial_epoch=50,
            callbacks=[logging, checkpoint, reduce_lr, early_stopping])
        model.save_weights(log_dir + 'trained_weights_final.h5')

    # Further training if needed.


def get_classes(classes_path):
    '''loads the classes'''
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    '''loads the anchors from a file'''
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape(-1, 2)


def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
            weights_path='/content/drive/MyDrive/YOLOv3/logs/000/ep483-loss27.896-val_loss1373.584.h5'):
    '''create the training model'''
    K.clear_session() # get a new session
    image_input = Input(shape=(None, None, 3))
    h, w = input_shape
    num_anchors = len(anchors)

    y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
        num_anchors//3, num_classes+5)) for l in range(3)]

    model_body = yolo_body(image_input, num_anchors//3, num_classes)
    print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))

    if load_pretrained:
        model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
        print('Load weights {}.'.format(weights_path))
        if freeze_body in [1, 2]:
            # Freeze darknet53 body or freeze all but 3 output layers.
            num = (185, len(model_body.layers)-3)[freeze_body-1]
            for i in range(num): model_body.layers[i].trainable = False
            print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))

    model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
        arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
        [*model_body.output, *y_true])
    model = Model([model_body.input, *y_true], model_loss)

    return model

def create_tiny_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
            weights_path='/content/drive/MyDrive/YOLOv3/logs/000/ep483-loss27.896-val_loss1373.584.h5'):
    '''create the training model, for Tiny YOLOv3'''
    K.clear_session() # get a new session
    image_input = Input(shape=(None, None, 3))
    h, w = input_shape
    num_anchors = len(anchors)

    y_true = [Input(shape=(h//{0:32, 1:16}[l], w//{0:32, 1:16}[l], \
        num_anchors//2, num_classes+5)) for l in range(2)]

    model_body = tiny_yolo_body(image_input, num_anchors//2, num_classes)
    print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))

    if load_pretrained:
        model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
        print('Load weights {}.'.format(weights_path))
        if freeze_body in [1, 2]:
            # Freeze the darknet body or freeze all but 2 output layers.
            num = (20, len(model_body.layers)-2)[freeze_body-1]
            for i in range(num): model_body.layers[i].trainable = False
            print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))

    model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
        arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})(
        [*model_body.output, *y_true])
    model = Model([model_body.input, *y_true], model_loss)

    return model

def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
    '''data generator for fit_generator'''
    n = len(annotation_lines)
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            if i==0:
                np.random.shuffle(annotation_lines)
            image, box = get_random_data(annotation_lines[i], input_shape, random=True)
            image_data.append(image)
            box_data.append(box)
            i = (i+1) % n
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
        yield [image_data, *y_true], np.zeros(batch_size)

def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
    n = len(annotation_lines)
    if n==0 or batch_size<=0: return None
    return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)

if __name__ == '__main__':
    _main()

Using TensorFlow backend.


-------------------CLASS NAMES-------------------
['Text']
-------------------CLASS NAMES-------------------














Create Tiny YOLOv3 model with 6 anchors and 1 classes.
Load weights /content/drive/MyDrive/YOLOv3/logs/000/ep483-loss27.896-val_loss1373.584.h5.
Freeze the first 42 layers of total 44 layers.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Train on 512 samples, val on 127 samples, with batch size 32.




Epoch 484/500

Epoch 485/500
Epoch 486/500
Epoch 487/500
Epoch 488/500
Epoch 489/500
Epoch 490/500
Epoch 491/500
Epoch 492/500
Epoch 493/500
Epoch 494/500
Epoch 495/500
Epoch 496/500
Epoch 497/500
Epoch 498/500
Epoch 499/500
Epoch 500/500
Unfreeze all of the layers.
Train on 512 samples, val on 127 samples, with batch size 32.
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100

Epoch 00062: ReduceLROnPlateau reduci

In [None]:
!python yolo.py --model= /content/drive/MyDrive/YOLOv3/logs/000/trained_weights_final.h5 --anchors='model_data/tiny_yolo_anchors.txt' --classes="_classes.txt" --input="./valid"

Using TensorFlow backend.


In [None]:
!python yolo_video.py --model /content/drive/MyDrive/YOLOv3/logs/000/trained_weights_final.h5 --anchors='./model_data/tiny_yolo_anchors.txt' --classes _classes.txt --folder

Using TensorFlow backend.
usage: yolo_video.py [-h] [--model MODEL_PATH] [--anchors ANCHORS_PATH]
                     [--classes CLASSES_PATH] [--gpu_num GPU_NUM] [--image]
                     [--input [INPUT]] [--output [OUTPUT]]
yolo_video.py: error: unrecognized arguments: --folder


In [None]:
%%writefile __init__.py

UsageError: %%writefile is a cell magic, but the cell body is empty.
