Simple Image Crawling and File System Mapping

Have fun

Introduction

This is a simple tool mapping images and sub links in a single html page to file system directory structure.

When we run image_tool a simple file system server will be running background. The file system server is based on PathFileSystem provided by go-fuse. File system operation such as ls, cd, cat will trigger interface defined in go-fuse, so we implement some useful interface in order to update file system structure dynamicly. Currently the file system information including dir entry list, file attributes and file data is all stored in memory.

Build

$ export build_path=/path/to/build
$ mkdir -p $build_path/src/github.com/amyangfei && cd $build_path/src/github.com/amyangfei
$ git clone https://github.com/amyangfei/image_viewer
$ export GOPATH=$GOPATH:$build_path
$ cd image_viewer && make

Headless Crawling

Javascript executing is turned off by default. If we want to execute js, turn on --headless option and chrome headless will be used. Chrome and chrome driver is needed in headless mode. Dependencies installation instructions in Ubuntu/Debian is following:

$ curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
$ echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google.list
$ apt-get update && apt-get install -y google-chrome-stable
$ wget -N https://chromedriver.storage.googleapis.com/2.42/chromedriver_linux64.zip && unzip chromedriver_linux64.zip
$ mv -f chromedriver /usr/local/bin/chromedriver

TODO

Add test case
Dependency management
Javascript simulator, eg chrome headless
Better filename against urlencode
Image type detection, used for filename without extension
Image pre load acceleration for dir list
CI support
Duplicate url optimization
Better url and img src extract strategy

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
apps/image_tool		apps/image_tool
scripts		scripts
vendor		vendor
viewer		viewer
.gitignore		.gitignore
.travis.yml		.travis.yml
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
Makefile		Makefile
README.md		README.md
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apps/image_tool

apps/image_tool

scripts

scripts

vendor

vendor

viewer

viewer

.gitignore

.gitignore

.travis.yml

.travis.yml

Gopkg.lock

Gopkg.lock

Gopkg.toml

Gopkg.toml

Makefile

Makefile

README.md

README.md

test.sh

test.sh

Repository files navigation

Simple Image Crawling and File System Mapping

Introduction

Build

Headless Crawling

TODO

About

Releases

Packages

Languages

amyangfei/image_viewer

Folders and files

Latest commit

History

Repository files navigation

Simple Image Crawling and File System Mapping

Introduction

Build

Headless Crawling

TODO

About

Resources

Stars

Watchers

Forks

Languages