GitHub - 256hax/rururu: Simple web scraping tool by Ruby

My Pet Project. rururu (るるる) is a simple scraping tool. 「るるる」はシンプルなスクレイピングツールです。

Features

Get values with URLs and Elements. URLと検索要素（XPathなど）を指定して文字列を抽出
CSV output. 抽出結果はCSV形式で出力
Sleep per one scraping for prevent being consider an attack. スクレイピング中に攻撃とみなされないように１スクレイピング毎にスリープ時間を定義

Technologies

Frontend Layer

Application Layer

Ruby 💎
Sinatra 🎩
RubyGem Mechanize

Development

App file is only One-File also including images on Sinatra. Sinatraを使って１ファイルのみで開発

Install & Run

Install

Install Ruby
Download & Unarchive rururu(るるる) zip in GitHub
$ cd [Unarchive rururu folder]
$ gem install bundler:2.0.1
$ gem update bundler
$ bundle install --path vendor/bundle

Run

$ cd [Unarchive rururu folder]
$ bundle exec ruby app.rb
Open http://localhost:4567/ in browser
[Enjoy scraping!]
If you want to stop it, control + C in Terminal

HowTo Run

1.Form

1行1URLで入力

例：Wikipediaの日本とタイ王国のページをスクレイピングする場合

https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC
https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E7%8E%8B%E5%9B%BD

1行1要素/XPathで入力

例：首都と人口をXPathで取得する場合

//*[@id="infoboxCountry"]/dd[5]/table/tbody/tr[2]/td/a
//*[@id="infoboxCountry"]/dd[5]/table/tbody/tr[4]/td/dl/dd[3]/table/tbody/tr[1]/td/a[1]

XPathはChromeの検証ツールでかんたんに取得できます。取得方法は「Chrome XPath」で検索してみてください。

または、タグとクラス名で取得する場合、「[タグ名].[クラス名]」として指定できます。

dt.infoboxCountryNameJa

2.Result

Copy & Paste result(textarea) values to textpad. スクレイピング結果をメモ帳などにコピペ
Save as somename.csv. 適当な名前をつけてCSV形式で保存

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs/screenshot		docs/screenshot
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
app.rb		app.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/screenshot

docs/screenshot

.gitignore

.gitignore

Gemfile

Gemfile

Gemfile.lock

Gemfile.lock

LICENSE

LICENSE

README.md

README.md

app.rb

app.rb

Repository files navigation

Features

Technologies

Frontend Layer

Application Layer

Development

Install & Run

Install

Run

HowTo Run

1.Form

1行1URLで入力

1行1要素/XPathで入力

2.Result

About

Releases

Packages

Contributors 2

Languages

License

256hax/rururu

Folders and files

Latest commit

History

Repository files navigation

Features

Technologies

Frontend Layer

Application Layer

Development

Install & Run

Install

Run

HowTo Run

1.Form

1行1URLで入力

1行1要素/XPathで入力

2.Result

About

Topics

Resources

License

Stars

Watchers

Forks

Languages