This project creates a Scraper that allows the user to search coding articles on Hackernoon web site using keywords. And shows the results that match the keywords in an HTML file. The file after the search is located in searches folder.
I build this scraper to help people to optimize their time that spends looking for articles related to a specific topic on the Hackernoon website.
To get started, you should first get this file in your local machine by downloading this project or typing.
git clone https://github.com/Salvador-ON/Build-Your-Own-Scraper
Before you start using the scraper, you need to be sure that you have Ruby installed on your computer, by typing.
ruby -v
and it should return something like.
####### ruby 2.6.##### (20##-##-## revision 6####) [########]
If it's not installed in your system, follow this guide and it will help you to get it done.
You need to open your terminal and go to the downloaded folder, then You need to run the next command to install the gems.
bundle install
Then you can run the next command that starts the program.
ruby bin/main.rb
When the program starts, It is going to ask you to type the keywords separated by a space, remember to only use keywords to optimize the search.
ruby rails
By default, the target of Scraper right now is the coding Ruby section but you can change to any other Hackernoon section. You only need to change the URL that is in the initialize of the Browser class.
@browser.goto 'https://hackernoon.com/tagged/ruby'
The code includes unit testing using RSpec. To start the test run the next line in your terminal:
rspec
If you have a slow internet connection you can increment the wait time of the parsed_wait method that it is located in lib/browser.rb. Changing the number of sleep seconds. So the scraper could be able to change to other pages in the correct time.
sleep(1)
Salvador Olvera
- Linkedin: Salvador Olvera
- Github: @Salvador-ON
- Twitter: @Salvador Olvera_ON
Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.
Give a start if you like this project!
This project is MIT licensed.