Skip to content

Kyle-Law/web_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper

Ruby Capstone Project of Microverse, which students have to complete a real-world-like project within 72 hours according to this project specifications

It's a 3-in-1 Web Scraper, which allows users to parse all courses from udacity.com and jobs from indeed.com and remote.io into CSV file.

image image

Job Scraped from Remote.io

image

Job Scraped from indeed.com

image

Courses Scraped from udacity.com

Built With

  • Ruby
  • Nokogiri gem
  • HTTParty gem

Project Structure

├── README.md
├── bin
│   └── main.rb
└── lib
    └── scraper.rb
    └── udacity_scraper.rb
    └── indeed_scraper.rb
    └── remoteio_scraper.rb
└── rspec
    └── scraper_spec.rb
    └── spec_helper.rb

Video Presentation

Feel free to check out this link for a 3min video walkthrough :)

Deployment

  1. Git clone this repo and cd the to the web_scraper directory.
  2. Run bundle install in command line to install Nokogiri and HTTParty Gem.
  3. Run bin/main.rb.
  4. Input either 'udacity', 'indeed', or 'remote.io' and follows the respective commands.
  5. Tada! 'udacity_courses.csv', 'indeed_jobs.csv', or 'remote_io.csv' would be created at the root directory respectively :)

Run tests

  1. Git clone this repo and cd the to the web_scraper directory.
  2. Install rspec with gem install rspec.
  3. Run rspec in Command Line.
  4. You would see failures because all 3 scraped files haven't been created yet.
  5. To solve it, run ruby bin/main.rb and input 'udacity', 'indeed', and 'remote.io' for every execution.
  6. Run rspec in CLI again. The test cases would success upon each file created :)

Authors

👤 Kyle Law

🤝 Contributing

Contributions, issues and feature requests are welcome!

Feel free to check the issues page.

Show your support

Give a ⭐️ if you like this project!

Acknowledgments

  • Microverse
  • Nokogiri gem
  • HTTParty Parser
  • Udacity.com
  • Indeed.com
  • Remote.io

📝 License

This project is MIT licensed.

About

A 3-in-1 web scraping program in Ruby, where the user can scrap jobs from indeed.com and remote.io into a well-structured CSV File. :)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages