Skip to content

Console-based Hotels and Medical sites Web Crawlers developed using Java, Selenium, JSoup. Leveraged several data structures like HashMaps, ArrayList, etc. to store the crawled data.

Notifications You must be signed in to change notification settings

Rudra1402/WebCrawlers

Repository files navigation

WebCrawlers

All the crawlers will work in headless mode.

  1. Crawler.java file crawls "Trivago" and prints all the hotels on the main page.
  2. HotelsCrawler.java file crawls "Hotels" and "Booking" and stores the data in 3 json files, one for each site.
  3. MedicalSiteCrawler.java file crawls "Clevelandclinic" and stores the diseases, their symptoms, diagnosis, and treatments in a csv file.

Note: To run these crawlers you will have to add 2 jar files; 1. selenium-server, 2. gson.

About

Console-based Hotels and Medical sites Web Crawlers developed using Java, Selenium, JSoup. Leveraged several data structures like HashMaps, ArrayList, etc. to store the crawled data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages