Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 495 Bytes

README.md

File metadata and controls

9 lines (6 loc) · 495 Bytes

WebCrawlers

All the crawlers will work in headless mode.

  1. Crawler.java file crawls "Trivago" and prints all the hotels on the main page.
  2. HotelsCrawler.java file crawls "Hotels" and "Booking" and stores the data in 3 json files, one for each site.
  3. MedicalSiteCrawler.java file crawls "Clevelandclinic" and stores the diseases, their symptoms, diagnosis, and treatments in a csv file.

Note: To run these crawlers you will have to add 2 jar files; 1. selenium-server, 2. gson.