Skip to content

Latest commit

 

History

History
44 lines (37 loc) · 932 Bytes

README.md

File metadata and controls

44 lines (37 loc) · 932 Bytes

info-retrieval

Information Retrieval project

Websites in use:

  • ticinotopten.ch
    • good: many categories, local specialties
    • bad: poor meta for each activity
  • myswitzerland.ch
    • good: many categories, great meta
    • bad: need to go into pages for more detailed info per activity
  • zermatt.ch
    • good: many categories, great meta, local specialties
    • bad: poorly structured

Elements scraped:

  • Activity name
  • Activity type:
    • Hike
    • Cycling
    • Adventure
    • (Other - water, snow, parks etc.)
  • Region
  • Distance (km)
  • Duration (h)

If possible, also scrape:

  • Ascent
  • Description
  • Accessibility info

Files:

  • hiking_ti
    • crawls the hiking pages of ticinotopten.ch
  • activities_ti
    • crawls the other activity categories of ticinotopten.ch
  • hiking_ch1
    • crawls all activities of zermatt.ch
  • hiking_ch2
    • crawls the hiking section of myswitzerland.ch

TODO:

  • User eval