Skip to content

Treehouse Techdegree Project #6 - Website Content Scraper

Notifications You must be signed in to change notification settings

b2point0h/content-scraper

Repository files navigation

Website Scraping Application

Requirements

  • Create a scraper.js. This should be the file run every day.

  • The scraper should generate a folder called data if it doesn’t exist.

  • The information from the site should be stored in a CSV file with today’s day e.g. 2016-01-29.csv.

  • Use a third party npm package to scrape content from the site. You should be able to explain why you chose that package.

  • The scraper should be able to visit the website http://shirts4mike.com and follow links to all t-shirts.

  • The scraper should get the price, title, url and image url from the product page and save it in the CSV.

  • Use a third party npm package to create an CSV file. You should be able to explain why you chose that package.

  • The column headers should be in in this order Title, Price, ImageURL, URL and Time. Time should be the current date time of when the scrape happened. If they aren’t in this order the can’t be entered into the database of the price comparison site.

  • If the site is down, an error message describing the issue should appear in the console. This is to be tested by disabling wifi on your device.

  • If the data file for today already exists it should overwrite the file.

  • Code should be well documented.

Extra Credit

  • Use a linting tool like ESLint to check your code for syntax errors and to ensure general code quality. You should be able to run npm run lint to check your code.

  • When an error occurs log it to a file scraper-error.log . It should append to the bottom of the file with a time stamp and error e.g. [Tue Feb 16 2016 10:02:12 GMT-0800 (PST)]

About

Treehouse Techdegree Project #6 - Website Content Scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published