Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build initial crawler script #1

Closed
corb1999 opened this issue Nov 15, 2021 · 0 comments
Closed

build initial crawler script #1

corb1999 opened this issue Nov 15, 2021 · 0 comments
Assignees

Comments

@corb1999
Copy link
Owner

finish building the script that does the following:

  • makes a list of urls to crawl, each url another page. use purr to generate the list
  • do an initial page read, then calculate how many total pages need to be read
  • write a function to read the html, then parse it into a tidy dataframe and include a Sys.sleep() to not spam
  • minor cleaning on the pages then compiling them all into one dataframe
  • append a time stamp to the result then export to a dropzone
@corb1999 corb1999 self-assigned this Nov 15, 2021
corb1999 added a commit that referenced this issue Nov 20, 2021
…gets the products and prices. partially solves issue #1 but now need to do some initial cleaning and then write out the final dataframe
corb1999 added a commit that referenced this issue Nov 20, 2021
…bject then write to csv. it works, this solves issue #1 and will in the future run this crawler periodically and then later work to compile outputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant