Store Info Web Crawler

This crawler fetches data from the websites of various websites (e.g. clubs, companies) in order to get information about their store locations, clubs, or other company informaiton. Information such as store name, locations, coordinates, phone number, operating hours, etc. See the results folder for the crawler output.

General Notes

Either crawler 1 or 2 was not working because the robots.txt was being misread. While the website's robots.txt allowed the specific URL to be accessed by crawlers, scapy did not read that correctly.
- Workaround: set ROBOTSTXT_OBEY to False in settings.py
- Further investigation needed.

Running the crawlers

Use the following commands to run the crawlers.

Output as JSON file:

scrapy crawl <name> -o results/<name>.json

Output as CSV file:

scrapy crawl <name> -o results/<name>.csv -t csv

Crawlers

The crawlers would need to be tested and changed on a regular basis to make sure they still works.

Name	Last Ran
towncaredental	2020-07-15
rickysalldaygrillcanada	2020-07-15
jockey	2020-07-15
rentking	2020-07-15
uae_free	2020-07-18
marketwatch_ipo	2020-07-15
maac	2020-07-15

Pipelines

XlsxWriterPipeline will take the items from a spider and place them in an excel spreadsheet. If the spider yields multiple items, they will be placed in separate sheets in the excel file.

Notes

Crawler 5 "uae_free"

This crawler was created specifically to answer the StackOverflow.com question "Crawl table data without 'next button' with Scrapy".
For help, I used the StackOverflow.com answer to the question "Crawling through pages with PostBack data javascript Python Scrapy".

Resources

ScraPy module for Python: https://docs.scrapy.org/en/latest/. Quick start-to-finish example: https://www.codementor.io/andy995/writing-a-simple-web-scraper-using-scrapy-myb7vrmgx
XPath syntax: https://devhints.io/xpath. Use Google Chrome Inspector (Dev tools) to test XPath to access HTML nodes of a website; example: https://yizeng.me/2014/03/23/evaluate-and-validate-xpath-css-selectors-in-chrome-developer-tools/
Network Log details/demo: https://developers.google.com/web/tools/chrome-devtools/network/

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
results		results
storeinfo		storeinfo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results

results

storeinfo

storeinfo

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

scrapy.cfg

scrapy.cfg

Repository files navigation

Store Info Web Crawler

General Notes

Running the crawlers

Crawlers

Pipelines

Notes

Crawler 5 "uae_free"

Resources

About

Releases

Packages

Languages

License

mikeym88/Store-Information-Crawler

Folders and files

Latest commit

History

Repository files navigation

Store Info Web Crawler

General Notes

Running the crawlers

Crawlers

Pipelines

Notes

Crawler 5 "uae_free"

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages