Price scraper for a colombian technology vendor

Python scraper using beautifulsoup

Requered packages

import yaml
import requests
import bs4
import urllib.request
import argparse
import logging
import csv
import datetime

Usage

Four files are disposed:

config.yaml: You can put the URL to be scrapped. The file allow organize by retail site, category and queries.
common.py: A simple python code to import and parse the previous .yaml file.
item_page_object.py: Python class to read the page and provide a method to extract all articles of that page.

The constructor requires: the base URL (without page number), the category of products to be extracted and the total number of pages.
prices-scraper.py: The principal code with 3 arguments:
1. Retail site to be scraped (only alkosto are used at the moment).
2. Category of the product - twelve categories implemented at the moment.
3. Number of pages to scrap in the selected categories
##example:

python3 prices_scraper.py alkosto televisores 3 python3 prices_scraper.py alkosto computadores-tablets 6

The code use the Homepage class into a for loop to collect all the data of the selected category and saving on a csv file.

Some logging messages are used to inform the user about the progress telling at the end the total of articles founded.

Finally, an example of the exported data is provided

Next Steps

Improve the use of config.yaml to be the unique file to be changed if the sraped page change.

The cleaning and reporting code will be available soon.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
common.py		common.py
config.yaml		config.yaml
item_page_objects.py		item_page_objects.py
prices_scraper.py		prices_scraper.py
telefonos-report.csv		telefonos-report.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

common.py

common.py

config.yaml

config.yaml

item_page_objects.py

item_page_objects.py

prices_scraper.py

prices_scraper.py

telefonos-report.csv

telefonos-report.csv

Repository files navigation

Price scraper for a colombian technology vendor

Requered packages

Usage

Next Steps

Contributing

License

About

Releases

Packages

Languages

License

datacloudgui/prices_scraper

Folders and files

Latest commit

History

Repository files navigation

Price scraper for a colombian technology vendor

Requered packages

Usage

Next Steps

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Languages