This code call three more python codes to automatize ETL Extract Transform and Load scraped data form a sales page.
_extract(), call the free repository named "prices_scraper" to extract data using beautifulsoup.
_transform(), call the free repository named "prices_cleaner" to transform numeric values and clean titles scraped.
_load(), call the free repository named "prices_load" to merge csv files created in the transform stage.
You need to clone the three repositores related above into this repository, example:
- Clone this repository
- Move to prices_pipeline folder
- Clone prices_scraper
- Clone prices_cleaner
- Clone prices_load
Next, create a folder named "raw" into the prices_scraper folder.
Also, create a folder named "prices_db" and other named "backup" in prices_pipeline folder.
Run the pipeline_prices.py without run the ** load ** function and rename the cleaned file to:
+'_db.csv'
This is necessary because the load data merge two diferents files a db file and the today data.
Simply run:
python3 pipeline_prices.py
and see the magic!!
If you want contribute let me know or make a pull request I try to come daily to github to see what happen.
Enjoy!