prices_pipeline

This code call three more python codes to automatize ETL Extract Transform and Load scraped data form a sales page.

Functions to call

_extract(), call the free repository named "prices_scraper" to extract data using beautifulsoup.

_transform(), call the free repository named "prices_cleaner" to transform numeric values and clean titles scraped.

_load(), call the free repository named "prices_load" to merge csv files created in the transform stage.

Requerements

You need to clone the three repositores related above into this repository, example:

Clone this repository
Move to prices_pipeline folder
Clone prices_scraper
Clone prices_cleaner
Clone prices_load

Next, create a folder named "raw" into the prices_scraper folder.

Also, create a folder named "prices_db" and other named "backup" in prices_pipeline folder.

For first use

Run the pipeline_prices.py without run the ** load ** function and rename the cleaned file to:

+'_db.csv'

This is necessary because the load data merge two diferents files a db file and the today data.

Next step

Simply run:

python3 pipeline_prices.py

and see the magic!!

Contribute

If you want contribute let me know or make a pull request I try to come daily to github to see what happen.

Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
pipeline_prices.py		pipeline_prices.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

pipeline_prices.py

pipeline_prices.py

Repository files navigation

prices_pipeline

Functions to call

Requerements

For first use

Next step

Contribute

About

Releases

Packages

Languages

License

datacloudgui/prices_pipeline

Folders and files

Latest commit

History

Repository files navigation

prices_pipeline

Functions to call

Requerements

For first use

Next step

Contribute

About

Resources

License

Stars

Watchers

Forks

Languages