Data Pipeline for Fetching, Processing, Adding Geospatial Data and Saving to S3 Bucket

This repository contains code for a data pipeline that fetches data from a website, processes it, adds geospatial data to it, and then saves it to an S3 bucket.

Code Description

The fetch_data function in fetch_data.py uses the requests library to fetch data from a website and the BeautifulSoup library to parse the HTML. It then extracts data from the website and stores it in a pandas data frame.

The process_data function in process_data.py takes the data frame created in fetch_data and processes it to keep only the relevant columns.

The add_coordinates function in add_coordinates.py adds latitude and longitude coordinates to the data frame by using the geopy library and the get_coordinates function from get_coordinates.py.

The, the save_data function in save_data.py saves the processed data frame to an S3 bucket using the boto3 library and the pyarrow library to convert the data frame to a Parquet format.

Finally main function in main.py ties everything together and runs the pipeline by calling the functions in the correct order.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
__pycache__		__pycache__
prices_project		prices_project
.gitignore		.gitignore
README.md		README.md
add_coordinates.py		add_coordinates.py
fetch_data.py		fetch_data.py
get_coordinates.py		get_coordinates.py
main.py		main.py
process_data.py		process_data.py
save_data.py		save_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipeline for Fetching, Processing, Adding Geospatial Data and Saving to S3 Bucket

Code Description

About

Releases

Packages

Languages

Valentine-pl/price_data

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline for Fetching, Processing, Adding Geospatial Data and Saving to S3 Bucket

Code Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages