This Python package is designed to scrape historical space launch data from NextSpaceFlight.com and store it in Google Cloud Storage. It complements the Space-App project by providing the data backbone for various visualizations and analyses.
- Source: Scrapes comprehensive historical data from NextSpaceFlight.com.
- Historical Data: Gathers detailed information on past space launches.
- Data Transformation: Transforms the scraped data into a CSV format for easy consumption.
- Google Cloud Storage: Automatically uploads the scraped data to Google Cloud Storage.
- Data Update: Checks for existing data in Google Cloud Storage and appends new data.
- Error Handling: Robust error handling to ensure data integrity.
- Logging: Detailed logging for debugging and monitoring.
To install this package, run:
pip install git+https://github.com/Tanguy9862/NextSpaceFlight-Scrapper.git
After installation, you can import the package and use the scrape_past_launches_data()
function to scrape and update the data.
from next_spaceflight_scrapper import scraper
# Scrape and update historical launch data
scraper.scrape_past_launches_data()
- Python 3.x
- BeautifulSoup
- Requests
- Pandas
- Google Cloud Storage
To access Google Cloud Storage, you'll need a JSON file containing your GCS authentication keys. Place this file in the past_launches_scrapper
directory and name it spacexploration-keys.json
.
This project is licensed under the MIT License - see the LICENSE file for details.