Check out the web app for this project!
https://charlesphil-mars-scrape.herokuapp.com/
The purpose of this site is to demonstrate the scraping, loading, and storage of many types of content from websites related to Mars and Mars exploration. To access the elements in the HTML, I used the popular BeautifulSoup Python library. To automate the clicking of items for access to these high quality images, I used the Splinter Python library to interact with elements in these pages.
The "Latest News" card scrapes the NASA news page to obtain a high resolution image link of the article as well as getting the headline and blurb.
The "Image of the Week" comes from NASA's home page, which gets the link from the image element along with the name of the image.
The "Mars Facts" card scrapes the table element on the NASA Mars Facts page and is processed using the Pandas Python library, which then gets exported as a string of HTML code.
Lastly, the "Mars Hemisphere" card gets the high resolution images and names from the United States Geological Survey Astropedia page on Mars.
For storing the data, I opted to use MongoDB, a NoSQL database, as this project mainly reads stored data that will not change very frequently. I do not have much need to write large amounts of data, and instead I am solely focused on easy content management of a few documents.
This project uses Anaconda environments to manage dependencies. In order to install the dependencies required for running the Flask app and Jupyter Notebook, first clone the repository and go to the console and type conda env create -f environment.yml
while in the project repository with Anaconda running to set up the Conda environment.
Once installed, activate the environment with conda activate mars_scrape
and you will then be able to run the Python app inside Missions_to_Mars.
Your console will look different depending on your set up.
This project requires MongoDB, a NoSQL database. If MongoDB is not installed on your device, please refer to https://www.mongodb.com/try/download/community for installation.
Please follow these instructions to install and start the service on your platform:
Linux (Red Hat, Ubuntu, Debian, SUSE, Amazon)
Once the environment is set up, navigate your console to Missions_to_Mars/
and run the command Python app.py
.
Open either Google Chrome or Mozilla Firefox to the localhost address listed in your console (most commonly will be http://127.0.0.1:5000/). This project requires either Chrome or Firefox to be installed on your system as the web automation library uses either the Chrome or Gecko web drivers to run the scrape.