Skip to content

ETL/Flask project designed to scrape and loads the data into MongoDB and displays the information. A web scraping application which retrieves and presents summary information, the latest news, and images of Mars from NASA.

Notifications You must be signed in to change notification settings

PetraLee2019/NASA-Web-Scraping-Project

Repository files navigation

Mission to Mars

Background

A web scraping application which retrieves and presents summary information, the latest news, and images of Mars. A project designed to loads the data into MongoDB and displays the information in a single HTML page.

Process:

Scrape data from several websites containing Mars news. Different types of data included were images of Mars, tweets about the current Mars weather, a table of Mars facts, and headlines with the latest Mars news. After scraping, the data is stored in MongoDB and then loaded it into an HTML file using a Flask template that interfaces with Python and formatted with HTML using Bootstrap.

alt tag

Steps:

  • To scrape various websites for data related to the Mission to Mars and display output on Jupyter Notebook [Scraping_mission_to_mars.ipynb]
  • To create a Python Script [scrape_mars.py] to scrape and execute all scraping code and return one Python dictionary containing all of the scraped data
  • To create a Flask App [app.py] to create route (index and scrape). The root route / will query Mongo database and pass the mars data into an HTML template to display the data
  • To create HTML file [index.html] that will take the mars data dictionary and display all of the data in the appropriate HTML elements
  • To create Mongo db and collection to store the scraped data. PyMongo was used to set up mongo connection and to define db and collection

Prerequisites

  • The Python libraries flask, flask_pymongo, BeautifulSoup, and splinter must be installed in order for the code to run. The initial data scraping can be run either in a Jupyter Notebook or in Python

Technology Stack

  • HTML, CSS, BootStrap, Jupyter, Python
  • Python Libraries - Pandas, Beautiful Soup, Splinter, PyMongo
  • Database - Mongo DB
  • App Server - Flask

Sources:

  • Nasa Mars News Scrape the latest NASA Mars news using BeautifulSoup, splinter, pandas in a jupyter notebook.
  • JPL Space Images Using Splinter to navigate the site and scrape the JPL featured image of mars in full resolution.
  • Mars Weather Twitter Visit the Mars Weather Twitter account and scrape the latest Mars weather data.
  • Space Facts Mars facts table from Space-Facts.
  • USGS From United States Geological Survey Astrogeology to obtain high resulution images for each of Mar's Hemisphere.

Run without Heroku:

  • Copy/gather the files in this repo (don't need the .gitignore)
  • Start a MongoDB daemon in the terminal, then start mongo instance
  • Run the app.py in the terminal. Copy the local url to your web browser

About

ETL/Flask project designed to scrape and loads the data into MongoDB and displays the information. A web scraping application which retrieves and presents summary information, the latest news, and images of Mars from NASA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages