## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

In [1]:
#Import the BeautifulSoup class creator from the package bs4.
#Import Broser from splintre package

In [2]:
import time
from splinter import Browser
from bs4 import BeautifulSoup as bs

In [3]:
#Function to init browser to Chrome

def init_browser():
    executable_path = {"executable_path": "/Users/rck/chrome_driver/chromedriver"}
    return Browser("chrome", **executable_path, headless=False)

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragragh Text. Assign the text to variables that you can reference later.

In [4]:
#Function to scrape the Latest News Article from NASA
#get only the first(latest) article
#1st scrape method declared

def get_news():
    url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'
    
    #try catch block to catch if error is encountered, pass will return the title and paragraph element of the article
    try:
        browser.visit(url)
        html_string = browser.html
        soup = bs(html_string, 'html.parser')

        div = soup.find('div', attrs={'class': 'list_text'})
        title=div.findNext('div', {'class': 'content_title'}).text            
        description=div.findNext('div', {'class': 'article_teaser_body'}).text
    except:
        pass
    return {"news_title":title,"news_p":description}

### JPL Mars Space Images - Featured Image

* Visit the url for JPL's Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

In [5]:
#Function to scrape the image URL from JPL page
def get_featured_image():
    url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
    
    #try catch block to catch if error is encountered, pass will return image URL
    try:
        browser.visit(url)
        button = browser.find_by_id("full_image")
        button.click()
        time.sleep(2) # will pause the execution of the loop for a specified amount of seconds

        html_string = browser.html
        
        #The 'html.parser' argument indicates that we want to do the parsing using Python’s built-in HTML parser.
        soup = bs(html_string, 'html.parser')
        anchor = soup.find('a','ready')
        if anchor.img:
            image_url = anchor.img['src']
        featured_image_url = "https://www.jpl.nasa.gov" + image_url      
    except:
        pass
    return featured_image_url

### Mars Weather

* Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

In [6]:
#Function to get latest weather update from Twitter
def get_latest_weather():
    url = 'https://twitter.com/marswxreport?lang=en'
    
    #try catch block to catch if error is encountered, pass will return latest weather update
    try:
        browser.visit(url)
        html_string = browser.html
        soup = bs(html_string, 'lxml')
        
        latest_weather = soup.find('div','js-tweet-text-container').text.strip()
    except:
        pass
    return latest_weather 

### Mars Facts

* Visit the Mars Facts webpage [here](http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

In [7]:
#Function to get Mars Facts information from Mars Facts webpage
def get_facts(): 
    url = 'https://space-facts.com/mars/'
    
    #try catch block to catch if error is encountered, pass will return mars facts if passed
    try:
        browser.visit(url)
        html_string = browser.html
        soup = bs(html_string, 'lxml')

        keys =[]
        values=[]
        table = soup.find('table','tablepress tablepress-id-mars')
        for row in table.find_all('tr'):
            columns = row.find_all('td')
            keys.append(columns[0].text)
            values.append(columns[1].text)
        facts = dict(zip(keys, values)) #facts in a dictionary as key-value pair
    except:
        pass
    return facts

### Mars Hemisperes

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemipshere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [8]:
#Function to to obtain high resolution images for each of Mar's hemispheres
def get_hemispheres():
    hemisphere_image_urls = []
    url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars' 
    
    #try catch block to catch if error is encountered, pass will return Mar's hemispheres images
    try:
        browser.visit(url)     
        html_string = browser.html
        soup = bs(html_string, 'lxml')

        for header in soup.find_all("h3"):
            title = header.text
            uri = header.find_previous("a")
            image_url = 'https://astrogeology.usgs.gov'+ uri['href'] 
            browser.visit(image_url)

            sub_html_string = browser.html
            sub_soup = bs(sub_html_string, 'lxml')
            image_url='https://astrogeology.usgs.gov' + str(sub_soup.find('img','wide-image')['src'])
            hemisphere_image_urls.append({"title": title, "img_url": image_url})
            browser.back()
    except:
        pass
    return hemisphere_image_urls

In [9]:
#Function scrape to call the functions created to scrape various needed information from various website and consolidate it 
#it to output object which is initialize as empty object.
def scrape():
    output = {}
    
    #call news scrape function - 1st scrape method
    news = get_news()
    
    #call image URL scrape function - 2nd scrape method
    featured_image_url= get_featured_image()
    
    #get latest weather update from Twitter - 3rd scrape method
    latest_weather=get_latest_weather()
    
    #get Mars Facts information from Mars Facts webpage -4th scrape method
    facts =get_facts()
    
    #Obtain high resolution images for each of Mar's hemispheres -5th declared scrape method
    hemisphere_image_urls =get_hemispheres() 
    
    #save all scrape information into output as key-value pair into a dictionary
    output ={ "news":news,"featured_image_url":featured_image_url,"weather":latest_weather,"facts":facts, "hemisphere_image_urls":hemisphere_image_urls
    }
    return output 

In [10]:
#call function to initialize Chrome Browser using Splinter
browser = init_browser()

#call scrape function to call the other scraping methods and return the consolidated information into output object
output = scrape()

In [11]:
#import json dependencies since output object (return value was a list of k-v pair/JSON format/dictionary)
#print the output object to view the parse information from 5 parsing methods created earlier

import json
print(json.dumps(output,indent=4))

{
    "news": {
        "news_title": "Mars Helicopter to Fly on NASA\u2019s Next Red Planet Rover Mission",
        "news_p": "NASA is adding a Mars helicopter to the agency\u2019s next mission to the Red Planet, Mars 2020."
    },
    "featured_image_url": "https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA17440_ip.jpg",
    "weather": "Sol 2047 (May 10, 2018), Sunny, high 3C/37F, low -71C/-95F, pressure at 7.33 hPa, daylight 05:22-17:20",
    "facts": {
        "Equatorial Diameter:": "6,792 km\n",
        "Polar Diameter:": "6,752 km\n",
        "Mass:": "6.42 x 10^23 kg (10.7% Earth)",
        "Moons:": "2 (Phobos & Deimos)",
        "Orbit Distance:": "227,943,824 km (1.52 AU)",
        "Orbit Period:": "687 days (1.9 years)\n",
        "Surface Temperature: ": "-153 to 20 \u00b0C",
        "First Record:": "2nd millennium BC",
        "Recorded By:": "Egyptian astronomers"
    },
    "hemisphere_image_urls": [
        {
            "title": "Cerberus Hemisphere Enhanced"

## Step 2 - MongoDB and Flask Application

Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

  * Store the return value in Mongo as a Python dictionary.

* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

In [14]:
#import dependencies needed to store the parse information into a Mongo Database
from pymongo import MongoClient

In [15]:
#Connect to the mongodb server and use mission_to_mars database
client = MongoClient("mongodb://localhost:27017")
db = client.mission_to_mars

In [19]:
#use mars_info collection on mission_to_marsDB
#query information from the DB and mars_info collection
#display the parse information that is save to the mongo db
new_info = db.mars_info.find_one()
new_info

{'_id': ObjectId('5af6774b3c64973a4c8a03eb'),
 'facts': {'Equatorial Diameter:': '6,792 km\n',
  'First Record:': '2nd millennium BC',
  'Mass:': '6.42 x 10^23 kg (10.7% Earth)',
  'Moons:': '2 (Phobos & Deimos)',
  'Orbit Distance:': '227,943,824 km (1.52 AU)',
  'Orbit Period:': '687 days (1.9 years)\n',
  'Polar Diameter:': '6,752 km\n',
  'Recorded By:': 'Egyptian astronomers',
  'Surface Temperature: ': '-153 to 20 °C'},
 'featured_image_url': 'https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA17009_ip.jpg',
 'hemisphere_image_urls': [{'img_url': 'https://astrogeology.usgs.gov/cache/images/cfa62af2557222a02478f1fcd781d445_cerberus_enhanced.tif_full.jpg',
   'title': 'Cerberus Hemisphere Enhanced'},
  {'img_url': 'https://astrogeology.usgs.gov/cache/images/3cdd1cbf5e0813bba925c9030d13b62e_schiaparelli_enhanced.tif_full.jpg',
   'title': 'Schiaparelli Hemisphere Enhanced'},
  {'img_url': 'https://astrogeology.usgs.gov/cache/images/ae209b4e408bb6c3e67b6af38168cf28_syrtis_majo

In [20]:
#Display the consolidated information queried from the mongo DB
for k, v in new_info.items(): 
    if k == "news":
        news = v
    elif k == "featured_image_url":
        featured_image_url = v
    elif k == "weather":
        weather = v
    elif k == "facts":
        facts = v
    elif k == "hemisphere_image_urls":
        hemisphere_image_urls = v
print(news)
print(featured_image_url)
print(weather)
print(facts)
print(hemisphere_image_urls)

{'news_title': 'Mars Helicopter to Fly on NASA’s Next Red Planet Rover Mission', 'news_p': 'NASA is adding a Mars helicopter to the agency’s next mission to the Red Planet, Mars 2020.'}
https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA17009_ip.jpg
Sol 2047 (May 10, 2018), Sunny, high 3C/37F, low -71C/-95F, pressure at 7.33 hPa, daylight 05:22-17:20
{'Equatorial Diameter:': '6,792 km\n', 'Polar Diameter:': '6,752 km\n', 'Mass:': '6.42 x 10^23 kg (10.7% Earth)', 'Moons:': '2 (Phobos & Deimos)', 'Orbit Distance:': '227,943,824 km (1.52 AU)', 'Orbit Period:': '687 days (1.9 years)\n', 'Surface Temperature: ': '-153 to 20 °C', 'First Record:': '2nd millennium BC', 'Recorded By:': 'Egyptian astronomers'}
[{'title': 'Cerberus Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/cfa62af2557222a02478f1fcd781d445_cerberus_enhanced.tif_full.jpg'}, {'title': 'Schiaparelli Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/3cdd1cbf5e0813bba