# Web Scraping Homework - Mission to Mars

In [1]:
# Import dependencies
import requests
import pandas as pd
import pymongo
import time
from splinter import Browser
from bs4 import BeautifulSoup as bs
from webdriver_manager.chrome import ChromeDriverManager

In [2]:
# Setup splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)



Current google-chrome version is 99.0.4844
Get LATEST chromedriver version for 99.0.4844 google-chrome
Driver [/Users/siddharth/.wdm/drivers/chromedriver/mac64/99.0.4844.51/chromedriver] found in cache


### NASA Mars News
* Scrape the [Mars News Site](https://redplanetscience.com/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [4]:
# Set up the Mars News Site to be scraped
url = 'https://redplanetscience.com/'
browser.visit(url)

# Create BeautifulSoup object; parse with 'html.parser'
html = browser.html
soup = bs(html, 'html.parser')

In [5]:
# Scrape news title and news paragraph text
newstitle = soup.find('div', class_='content_title').text
paragraphtext = soup.find('div', class_='article_teaser_body').text

print(f"Title: {newstitle}")
print(f"Text: {paragraphtext}")

Title: NASA Perseverance Mars Rover Scientists Train in the Nevada Desert
Text: Team members searched for signs of ancient microscopic life there, just as NASA's latest rover will on the Red Planet next year.


### JPL Mars Space Images - Featured Image
* Visit the url for the Featured Space Image site [here](https://spaceimages-mars.com).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

In [6]:
# Set up the Mars Space image to be scraped
url2 = 'https://spaceimages-mars.com/'
browser.visit(url2)

# Click on button to see full size image
browser.links.find_by_partial_text('FULL IMAGE').click()

# Create BeautifulSoup object; parse with 'html.parser'
html2 = browser.html
soup2 = bs(html2, 'html.parser')

In [7]:
# Scrape full size image url
imageurl = soup2.find('img', class_='fancybox-image')['src']

# Concatenate complete url
featuredimageurl = url2 + imageurl

print(featuredimageurl)

https://spaceimages-mars.com/image/featured/mars3.jpg


### Mars Facts
* Visit the Mars Facts webpage [here](https://galaxyfacts-mars.com) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.

In [8]:
# URL for Mars Facts table
url3 = 'https://galaxyfacts-mars.com/'

# Use pandas to read html file
factstables = pd.read_html(url3)

factstables

[                         0                1                2
 0  Mars - Earth Comparison             Mars            Earth
 1                Diameter:         6,779 km        12,742 km
 2                    Mass:  6.39 × 10^23 kg  5.97 × 10^24 kg
 3                   Moons:                2                1
 4       Distance from Sun:   227,943,824 km   149,598,262 km
 5          Length of Year:   687 Earth days      365.24 days
 6             Temperature:     -87 to -5 °C      -88 to 58°C,
                       0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
 3                Moons:          2 ( Phobos & Deimos )
 4       Orbit Distance:       227,943,824 km (1.38 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                   -87 to -5 °C
 7         First Record:              2nd millennium BC

In [9]:
# Table with Mars planet facts
marstable = factstables[1]

marstable

Unnamed: 0,0,1
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 ( Phobos & Deimos )
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [10]:
# Use pandas to convert the data to a HTML table string
htmltable = marstable.to_html()

htmltable

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Mass:</td>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Moons:</td>\n      <td>2 ( Phobos &amp; Deimos )</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Orbit Distance:</td>\n      <td>227,943,824 km (1.38 AU)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Orbit Period:</td>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Surface Temperature:</td>\n      <td>-87 to -5 °C</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>First Record:</td>\n      <td>2nd millennium BC</td>\n   

### Mars Hemispheres
* Visit the astrogeology site [here](https://marshemispheres.com/) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [11]:
# Set up the Mars Hemispheres images to be scraped
url4 = 'https://marshemispheres.com/'
browser.visit(url4)

# Create BeautifulSoup object; parse with 'html.parser'
html4 = browser.html
soup4 = bs(html4, 'html.parser')

In [42]:
# Find the 'item' class for each hemisphere
classitems = soup4.find_all('div', class_='item')

# Lists for HTML scrape
hemisphere_url_list = []
title_list = []

# Loop through the 4 'item' classes
for hemisphere in classitems:

    # Scrape the link for each hemisphere website
    img_url = hemisphere.find('a')['href']
    hemisphere_url_list.append(url4 + img_url)
    
    # Scrape the title for each hemisphere
    title_list.append(hemisphere.find('h3').text)
    
title_list

['Cerberus Hemisphere Enhanced',
 'Schiaparelli Hemisphere Enhanced',
 'Syrtis Major Hemisphere Enhanced',
 'Valles Marineris Hemisphere Enhanced']

In [65]:
# List for fullsize image urls
fullsize_img_urls = []

# Loop through hemisphere urls list
for url in hemisphere_url_list:
    
    # Visit each website
    browser.visit(url)
    
    # Create BeautifulSoup object; parse with 'html.parser'
    html5 = browser.html
    soup5 = bs(html5, 'html.parser')
    
    # Find the 'wide-image' class for each hemisphere
    imagelinks = soup5.find('img', class_='wide-image')['src']
    fullsize_img_urls.append(url4 + imagelinks)
    
fullsize_img_urls

['https://marshemispheres.com/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg',
 'https://marshemispheres.com/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg',
 'https://marshemispheres.com/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg',
 'https://marshemispheres.com/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg']

In [None]:
browser.quit()

## MongoDB and Flask Application
* Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

* Store the return value in Mongo as a Python dictionary.

* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.