# Web Scraping: Mission to Mars 

Objective: Build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page.

### Dependecies used in this Jupyter Notebook

In [1]:
# Dependencies
from bs4 import BeautifulSoup as bs
import requests
import pymongo
from splinter import Browser
import time

In [2]:
import pandas as pd

# Step 1 - Scraping

Created a Jupyter Notebook file called mission_to_mars.ipynb and used to complete all of the scraping and analysis tasks.
- NASA Mars News
- JPL Mars Space Images - Featured Image
- Mars Weather
- Mars Facts
- Mars Hemispheres

## 1. NASA Mars News
* Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [3]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
chromedriver_full_path = !which chromedriver

In [4]:
chromedriver_path = chromedriver_full_path.nlstr
chromedriver_path

'/usr/local/bin/chromedriver'

In [5]:
# Links
nasa_mars = 'https://mars.nasa.gov/news'

In [6]:
executable_path = {'executable_path': chromedriver_path}
browser = Browser('chrome', **executable_path, headless=False)

In [7]:
browser.visit(nasa_mars)

In [8]:
# FIND LATEST NEWS TITLE AND PARAGRAPH
html = browser.html
soup = bs(html, 'html.parser')

slides = soup.find('li', class_='slide')
# GET latest news title.
nasa_title = slides.find('div', class_='content_title')
latest_title = nasa_title.text
# GET latest news paragraph
description = slides.find('div', class_='rollover_description_inner')
latest_description = description.text
print(latest_title)
print (latest_description)

NASA's Mars 2020 Gets a Dose of Space Here on Earth
NASA's Mars 2020 spacecraft has completed tests that are the best Earthly approximations of what the spacecraft will endure during launch and interplanetary cruise.


Scraped data variables to pass:
* latest_title
* latest_description

In [9]:
# EXTRA: FIND ALL LATEST TITLES AND PARAGRAPHS
html = browser.html
soup = bs(html, 'html.parser')

slides = soup.find_all('li', class_='slide')
#nasa_titles = soup.find_all('div', class_='content_title')

#for title in nasa_titles:  
for title in slides:
    print("----------")
    nasa_title = title.find('div', class_='content_title').text
    nasa_title = nasa_title.strip()        
    print(nasa_title)
    description = title.find('div', class_='rollover_description_inner').text
    description = description.strip()
    print(description)    

----------
NASA's Mars 2020 Gets a Dose of Space Here on Earth
NASA's Mars 2020 spacecraft has completed tests that are the best Earthly approximations of what the spacecraft will endure during launch and interplanetary cruise.
----------
NASA Invites Public to Submit Names to Fly Aboard Next Mars Rover
From now till Sept. 30, the public can submit names to be stenciled on chips that will fly on the Mars 2020 rover and receive a souvenir boarding pass.
----------
NASA's MRO Completes 60,000 Trips Around Mars
The orbiting spacecraft is also about to set a record for data relayed from the Martian surface.
----------
Video: Fly over Mount Sharp on Mars
A new animation shows the region NASA's Curiosity rover is currently exploring and where its team wants to go in the future.
----------
Why This Martian Full Moon Looks Like Candy
For the first time, NASA's Mars Odyssey orbiter has caught the Martian moon Phobos during a full moon phase. Each color in this new image represents a temperature

---

## 2. JPL Mars Space Images - Featured Image
* Visit the url for JPL Featured Space Image here.
* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.
* Make sure to find the image url to the full size .jpg image.
* Make sure to save a complete url string for this image.

In [10]:
# Open Browser with splinter
executable_path = {'executable_path': chromedriver_path}
browser = Browser('chrome', **executable_path, headless=False)

In [11]:
# Define and visit URL
jpl_mars_image = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
base_jpl_url = 'https://www.jpl.nasa.gov'
browser.visit(jpl_mars_image)

In [12]:
# Create BS4 Object
html = browser.html
soup = bs(html, 'html.parser')

url = soup.find('div', class_='carousel_container')
featured_image_url = url.find('a')['data-fancybox-href']
featured_image_url

'/spaceimages/images/mediumsize/PIA17254_ip.jpg'

In [13]:
featured_image_url = base_jpl_url + featured_image_url
featured_image_url

'https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA17254_ip.jpg'

Scraped data variable to pass:
* featured_image_url

---

## Mars Weather
* Visit the Mars Weather twitter account here and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called mars_weather.

In [14]:
# Open Browser with splinter
executable_path = {'executable_path': chromedriver_path}
browser = Browser('chrome', **executable_path, headless=False)

In [15]:
# Define and visit URL
mars_weather_url = 'https://twitter.com/marswxreport?lang=en'
browser.visit(mars_weather_url)

In [16]:
# Create BS4 Object
html = browser.html
soup = bs(html, 'html.parser')

# Print html section for the latest tweet
tweet_container = soup.find('div', class_='js-tweet-text-container')
tweet_container

<div class="js-tweet-text-container">
<p class="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text" data-aria-label-part="0" lang="en">InSight sol 176 (2019-05-26) low -100.3ºC (-148.6ºF) high -19.9ºC (-3.9ºF)
winds from the W at 4.2 m/s (9.5 mph) gusting to 15.9 m/s (35.6 mph)
pressure at 7.50 hPa<a class="twitter-timeline-link u-hidden" data-pre-embedded="true" dir="ltr" href="https://t.co/rI1XSUC5yf">pic.twitter.com/rI1XSUC5yf</a></p>
</div>

In [17]:
# Get the text from that latest tweet and remove "\n chars"
tweet_container_text = tweet_container.p.text.strip()
tweet_container_text

'InSight sol 176 (2019-05-26) low -100.3ºC (-148.6ºF) high -19.9ºC (-3.9ºF)\nwinds from the W at 4.2 m/s (9.5 mph) gusting to 15.9 m/s (35.6 mph)\npressure at 7.50 hPapic.twitter.com/rI1XSUC5yf'

In [18]:
# Remove the href text(image description) from latest tweet to get only the Tweet
latest_tweet = tweet_container_text.replace(tweet_container.a.text,'')
latest_tweet

'InSight sol 176 (2019-05-26) low -100.3ºC (-148.6ºF) high -19.9ºC (-3.9ºF)\nwinds from the W at 4.2 m/s (9.5 mph) gusting to 15.9 m/s (35.6 mph)\npressure at 7.50 hPa'

In [19]:
# Remove "\n chars" found in the middle
latest_tweet = latest_tweet.replace('\n',' ')
latest_tweet

'InSight sol 176 (2019-05-26) low -100.3ºC (-148.6ºF) high -19.9ºC (-3.9ºF) winds from the W at 4.2 m/s (9.5 mph) gusting to 15.9 m/s (35.6 mph) pressure at 7.50 hPa'

Scraped data vaariable to pass:
* latest_tweet

---

## Mars Facts
* Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
* Use Pandas to convert the data to a HTML table string.

In [20]:
# Open Browser with splinter
executable_path = {'executable_path': chromedriver_path}
browser = Browser('chrome', **executable_path, headless=False)

In [21]:
# Define and visit URL
mars_fact_url = 'https://space-facts.com/mars/'
browser.visit(mars_fact_url)

In [22]:
# Extract Mars Hemispheres Web
html = browser.html
soup = bs(html, 'html.parser')

table = soup.find_all('tr')
table

[<tr class="row-1 odd">
 <td class="column-1"><strong>Equatorial Diameter:</strong></td><td class="column-2">6,792 km<br/>
 </td>
 </tr>, <tr class="row-2 even">
 <td class="column-1"><strong>Polar Diameter:</strong></td><td class="column-2">6,752 km<br/>
 </td>
 </tr>, <tr class="row-3 odd">
 <td class="column-1"><strong>Mass:</strong></td><td class="column-2">6.42 x 10^23 kg (10.7% Earth)</td>
 </tr>, <tr class="row-4 even">
 <td class="column-1"><strong>Moons:</strong></td><td class="column-2">2 (<a href="https://space-facts.com/phobos/">Phobos</a> &amp; <a href="https://space-facts.com/deimos/">Deimos</a>)</td>
 </tr>, <tr class="row-5 odd">
 <td class="column-1"><strong>Orbit Distance:</strong></td><td class="column-2">227,943,824 km (1.52 AU)</td>
 </tr>, <tr class="row-6 even">
 <td class="column-1"><strong>Orbit Period:</strong></td><td class="column-2">687 days (1.9 years)<br/>
 </td>
 </tr>, <tr class="row-7 odd">
 <td class="column-1"><strong>Surface Temperature: </strong></

In [23]:
# Get the text from the Mars Facts Table
for row in table:
    fact_name = row.find('td', class_='column-1')
    fact_value = row.find('td', class_='column-2')
    print(fact_name.text.strip())
    print(fact_value.text.strip())

Equatorial Diameter:
6,792 km
Polar Diameter:
6,752 km
Mass:
6.42 x 10^23 kg (10.7% Earth)
Moons:
2 (Phobos & Deimos)
Orbit Distance:
227,943,824 km (1.52 AU)
Orbit Period:
687 days (1.9 years)
Surface Temperature:
-153 to 20 °C
First Record:
2nd millennium BC
Recorded By:
Egyptian astronomers


In [24]:
# Get the elements of the Website Table in 2 lists
mars_facts = []
mars_facts_values = []
for row in table:
    fact_name = row.find('td', class_='column-1').text.strip()
    fact_value = row.find('td', class_='column-2').text.strip()

    mars_facts.append(fact_name)
    mars_facts_values.append(fact_value) 

print(mars_facts)
print(mars_facts_values)

['Equatorial Diameter:', 'Polar Diameter:', 'Mass:', 'Moons:', 'Orbit Distance:', 'Orbit Period:', 'Surface Temperature:', 'First Record:', 'Recorded By:']
['6,792 km', '6,752 km', '6.42 x 10^23 kg (10.7% Earth)', '2 (Phobos & Deimos)', '227,943,824 km (1.52 AU)', '687 days (1.9 years)', '-153 to 20 °C', '2nd millennium BC', 'Egyptian astronomers']


In [25]:
# Create Mars Facts DataFrame
data = {'Description':mars_facts, 'Value':mars_facts_values}
mars_facts_df = pd.DataFrame(data)
mars_facts_df

Unnamed: 0,Description,Value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [26]:
# Set index to Description
mars_facts_df = mars_facts_df.set_index('Description')

In [27]:
mars_facts_df

Unnamed: 0_level_0,Value
Description,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.42 x 10^23 kg (10.7% Earth)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.52 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-153 to 20 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [28]:
# DataFrame.to_html(https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.to_html.html)
mars_facts_df.to_html()

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Value</th>\n    </tr>\n    <tr>\n      <th>Description</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Equatorial Diameter:</th>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>Polar Diameter:</th>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>Mass:</th>\n      <td>6.42 x 10^23 kg (10.7% Earth)</td>\n    </tr>\n    <tr>\n      <th>Moons:</th>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>Orbit Distance:</th>\n      <td>227,943,824 km (1.52 AU)</td>\n    </tr>\n    <tr>\n      <th>Orbit Period:</th>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>Surface Temperature:</th>\n      <td>-153 to 20 °C</td>\n    </tr>\n    <tr>\n      <th>First Record:</th>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <th>Recorded By:</th>\n      <td>Egyptian astronomers</td>\n    </tr>

Scraped data to pass:
* mars_facts_df.to_html()

---

## Mars Hemispheres
* Visit the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres.
* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.
* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [29]:
# Open Browser with splinter
executable_path = {'executable_path': chromedriver_path}
browser = Browser('chrome', **executable_path, headless=False)

In [30]:
chromedriver_path

'/usr/local/bin/chromedriver'

In [31]:
# Base USGS Astrogeology site Link
base_usgs_link='https://astrogeology.usgs.gov'

In [32]:
# Define and visit URL
mars_hemispheres_url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(mars_hemispheres_url)

In [33]:
# Extract Mars Facts Table
html = browser.html
soup = bs(html, 'html.parser')

hem_links = soup.find_all('div', class_='item')

hemispheres_dict = {}
for link in hem_links:
    # Get the link to the hemisphere
    hemisphere_url = link.find('a')['href']
    title = link.find('h3').text
    
    print(hemisphere_url)
    

/search/map/Mars/Viking/cerberus_enhanced
/search/map/Mars/Viking/schiaparelli_enhanced
/search/map/Mars/Viking/syrtis_major_enhanced
/search/map/Mars/Viking/valles_marineris_enhanced


In [34]:
# OK
# Parse HTML
html = browser.html
soup = bs(html, 'html.parser')

hem_links = soup.find_all('div', class_='item')

# List to save the image data.
hemisphere_image_urls = []

for link in hem_links:
    title = link.find('h3').text
    # Get the link to the hemisphere
    hemisphere_url = link.find('a')['href']
    #print(title)
    key_word = title.split(' ')
    
    try:
        
        browser.click_link_by_partial_text(key_word[0])
        
        # Reload html and soup for the new window
        html = browser.html
        soup = bs(html, 'html.parser')
        
        # Extract the image URL
        image = soup.find('div', class_='downloads')        
        
        image_url = image.find('a')['href']
        
        # Save data to a Dictionary
        hemispheres_dict = {
            'Title':title,
            'Img_URL':image_url
        } 
        # Append new dictionary to list
        hemisphere_image_urls.append(hemispheres_dict)
        
    except:
        print('Cant click')


In [35]:
# OK - One of the links is extracted twice.
hemisphere_image_urls

[{'Title': 'Cerberus Hemisphere Enhanced',
  'Img_URL': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg'},
 {'Title': 'Schiaparelli Hemisphere Enhanced',
  'Img_URL': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_unenhanced.tif/full.jpg'},
 {'Title': 'Syrtis Major Hemisphere Enhanced',
  'Img_URL': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_unenhanced.tif/full.jpg'},
 {'Title': 'Valles Marineris Hemisphere Enhanced',
  'Img_URL': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_unenhanced.tif/full.jpg'}]

Scraped data to pass (Dictionary):
* hemisphere_image_urls