# Mission to Mars

![mission_to_mars](Images/mission_to_mars.jpg)


In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.
### NASA Mars News


 Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragragh Text. Assign the text to variables that you can reference later.

```python
# Example:
# news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

# news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."

In [9]:
import time
import requests
import pandas as pd
from splinter import Browser
from bs4 import BeautifulSoup
from selenium import webdriver
from difflib import SequenceMatcher
from selenium.webdriver.common.keys import Keys

In [11]:
article_url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'

In [12]:
response = requests.get(article_url)

In [13]:
soup = BeautifulSoup(response.text, 'html.parser')

In [14]:
title_results = soup.find_all('div', class_="content_title")

In [15]:
news_titles = []

for result in title_results:
    title_text = result.text.strip()
    news_titles.append(title_text)

In [17]:
news_titles

['Opportunity Hunkers Down During Dust Storm',
 'NASA Finds Ancient Organic Material, Mysterious Methane on Mars',
 'NASA Invests in Visionary Technology',
 'NASA is Ready to Study the Heart of Mars',
 'NASA Briefing on First Mission to Study Mars Interior',
 "New 'AR' Mobile App Features 3-D NASA Spacecraft"]

In [18]:
news_title = news_titles[0]
news_title

'Opportunity Hunkers Down During Dust Storm'

In [19]:
def find_latest_news_title(article_url):
    """Returns the latest News Article from the article url provided"""
    
    response = requests.get(article_url)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    title_results = soup.find_all('div', class_="content_title")
    
    news_titles = []

    for result in title_results:
        title_text = result.text.strip()
        news_titles.append(title_text)
    
    news_title = news_titles[0]
    return news_title

In [22]:
find_latest_news_title(article_url)


'Opportunity Hunkers Down During Dust Storm'

In [23]:
description_results = soup.find_all('div', class_="rollover_description_inner")

In [25]:
news_descriptions = []

for result in description_results:
    description_text = result.text.strip()
    news_descriptions.append(description_text)

In [26]:
news_descriptions

['NASA engineers attempted to contact the Opportunity rover today but did not hear back from the nearly 15-year old rover.',
 'NASA’s Curiosity rover has found evidence on Mars with implications for NASA’s search for life.',
 'NASA is investing in technology concepts, including several from JPL, that may one day be used for future space exploration missions.',
 'NASA is about to go on a journey to study the center of Mars.',
 'NASA’s next mission to Mars will be the topic of a media briefing Thursday, March 29, at JPL. The briefing will air live on NASA Television and the agency’s website.',
 "NASA spacecraft travel to far-off destinations in space, but a new mobile app produced by NASA's Jet Propulsion Laboratory, Pasadena, California, brings spacecraft to users."]

In [27]:
news_description = news_descriptions[0]
news_description

'NASA engineers attempted to contact the Opportunity rover today but did not hear back from the nearly 15-year old rover.'

In [31]:
def find_latest_news_description (article_url):
    """Returns a description of the latest news article from the url provided"""
    
    response = requests.get(article_url)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    description_results = soup.find_all('div', class_="rollover_description_inner")
    
    news_descriptions = []

    for result in description_results:
        description_text = result.text.strip()
        news_descriptions.append(description_text)
        
    news_description = news_descriptions[0]
    
    return news_description

In [32]:
find_latest_news_description(article_url)

'NASA engineers attempted to contact the Opportunity rover today but did not hear back from the nearly 15-year old rover.'

In [33]:
image_url = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"

In [34]:
!which chromedriver
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
browser.visit(image_url)

/usr/local/bin/chromedriver


In [35]:
def find_feature_image():
    """Returns feature image url from NASA's website"""
    
    image_url = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"
    
    !which chromedriver
    executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
    browser = Browser('chrome', **executable_path, headless=False)
    browser.visit(image_url)
    
    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')

    feature_images = soup.find_all('article', class_='carousel_item')

    def similar(a, b):
        return SequenceMatcher(None, a, b).ratio()
    
    image_tags = []

    for item in feature_images:
        target_item = str(item.a)
        split_target = target_item.split(" ")
        image_tags.append(split_target)

    text_list = []
    score_list = []

    for y in image_tags[0]:

        similarity = similar(y, 'data-fancybox-href="/spaceimages/images/')

        text_list.append(y)
        score_list.append(similarity)

    target_url = str(text_list[score_list.index(max(score_list))])
    target_url_list = target_url.split('"')

    beg_url = 'https://www.jpl.nasa.gov'

    featured_image_url = beg_url + target_url_list[1]

    return featured_image_url


In [36]:
find_feature_image()

/usr/local/bin/chromedriver


'https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA19046_ip.jpg'

In [37]:
twitter_url = "https://twitter.com/marswxreport?lang=en"
response = requests.get(twitter_url)
soup = BeautifulSoup(response.text, 'html.parser')
twitter_results = soup.body.find_all('div', class_="js-tweet-text-container")

In [38]:
recent_tweets = []

for tweet in twitter_results:
    recent = tweet.find_all('p', class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text")
    
    for tweet_text in recent:
        recent_tweets.append(tweet_text.text.strip())

In [39]:
most_recent_weather_tweet = recent_tweets[0]
most_recent_weather_tweet

'Puerto Rico has NEXRAD Doppler radar again, just in time for hurricane season. Well done @NWSSanJuanhttps://twitter.com/adamonzon/status/1008092827815903232\xa0…'

In [40]:
def find_most_recent_weather_tweet():
    """Returns most recent tweet about weather on Mars"""
    twitter_url = "https://twitter.com/marswxreport?lang=en"
    
    response = requests.get(twitter_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    twitter_results = soup.body.find_all('div', class_="js-tweet-text-container")
    
    recent_tweets = []

    for tweet in twitter_results:
        recent = tweet.find_all('p', class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text")
        for tweet_text in recent:
            recent_tweets.append(tweet_text.text.strip())
            
    most_recent_weather_tweet = recent_tweets[0]
    
    return most_recent_weather_tweet

In [42]:
find_most_recent_weather_tweet()

'Puerto Rico has NEXRAD Doppler radar again, just in time for hurricane season. Well done @NWSSanJuanhttps://twitter.com/adamonzon/status/1008092827815903232\xa0…'

In [43]:
mars_url = "https://space-facts.com/mars/"

In [44]:
mars_pd = pd.read_html(mars_url)
initial_mars_df = mars_pd[0]
renamed_mars_df = initial_mars_df.rename(columns={0 : 'Scientific Measures', 1 : 'Values'})
mars_df = renamed_mars_df.set_index('Scientific Measures')

In [47]:
mars_html_table = mars_df.to_html()
mars_html_table

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Values</th>\n    </tr>\n    <tr>\n      <th>Scientific Measures</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Equatorial Diameter:</th>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>Polar Diameter:</th>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>Mass:</th>\n      <td>6.42 x 10^23 kg (10.7% Earth)</td>\n    </tr>\n    <tr>\n      <th>Moons:</th>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>Orbit Distance:</th>\n      <td>227,943,824 km (1.52 AU)</td>\n    </tr>\n    <tr>\n      <th>Orbit Period:</th>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>Surface Temperature:</th>\n      <td>-153 to 20 °C</td>\n    </tr>\n    <tr>\n      <th>First Record:</th>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <th>Recorded By:</th>\n      <td>Egyptian astronomers</td>\n

In [48]:
mars_html_table.replace('\n', '')

'<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Values</th>    </tr>    <tr>      <th>Scientific Measures</th>      <th></th>    </tr>  </thead>  <tbody>    <tr>      <th>Equatorial Diameter:</th>      <td>6,792 km</td>    </tr>    <tr>      <th>Polar Diameter:</th>      <td>6,752 km</td>    </tr>    <tr>      <th>Mass:</th>      <td>6.42 x 10^23 kg (10.7% Earth)</td>    </tr>    <tr>      <th>Moons:</th>      <td>2 (Phobos &amp; Deimos)</td>    </tr>    <tr>      <th>Orbit Distance:</th>      <td>227,943,824 km (1.52 AU)</td>    </tr>    <tr>      <th>Orbit Period:</th>      <td>687 days (1.9 years)</td>    </tr>    <tr>      <th>Surface Temperature:</th>      <td>-153 to 20 °C</td>    </tr>    <tr>      <th>First Record:</th>      <td>2nd millennium BC</td>    </tr>    <tr>      <th>Recorded By:</th>      <td>Egyptian astronomers</td>    </tr>  </tbody></table>'

In [50]:
mars_df.to_html('mars_table.html')

In [52]:
def find_mars_hemisphere_images():
    """Returns image urls of Mars Hemispheres"""
    !which chromedriver
    executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
    browser = Browser('chrome', **executable_path, headless=False)
    hemispheres_url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
    browser.visit(hemispheres_url)

    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')

    description_class = soup.find_all('div', class_='description')

    hemisphere_names = []

    for hemispheres in description_class:
        hemisphere_names.append(hemispheres.find('h3').text)

    start_url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

    hemisphere_images_windows = []

    for hemispheres_image in hemisphere_images:
        browser.click_link_by_partial_text(hemispheres_image)
        hemispheres_url = browser.url
        new_page = soup.body.find_all('div', class_='container')
        for sample in new_page:
            browser.click_link_by_text('Sample')
        hemisphere_images_windows.append(browser.windows)
        browser.visit(start_url)
    full_hemisphere_images = []

    for full_images in hemisphere_images_windows[3]:
        full_hemisphere_images.append(full_images.url)

    full_hemisphere_image_urls = [
        {"title": hemisphere_names[3], "img_url": full_hemisphere_images[1]},
        {"title": hemisphere_names[2], "img_url": full_hemisphere_images[2]},
        {"title": hemisphere_names[1], "img_url": full_hemisphere_images[3]},
        {"title": hemisphere_names[0], "img_url": full_hemisphere_images[4]},
    ]
    
    return full_hemisphere_image_urls