[//]: # (# <font color=lightblue>P</font><font color=maroon>r</font><font color=teal>o</font><font color=pink>j</font><font color=gold>e</font><font color=lightblue>c</font><font color=maroon>t</font> <font color=teal>G</font><font color=pink>o</font><font color=gold>a</font><font color=lightblue>l</font>)
# Project Goal
To build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page.
***
## Step 1 - Scraping

Initial scraping and analysis will be completed using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.
***

In [1]:
# Dependencies
import requests
import pandas as pd
import time
from splinter import Browser
from bs4 import BeautifulSoup as bs

executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

### NASA Mars News
Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest) and collect the latest News Title and Paragraph Text. Assign the text to variables for reference later.

###### Example:
`news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"`

`news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."`
***

In [2]:
# URL of page to be scraped
news_url = 'https://mars.nasa.gov/news/'

# Retrieve page with the splinter/browser module
browser.visit(news_url)

# Create BeautifulSoup object; parse with 'html.parser'
news_html = browser.html
news_soup = bs(news_html, 'lxml')

# Check to make sure news_soup is type BeautifulSoup
# type(news_soup)

# Examine the results, then determine element that contains sought info
# print(news_soup.prettify())

# Extract latest news article title and description, and save both into variables
news_title = news_soup.find('div', class_="content_title").text
news_p = news_soup.find('div', class_="rollover_description_inner").text

# Check results
# print(news_title)
# print(news_p)

### JPL Mars Space Images - Featured Image
Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).
Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.
Make sure to find the image url to the full size .jpg image.
Make sure to save a complete url string for this image.

###### Example:
`featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'`
***

In [3]:
# URL of page to be scraped
image_url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'

# Retrieve page with the splinter/browser module
browser.visit(image_url)
time.sleep(2)
browser.click_link_by_partial_text("FULL IMAGE")
time.sleep(2)
browser.click_link_by_partial_text("more info")
time.sleep(2)

# Create BeautifulSoup object; parse with 'html.parser'
image_html = browser.html
image_soup = bs(image_html, 'lxml')

# Check to make sure image_soup is type BeautifulSoup
# type(image_soup)

# Examine the results, then determine element that contains sought info
# print(image_soup.prettify())


# Extract featured image url and save into variable
buttons = image_soup.find_all('div', class_='download_tiff')
for button in buttons:
    if ('JPG' in button.text):
        featured_image_url = 'https:' + button.a['href']

# Check results
# print(featured_image_url)

### Mars Weather
Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called mars_weather.

###### Example:
`mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'`
***

In [4]:
# URL of page to be scraped
twitter_url = 'https://twitter.com/marswxreport?lang=en'

# Retrieve page with the splinter/browser module
browser.visit(twitter_url)

# Create BeautifulSoup object; parse with 'html.parser'
twitter_html = browser.html
twitter_soup = bs(twitter_html, 'lxml')

# Check to make sure twitter_soup is type BeautifulSoup
# type(twitter_soup)

# Examine the results, then determine element that contains sought info
# print(twitter_soup.prettify())

# Extract latest Mars weather tweet text, and save it into a variable
tweets = twitter_soup.find_all('p', class_='tweet-text')
# Some tweets are not weather updates, or contain more than one text section
for tweet in tweets:
    if tweet.text.startswith('InSight'):
        tweet_image_text_length = len(tweet.find('a', class_="twitter-timeline-link").text)
        mars_weather = tweet.text[:-tweet_image_text_length]
        break
    else:
        continue

# Check results
# print(mars_weather)

### Mars Facts
Visit the Mars Facts webpage [here](https://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
Use Pandas to convert the data to a HTML table string.
***

In [5]:
# URL of page to be scraped
facts_url = 'https://space-facts.com/mars/'

# Retrieve table with pandas 'read_html'
facts_table = pd.read_html(facts_url)
mars_table = facts_table[0]

# View table
# mars_table

# Convert table to html object
mars_table_html = mars_table.to_html(header=False, index=False)

# Check results
# print(mars_table_html)

### Mars Hemispheres
Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.
Will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.
Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.


###### Example:
`hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]`
***

In [6]:
# URL of page to be scraped
hemis_url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

# Retrieve page with the splinter/browser module
browser.visit(hemis_url)

# Create BeautifulSoup object; parse with 'html.parser'
hemis_html = browser.html
hemis_soup = bs(hemis_html, 'lxml')

# Check to make sure twitter_soup is type BeautifulSoup
# type(hemis_soup)

# Examine the results, then determine element that contains sought info
# print(hemis_soup.prettify())

# Extract titles and image urls of Mars hemispheres, and save them into list of dictionaries
# Create empty list
hemisphere_image_urls = []
hemispheres = hemis_soup.find_all('div', class_="item")

# Iterate through hemispheres
for hemis in hemispheres:
    # Create empty dictionary
    hemis_info = {}
    # Find title 
    hemis_title = hemis.div.a.h3.text
    hemis_info['title'] = hemis_title
    
    # Find image url by clicking on the title and then finding linked text to sample jpeg image
    browser.click_link_by_partial_text(hemis_title)
    time.sleep(1)
    hemis_img_html = browser.html
    hemis_img_soup = bs(hemis_img_html, 'lxml')
    hemis_img = hemis_img_soup.find('div', class_='downloads').ul.li.a['href']
    hemis_info['img_url'] = hemis_img
    
    # Add title and url dictionary to list of dictionaries
    hemisphere_image_urls.append(hemis_info)
    
    # Go back to main page in preparation for next img search
    browser.back()
    time.sleep(1)
    
# Check results
# print(hemisphere_image_urls)