# Mission 2 Mars
**This notebook outlines scraping various websites related to Nasa's information on Mars.**  
Several blocks of example code have been included that do not pertain to scraping information on Mars but illustrate coding concepts which are pertinent to the code required for the task. Some examples illustrate concepts that would be preferred but did not get fully developed.

In [1]:
from bs4 import BeautifulSoup as bs
#import splinter
from splinter import Browser

In [2]:
## To check modules in VirtEnv
!pip freeze | grep selenium
## To check Chrome Path
path = !which chromedriver
path

selenium==3.141.0


['/usr/local/bin/chromedriver']

In [3]:
import os
import time
import requests
import pandas as pd
import datetime

### Function for getting path to 'chromedriver'

In [4]:
def get_path(target_file):
    path=os.getenv('PATH')
    for file_path in path.split(os.path.pathsep):
        file_path=os.path.join(file_path,target_file)
        if os.path.exists(file_path) and os.access(file_path,os.X_OK):
            return file_path
get_path('chromedriver')

'/usr/local/bin/chromedriver'

### Commented out code for hard coding path to chromedriver

In [5]:
# Set the executable path and initialize the chrome browser in splinter
# chromedriver path is '/usr/local/bin/chromedriver'
# executable_path = {'executable_path': '/usr/local/bin/chromedriver' }

# browser = Browser('chrome', **executable_path)

### Using get_path function on chromedriver 

In [6]:
# Set the executable path and initialize the chrome browser in splinter
path = get_path('chromedriver')
executable_path = {'executable_path': path}
browser = Browser('chrome', **executable_path)

## Better functionality
### Web Requests with out Chromedriver
Sample code below. Scraping list of Country Codes with Requests and BeautifulSoup


In [7]:
url = "https://www.nationsonline.org/oneworld/country_code_list.htm"

response = requests.get(url)

# Parse HTML with Beautiful Soup
soup = bs(response.text, 'html.parser')

html_data = soup.find_all('tr', class_="border1")

result = []
for i in html_data:
    result.append(i.text)

dict_= {"Country":[],
    "Alpha_2":[],
    "Alpha_3_Code":[],
    "UN_Code":[]}

for i in result:
    split_list = i.split("\n")
    if len(split_list)>6:
        dict_["Country"].append(split_list[2])
        dict_["Alpha_2"].append(split_list[3])
        dict_["Alpha_3_Code"].append(split_list[4])
        dict_["UN_Code"].append(split_list[5])

countryISO_df = pd.DataFrame(dict_)

In [8]:
countryISO_df.head()

Unnamed: 0,Country,Alpha_2,Alpha_3_Code,UN_Code
0,Aland Islands,AX,ALA,248
1,Albania,AL,ALB,8
2,Algeria,DZ,DZA,12
3,American Samoa,AS,ASM,16
4,Andorra,AD,AND,20


### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

#### Code to use 'request' to get html instead of 'splinter' . 
**'Respons' gets noneType object.**  
**Need to look further at data types.**
```python
# Set URL for Nasa news
nasa_url = 'https://mars.nasa.gov/news/'
# browser.visit(nasa_url)
response = requests.get(nasa_url)

#Create BeautifulSoup object out of webpage html
soup = bs(response.text, 'html.parser')
```

In [9]:
# Set URL for Nasa news
nasa_url = 'https://mars.nasa.gov/news/'
browser.visit(nasa_url)

#assign string html splinter response to local variable
nasa_html= browser.html

#Create BeautifulSoup object out of webpage html
soup = bs(nasa_html, 'html.parser')

In [10]:
# for link in soup.find_all('li'):
#     print(link)

#Isolate unordered list containing news stories.
nasa_news = soup.find('ul', class_='item_list')

# #Isolate the List Item/s from the 'ul' container
news_stories = nasa_news.find_all('li')
# news_stories = nasa_news.find('li')
#news_stories

first_story = news_stories[0]
### #first_story
# Identify and return title
news_title = first_story.find('div', class_ = "bottom_gradient").h3.text
# Identify and story summary
news_p = first_story.find('div', class_ = "article_teaser_body").text


# Print results for title and paragraph
print('-------------')
print(f'Title: {news_title}:')
print(f'Discription: >>> {news_p} <<<')
print('-------------')

-------------
Title: A Rover Pit Stop at JPL:
Discription: >>> Working like a finely honed machine, a team of engineers in this time-lapse video clip install test wheels on another finely honed machine: NASA's Mars 2020 rover. <<<
-------------


### Getting the full text for the article
#### Not Required for assignment

In [11]:
# Get link for full Article (not required)
news_link_div = first_story.find('div', class_='image_and_description_container')


news_link = (news_link_div.find('a')['href'])
#news_link

# Navigate to bowser for full article (no required)
browser.click_link_by_href(news_link)

#Get text for full article (not required)
splinter_new_html = browser.html

new_soup_html = bs(splinter_new_html, "html.parser")
new_soup_html.find('div', class_='wysiwyg_content').get_text()


'\n\n\n\nPit Crew for Mars: NASA\'s Mars 2020 Rover Gets Some Wheels (time lapse): A team of engineers at NASA\'s Jet Propulsion Laboratory in Pasadena, California, install the legs and wheels — otherwise known as the mobility suspension — on the Mars 2020 rover. The imagery for this accelerated time-lapse was taken on June 13, 2019, from a camera above the Spacecraft Assembly Facility\'s High Bay 1 clean room. Credit: NASA/JPL-Caltech. Video download ›\n\n\nConstructing an exquisitely complex vehicle like the Mars 2020 rover takes serious teamwork. On June 13, 2019, more than a dozen "bunny suit"-clad engineers rolled past another milestone in the clean room of the Spacecraft Assembly Facility at NASA\'s Jet Propulsion Laboratory in Pasadena, California, when they integrated the rover\'s legs and wheels.\nThe Mars 2020 team could pass for a pit crew in this video clip, which has been sped up by 300% and focuses on the major activities that took place the day the wheels were installed.

## Looping through all stories on first page of Nasa website.
#### Not required for assignment

In [12]:
# Loop through news story list items and return printable results.
for story in news_stories:
    # Error handling
    try:
        # Identify and return title
        news_title = story.find('div', class_ = "bottom_gradient").h3.text
        # Identify and story summary
        news_p = story.find('div', class_ = "article_teaser_body").text
        

        # Print results only if title, price, and link are available
        if (news_title and news_p):
            print('-------------')
            print(f'Title: {news_title}:')
            print(f'Discriiption: >>> {news_p} <<<')
    except AttributeError as e:
        print(e)
        

-------------
Title: A Rover Pit Stop at JPL:
Discriiption: >>> Working like a finely honed machine, a team of engineers in this time-lapse video clip install test wheels on another finely honed machine: NASA's Mars 2020 rover. <<<
-------------
Title: Mars 2020 Rover Gets a Super Instrument:
Discriiption: >>> With its rock-zapping laser, the SuperCam will enable the science team to identify the chemical and mineral makeup of its targets on the Red Planet. <<<
-------------
Title: A Neil Armstrong for Mars: Landing the Mars 2020 Rover:
Discriiption: >>> NASA's newest rover will have an autopilot called Terrain-Relative Navigation. <<<
-------------
Title: NASA's InSight Uncovers the 'Mole' :
Discriiption: >>> The lander's robotic arm has successfully removed a piece of hardware blocking the view of its digging device in order to help with recovery efforts. <<<
-------------
Title: Mars 2020 Rover's 7-Foot-Long Robotic Arm Installed:
Discriiption: >>> The main robotic arm has been insta

## JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```

### Set up JPL url with base_url to be used later on

**jpl_url** set to: https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars

In [13]:
# Set base url to be used later.
base_url = 'https://www.jpl.nasa.gov'
# Go to search results for Mars
jpl_url = f'{base_url}/spaceimages/?search=&category=Mars'
image_html= browser.visit(jpl_url)


### Primary Method for getting JPL Mars Space Images

In [14]:
base_url = 'https://www.jpl.nasa.gov'
jpl_url = f'{base_url}/spaceimages/?search=&category=Mars'

browser.visit(jpl_url)
#image_html= browser.visit(jpl_url)

jpl_a_html = browser.html


parsed_a_html = bs(jpl_a_html, 'html.parser')


description_page = parsed_a_html.find_all('a', class_='button fancybox')[0]['data-link']

#Concatenate route and base url
description_url = base_url + description_page

#Open new url using splinter
browser.visit(description_url)

#Create HTML object
image = browser.html

#Parse HTML object with BeautifulSoup
soup = bs(image, 'html.parser')

#Retrieve route to full-size image 
img = soup.find('img', class_="main_image")['src']

#Concatenate route with base url
featured_image_url = base_url + img
print(featured_image_url)


https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA19674_hires.jpg


### Navigating to image to get url (not recommended method)
(this section of code commented out for reason's described below)

In [15]:
## The Python environment for running this section requires specification 
## that were not recorded and need to be determined for proper configuration.

# # Navigating to image and then getting url.
# image_link = browser.find_link_by_partial_text('FULL IMAGE')
# print(image_link)
# image_link.click()
# #Getting html from page
# jpl_2_html = browser.html
# parsed_html = bs(jpl_2_html, 'html.parser')
# #Finding div container with URL
# grab_div = parsed_html.find('div', class_ = "fancybox-inner")
# src_url = (grab_div.find('img')['src'])
# print(src_url)
# jpl_img_url = f'{base_url}{src_url}'
# browser.visit(jpl_img_url)
# print(jpl_img_url)

### Example Code
**Sample Code** for understanding Soup parsing

In [16]:
html = '''
<img src="smiley.gif" alt="Smiley face" height="42" width="42">'''
soup = bs(html)
images = soup.find('img')
print(images['src']) #smiley.gif
#############
## Example code
# url = browser.visit('http://www.python.org')

# python_html = browser.html
# soup = bs(python_html, 'html.parser')
# srcs = [img['src'] for img in soup.find_all('img')]
# srcs
#############
## Getting image paths from webpage. Combine with base URL for image url
# url = browser.visit('https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars')

# test_html = browser.html
# soup = bs(test_html, 'html.parser')
# srcs = [img['src'] for img in soup.find_all('img')]
# srcs
##############

smiley.gif


## Mars Weather

In [17]:
#Set Mars twitter URL
mars_twitter_url = "https://twitter.com/marswxreport?lang=en"

#Navigate to Mars twitter URL.
browser.visit(mars_twitter_url)

#Pull HTML from twitter page.
mars_twitter_html = browser.html

#Use Beautiful Soup to parse html from browser.
mars_twitter_soup = bs(mars_twitter_html, "html.parser")

#Find most recent tweet of weather.
mars_tweet= mars_twitter_soup.find('div', class_='js-tweet-text-container').find('p').get_text()

mars_tweet

'The #Mars2020 rover gets her wheels from the pit crew with PhDs\nhttps://mars.nasa.gov/mars2020/mission/where-is-the-rover/\xa0…pic.twitter.com/DRHHZpRI3f'

### Mars Hemispheres

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

```python
# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]
```

- - -

In [18]:
# Set url for USGS
usgs_start_url= 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

#Go to initial page
browser.visit(usgs_start_url)
time.sleep(.01)

#Set list variable
hemisphere_image_urls = []

print("------\nOpening Browser\n------")

#List of WebDriveElements
usgs_items = browser.find_by_css("a.itemLink h3")

for i in range(len(usgs_items)):
    print(f"------\nImage Loop {i+1}\n------")
    usgs_items = browser.find_by_css("a.itemLink h3")
    usgs_items[i].click()
    time.sleep(.1)

    #From image page get URL for image.
    li_item = browser.find_by_css("div.downloads li").first
    to_soup = bs(li_item.html, "html.parser")
    usgs_img_url = to_soup.find('a')['href']

    #From image page get Title 
    title_item = browser.find_by_css("div.content h2").text
    
    #print items
    usgs_dict = { 'title' : title_item, 'img_url': usgs_img_url}
    hemisphere_image_urls.append(usgs_dict)
    
    #Go to initial page
    browser.visit(usgs_start_url)
    time.sleep(.1)

print('Program Complete\n\nFound following images:')
for i in range(len(hemisphere_image_urls)):
    print(f'{i+1}.) {hemisphere_image_urls[i]["title"]}')

------
Opening Browser
------
------
Image Loop 1
------
------
Image Loop 2
------
------
Image Loop 3
------
------
Image Loop 4
------
Program Complete

Found following images:
1.) Cerberus Hemisphere Enhanced
2.) Schiaparelli Hemisphere Enhanced
3.) Syrtis Major Hemisphere Enhanced
4.) Valles Marineris Hemisphere Enhanced


In [19]:
current_dt = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
current_dt

'2019-07-13 16:50'

In [20]:
## Dialing in the print out.

# print('Program Complete\n\nFound following images:')
# for i in range(len(hemisphere_image_urls)):
#     print(f'{i+1}.) {hemisphere_image_urls[i]["title"]}')