## Step 1 Scraping

This task will use BeautifulSoup,Pandas, and Requests and Splinter to scraping Mars related information 

In [1]:
# modules
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
from splinter import Browser
import time

#### NASA Mars News

We will scrape the lastest News Title and Paragragh Text from NASA Mars News Site(https://mars.nasa.gov/news/).

In [3]:
# scraped page url
url_one = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'
# Retrieve data with the requests module
res = requests.get(url_one)

In [4]:
# Creating a Beautiful Soup object
soup_one = bs(res.text, "html5lib")
type(soup_one)

bs4.BeautifulSoup

In [5]:
 # Extracting text from class="content_title" and clean up the text use strip
title_of_news = soup_one.find_all('div', class_='content_title')[0].find('a').text.strip()

#printing title to verify
print(title_of_news)

NASA Invests in Visionary Technology


In [6]:
 # Extracting paragraph from the class="rollover_description_inner" and clean up the text use strip
news_para = soup_one.find_all('div', class_='rollover_description_inner')[0].text.strip()

#printing paragraph to verify
print(news_para)

NASA is investing in technology concepts, including several from JPL, that may one day be used for future space exploration missions.


#### JPL Mars Space Images - Featured Image

Use splinter to navigate the JPL's Featured Space Image and scrape the current Featured Mars Image url (https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars)

In [7]:
# Executing Chromedriver
exe_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **exe_path, headless=False)

In [8]:
# scraped page url
url_two = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'

#Visiting the page
browser.visit(url_two)

In [9]:
# assigning html data
html = browser.html
# Creating a Beautiful Soup object
soup_two = bs(html, "html5lib")

In [10]:
#incomplete path of the url
incomplete_address = soup_two.find_all('a', class_='fancybox')[0].get('data-fancybox-href').strip()

In [11]:
#full address
image_url = "https://www.jpl.nasa.gov" + incomplete_address

#Printing to verify the full url
print(image_url)

#browse to verify url
browser.visit(image_url)

https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA00063_ip.jpg


#### Mars Weather

Use splinter to scrape the latest Mars weather tweet from the Mars Weather twitter account  (https://twitter.com/marswxreport?lang=en)

In [12]:
# Executing Chromedriver
path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **path, headless=False)

In [13]:
# scraped page url
url_three = 'https://twitter.com/marswxreport?lang=en'

#Visiting the page
browser.visit(url_three)

In [14]:
# assigning html data
html = browser.html
# Creating a Beautiful Soup object
soup_three = bs(html, "html5lib")

In [15]:
#latest Mars weather tweet
weather = soup_three.find_all('p', class_='TweetTextSize TweetTextSize--normal js-tweet-text tweet-text')[0].text

#printing to verify tweet
print(weather)

#InSight rising above the California fog on liftoff.https://twitter.com/birdsnspace/status/993603886106660864 …


#### Mars Facts

Use Pandas to scrape the table from Mars Facts webpage and convert the data to a HTML table string

In [16]:
# scraped page url
url_four = 'https://space-facts.com/mars/'

In [17]:
# getting the url table
table = pd.read_html(url_four)
table

[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.42 x 10^23 kg (10.7% Earth)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.52 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                  -153 to 20 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers]

In [18]:
# Converting list into dataframe
dataframe = table[0]

# modifying column name
dataframe.columns=['description','value']

# viewing dataframe
dataframe

Unnamed: 0,description,value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [19]:
#Setting index to column, description

dataframe.set_index('description', inplace=True)
dataframe

Unnamed: 0_level_0,value
description,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.42 x 10^23 kg (10.7% Earth)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.52 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-153 to 20 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [20]:
# saving dataframe as html file
dataframe.to_html('table.html')


#### Mars Hemisperes

USGS Astrogeology site to obtain high resolution images for each of Mar's hemispheres

In [21]:
# Executing Chromedriver
path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **path, headless=False)

In [22]:
# scraped page url
url_five = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

#Visiting the page
browser.visit(url_five)

In [23]:
# assigning html data
html = browser.html
# Creating a Beautiful Soup object
soup_five = bs(html,"html5lib")

In [24]:
# list to store:
images_of_hemisphere = []

In [25]:
# creating empty dictionary
dictionary = {}

In [26]:
# getting all the titles
findings = soup_five.find_all('h3')

In [27]:
# Loop through each finding
for finding in findings:
    # Get text info from finding
    item_alpha = finding.text
    time.sleep(1)    
    browser.click_link_by_partial_text(item_alpha)
    time.sleep(1)
    # assigning html data
    html_alpha = browser.html
    # Creating a Beautiful Soup object
    soup_alpha = bs(html_alpha,"html5lib")
    time.sleep(1)
    # image link
    link_alpha = soup_alpha.find_all('div', class_="downloads")[0].find_all('a')[0].get("href")
        # Pass title to dictionary
    time.sleep(1)
    dictionary["title"]=item_alpha
    # Pass url to dictionary
    dictionary["img_url"]=link_alpha
    # Append dictionary to the list 
    images_of_hemisphere.append(dictionary)
    # Cleaning up dictionary
    dictionary = {}
    browser.click_link_by_partial_text('Back')
    time.sleep(1)

In [28]:
# viewing List
images_of_hemisphere

[{'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'title': 'Cerberus Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'title': 'Schiaparelli Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'title': 'Syrtis Major Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'title': 'Valles Marineris Hemisphere Enhanced'}]