# Nasa Mission to Mars 
## Jupyter lab notebook 
#### This notebook was created to facilitate the web scrapping portion of the assignment 
#### This notebook was developed by Jorge Daniel Atuesta May 2021

## Instructions

1. Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

2. JPL Mars Space Images - Featured Image
* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).
* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.
* Make sure to find the image url to the full size `.jpg` image.
* Make sure to save a complete url string for this image.

3. Mars Facts
* Visit the Mars Facts webpage [here](https://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
* Use Pandas to convert the data to a HTML table string.

4. Mars Hemispheres
* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.
* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.
* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

### Step 1
1. Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [25]:
# Importing the necessarcy dependencies 
from splinter import Browser
from bs4 import BeautifulSoup as bs
import time
import pandas as pd
import urllib.request
import requests
from webdriver_manager.chrome import ChromeDriverManager

In [5]:
# first step initiate browser
def init_browser():
    # Set up Splinter
    executable_path = {'executable_path': ChromeDriverManager().install()}
    browser = Browser('chrome', **executable_path, headless=False)
    return browser

In [9]:
# Initiate scrape for articel title and paragraph 
#For my flask application I will set this as a function called scrape in the following form:
#def scrape():
browser = init_browser()

# Visit https://mars.nasa.gov/news/
url= "https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest"
browser.visit(url)
    
# Set time sleep for it not to crash 
time.sleep(1)

# Scrape page into Soup
html = browser.html
soup = bs(html, "html.parser")

#News title
news_title = soup.find('div', class_= "content_title").text

#News paragraph
news_p = soup.find('div', class_ = "article_teaser_body").text
    
print(f"News_Titel:{news_title}")
print(f"News_Paragraph :{news_p}")

# Close the browser after scraping
browser.quit()



Current google-chrome version is 90.0.4430
Get LATEST driver version for 90.0.4430
Driver [C:\Users\danie\.wdm\drivers\chromedriver\win32\90.0.4430.24\chromedriver.exe] found in cache


News_Titel:Mars Now
News_Paragraph :The Red Planet rotorcraft will shift focus from proving flight is possible on Mars to demonstrating flight operations that future aerial craft could utilize.


### Step 2
2. JPL Mars Space Images - Featured Image
* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).
* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.
* Make sure to find the image url to the full size `.jpg` image.
* Make sure to save a complete url string for this image.

In [22]:
#Step 2 
# Scarpping the site for the feature photo and create a final url for it 
# For my falsk application this part of the code will join with the code on the previouse cell above the "browser.quit() " funciton 
# Scraping the image name 
browser = init_browser()
url = 'https://spaceimages-mars.com/'
browser.visit(url)

# Set time sleep for it not to crash 
time.sleep(1)

# Scrape page into Soup
html = browser.html
soup = bs(html, "html.parser")

#scrapping for images name
#image_name = soup.find('div', class_='floating_text_area')

image_name = soup.find('img', class_= 'headerimage fade-in')['src']
print(image_name)

# Close the browser after scraping
browser.quit()



Current google-chrome version is 90.0.4430
Get LATEST driver version for 90.0.4430
Driver [C:\Users\danie\.wdm\drivers\chromedriver\win32\90.0.4430.24\chromedriver.exe] found in cache


image/featured/mars1.jpg


In [23]:
featured_image_url = url + image_name
print(f"The image complete url for the featured image is: {featured_image_url}")

The image complete url for the featured image is: https://spaceimages-mars.com/image/featured/mars1.jpg


### Step 3
3. Mars Facts
* Visit the Mars Facts webpage [here](https://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
* Use Pandas to convert the data to a HTML table string.

In [26]:
# Step 3 
#Scrapping the data on the table using pandas and converting it into an html file 
#visit https://space-facts.com/mars/

#url
url = 'https://space-facts.com/mars/'

# using padnas to read the html 
tables = pd.read_html(url)
tables


[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.38 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                   -87 to -5 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers,
   Mars - Earth Comparison             Mars            Earth
 0               Diameter:         6,779 km        12,742 km
 1                   Mass:  6.39 × 10^23 kg  5.97 × 10^24 kg
 2                  Moons:                2                1
 3      Distance from Sun:   227,943,824 km   149,598,262 km
 4         Length of Year:   687 Earth days      365.24 days
 5            Temperature:     -87 to -5 °C      -88 to 58°C,
           

In [27]:
#using pandas to extract only the table for mars facts [0]
mars_df = tables[0]
mars_df

Unnamed: 0,0,1
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [29]:
#Setting the column names for mars_df
mars_df.columns = ['Mars_Planet_Profile', 'Values']
mars_df

Unnamed: 0,Mars_Planet_Profile,Values
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [30]:
#Turning df into html table
html_table = mars_df.to_html(header = None, index = False)
html_table.replace('\n','')
html_table

'<table border="1" class="dataframe">\n  <tbody>\n    <tr>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <td>Mass:</td>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <td>Moons:</td>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <td>Orbit Distance:</td>\n      <td>227,943,824 km (1.38 AU)</td>\n    </tr>\n    <tr>\n      <td>Orbit Period:</td>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <td>Surface Temperature:</td>\n      <td>-87 to -5 °C</td>\n    </tr>\n    <tr>\n      <td>First Record:</td>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <td>Recorded By:</td>\n      <td>Egyptian astronomers</td>\n    </tr>\n  </tbody>\n</table>'

In [31]:
#Creating table.html file in repo
mars_df.to_html('table.html')

### Step 4
4. Mars Hemispheres
* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.
* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.
* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [39]:
#Step 4
browser = init_browser()
url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url)

# Set time sleep for it not to crash 
time.sleep(1)

# Scrape page into Soup
html = browser.html
soup = bs(html, "html.parser")

#scrapping 

product = soup.find('div', class_ = 'collapsible results')
hemisphere_items = product.find_all('div', class_='item')
hemisphere_img_urls = []

#Creating conditional while scrapping 

for i in hemisphere_items:
    #Create a try statement
    try:
        hemisphere_title = i.find('h3').text
        hemisphere_href = i.find('a')['href']
        hemisphere_url = "https://astrogeology.usgs.gov/" + hemisphere_href
        hemisphere_page = requests.get(hemisphere_url).text
        soup = bs(hemisphere_page, 'html.parser')
        
        hemisphere_page_img = soup.select('#wide-image > div > ul > li:nth-child(1) > a')
        hemisphere_img_url = hemisphere_page_img[0]['href']
        hemisphere_img_dict = {"title": hemisphere_title, "img_url": hemisphere_img_url}
        hemisphere_img_urls.append(hemisphere_img_dict)
        
    #Except clause
    except Exception as e:
        print(e)

# Close the browser after scraping
browser.quit()

# Print hemispheres img title and url 
hemisphere_img_urls






Current google-chrome version is 90.0.4430
Get LATEST driver version for 90.0.4430
Driver [C:\Users\danie\.wdm\drivers\chromedriver\win32\90.0.4430.24\chromedriver.exe] found in cache


[{'title': 'Cerberus Hemisphere Enhanced',
  'img_url': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg'},
 {'title': 'Schiaparelli Hemisphere Enhanced',
  'img_url': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg'},
 {'title': 'Syrtis Major Hemisphere Enhanced',
  'img_url': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg'},
 {'title': 'Valles Marineris Hemisphere Enhanced',
  'img_url': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg'}]