# Web Scraping Challenge

## Part 1 - Scraping
### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.


In [17]:
# importing the dependencies that I think I will need
import requests
from pprint import pprint
from bs4 import BeautifulSoup as bs
from datetime import datetime as dt
import pandas as pd
from splinter import Browser
import os
import re

In [2]:
# Establishing the executable path of the chromedriver
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [6]:
# Getting the url and establishing the soup format thing 
url = 'https://mars.nasa.gov/news/'
response = requests.get(url)
browser.visit(url)
html = browser.html

In [7]:
soup_title = bs(response.text, 'html.parser')
soup_para = bs(html, 'html.parser')

title_not_clean = soup_title.find('div', class_='content_title').a.text
paragraph = soup_para.find('div', class_='article_teaser_body').text

print(title_not_clean)
print(paragraph)


Mars Helicopter Attached to NASA's Perseverance Rover

The team has learned to meet new challenges as they work remotely on the Mars mission.


In [8]:
title_clean = title_not_clean.replace("\n","")
browser.quit()

print(title_clean)
print(paragraph)

Mars Helicopter Attached to NASA's Perseverance Rover
The team has learned to meet new challenges as they work remotely on the Mars mission.


### JPL Mars Space Images - Featured Image

In [9]:
from splinter import Browser
from bs4 import BeautifulSoup as bs
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [10]:
url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)
html = browser.html
soup = bs(html, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<!--[if IE 9]> <html class="no-js ie ie9" lang="en"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie ie8" lang="en"> <![endif]-->
<html class="js flexbox canvas canvastext webgl no-touch geolocation postmessage websqldatabase indexeddb hashchange history draganddrop websockets rgba hsla multiplebgs backgroundsize borderimage borderradius boxshadow textshadow opacity cssanimations csscolumns cssgradients cssreflections csstransforms csstransforms3d csstransitions fontface generatedcontent video audio localstorage sessionstorage webworkers applicationcache svg inlinesvg smil svgclippaths -webkit-" style="" xmlns="http://www.w3.org/1999/xhtml">
 <!-- START HEADER: "DEFAULT" -->
 <!-- Google Tag Manager -->
 <head>
  <script async="" src="https://www.google-analytics.com/analytics.js" type="text/javascript">
  </script>
  <script src="https://m.addthis.com/live/red_lojson/300lo.json?si=5e967526042d1de1&amp;bkl=0&amp;bl=1&amp;pdt=1417&amp;sid=5e967526042d1de1&amp;pub=&amp;

In [11]:
article = soup.find('article', class_='carousel_item')
print(article)

<article alt="Galaxy in Different Lights" class="carousel_item" style="background-image: url('/spaceimages/images/wallpaper/PIA18840-1920x1200.jpg');">
<div class="default floating_text_area ms-layer">
<h2 class="category_title">
</h2>
<h2 class="brand_title">
				  FEATURED IMAGE
				</h2>
<h1 class="media_feature_title">
				  Galaxy in Different Lights				</h1>
<div class="description">
</div>
<footer>
<a class="button fancybox" data-description="The comparison from NASA's Hubble telescope and Chandra X-ray Observatory highlights how different the universe can look when viewed in other wavelengths of light. M82 is located 12 million light-years away in the Ursa Major constellation." data-fancybox-group="images" data-fancybox-href="/spaceimages/images/mediumsize/PIA18840_ip.jpg" data-link="/spaceimages/details.php?id=PIA18840" data-title="Galaxy in Different Lights" id="full_image">
					FULL IMAGE
				  </a>
</footer>
</div>
<div class="gradient_container_top"></div>
<div class="gra

In [12]:
image_extension  = article['style'].split("('", 1)[1].split("')")[0]
print(image_extension)

/spaceimages/images/wallpaper/PIA18840-1920x1200.jpg


In [13]:
featured_img_url = f'jpl.nasa.gov{image_extension}'
browser.quit()
print(featured_img_url)

jpl.nasa.gov/spaceimages/images/wallpaper/PIA18840-1920x1200.jpg


### Mars Weather

In [14]:
url = 'https://twitter.com/marswxreport'
response = requests.get(url)

In [15]:
soup = bs(response.text, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html data-scribe-reduced-action-queue="true" lang="en">
 <head>
  <meta charset="utf-8"/>
  <script nonce="gL4wOJsByI/mb5idA0hvig==">
   !function(){window.initErrorstack||(window.initErrorstack=[]),window.onerror=function(r,i,n,o,t){r.indexOf("Script error.")>-1||window.initErrorstack.push({errorMsg:r,url:i,lineNumber:n,column:o,errorObj:t})}}();
  </script>
  <script id="bouncer_terminate_iframe" nonce="gL4wOJsByI/mb5idA0hvig==">
   if (window.top != window) {
  window.top.postMessage({'bouncer': true, 'event': 'complete'}, '*');
}
  </script>
  <script id="swift_action_queue" nonce="gL4wOJsByI/mb5idA0hvig==">
   !function(){function e(e){if(e||(e=window.event),!e)return!1;if(e.timestamp=(new Date).getTime(),!e.target&&e.srcElement&&(e.target=e.srcElement),document.documentElement.getAttribute("data-scribe-reduced-action-queue"))for(var t=e.target;t&&t!=document.body;){if("A"==t.tagName)return;t=t.parentNode}return i("all",o(e)),a(e)?(document.addEventListener||(e=o(

In [22]:
weather = soup.find('p', class_='tweet-text').text
weather_update = re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', weather)
print(weather_update)

“It's classic, textbook NASA, We're presented with a problem and we figure out how to make things work. Mars isn't standing still for us; we're still exploring." MER Science Operations Team Chief. …


### Mars Facts

In [23]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [24]:
url = 'https://space-facts.com/mars/'
browser.visit(url)
html = browser.html

In [25]:
soup = bs(html, 'html.parser')
table_html = soup.find_all('table')
mars_facts = pd.read_html(html)[2]
mars_facts.columns = ['','Mars Values']
mars_facts.set_index('', inplace = True)
browser.quit()
mars_facts

Unnamed: 0,Mars Values
,
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.39 × 10^23 kg (0.11 Earths)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.38 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-87 to -5 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [26]:
mars_facts.to_html(os.path.join('templates','mars_facts.html'))

### Mars Hemispheres


In [27]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [30]:
url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url)
html = browser.html

In [33]:
soup = bs(html, 'html.parser')

In [34]:
results_item_list = soup.find('div', class_='collapsible results').find_all('div', class_='item')
print(results_item_list)

[<div class="item"><a class="itemLink product-item" href="/search/map/Mars/Viking/cerberus_enhanced"><img alt="Cerberus Hemisphere Enhanced thumbnail" class="thumb" src="/cache/images/dfaf3849e74bf973b59eb50dab52b583_cerberus_enhanced.tif_thumb.png"/></a><div class="description"><a class="itemLink product-item" href="/search/map/Mars/Viking/cerberus_enhanced"><h3>Cerberus Hemisphere Enhanced</h3></a><span class="subtitle" style="float:left">image/tiff 21 MB</span><span class="pubDate" style="float:right"></span><br/><p>Mosaic of the Cerberus hemisphere of Mars projected into point perspective, a view similar to that which one would see from a spacecraft. This mosaic is composed of 104 Viking Orbiter images acquired…</p></div> <!-- end description --></div>, <div class="item"><a class="itemLink product-item" href="/search/map/Mars/Viking/schiaparelli_enhanced"><img alt="Schiaparelli Hemisphere Enhanced thumbnail" class="thumb" src="/cache/images/7677c0a006b83871b5a2f66985ab5857_schiapar

In [35]:
hemisphere_image_urls = []
    
for h in results_item_list:
    title = h.find('h3', class_=None).text
    title = title.replace(" Enhanced","")
    url = h.find('a')['href']
    url_1 = f'https://astrogeology.usgs.gov{url}'
    browser.visit(url_1)
    html = browser.html
    soup_image = bs(html,'html5lib')
    img_url = soup_image.find("div", class_="downloads").a['href']
    hemisphere_image_urls.append({"title": title, "img_url": img_url})
    
browser.quit()

In [36]:
hemisphere_image_urls

[{'title': 'Cerberus Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg'},
 {'title': 'Schiaparelli Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg'},
 {'title': 'Syrtis Major Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg'},
 {'title': 'Valles Marineris Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg'}]