# Mission to Mars


In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.


In [1]:
# Dependencies
import pymongo
from bs4 import BeautifulSoup as bs
import requests
from splinter import Browser
from splinter.exceptions import ElementDoesNotExist
import time
import pandas as pd
import scrape_mars

# Open and execute path
executable_path = {'executable_path': 'chromedriver'}
browser = Browser('chrome', **executable_path, headless=True)


In [None]:
mars_data = {}

In [None]:
# URL of page to be scraped
url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'


In [None]:
# Browser visit
browser.visit(url)


In [None]:
# Create BeautifulSoup object; parse with 'html.parser'
html = browser.html
soup = bs(html, 'html.parser')


In [None]:
# Examine the results, then determine element that contains sought info
# print(soup.prettify())


In [None]:
# Collect the latest News Title and Paragraph Text
# Assign the text to variables
news_title = soup.find("div", class_='content_title').text
mars_data['news_title'] = news_title
# print(news_title)

In [None]:
mars_data

In [None]:
# news_p = soup.find("div", class_='rollover_description_inner').text
news_p = soup.find("div", class_='article_teaser_body').text
print(news_p)

### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```

In [None]:
# URL of page to be scraped
url2 = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'


In [None]:
# Browser visit
browser.visit(url2)


In [None]:
# Click lead image
browser.click_link_by_partial_text('FULL IMAGE')
time.sleep(2)


In [None]:
# Click again for full image
browser.click_link_by_partial_text('more info')

In [None]:
# Design an XPATH selector to grab the featured image
xpath = '//figure//a'

In [None]:
# Use splinter to click the featured image and bring up the full resolution image
results = browser.find_by_xpath(xpath)
img = results[0]
img.click()

In [None]:
# Retrieve final image URL
html = browser.html
soup = bs(html, 'html.parser')
featured_image_url = soup.find("img")["src"]
featured_image_url

### Mars Weather

* Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

```python
# Example:
mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'
```

In [None]:
# URL of page to be scraped
url3 = 'https://twitter.com/marswxreport?lang=en'

In [None]:
# Browser visit
browser.visit(url3)
html = browser.html
# Create soup object
soups = bs(html, 'html.parser')


In [None]:
# mars_weather = soups.find("li", class_="js-stream-item").find("p", class_="TweetTextSize").text

In [None]:
# Scrape first tweet
mars_weather = soups.find("li", class_="js-stream-item").find("p", class_="tweet-text").text
print(mars_weather)

### Mars Facts

* Visit the Mars Facts webpage [here](http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.

In [None]:
# URL to scrape
url4='http://space-facts.com/mars/'


In [None]:
# Use read_html to read the data
tables = pd.read_html(url4)
tables


In [None]:
type(tables)


In [None]:
# Slice off DataFrame using normal indexing
df = tables[0]
df.columns = ['0', '1']
df.head()


In [None]:
# Rename columns so that they make sense
# df.rename(index=str, columns={"0": "Description", "1": "Mars Facts"})
df.columns = ['Metric', 'Value']
df

In [None]:
# Set columns to index
table_html = df.set_index(['Metric', 'Value'], inplace=True)


In [None]:
table_html = df
table_html

In [None]:
# Generate HTML tables from Pandas using to_html method
table_html = df.to_html(classes="table table-striped")


In [None]:
# Clean up table
table_html = table_html.replace('\n', '')
table_html

In [None]:
# Save to a file
df.to_html('tables.html')
print(table_html)


In [None]:
# OSX Users can run this to open the file in a browser, 
# or you can manually find the file and open it in the browser
!open table.html

### Mars Hemispheres

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mars' hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.


In [None]:
# URL of page to be scraped
url5 = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

# Browser visit
browser.visit(url5)

# Create a Beautiful Soup object
html = browser.html
soup = bs(html, 'html.parser')

# Child website links for each hemisphere
base_url = "https://astrogeology.usgs.gov"
links = [base_url + item.find(class_="description").a["href"] for item in soup.find_all("div", class_="item")]


In [None]:
# Extract hemisphere title and web URL for each image
hemisphere_image_urls = []

for url in links:
    
    # from url to soup
    browser.visit(url)
    html = browser.html
    soup = bs(html, 'html.parser')
    
    # Extract data
    title = soup.find("div", class_="content").find("h2", class_="title").text.replace(" Enhanced", "")
    img_url = base_url + soup.find("img", class_="wide-image")["src"]
    
    # Store in list
    hemisphere_image_urls.append({"title": title, "img_url": img_url})


In [None]:
hemisphere_image_urls

In [None]:
# Quit browser
browser.quit()


## Step 2 - MongoDB and Flask Application

Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

  * Store the return value in Mongo as a Python dictionary.

* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

<strong> Please refer to scrape_mars.py, app.py, and index.html for this part </strong>


![final_app_part1.png](Images/final_app_part1.png)
![final_app_part2.png](Images/final_app_part2.png)

- - -

## Step 3 - Submission

To submit your work to BootCampSpot, create a new GitHub repository and upload the following:

1. The Jupyter Notebook containing the scraping code used.

2. Screenshots of your final application.

3. Submit the link to your new repository to BootCampSpot.

## Hints

* Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

* Use Pymongo for CRUD applications for your database. For this homework, you can simply overwrite the existing document each time the `/scrape` url is visited and new data is obtained.

* Use Bootstrap to structure your HTML template.

In [2]:
print(scrape_mars.scrape())

visiting https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest
NASA's social media presence, the InSight mission social media accounts, NASA.gov and SolarSystem.NASA.gov will be honored at the 2019 Webby Awards - "the Oscars of the Internet."
visiting: https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars
visiting https://twitter.com/marswxreport?lang=en
InSight sol 144 (2019-04-23) low -98.7ºC (-145.7ºF) high -17.6ºC (0.4ºF)
winds from the SW at 4.2 m/s (9.5 mph) gusting to 11.1 m/s (24.8 mph)
pressure at 7.40 hPapic.twitter.com/ZbFNWx1Eq6
visiting https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars
getting mars data...
{'featured_image_url': 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA19083_hires.jpg',
 'hemisphere_image_urls': [{'img_url': 'https://astrogeology.usgs.gov/cache/images/cfa62af2557222a02478f1fcd781d445_cerberus_enhanced.tif_ful