# Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

In [2]:
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import os
import pymongo
import time
from webdriver_manager.chrome import ChromeDriverManager
from splinter import Browser


In [3]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)




[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST chromedriver version for 103.0.5060 google-chrome
[WDM] - There is no [win32] chromedriver for browser 103.0.5060 in cache
[WDM] - About to download new driver from https://chromedriver.storage.googleapis.com/103.0.5060.134/chromedriver_win32.zip
[WDM] - Driver has been saved in cache [C:\Users\carly\.wdm\drivers\chromedriver\win32\103.0.5060.134]


# NASA Mars News

* Scrape the [Mars News Site](https://redplanetscience.com) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

```python
# Example:
news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."
```

In [113]:
# URL of page to be scraped
news_url = 'https://redplanetscience.com/'
browser.visit(news_url)

# Create BeautifulSoup object; parse with 'html.parser'
news_soup = bs(browser.html, 'html.parser')

# Find parenmt division class
parent_division_text = news_soup.select_one('div.list_text')

In [114]:
# scrape the article header 
content_title_results = parent_division_text.find('div', class_="content_title")
content_title_results

<div class="content_title">MOXIE Could Help Future Rockets Launch Off Mars</div>

In [115]:
# scrape the article subheader
article_teaser_results = parent_division_text.find('div', class_="article_teaser_body")
article_teaser_results

<div class="article_teaser_body">NASA's Perseverance rover carries a device to convert Martian air into oxygen that, if produced on a larger scale, could be used not just for breathing, but also for fuel.</div>

In [116]:
browser.quit()

# JPL Mars Space Images - Featured Image

* Visit the url for the Featured Space Image page [here](https://spaceimages-mars.com).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://spaceimages-mars.com/image/featured/mars2.jpg'
```

In [4]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)




[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST chromedriver version for 103.0.5060 google-chrome
[WDM] - Driver [C:\Users\carly\.wdm\drivers\chromedriver\win32\103.0.5060.134\chromedriver.exe] found in cache


In [5]:
# URL of page to be scraped
url_image = 'https://spaceimages-mars.com/'
browser.visit(url_image)

In [6]:
browser.links.find_by_partial_text('FULL IMAGE').click()

In [7]:
html_image = browser.html
image_soup = bs(html_image, 'html.parser')

featured_image = image_soup.find('img', class_='fancybox-image')['src']

In [8]:
featured_image_url = url_image + featured_image

In [9]:
browser.quit()

# Mars Facts

* Visit the Mars Facts webpage [here](https://galaxyfacts-mars.com) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.



In [52]:
facts_url = 'https://galaxyfacts-mars.com/'

In [53]:
# Use Panda's `read_html` to parse the url
tables = pd.read_html(facts_url)

In [67]:
# Find the relevant dataframe
mars_facts_df = pd.DataFrame(tables[1])

In [70]:
# Change column headers
mars_facts_df.columns = ['Measures', 'Mars Planet Profile']

In [72]:
# Drop the first row and set the index to the `Measures` column
mars_facts_df = mars_facts_df.iloc[1:]
mars_facts_df.set_index('Measures', inplace=True)
mars_facts_df

Unnamed: 0_level_0,Mars Planet Profile
Measures,Unnamed: 1_level_1
Polar Diameter:,"6,752 km"
Mass:,6.39 × 10^23 kg (0.11 Earths)
Moons:,2 ( Phobos & Deimos )
Orbit Distance:,"227,943,824 km (1.38 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-87 to -5 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [73]:
mars_facts_html = mars_facts_df.to_html()

In [76]:
print(mars_facts_html)

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Mars Planet Profile</th>
    </tr>
    <tr>
      <th>Measures</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Polar Diameter:</th>
      <td>6,752 km</td>
    </tr>
    <tr>
      <th>Mass:</th>
      <td>6.39 × 10^23 kg (0.11 Earths)</td>
    </tr>
    <tr>
      <th>Moons:</th>
      <td>2 ( Phobos &amp; Deimos )</td>
    </tr>
    <tr>
      <th>Orbit Distance:</th>
      <td>227,943,824 km (1.38 AU)</td>
    </tr>
    <tr>
      <th>Orbit Period:</th>
      <td>687 days (1.9 years)</td>
    </tr>
    <tr>
      <th>Surface Temperature:</th>
      <td>-87 to -5 °C</td>
    </tr>
    <tr>
      <th>First Record:</th>
      <td>2nd millennium BC</td>
    </tr>
    <tr>
      <th>Recorded By:</th>
      <td>Egyptian astronomers</td>
    </tr>
  </tbody>
</table>


# Mars Hemispheres

* Visit the Astrogeology site [here](https://marshemispheres.com) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

```python
# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]
```

In [11]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

hemisphere_url = 'https://marshemispheres.com/'
browser.visit(hemisphere_url)

hemisphere_image_urls = []




[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST chromedriver version for 103.0.5060 google-chrome
[WDM] - Driver [C:\Users\carly\.wdm\drivers\chromedriver\win32\103.0.5060.134\chromedriver.exe] found in cache


In [12]:

for x in range(0, 4):
    browser.find_by_css('img.thumb')[x].click()
    
    hemisphere_soup = bs(browser.html, 'html.parser')

    hemisphere_image = hemisphere_soup.find('img', class_='wide-image')['src']
    hemisphere_image_url = hemisphere_url + hemisphere_image
    
    hemisphere_title = hemisphere_soup.find('h2', class_='title').text
    
    title_and_url = {"title": hemisphere_title, 
                    "img_url": hemisphere_image_url}
    
    hemisphere_image_urls.append(title_and_url)
    
    browser.links.find_by_partial_text('Back').click()

In [13]:
hemisphere_image_urls

[{'title': 'Cerberus Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg'},
 {'title': 'Schiaparelli Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg'},
 {'title': 'Syrtis Major Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg'},
 {'title': 'Valles Marineris Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg'}]

In [14]:
browser.quit()

## Browser Quit

In [None]:
browser.quit()