# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [1]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager
import os
import json

In [2]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [3]:
# Visit the Mars NASA news site: https://redplanetscience.com
url = 'https://redplanetscience.com'
browser.visit(url)

### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [4]:
# Create a Beautiful Soup object

# This will open the url in the browser.  We must do it this way because the data we are looking for is not it the HTML if look at it outside of a browser.
# The data is loaded with the webpage by a java script.  So if we just did a "requests.get(url)" the data we are after would not be there.  We must 
# load a browser and scrape/pars that.

html = browser.html
soup = soup(html, 'html.parser')

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [5]:
# get news story titles

store_nasa_news_titles = []

# Getting the titles
# This is what I am looking for <div class="content_title">
nasa_news_titles = soup.find_all('div', class_='content_title')

for title in nasa_news_titles:
         print('-------------')
         print(title.text)
         store_nasa_news_titles.append(title.text)
        

-------------
NASA's Curiosity Keeps Rolling As Team Operates Rover From Home
-------------
Hear Audio From NASA's Perseverance As It Travels Through Deep Space
-------------
NASA Establishes Board to Initially Review Mars Sample Return Plans
-------------
MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss
-------------
NASA's Mars 2020 Rover Closer to Getting Its Name
-------------
8 Martian Postcards to Celebrate Curiosity's Landing Anniversary
-------------
NASA's Push to Save the Mars InSight Lander's Heat Probe
-------------
Meet the People Behind NASA's Perseverance Rover
-------------
A Martian Roundtrip: NASA's Perseverance Rover Sample Tubes
-------------
NASA's Perseverance Rover Will Peer Beneath Mars' Surface 
-------------
MOXIE Could Help Future Rockets Launch Off Mars
-------------
NASA Administrator Statement on Moon to Mars Initiative, FY 2021 Budget
-------------
Three New Views of Mars' Moon Phobos
-------------
NASA's MAVEN Explores Ma

In [6]:
# get news story previes

store_nasa_news_previews = []

# Getting the article preview.
# it is stored with the following tag. <div class="article_teaser_body">
nasa_news_previews = soup.find_all('div', class_='article_teaser_body')

for preview in nasa_news_previews:
        print('-------------')
        print(preview.text)
        store_nasa_news_previews.append(preview.text)

-------------
The team has learned to meet new challenges as they work remotely on the Mars mission.
-------------
The first to be rigged with microphones, the agency's latest Mars rover picked up the subtle sounds of its own inner workings during interplanetary flight.
-------------
The board will assist with analysis of current plans and goals for one of the most difficult missions humanity has ever undertaken.
-------------
Five years after NASA’s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map of electric current systems in the Martian atmosphere.
-------------
155 students from across the U.S. have been chosen as semifinalists in NASA's essay contest to name the Mars 2020 rover, and see it launch from Cape Canaveral this July.
-------------
The NASA rover touched down eight years ago, on Aug. 5, 2012, and will soon be joined by a second rover, Perseverance.
-------------
The scoop on the end of the spacecraft's robotic arm wi

In [7]:
# get news story dates

store_nasa_news_dates = []

# Getting the article dates.
# it is stored with the following tag. <div class="list_date">
nasa_news_dates = soup.find_all('div', class_='list_date')

for date in nasa_news_dates:
        store_nasa_news_dates.append(date.text)

In [8]:
# Create an empty list to store the dictionaries
list_of_dic = []

In [9]:
# Loop through the text elements
# Extract the title and preview text from the elements
# Store each title and preview pair in a dictionary
# Add the dictionary to the list
num_of_stories = len(store_nasa_news_titles)
for i in range (num_of_stories):
      list_of_dic.append({'title':store_nasa_news_titles[i] , 'preview' : store_nasa_news_previews[i], 'date':store_nasa_news_dates[i]})

In [10]:
# Print the list to confirm success
list_of_dic

[{'title': "NASA's Curiosity Keeps Rolling As Team Operates Rover From Home",
  'preview': 'The team has learned to meet new challenges as they work remotely on the Mars mission.',
  'date': 'December 21, 2022'},
 {'title': "Hear Audio From NASA's Perseverance As It Travels Through Deep Space",
  'preview': "The first to be rigged with microphones, the agency's latest Mars rover picked up the subtle sounds of its own inner workings during interplanetary flight.",
  'date': 'December 20, 2022'},
 {'title': 'NASA Establishes Board to Initially Review Mars Sample Return Plans',
  'preview': 'The board will assist with analysis of current plans and goals for one of the most difficult missions humanity has ever undertaken.',
  'date': 'December 19, 2022'},
 {'title': 'MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss',
  'preview': 'Five years after NASA’s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map 

In [11]:
browser.quit()

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [12]:
# Export data to JSON
nasa_news_json = json.dumps(list_of_dic, indent = 4)

#set the path for saving the json
file = os.path.join('.','Output', 'nasa_news.json')

# Writing to sample.json
with open(file, "w") as outfile:
    outfile.write(nasa_news_json)

#print the json
print(nasa_news_json)

[
    {
        "title": "NASA's Curiosity Keeps Rolling As Team Operates Rover From Home",
        "preview": "The team has learned to meet new challenges as they work remotely on the Mars mission.",
        "date": "December 21, 2022"
    },
    {
        "title": "Hear Audio From NASA's Perseverance As It Travels Through Deep Space",
        "preview": "The first to be rigged with microphones, the agency's latest Mars rover picked up the subtle sounds of its own inner workings during interplanetary flight.",
        "date": "December 20, 2022"
    },
    {
        "title": "NASA Establishes Board to Initially Review Mars Sample Return Plans",
        "preview": "The board will assist with analysis of current plans and goals for one of the most difficult missions humanity has ever undertaken.",
        "date": "December 19, 2022"
    },
    {
        "title": "MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss",
        "preview": "Five years after NASA

In [38]:
# Export data to MongoDB
import pymongo

conn = 'mongodb://localhost:27017/'
client = pymongo.MongoClient(conn)

db = client.nasa_news_db

db.news_stories.drop()

db.news_stories.insert_many(list_of_dic)

<pymongo.results.InsertManyResult at 0x1aa6a3344f0>

In [39]:
cursor = db.news_stories

for doc in cursor.find():
      print(doc)

{'_id': ObjectId('63a355cecaded8e3de5ec737'), 'title': "NASA's Curiosity Keeps Rolling As Team Operates Rover From Home", 'preview': 'The team has learned to meet new challenges as they work remotely on the Mars mission.', 'date': 'December 21, 2022'}
{'_id': ObjectId('63a355cecaded8e3de5ec738'), 'title': "Hear Audio From NASA's Perseverance As It Travels Through Deep Space", 'preview': "The first to be rigged with microphones, the agency's latest Mars rover picked up the subtle sounds of its own inner workings during interplanetary flight.", 'date': 'December 20, 2022'}
{'_id': ObjectId('63a355cecaded8e3de5ec739'), 'title': 'NASA Establishes Board to Initially Review Mars Sample Return Plans', 'preview': 'The board will assist with analysis of current plans and goals for one of the most difficult missions humanity has ever undertaken.', 'date': 'December 19, 2022'}
{'_id': ObjectId('63a355cecaded8e3de5ec73a'), 'title': 'MAVEN Maps Electric Currents around Mars that are Fundamental to 