# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [1]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as bs
from webdriver_manager.chrome import ChromeDriverManager
import requests
import pymongo
import json

In [2]:
# Initialize PyMongo to work with MongoDBs
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

In [3]:
# Define database and collection
db = client.mars_news_db
collection = db.headlines

In [4]:
# set up splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

[WDM] - Downloading: 100%|████████████████████████████████████████████████████████| 6.46M/6.46M [00:00<00:00, 14.0MB/s]


### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [5]:
# Visit the Mars NASA news site: https://redplanetscience.com
url= 'https://redplanetscience.com'
browser.visit(url)

### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [6]:
html = browser.html
soup = bs(html, 'lxml')

In [7]:
results = soup.find_all('div', class_='list_text')
for result in results:
    print(result)

<div class="list_text">
<div class="list_date">December 4, 2022</div>
<div class="content_title">NASA's InSight 'Hears' Peculiar Sounds on Mars</div>
<div class="article_teaser_body">Listen to the marsquakes and other, less-expected sounds that the Mars lander has been detecting.</div>
</div>
<div class="list_text">
<div class="list_date">December 3, 2022</div>
<div class="content_title">NASA to Reveal Name of Its Next Mars Rover</div>
<div class="article_teaser_body">After a months-long contest among students to name NASA's newest Mars rover, the agency will reveal the winning name — and the winning student — this Thursday. </div>
</div>
<div class="list_text">
<div class="list_date">December 1, 2022</div>
<div class="content_title">All About the Laser (and Microphone) Atop Mars 2020, NASA's Next Rover</div>
<div class="article_teaser_body">SuperCam is a rock-vaporizing instrument that will help scientists hunt for Mars fossils.</div>
</div>
<div class="list_text">
<div class="list_da

In [8]:
results = soup.find_all('div', class_='article_teaser_body')
for result in results:
    print(result)

<div class="article_teaser_body">Listen to the marsquakes and other, less-expected sounds that the Mars lander has been detecting.</div>
<div class="article_teaser_body">After a months-long contest among students to name NASA's newest Mars rover, the agency will reveal the winning name — and the winning student — this Thursday. </div>
<div class="article_teaser_body">SuperCam is a rock-vaporizing instrument that will help scientists hunt for Mars fossils.</div>
<div class="article_teaser_body">A NASA Wallops Flight Facility cargo plane transported more than two tons of equipment — including the rover's sample collection tubes — to Florida for this summer's liftoff.</div>
<div class="article_teaser_body">The Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan. </div>
<div class="article_teaser_body">The system will be collecting and storing Martian rock and soil. Its installation marks another mi

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [9]:
# Create a Beautiful Soup object

# Retrieve the parent divs for all articles
results = soup.find_all('div', class_='list_text')

# Create an empty list to store the dictionaries
headlines = []

# Loop through the text elements
# Extract the title and preview text from the elements

for result in results:
    # scrape the article title 
    title = result.find('div', class_='content_title').text.strip()
    
    # scrape the preview text
    preview = result.find('div', class_='article_teaser_body').text
    
# Store each title and preview pair in a dictionary
# Add the dictionary to the list
    headlines.append({'title':title,
                      'preview':preview})


    # Export data to MongoDB
# ---------------------------------------------------------------------
    # Dictionary to be inserted into MongoDB
    post = {
        'title': title,
        'preview': preview,
    }

    # Insert dictionary into MongoDB as a document
    collection.insert_one(post)

In [10]:
# Print the list to confirm success
print(json.dumps(headlines, sort_keys=False, indent=4))

[
    {
        "title": "NASA's InSight 'Hears' Peculiar Sounds on Mars",
        "preview": "Listen to the marsquakes and other, less-expected sounds that the Mars lander has been detecting."
    },
    {
        "title": "NASA to Reveal Name of Its Next Mars Rover",
        "preview": "After a months-long contest among students to name NASA's newest Mars rover, the agency will reveal the winning name \u2014 and the winning student \u2014 this Thursday. "
    },
    {
        "title": "All About the Laser (and Microphone) Atop Mars 2020, NASA's Next Rover",
        "preview": "SuperCam is a rock-vaporizing instrument that will help scientists hunt for Mars fossils."
    },
    {
        "title": "Air Deliveries Bring NASA's Perseverance Mars Rover Closer to Launch",
        "preview": "A NASA Wallops Flight Facility cargo plane transported more than two tons of equipment \u2014 including the rover's sample collection tubes \u2014 to Florida for this summer's liftoff."
    },
    {
  

In [11]:
browser.quit()