# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [1]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager

In [2]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [3]:
# Visit the Mars NASA news site: https://redplanetscience.com
url = "https://redplanetscience.com"
browser.visit(url)
html = browser.html


### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [4]:
# Create a Beautiful Soup object
html_soup = soup (html, 'html.parser')
print(html_soup.prettify())


<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link crossorigin="anonymous" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" integrity="sha384-giJF6kkoqNQ00vy+HMDP7azOuL0xtbfIcaT9wjKHr8RbDVddVHyTfAAsrekwKmP1" rel="stylesheet"/>
  <link href="css/font.css" rel="stylesheet" type="text/css"/>
  <link href="css/app.css" rel="stylesheet" type="text/css"/>
  <link crossorigin="anonymous" href="https://pro.fontawesome.com/releases/v5.10.0/css/all.css" integrity="sha384-AYmEC3Yw5cVb3ZcuHtOA93w35dYTsvhLPVnYs9eStHfGJvOvKxVfELGroGkvsg+p" rel="stylesheet"/>
  <title>
   News - Mars Exploration Program
  </title>
 </head>
 <body>
  <div class="col-md-12">
   <div class="row">
    <nav class="navbar navbar-expand-lg navbar-light fixed-top">
     <div class="container-fluid">
      <a class="navbar-brand" href="#">
       <img src="image/nasa.png" width="80"/>
       <span class="logo">
        MA

In [5]:
# Extract all the text elements
NASA_titles = html_soup.find_all("div", class_="content_title")
NASA_text = html_soup.find_all("div", class_="article_teaser_body")
for title in NASA_titles:
    word = title.text
    print(word)


Media Get a Close-Up of NASA's Mars 2020 Rover
NASA Moves Forward With Campaign to Return Mars Samples to Earth
8 Martian Postcards to Celebrate Curiosity's Landing Anniversary
NASA InSight's 'Mole' Is Out of Sight
NASA Engineers Checking InSight's Weather Sensors
Naming a NASA Mars Rover Can Change Your Life
With Mars Methane Mystery Unsolved, Curiosity Serves Scientists a New One: Oxygen
NASA's Mars Rover Drivers Need Your Help
NASA's Perseverance Rover 100 Days Out
NASA Prepares for Moon and Mars With New Addition to Its Deep Space Network
NASA's Perseverance Rover Will Peer Beneath Mars' Surface 
Global Storms on Mars Launch Dust Towers Into the Sky
AI Is Helping Scientists Discover Fresh Craters on Mars
NASA's Mars 2020 Rover Closer to Getting Its Name
InSight's 'Mole' Team Peers into the Pit


### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [6]:
# Create an empty list to store the dictionaries
NASA_news = []


In [7]:
# Loop through the text elements
# Extract the title and preview text from the elements
# Store each title and preview pair in a dictionary
# Add the dictionary to the list
for title, text in zip(NASA_titles, NASA_text):
    news_block= {"Title": title.text,
                "Text": text.text}
    #print(news_block)
    NASA_news.append(news_block)

In [8]:
# Print the list to confirm success
print(NASA_news)

[{'Title': "Media Get a Close-Up of NASA's Mars 2020 Rover", 'Text': "The clean room at NASA's Jet Propulsion Laboratory was open to the media to see NASA's next Mars explorer before it leaves for Florida in preparation for a summertime launch."}, {'Title': 'NASA Moves Forward With Campaign to Return Mars Samples to Earth', 'Text': 'During this next phase, the program will mature critical technologies and make critical design decisions as well as assess industry partnerships.'}, {'Title': "8 Martian Postcards to Celebrate Curiosity's Landing Anniversary", 'Text': 'The NASA rover touched down eight years ago, on Aug. 5, 2012, and will soon be joined by a second rover, Perseverance.'}, {'Title': "NASA InSight's 'Mole' Is Out of Sight", 'Text': "Now that the heat probe is just below the Martian surface, InSight's arm will scoop some additional soil on top to help it keep digging so it can take Mars' temperature."}, {'Title': "NASA Engineers Checking InSight's Weather Sensors", 'Text': 'An

In [9]:
browser.quit()

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [12]:
# Export data to JSON
import json
with open("NASA_news.json", "w") as outfile:
    json.dump(NASA_news, outfile)

In [14]:
# Export data to MongoDB
from pymongo import MongoClient
import pymongo
# from bson import json_util
from flask import jsonify
conn = "mongodb://localhost:27017"
client = pymongo.MongoClient(conn)
# db = client.NASA_news
# class_collection = db.Nasa_news


In [15]:
mars_db = client["mars_db"] 

In [18]:
class_collection = mars_db["Mars_news"]
for d in NASA_news:
    class_collection.insert_one(d)

In [19]:
client.list_database_names()

['admin', 'config', 'local', 'mars_db']