# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [None]:
pip install webdriver_manager

In [2]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager

In [109]:
import requests
import pymongo
import pandas as pd
from datetime import datetime

In [30]:
# Initialize PyMongo to work with MongoDBs
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

In [31]:
# Define database and collection
db = client.news_db
collection = db.articles

In [3]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [5]:
# Visit the Mars NASA news site: https://redplanetscience.com
mars_url = 'https://redplanetscience.com'
browser.visit(mars_url)
#create HTMl Object
html = browser.html

### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [7]:
# Create a Beautiful Soup object
soup = soup(html, 'html.parser')

In [8]:
# Extract all the text elements (title)
title = soup.title.text
print(title)

News - Mars Exploration Program


In [18]:
for x in range(10):

    titles = soup.find_all('div', class_='content_title')

    for title in titles:
        print('-------------')
        print(title.text)
    
    # button to show next title
    #browser.links.find_by_partial_text('Next').click()

-------------
How NASA's Mars Helicopter Will Reach the Red Planet's Surface
-------------
NASA's Perseverance Rover Is Midway to Mars 
-------------
Common Questions about InSight's 'Mole'
-------------
NASA's New Mars Rover Is Ready for Space Lasers
-------------
Meet the People Behind NASA's Perseverance Rover
-------------
NASA Administrator Statement on Moon to Mars Initiative, FY 2021 Budget
-------------
NASA's Mars Reconnaissance Orbiter Undergoes Memory Update
-------------
NASA's Curiosity Rover Finds an Ancient Oasis on Mars
-------------
How NASA's Perseverance Mars Team Adjusted to Work in the Time of Coronavirus 
-------------
NASA's Perseverance Rover Bringing 3D-Printed Metal Parts to Mars
-------------
Mars Is Getting a New Robotic Meteorologist
-------------
NASA Moves Forward With Campaign to Return Mars Samples to Earth
-------------
Naming a NASA Mars Rover Can Change Your Life
-------------
Mars 2020 Unwrapped and Ready for More Testing
-------------
NASA's Mars P

In [26]:
# Extract all the text elements (paragraph)
for x in range(10):

    articles = soup.find_all('div', class_='article_teaser_body')

    for article in articles:
        print('-------------')
        print(article.text)
        
    
    # button to show next title
    #browser.links.find_by_partial_text('Next').click()

-------------
The small craft will seek to prove that powered, controlled flight is possible on another planet. But just getting it onto the surface of Mars will take a whole lot of ingenuity.
-------------
Sometimes half measures can be a good thing – especially on a journey this long. The agency's latest rover only has about 146 million miles left to reach its destination.
-------------
The following Q&As with members of the team answer some of the most common questions about the burrowing device, part of a science instrument called the Heat Flow and Physical Properties Package (HP3).
-------------
Perseverance is one of a few Mars spacecraft carrying laser retroreflectors. The devices could provide new science and safer Mars landings in the future.
-------------
These are the scientists and engineers who built NASA's next Mars rover and who will guide it to a safe landing in Jezero Crater. 
-------------
Jim Bridenstine addresses NASA's ambitious plans for the coming years, includin

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [112]:
# Retrieve the parent divs for all articles
results = soup.find_all('div', {'id': 'news', 'class': 'container'})
# counter =1
# loop over results to get article data
for result in results:
    # scrape the article title 
    title = result.find('div', class_='content_title').text.strip()
    
    # scrape the article paragraph
    paragraph = result.find('div', class_='article_teaser_body').text
    
    # scrape the date
    date = result.find('div', class_='list_date')
    
    # print article data
    print('-----------------')
    print(title)
    print(paragraph)
    print(date)

    # Dictionary to be inserted into MongoDB
    post = {
        'title': title,
        'preview': paragraph,
        'date': date
    }

    # Insert dictionary into MongoDB as a document
    collection.insert_one(post)

-----------------
How NASA's Mars Helicopter Will Reach the Red Planet's Surface
The small craft will seek to prove that powered, controlled flight is possible on another planet. But just getting it onto the surface of Mars will take a whole lot of ingenuity.
<div class="list_date">December 4, 2022</div>


InvalidDocument: cannot encode object: <div class="list_date">December 4, 2022</div>, of type: <class 'bs4.element.Tag'>

In [111]:
# Print the list to confirm success
print(post)

{'title': "How NASA's Mars Helicopter Will Reach the Red Planet's Surface", 'preview': 'The small craft will seek to prove that powered, controlled flight is possible on another planet. But just getting it onto the surface of Mars will take a whole lot of ingenuity.', 'date': <div class="list_date">December 4, 2022</div>, '_id': ObjectId('638d22f963dc05a372701aaa')}


In [None]:
browser.quit()

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [60]:
df = pd.DataFrame(results)

In [65]:
# Export data to JSON
df.to_json(r'c:\users\annadejesus\results_news.json')

In [66]:
# Export data to MongoDB
# Display the MongoDB records created above
articles = db.articles.find()
for article in articles:
    print(article)

{'_id': ObjectId('638d0be963dc05a372701aa7'), 'title': "How NASA's Mars Helicopter Will Reach the Red Planet's Surface", 'preview': 'The small craft will seek to prove that powered, controlled flight is possible on another planet. But just getting it onto the surface of Mars will take a whole lot of ingenuity.'}
{'_id': ObjectId('638d0c0c63dc05a372701aa8'), 'title': "How NASA's Mars Helicopter Will Reach the Red Planet's Surface", 'preview': 'The small craft will seek to prove that powered, controlled flight is possible on another planet. But just getting it onto the surface of Mars will take a whole lot of ingenuity.'}
