# Deliverable 1: Scrape Titles and Preview Text from Mars News

## Importing the dependencies

In [1]:
# Importing the dependencies
from bs4 import BeautifulSoup
from splinter import Browser
from webdriver_manager.chrome import ChromeDriverManager
import requests
import json

In [2]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

## Loading the webpage and creating BeautifulSoup object

In [3]:
# URL of page to be scraped
url = 'https://redplanetscience.com/'

# Retrieve page with the browser
browser.visit(url)
html = browser.html

# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [4]:
# Extract title text
title = soup.title.text
print(title)

News - Mars Exploration Program


## 1) Scraping the title and preview text / summary text of each article on the landing page

After inspecting the page by using Chrome DevTools, the following information can be identified:
* **Article:** identified under `<div class="list_text">`
* **Article Date:** identified under `<div class="list_date">`
* **Article Title:** identified under `<div class="content_title">`
* **Article Summary:** identified under `<div class="article_teaser_body">`

In [5]:
# Retrieve the parent divs for all articles
articles = soup.find_all('div', class_='list_text')

In [6]:
# Empty list to store all the dictionaries extracted from the page
title_and_preview = []

# Loop over all the articles elements
for article in articles:
    # Dictionary to keep the required information from each article: title and preview
    title_and_preview_dict = {
        'title':article.find('div', class_='content_title').text,
        'preview':article.find('div', class_='article_teaser_body').text
    }
    # Adding the dictionary with the article elements into the main list
    title_and_preview.append(title_and_preview_dict)

In [7]:
# Printing the list with all the dictionaries
title_and_preview

[]

In [8]:
# Closing the browser session with Splinter
browser.quit()

# Bonus

## 2) Store the scraped data in a file

After creating the list of dictionaries with the required information for each article (title and preview), store the scraped data in a JSON file

In [9]:
# Path and Filename for the JSON file
output_file = "output/title_and_preview.json"

# Open the file with the "write" option and save the data in JSON format
with open(output_file, "w") as outfile:
    json.dump(title_and_preview, outfile, indent=2)