## `spacenews_stories`
A program to scrape news from the [Space News](https://www.space.com/news) web page.

The function retrieves the space.com news page, extracts the news stories, and prints out the headline, author, synopsis, and date and time for each story.

In [0]:
import requests
from bs4 import BeautifulSoup

In [0]:
url = 'https://www.space.com/news'
response = requests.get(url)

# make sure we got a valid response
if(response.ok):
  # get the full data from the response
  data = response.text
  soup = BeautifulSoup(data, 'html.parser')
  
  # find all elements with class *listingResults mixed*
  summary =soup.find_all(class_='listingResults mixed')
  print(summary)

Still, lots of extraneous data.

Let's look for `class=content` which has all the attributes we are looking for. 

In [0]:
summaries = soup.find_all(class_='content')
print(summaries)

Alternatively, try CSS selectors and dig down deeper to pick specific elements of interest. Those are:

h3: article name

class = by-author

time: date and time

class = synopsis

In [6]:
# iterate over all stories
raw_stories = soup.select('.content')
stories = []
for story in raw_stories:
  headline =  story.select_one('h3').get_text() # extract the headline
  author =    story.select_one('.by-author > span').get_text() # extract the author
  date_time = story.select_one('time') # extract the date and time. Tried [attr=datetime] and others...
  synopsis =  story.select_one('.synopsis').get_text().strip() # extract the story's synopsis
  new_story = {'headline': headline, 'author': author, 'date_n_time': date_time, 'synopsis': synopsis } # construct a dictionary
  stories.append(new_story) # add dictionary to list
print(stories)  

[{'headline': "James Gunn doesn't think 'Guardians of the Galaxy Vol. 3' or 'The Suicide Squad' will be delayed", 'author': '\nChris Arrant ', 'date_n_time': <time class="published-date relative-date" data-published-date="2020-04-26T14:28:29Z" datetime="2020-04-26T14:28:29Z"></time>, 'synopsis': 'Writer/director James Gunn doesn\'t think there will be a need to postpone "Guardians of the Galaxy Vol. 3" despite cascading delays in film release due to the coronavirus pandemic.'}, {'headline': 'Tomanowos, the meteorite that survived mega-floods and human folly', 'author': '\nDaniel Garcia-Castellanos ', 'date_n_time': <time class="published-date relative-date" data-published-date="2020-04-26T14:10:53Z" datetime="2020-04-26T14:10:53Z"></time>, 'synopsis': 'The rock with arguably the most fascinating story on Earth has an ancient name.'}, {'headline': "See the bright 'evening star' Venus swing by the crescent moon tonight", 'author': '\nHanneke Weitering ', 'date_n_time': <time class="publi