### 0. Load: 
https://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1501390 

### 1. Use the browser's development tools to find a unique way to access its list price and its current price. 

In [1]:
# Install the parser library and regular expression
#!pip install lxml
#!pip install regex

In [2]:
from bs4 import BeautifulSoup
import requests
import regex as re

In [3]:
# Specify the url of the website we are going to scrape
url = "https://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1501390"

# Get the user agent of the browser: http://httpbin.org/get
headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get(url, headers = headers)

In [4]:
# Create a beautifulsoup object
soup = BeautifulSoup(page.text, 'lxml')

In [5]:
# Retrieve the list price and the current price
list_price = soup.select("p.list-price > span.sr-only")
for i in list_price:
    print(i)
    
current_price = soup.select("p.final-price > span.sale-price > span.sr-only")
for i in current_price:
    print(i)

<span class="sr-only">$1,399
          and 99 cents
        </span>
<span class="sr-only">$1,029
          and 99 cents
        </span>


In [6]:
# Another way to get the two prices both at once
prices = soup.select("div.pdp-price span.sr-only")
for i in prices:
    print(i)

<span class="sr-only">$1,399
          and 99 cents
        </span>
<span class="sr-only">$1,029
          and 99 cents
        </span>


### 2. Store the prices to strings.

In [7]:
# Conver the retrieved file into strings
list_price = str(list_price)
current_price = str(current_price)

In [8]:
# View the retrieved file
list_price

'[<span class="sr-only">$1,399\r\n          and 99 cents\r\n        </span>]'

### 3.  Use Python's (or Java's) regex (!!) functionality to convert the prices to "1234.56" (no dollar sign, comma, just a "." separator for cents)

In [9]:
# Extract the elements of prices
list_price_extracted = re.findall('([0-9,]+)', str(list_price))
current_price_extracted = re.findall('([0-9,]+)', str(current_price))

# Combine the elements
list_price_converted = list_price_extracted[0].replace(',','') + "." + list_price_extracted[1]
current_price_converted = current_price_extracted[0].replace(',','') + "." + current_price_extracted[1]

### 4.  Print both, the list price and the current price to screen / terminal

In [10]:
# Print the prices
print("List Price:\t", list_price_converted)
print("Current Price:\t", current_price_converted)

List Price:	 1399.99
Current Price:	 1029.99


### 5.   Write code that loads: https://www.usnews.com/

In [11]:
url_1 = "https://www.usnews.com/"
page_1 = requests.get(url_1, headers = headers)

In [12]:
soup_1 = BeautifulSoup(page_1.text, 'lxml')

### 6. "finds" its current "Top Stories" 

In [13]:
# Specify the anchor tag of "Top Stories"
a = soup_1.select("div.Box-w0dun1-0.ArmRestTopStories__Part-s0vo7p-1.erkdnc.biVKSR h3 a")

# Create the dictionary for storing the infomation of top stories
Top_Stories = {'Headings': [], 'Links': []}
for i in a:
    Top_Stories['Headings'].append(re.findall(">(.*)<", str(i))[0])
    Top_Stories['Links'].append(i['href'])
    
Top_Stories

{'Headings': ['McCarthy, Biden to Talk Amid Debt Threat',
  'Existing Homes Fall 1.5% in December'],
 'Links': ['https://www.usnews.com/news/politics/articles/2023-01-20/mccarthy-biden-agree-to-sit-down-over-debt-ceiling',
  'https://www.usnews.com/news/economy/articles/2023-01-20/existing-homes-fall-1-5-in-december-marking-11th-month-of-declines']}

### 7. Read + print the URL of the _second_ current top story to the screen (terminal)

In [14]:
url_second_story = Top_Stories['Links'][1]
url_second_story

'https://www.usnews.com/news/economy/articles/2023-01-20/existing-homes-fall-1-5-in-december-marking-11th-month-of-declines'

### 8. Load that page 

In [15]:
page_2 = requests.get(url_second_story, headers = headers)
soup_2 = BeautifulSoup(page_2.text, 'lxml')

### 9. Read + print the header as well as the first 3 sentences of the main body to the screen

In [16]:
# Find the header and the main body
header = soup_2.select("h1.Heading-sc-1w5xk2o-0.iQhOvV")
body = soup_2.select("div.Raw-slyvem-0.bCYKCn p")

In [17]:
# Convert elements in body into strings and store in a list
raw = []
for i in body: 
    raw.append(str(i))

# Split the main body into paragraphs
para = []
for i in raw:
    kk = re.findall('>(.*?)<', i[:])
    if kk != ['']:
        para.append(kk[0])

In [18]:
# Get the first 3 sentences of the main body
sentences = []
num = 0

for i in para:
    
    kk = re.findall("(.*?[a-z]\.)", i)
    num += len(kk)
    for i in kk:
        sentences.append(i)
        
    if num == 3:
        break

sentences

['Sales of existing homes slid 1.5% in December, somewhat better than expected but the 11th straight month of decline, the National Association of Realtors said on Friday.',
 'The number was better than estimates of a 3.4% drop and brings the annual rate of home sales just a hair above 4 million.',
 ' Sales are now down 34% from year-ago levels.']

In [19]:
# Print out
print("Header: " + re.findall(">(.*)<", str(header))[0])

for i in range(len(sentences)):
    print("Sentence " + str(i+1) + ": " + sentences[i].strip())

Header: Existing Homes Fall 1.5% in December, Marking 11th Month of Declines
Sentence 1: Sales of existing homes slid 1.5% in December, somewhat better than expected but the 11th straight month of decline, the National Association of Realtors said on Friday.
Sentence 2: The number was better than estimates of a 3.4% drop and brings the annual rate of home sales just a hair above 4 million.
Sentence 3: Sales are now down 34% from year-ago levels.
