# Reddit Scraper

## Instructions

* In this activity, you will scrape the [Python Reddit](https://www.reddit.com/r/Python/)

* Use Beautiful Soup to scrape only threads that have one or more comments, then print the thread's title, number of comments, and the URL to the thread.

* Your output should look something like this:

  ![reddit.png](Images/reddit.png)


## Bonus

* If you finish early, try to display each thread's top comment in your output!


In [1]:
# Dependencies
from bs4 import BeautifulSoup
import requests

# URL of page to be scraped
url = 'https://www.reddit.com/r/Python/'

In [3]:
# Retrieve page with the requests module
response = requests.get(url)
response

<Response [429]>

In [7]:
response = requests.get(url, headers = {'User-agent': 'totally not a bot'})
response

<Response [200]>

In [10]:
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(response.text, 'html.parser')

# Examine the results, then determine element that contains sought info
print(soup.prettify())


<!DOCTYPE doctype html>
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title>
   Python
  </title>
  <meta content=" reddit, reddit.com, vote, comment, submit " name="keywords"/>
  <meta content="reddit: the front page of the internet" name="description"/>
  <meta content="always" name="referrer"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
   <link href="/static/opensearch.xml" rel="search" type="application/opensearchdescription+xml"/>
   <link href="https://www.reddit.com/r/Python/" rel="canonical"/>
   <meta content="width=1024" name="viewport"/>
   <link href="//out.reddit.com" rel="dns-prefetch"/>
   <link href="//out.reddit.com" rel="preconnect"/>
   <meta content="https://www.redditstatic.com/icon.png" property="og:image"/>
   <meta content="reddit" property="og:site_name"/>
   <meta content="news about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python" property="og:descripti

In [12]:
# results are returned as an iterable list
results = soup.find_all('div', class_="top-matter")
results

[<div class="top-matter"><p class="title"><a class="title may-blank " data-event-action="title" data-href-url="/r/Python/comments/75jklc/rpython_official_job_board/" data-inbound-url="/r/Python/comments/75jklc/rpython_official_job_board/?utm_content=title&amp;utm_medium=hot&amp;utm_source=reddit&amp;utm_name=Python" href="/r/Python/comments/75jklc/rpython_official_job_board/" rel="" tabindex="1">/r/Python official Job Board!</a> <span class="domain">(<a href="/r/Python/">self.Python</a>)</span></p><div class="expando-button collapsed hide-when-pinned selftext"></div><p class="tagline ">submitted <time class="live-timestamp" datetime="2017-10-10T19:40:06+00:00" title="Tue Oct 10 19:40:06 2017 UTC">5 months ago</time> by <a class="author may-blank id-t2_628u" href="https://www.reddit.com/user/aphoenix">aphoenix</a><span class="flair " title="reticulated">reticulated</span><span class="userattrs"></span> - <span class="stickied-tagline" title="selected by this subreddit's moderators">anno

In [18]:
for result in results:
    title = result.find('p', class_="title")
    title_text = title.a.text
    thread = result.find('li', class_ = 'first')
    comments = thread.text.lstrip()

    if (' comments' in comments):
        comments_num = comments.replace(' comments', '')
            
    else:
        comments_num = comments.replace(' comment', '')
    link = thread.a['href']
        
    if (comments_num):
        print('---------')
        print(title_text)
        print('comments: ', comments_num)
        print(link)

---------
/r/Python official Job Board!
comments:  152
https://www.reddit.com/r/Python/comments/75jklc/rpython_official_job_board/
---------
What's everyone working on this week?
comments:  89
https://www.reddit.com/r/Python/comments/82fb73/whats_everyone_working_on_this_week/
---------
After writing two books in RStudio using RMarkdown and Knitr, I finally duplicated the features through Python and Pandoc.
comments:  13
https://www.reddit.com/r/Python/comments/83d92n/after_writing_two_books_in_rstudio_using/
---------
bare bones implementation of pagerank
comments:  comment
https://www.reddit.com/r/Python/comments/83fdlb/bare_bones_implementation_of_pagerank/
---------
[Video] XSS - Vulnerability which we deserve (English subs) with Flask, pytest and Selenium
comments:  6
https://www.reddit.com/r/Python/comments/83d1ge/video_xss_vulnerability_which_we_deserve_english/
---------
Snips.ai open-sources its Natural Language Understanding Python lib
comments:  19
https://www.reddit.com/r/P