# Reddit Scraper

## Instructions

* In this activity, you will scrape the [Python Reddit](https://www.reddit.com/r/Python/)

* Use Beautiful Soup to scrape only threads that one or more comments, then print the thread's title, number of comments, and the URL to the thread.

* Your output should look something like this:

  ![reddit.png](Images/reddit.png)


## Bonus

* If you finish early, try to display each thread's top comment in your output!


In [4]:
# Dependencies
from bs4 import BeautifulSoup
import requests

# URL of Python reddit
url = 'https://www.reddit.com/r/Python/'

# Retrieve page with the requests module
html = requests.get(url)

# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html.text, 'html.parser')

In [5]:
print(html)
# if you get a 429, there are too many requests right now... wait a minute and try again.

<Response [429]>


* When you don't set up an user agent, the requests library will use the default user agent.

* If there are too many people using the default user agent they are being identified as the just one bot sending too many requests.

In [22]:
html = requests.get(url, headers = {'User-agent': 'your bot 1.0'})
soup = BeautifulSoup(html.text, 'html.parser')

In [23]:
print(html)

<Response [200]>


In [24]:
# Examine the results, then determine element that contains sought info
# results are returned as an iterable list
results = soup.find_all('div', class_='top-matter')
results

[<div class="top-matter"><p class="title"><a class="title may-blank " data-event-action="title" data-href-url="/r/Python/comments/75jklc/rpython_official_job_board/" data-inbound-url="/r/Python/comments/75jklc/rpython_official_job_board/?utm_content=title&amp;utm_medium=hot&amp;utm_source=reddit&amp;utm_name=Python" href="/r/Python/comments/75jklc/rpython_official_job_board/" rel="" tabindex="1">/r/Python official Job Board!</a> <span class="domain">(<a href="/r/Python/">self.Python</a>)</span></p><div class="expando-button collapsed hide-when-pinned selftext"></div><p class="tagline ">submitted <time class="live-timestamp" datetime="2017-10-10T19:40:06+00:00" title="Tue Oct 10 19:40:06 2017 UTC">4 months ago</time> by <a class="author may-blank id-t2_628u" href="https://www.reddit.com/user/aphoenix">aphoenix</a><span class="flair " title="reticulated">reticulated</span><span class="userattrs"></span> - <span class="stickied-tagline" title="selected by this subreddit's moderators">anno

In [25]:
for result in results:
    title = result.find('p', class_='title')
    title_text = title.a.text
    thread = result.find('li', class_='first')
    comments = thread.text.lstrip()
    if (' comments' in comments):
        comments_num = comments.replace(' comments', '')
        comments_num = int(comments_num)
    else:
        comments_num = comments.replace('comment', '')
    link = thread.a['href']
    if (comments_num):
        print('-----------------')
        print(title_text)
        print('Comments:', comments_num)
        print(link)

-----------------
/r/Python official Job Board!
Comments: 145
https://www.reddit.com/r/Python/comments/75jklc/rpython_official_job_board/
-----------------
What's everyone working on this week?
Comments: 34
https://www.reddit.com/r/Python/comments/82fb73/whats_everyone_working_on_this_week/
-----------------
Library to draw simple 2d objects and move them on a grid.
Comments: 10
https://www.reddit.com/r/Python/comments/82o34s/library_to_draw_simple_2d_objects_and_move_them/
-----------------
What can I do with python that's not data analysis? Would I be able to be a software engineer with only python?
Comments: 138
https://www.reddit.com/r/Python/comments/82jkl6/what_can_i_do_with_python_thats_not_data_analysis/
-----------------
None is None is None -> True
Comments: 8
https://www.reddit.com/r/Python/comments/82pbtc/none_is_none_is_none_true/
-----------------
[Flask] Best way to create a reusable app?
Comments: 3
https://www.reddit.com/r/Python/comments/82lvhb/flask_best_way_to_creat