<font color='DimGrey'>_**Cryptocurrency Reddit Webscraping - Edward Blair**_</font>
======

<p style="font-size:21px">
    <i>
        <span style="color:DimGrey">Please read and complete</span>
        <a href="#importantcell"> this cell </a>
        <span style="color:DimGrey">before running anything. Once complete, the cells can be run chronologically from the top to produce a result</span>
    </i>
</p>

## Required installs:
***
- **PRAW** = **P**ython **R**eddit **A**PI **W**rapper
- A powerful Python package, allowing for simple access to Reddit's API

In [None]:
pip install praw

## Necessary imports:
***

In [None]:
import itertools
import praw
import collections
from operator import itemgetter
import seaborn as sns
import matplotlib.pyplot as plt

<a name='importantcell'></a>
## *Important* - What you need to do:
***
- In order to use PRAW, you require a **Reddit instance** provided with a ```client_id```, ```client_secret``` and ```user_agent```
- Firstly, navigate to [This page](https://www.reddit.com/prefs/apps) and click **create app** or **create another app**:

<img src="ScraperTutorial1.png" width="850" height="350">

- Once clicked, a form will open up. Fill in a name, description and redirect UI. for the redirect UI, choose ```http://localhost:8080```:

<img src="ScraperTutorial2.png" width="850" height="350">

- Once you press **create app** a new application appears. Within this you find the information needed to edit the ```praw.Reddit``` instance:

<img src="ScraperTutorial3.png" width="850" height="350">

## Edit _the below cell_ to contain your specific client_id, client_secret and user_agent:

In [None]:
reddit = praw.Reddit(client_id = '[YOUR_CLIENT_ID_HERE]', client_secret = '[YOUR_CLIENT_SECRET_HERE]', user_agent = '[YOUR_USER_AGENT_HERE]')

## After running the above cells, the below cell can now be run to produce the graph:
***

In [None]:
#Choose number of hot posts to trawl through
numHotPosts = 350
hot_posts = reddit.subreddit('CryptoMoonShots').hot(limit=numHotPosts)

#Title's of hot_posts are type <class 'str'>

#split() the title String into a list - each word is list item
title_list = []
for post in hot_posts:
    title_list.append(post.title.split())
    
#Flatten list fully
final_list = []
for row in title_list:
    for item in row:
        final_list.append(item)

#occurrences - a count of how many times each item appears in the list
occurrences = collections.Counter(final_list)

#dict: k = word from title, v = number of ocurrences of word
my_dict = dict(occurrences)

'''
Separate wheat from chaff - removal of words from titles which do not begin with $
Post titles precede ticker symbols of cryptocurrencies with '$' e.g. '$BTC'
Iterate through keySet of the dict, pop() those which do not begin with '$'
The keys are Strings, check 0th index
'''
for key in my_dict.copy():
    if key[0] != "$":
        my_dict.pop(key)

'''
To print the current dictionary in sorted order:
print(sorted(my_dict.items(), key = itemgetter(1)))
'''

#Take the 10 most frequently occurring items of the dict, put them into a new dict
top_ten_dict = dict(sorted(my_dict.items(), key=itemgetter(1), reverse=True)[:10])

'''
Graphing the top_ten_dict:
x = name of crypto
y = number of times it is mentioned
'''
#Resizing of the graph's x and y
plt.rcParams['figure.figsize'] = [14, 6]

keys = list(top_ten_dict.keys())
#Acquire values in same order as the keys, parse the values
vals = [float(top_ten_dict[k]) for k in keys]
sns.barplot(x=keys, y=vals)
plt.title("Top ten most frequently mentioned crypto names within the " + str(numHotPosts) + " current hottest posts on r/CryptoMoonShots")
plt.xlabel("Crypto name")
plt.ylabel("Number of mentions in post title")
plt.show()

## Sources
***
- Inspiration for this webscraper as well as the tutorial material was drawn from: https://towardsdatascience.com/scraping-reddit-data-1c0af3040768