<h1 style="color:dimgray">
    <i>    
    Cryptocurrency webscraping - Edward Blair
    </i>
</h1>

Please read and complete [this cell](#important-cell) before running anything. Once complete, the cells can be run chronologically from the top to produce a result

## Necessary imports:
***

In [None]:
import itertools
import praw
import collections
from operator import itemgetter
import seaborn as sns
import matplotlib.pyplot as plt

<a id='important-cell'></a>
## *Important* - What you need to do:
***
- In order to use PRAW, you require a **Reddit instance** provided with a ```client_id```, ```client_secret``` and ```user_agent```
- Firstly, navigate to [This page](https://www.reddit.com/prefs/apps) and click **create app** or **create another app**:

<img src="ScraperTutorial1.png" width="850" height="350">

- Once clicked, a form will open up. Fill in a name, description and redirect UI. for the redirect UI, choose ```http://localhost:8080```:

<img src="ScraperTutorial2.png" width="850" height="350">

- Once you press **create app** a new application appears. Within this you find the information needed to edit the ```praw.Reddit``` instance:

<img src="ScraperTutorial3.png" width="850" height="350">

## Edit _the below cell_ to contain your specific client_id, client_secret and user_agent:

In [None]:
reddit = praw.Reddit(client_id = '[YOUR_CLIENT_ID_HERE]', client_secret = '[YOUR_CLIENT_SECRET_HERE]', user_agent = '[YOUR_USER_AGENT_HERE]')

## After running the above cells, the below cells can now be run to produce the graph:
***

In [None]:
def get_top_dollar_posts(reddit, subreddit, numHotPosts=350, returnNum=10):
    hot_posts = reddit.subreddit(subreddit).hot(limit=numHotPosts)
    
    # Title's of hot_posts are type <class 'str'>
    # split() the title String into a list - each word is list item
    title_list = [post.title.split() for post in hot_posts]
    
    # Flatten list fully
    final_list = [item for row in title_list for item in row]
    
    # occurrences - a count of how many times each item appears in the list
    occurrences = collections.Counter(final_list)
    
    # Separate wheat from chaff - removal of words from titles which do not begin with $
    # Post titles precede ticker symbols of cryptocurrencies with '$' e.g. '$BTC'
    filtered_occurrences = {k: v for k, v in occurrences.items() if k.startswith('$')}
    
    # Take the n most frequently occurring items of the dict, put them into a new dict
    ordered_occurrences = sorted(filtered_occurrences.items(), key=itemgetter(1), reverse=True)
    top_n_occurrences = dict(ordered_occurrences[:returnNum])
    
    return top_n_occurrences

In [None]:
#Choose number of hot posts to trawl through
numHotPosts = 350

top_ten_dict = get_top_dollar_posts(reddit, 'CryptoMoonShots', numHotPosts, 10)

'''
Graphing the top_ten_dict:
x = name of crypto
y = number of times it is mentioned
'''
#Resizing of the graph's x and y
plt.rcParams['figure.figsize'] = [14, 6]

keys = list(top_ten_dict.keys())
#Acquire values in same order as the keys, parse the values
vals = [float(top_ten_dict[k]) for k in keys]
sns.barplot(x=keys, y=vals)
plt.title("Top ten most frequently mentioned crypto names within the " + str(numHotPosts) + " current hottest posts on r/CryptoMoonShots")
plt.xlabel("Crypto name")
plt.ylabel("Number of mentions in post title")
plt.show()

## Sources
***
- Inspiration for this webscraper as well as the tutorial material was drawn from: https://towardsdatascience.com/scraping-reddit-data-1c0af3040768