Created by [SmirkyGraphs](https://smirkygraphs.github.io/). Code: [Github](https://github.com/SmirkyGraphs/Python-Notebooks). Source: [/u/sodogetip](https://www.reddit.com/u/sodogetip).
<hr>

# /u/SoDogeTip - Dogecoin Tipping Bot

The subreddit /r/dogecoin has a bot ([/u/sodogetip](https://www.reddit.com/user/sodogetip) created by [/u/just-a-dev](https://www.reddit.com/user/just-an-dev)) which users can send a command to tip someone a specific amount of dogecoins. If a user doesn't claim the value within a specific number of days the, the coins go back to the original owner. This code will clean and analyze all comments that were collected from using the [PRAW](https://praw.readthedocs.io/en/stable/index.html) and [Pushshift.io](https://pushshift.io/) API. 

The tip bot comments in a specific format of `/u/sender -> /u/reciever 0.0 doge ($0.00)`

Pushshift was used because PRAW limits the number of available comments of a specific user to only 1,000 (new or top). In total the script collected ~14,000 comment ids between pushshift + praw. There was a few removed comments and 62 that didn't follow the typical format used and wern't parsed correctly (some of these were register/help messages by the bot).

I have no affiliation with, or ever used the bot (sodogetip), dev (just-a-dev), or the /r/dogecoin community. I was looking for something to make using PRAW and figured this might be an interesting project given how much news coverage the crypto coin has gotten in the past year.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup

In [2]:
errors = []
def parse_comment(comment):
    if comment['body_string'] == '[removed]':
        return 'n/a'
    
    soup = BeautifulSoup(comment['body_html'], 'lxml')
    links = [x['href'] for x in soup.find_all(href=True)]
    value = [x.get_text() for x in soup.find_all('strong')]
    
    try:
        data = {
            "comment_id": comment['comment_id'].lower(),
            "tipper": links[0].lower(),
            "reciever": links[1].lower(),
            "num_coins": value[1].replace('Ð', ''),
            "value_usd": value[3],
            'transaction_id': links[3]
        }

        return data
    except:
        errors.append(comment['comment_id'])
        
def is_tx_spent(tx_id):
    api_url = 'https://chain.so/api/v2/is_tx_spent/DOGE'
    r = requests.get(f'{api_url}/{tx_id}/0')
    
    if r.status_code == 200:
        return r.json()['data']['is_spent']
    
def is_tx_confirmed(tx_id):
    api_url = 'https://chain.so/api/v2/is_tx_confirmed/DOGE'
    r = requests.get(f'{api_url}/{tx_id}/')
    
    if r.status_code == 200:
        return r.json()['data']['is_confirmed']

In [3]:
df = pd.read_csv('./data/raw/comments.csv', parse_dates=['comment_datetime'])

# filter out removed comments
df = df[df['body_string'] != '[removed]']

# parse body_html for values & names
df['parsed'] = df.apply(parse_comment, axis=1)
parsed = df['parsed'].apply(pd.Series)

# merge parsed to the regular dataframe
df = df.merge(parsed, how='left', on='comment_id')
df = df.drop(columns=['parsed']).dropna(subset=['value_usd'], axis=0)

# clean transaction_id (remove web address)
df['transaction_id'] = df['transaction_id'].apply(lambda x: x.split('/')[-1])

# replace values
df['value_usd'] = df['value_usd'].str.replace('[$()]', '', regex=True)
df = df[~df['value_usd'].str.contains('[abcdefghijklmnopqrstuvwxyz]', regex=True)]
df[['value_usd', 'num_coins']] = df[['value_usd', 'num_coins']].astype(float)

In [4]:
# top tippers value
top_value = df.groupby('tipper')['value_usd'].agg(['sum', 'size']).sort_values(by='sum', ascending=False).head()
top_value.to_csv('./data/clean/top_tippers_usd.csv', index=False)

# top tippers coins
top_coin = df.groupby('tipper')['num_coins'].agg(['sum', 'size']).sort_values(by='sum', ascending=False).head()
top_coin.to_csv('./data/clean/top_tippers_coins.csv', index=False)

# top recievers value
top_value_rec = df.groupby('reciever')['value_usd'].agg(['sum', 'size']).sort_values(by='sum', ascending=False).head()
top_value_rec.to_csv('./data/clean/top_reciever_usd.csv', index=False)

# top reciever coins
top_coins_rec = df.groupby('tipper')['num_coins'].agg(['sum', 'size']).sort_values(by='sum', ascending=False).head()
top_coins_rec.to_csv('./data/clean/top_reciever_coins.csv', index=False)

# largest coin transactions
cols = ['submission_id', 'comment_id', 'comment_datetime', 'tipper', 'reciever', 'num_coins', 'value_usd']
single_most_coins = df[cols].sort_values(by='num_coins', ascending=False).head()
single_most_coins.to_csv('./data/clean/single_most_coins.csv', index=False)

# largest valued transactions
cols = ['submission_id', 'comment_id', 'comment_datetime', 'tipper', 'reciever', 'num_coins', 'value_usd']
single_most_values = df[cols].sort_values(by='value_usd', ascending=False).head()
single_most_values.to_csv('./data/clean/single_most_values.csv', index=False)