# Mapping controversies tutorial x: harvest submissions and comments from Subreddits


## Step 1: Installing the right libraries 
Libraries for Jupyter can be understood as preprogrammed script parts. This means, that instead of writing a lot of lines of code in order e.g. make contact to Wikipedia, you can do it in one command.  

In order to run the installation, click on the cell below and press "Run" in the menu. 


In [None]:
# In this cell Jupyter checks whether you have the right libraries installed to carry out the harvest of data from Wikipedia

try: #First, Jupyter tries to import a library
    import praw
    print("praw library has been imported")
except: #If it fails, it will try to install the library
    print("praw library not found. Installing...")
    !pip install praw
    try:#... and try to import it again
        import praw
    except: #unless it fails, and raises an error.
        print("Something went wrong in the installation of the praw library. Please check your internet connection and consult output from the installation below")
try:
    import pandas
    print("pandas api library has been imported")
except:
    print("pandas api library not found. Installing...")
    !pip install pandas
    
    try:
        import pandas
    except:
        print("Something went wrong in the installation of the pandas api library. Please check your internet connection and consult output from the installation below")

try:
    import psaw
    print("psaw api library has been imported")
except:
    print("psaw api library not found. Installing...")
    !pip install psaw
    
    try:
        import psaw
    except:
        print("Something went wrong in the installation of the psaw api library. Please check your internet connection and consult output from the installation below")
        

## Step 2: Generate Reddit app

The first step is to create an app for reddit. This is done in order to get access to the API. You can do so by following the first step of [this tutorial](http://www.storybench.org/how-to-scrape-reddit-with-python/). 

### When you have the _14-characters personal use script_ and _27-character secret key_  run the cell below and input the information.

<img src="https://res.cloudinary.com/dra3btd6p/image/upload/v1550562228/Mapping%20controversies%202019/reddit_app.jpg" title="Category:circumcision" style="width: 900px;" /> 


### You need to run the cell below and input the information before you go to step 3

In [None]:
##### RUN THIS CELL FIRST!


print("Enter the 27 character secret key from the app page: ")
secret=input()

print("Enter the 14 character personal use script key from the app page: ")
pus=input()

print("Enter your app-name :")
app_name=input()

print("Enter your reddit user name: ")
user_name=input()

print("Enter your reddit password: ")
pw=input()

Enter the 27 character secret key from the app page: 


## Step 3: Harvest the data from Reddit

The Reddit API does not allow for a complete harvest of all data. You are allowed to harvest a maximum of 1000 submissions from a subreddit (e.g. www.reddit.com/r/askreddit), but from those 1000 submissions you can harvest all comments and replies. The 1000 submissions can be selected based on four different measures:

- Hottest
- Controversial
- Newest
- Top
- Date range

(https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html?highlight=submissions#praw.models.Subreddit.submissions)

The script outputs entries (submissions and comments) in a csv file (one entry per line). You can then use Table2Net or other tools to make sense of the data. Comments will refer to submissions in the row "parent_id". 



In [6]:
import pandas as pd
import datetime as dt
import praw
import psaw
import datetime as dt
import csv


    
reddit = praw.Reddit(client_id=pus, \
                     client_secret=secret, \
                     user_agent=app_name, \
                     username=user_name, \
                     password=pw)
api=psaw.PushshiftAPI(reddit)

print("What would you like to call the file?")
input_filename=input()
measure="0"

csv_headers=["Type","id","author","body","created","up_votes","down_votes","likes","depth", "parent_id", "url", "reports", "subreddit","submission_id"]
com_count=0
sub_count=0
csv_headers_2=["submission_id", "submission_url", "users", "subreddit"]


blacklisted_users=[]
print("Enter user names you want to blacklist. If you want to blacklist multiple users, use comma separation")
raw_blacklist=input()
raw_blacklist=""
if "," in raw_blacklist:
    for each in raw_blacklist.split(","):
        blacklisted_users.append(each.strip().lower())
else: 
    blacklisted_users.append(raw_blacklist.strip().lower())

print("Would you like to enter the submission id's manually (y/n)? Max 1000 submissions. Use this feature if you have manually identified submissions to focus on.")
manual_sub=input()
if manual_sub.lower()=="y":
    submissions=[]
    print("Enter the id of the submission(s). If more, separate by comma.")
    manual_list=input()
    for each in manual_list.split(","):
        each=each.replace(" ","")
        submissions.append(each)
    print("Harvesting data... This might take a while...")

    if ".csv" in input_filename:
        filename="submissions_"+input_filename
        filename_2="user_aggregates_submissions_"+input_filename
    else:
        filename="submissions_"+input_filename+".csv"
        filename_2="user_aggregates_submissions_"+input_filename+".csv"

    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
    for each in submissions:
        sub=reddit.submission(str(each))
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')
        subreddit_name=str(sub.subreddit)
        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_users=[]
        sub_users.append(str(sub_author))
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports,subreddit_name, sub_id]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_parent_id=comment_parent_id.split("_")[1]
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')
            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum "+str(max_response_cache)+". Continuing harvest...")

else:
    
    print("Enter the name of the subreddit you would like to harvest: (e.g. askreddit)")
    subreddit_name=input()
    print("How many submissions would you like to harvest? (Be carefull! It will take time to harvest 1000 submissions with a lot of comments!)")
    sub_limit=input()
    sub_limit=int(sub_limit)
    print("What type of measure would you like to use to define how the submissions are selected?")
    max_response_cache=sub_limit
    print("Press '1' for the "+str(sub_limit)+" 'hottest' submissions")
    print("Press '2' for the "+str(sub_limit)+" most 'controversial' submissions (be aware of what counts as controversial for reddit might not be in line with our definition)")
    print("Press '3' for the "+str(sub_limit)+" 'newest' submissions")
    print("Press '4' for the "+str(sub_limit)+" 'top' submissions")
    print("Press '5' to set a start date")
    measure=input()
    if measure=="5":
        print("Set the start date: (Format: yyyy-mm-dd. The script will harvest the first "+str(sub_limit)+ " submissions made from the start date )")
        start_date=input()
        start_year=int(start_date.split("-")[0])
        start_month=int(start_date.split("-")[1])
        start_day=int(start_date.split("-")[2])
        start_epoch=int(dt.datetime(start_year, start_month, start_day).timestamp())

        submissions=list(api.search_submissions(after=start_epoch,
                            subreddit=subreddit_name,
                            filter=['url','author', 'title', 'subreddit'],
                            limit=sub_limit))

    print("Harvesting data... This might take a while...")

if measure=="1":
    subreddit = reddit.subreddit(subreddit_name)

    if ".csv" in input_filename:
        filename="hot_submissions_"+input_filename
        filename_2="user_aggregates_hot_submissions_"+input_filename

    else:
        filename="hot_submissions_"+input_filename+".csv"
        filename_2="user_aggregates_hot_submissions_"+input_filename+".csv"
    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
    for sub in subreddit.hot(limit=sub_limit):
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')
        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        sub_users=[]
        sub_users.append(str(sub_author))
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports, subreddit_name,sub_id]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_parent_id=comment_parent_id.split("_")[1]
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')

            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1  
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum "+str(max_response_cache)+". Continuing harvest...")

if measure=="2":
    subreddit = reddit.subreddit(subreddit_name)

    if ".csv" in input_filename:
        filename="controversial_submissions_"+input_filename
        filename_2="user_aggregates_controversial_submissions_"+input_filename
    else:
        filename="controversial_submissions_"+input_filename+".csv"
        filename_2="user_aggregates_controversial_submissions_"+input_filename+".csv"

    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
    for sub in subreddit.controversial(limit=sub_limit):
        sub_comments=sub.num_comments
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')

        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        sub_users=[]
        sub_users.append(str(sub_author))
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports,subreddit_name, sub_id]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')
            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1    
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum "+str(max_response_cache)+". Continuing harvest...")

if measure=="3":
    subreddit = reddit.subreddit(subreddit_name)

    if ".csv" in input_filename:
        filename="newest_submissions_"+input_filename
        filename_2="user_aggregates_newest_submissions_"+input_filename
    else:
        filename="newest_submissions_"+input_filename+".csv"
        filename_2="user_aggregates_newest_submissions_"+input_filename+".csv"
    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
    for sub in subreddit.new(limit=sub_limit):
        sub_comments=sub.num_comments
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')

        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        sub_users=[]
        sub_users.append(str(sub_author))
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports,subreddit_name, sub_id]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')
            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum "+str(max_response_cache)+". Continuing harvest...")

if measure=="4":
    subreddit = reddit.subreddit(subreddit_name)

    if ".csv" in input_filename:
        filename="top_submissions_"+input_filename
        filename_2="user_aggregates_top_submissions_"+input_filename
    else:
        filename="top_submissions_"+input_filename+".csv"
        filename_2="user_aggregates_top_submissions_"+input_filename+".csv"
    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
    for sub in subreddit.top(limit=sub_limit):
        sub_comments=sub.num_comments
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')

        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        sub_users=[]
        sub_users.append(str(sub_author))
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports,subreddit_name, sub_id]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')
            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum "+str(max_response_cache)+". Continuing harvest...")

if measure=="5":
    subreddit = reddit.subreddit(subreddit_name)
    if ".csv" in input_filename:
        filename="submissions_"+start_date+"_"+input_filename
        filename_2="user_aggregates_submissions_"+start_date+"_"+input_filename
    else:
        filename="submissions_"+start_date+"_"+input_filename+".csv"
        filename_2="user_aggregates_submissions_"+start_date+"_"+input_filename+".csv"
    with open(filename,"w", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_headers)
    with open(filename_2,"w", newline='',encoding='utf-8') as q:
        wr2 = csv.writer(q, delimiter=",")
        wr2.writerow(csv_headers_2)
        
    for each in submissions:
        sub=reddit.submission(str(each))
        sub_author=sub.author
        sub_downs=sub.downs
        sub_ups=sub.ups
        sub_likes=sub.likes
        sub_title=sub.title
        sub_created=sub.created
        sub_created=dt.datetime.utcfromtimestamp(sub_created).strftime('%Y-%m-%d %H:%M:%S')
        subreddit_name=str(sub.subreddit)
        sub_id=sub.id
        sub_reports=sub.num_reports
        sub_text=sub.selftext
        sub_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id
        sub.comments.replace_more(limit=None)
        sub_list=sub.comments.list()
        sub_count=sub_count+1
        sub_users=[]
        sub_users.append(str(sub_author))
        csv_list=["Submission",sub_id,sub_author, sub_text,sub_created,sub_ups,sub_downs, sub_likes, "N/A", subreddit_name, sub_url, sub_reports,subreddit_name, "N/A"]
        with open(filename,"a", newline='',encoding='utf-8') as f:
            wr = csv.writer(f, delimiter=",")
            wr.writerow(csv_list)
        for comment in sub_list:
            comment_depth=comment.depth
            comment_parent_id=comment.parent_id
            comment_parent_id=comment_parent_id.split("_")[1]
            comment_reports=comment.num_reports
            comment_author=str(comment.author)
            if comment_author.lower() in blacklisted_users:
                continue
            comment_body=comment.body
            comment_created=comment.created_utc
            comment_created=dt.datetime.utcfromtimestamp(comment_created).strftime('%Y-%m-%d %H:%M:%S')
            comment_downs=comment.downs
            comment_ups=comment.ups
            comment_likes=comment.likes
            comment_id=comment.id
            sub_users.append(str(comment_author))
            comment_url="https://www.reddit.com/r/"+subreddit_name+"/comments/"+sub_id+"/"+sub_title+"/"+comment_id
            csv_list=["comment",comment_id, comment_author, comment_body,comment_created, comment_ups,comment_downs,comment_likes,comment_depth,comment_parent_id,comment_url,comment_reports,subreddit_name, sub_id]
            with open(filename,"a", newline='',encoding='utf-8') as f:
                wr = csv.writer(f, delimiter=",")
                wr.writerow(csv_list)
            com_count=com_count+1
        
        user_entry=""
        for user in sub_users:
            if not sub_users.index(user)==len(sub_users)-1:
                if user:
                    user_entry=user_entry+user+";"
            else:
                if user:
                    user_entry=user_entry+user
        csv_list_2=[str(sub_id), str(sub_url), user_entry,str(subreddit_name)]
        with open(filename_2,"a", newline='',encoding='utf-8') as q:
            wr2 = csv.writer(q, delimiter=",")
            wr2.writerow(csv_list_2)
    
        if sub_count % 50 == 0:
            print("Data harvested from "+str(sub_count)+" submissions out of maximum 1000. Continuing harvest...")

print("The script is done! We have harvested "+str(sub_count)+" submissions with "+ str(com_count)+" comments in total.")


What would you like to call the file?

Enter user names you want to blacklist. If you want to blacklist multiple users, use comma separation

Would you like to enter the submission id's manually (y/n)? Max 1000 submissions. Use this feature if you have manually identified submissions to focus on.
n
Enter the name of the subreddit you would like to harvest: (e.g. askreddit)
geek
How many submissions would you like to harvest? (Be carefull! It will take time to harvest 1000 submissions with a lot of comments!)
10
What type of measure would you like to use to define how the submissions are selected?
Press '1' for the 10 'hottest' submissions
Press '2' for the 10 most 'controversial' submissions (be aware of what counts as controversial for reddit might not be in line with our definition)
Press '3' for the 10 'newest' submissions
Press '4' for the 10 'top' submissions
Press '5' to set a start date
2
Harvesting data... This might take a while...
The script is done! We have harvested 10 subm