# Attracting comments to your Hacker News posts

The site Hackernews is a startup of incubator Y Combinator, and functions as a platform where used-submitted stories (known as posts) are voted and commented upon, somewhat similar to Reddit. <br> Hacker news is extremely popular in technology and startup circles, and posts that make it to the top of Hacker news' listing can get hundreds of thousands of visitors as a result. <br> To optimize visitor results for a post, it is important to know what type of post is most likely to receive most visitors.

## 1. Exploring the dataset

In [1]:
# Upload the Hackernews data
opened_file = open("HN_posts_year_to_Sep_26_2016.csv")
from csv import reader
read_file = reader (opened_file)

# Assign the data to the list hn
hn = list(read_file)

In [2]:
# Display the first five rows of the dataset

print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']]


In [4]:
# Display the header row of the dataset

headers = hn[:1]
print (headers)

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


We can see that there are several columns: <br>

- ID = The unique identifier from Hacker News for the post
- Title = The title of the post
- URL = The URL that the posts links to, if the post has one
- Num_points = The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
- Num_comments = The number of comments that were made on the post
- Author = The username of the person who submitted the post
- Created_at = The date and time at which the post was submitted

In [5]:
# Remove the header row from the dataset

hn = hn[1:]
print(hn[:5])

[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]


## 2. Posts types

In [146]:
# Create three empty lists: ask_post, show_post, and other_posts
ask_post = []
show_post = []
other_post = []

In [151]:
# Loop through each row in hn and assign the title of each data entry to title

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_post.append(row)
    elif title.lower().startswith('show hn'):
        show_post.append(row)
    else: 
        other_post.append(row)

In [152]:
# Check the number of posts in the ask_post, show_post and other_post lists
print(len(ask_post))
print(len(show_post))
print(len(other_post))

9139
10158
273822


In [169]:
print (ask_post[:5])

[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'], ['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']]


## 3. Number of comments per post type

In [154]:
# Loop through the ask_post list and assign the number of comments to the list

def avg_comments(data):
    total_comments = 0
    for row in data: 
        total_comments += int(row[4])
    return (round(total_comments/len(data),2))

In [157]:
print('The average number of ask comments is:', avg_comments(ask_posts))

The average number of ask comments is: 10.41


In [158]:
print('The average number of ask comments is:', avg_comments(show_posts))

The average number of ask comments is: 4.89


In [159]:
print('The average number of ask comments is:', avg_comments(other_posts))

The average number of ask comments is: 6.47


## 4. Number of comments by hour in the day

In [160]:
# Import the datetime module as dt
import datetime as dt

In [161]:
# Create an empty list and assign it to result_list
result_list = []

In [162]:
# Iterate over ask posts and append the date a post was created and its nr of comments to the result_list
for row in ask_posts:
    date = row[5]
    nr_comments = int (row[4])
    result_list.append([date, nr_comments])
    

In [104]:
# Create two empty dictionaries: counts_by_hour and comments_by_hour
counts_by_hour = {}
comments_by_hour = {}

In [168]:
# Loop through result_list and extract the hour from the date row
for row in result_list:
    extract_date = row[0]
    extract_comment = row[1]
    date_time = dt.datetime.strptime(extract_date, "%m/%d/%Y %H:%M")
    extract_hour = dt.datetime.strftime(date_method, "%H")
    
    if extract_hour not in counts_by_hour:
        counts_by_hour[extract_hour] = 1
        comments_by_hour[extract_hour] = comment
    else:
        counts_by_hour[extract_hour] += 1
        comments_by_hour[extract_hour] += comment

In [167]:
print("Number of comments on ask posts by hour:")
comments_by_hour

Number of comments on ask posts by hour:


{'05': 'Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local development?Ask HN: What TLD do you use for local deve

In [110]:
# Calculate the average number of comments per hour
avg_by_hour = []

for date_hour in comments_by_hour:
    avg_by_hour.append([date_hour, comments_by_hour[date_hour] / counts_by_hour[date_hour]])


TypeError: unsupported operand type(s) for /: 'str' and 'int'