Here, I will be consuming a few rows from a Kaggle Hacker News dataset and explore it further. 
[Link](https://www.kaggle.com/hacker-news/hacker-news-posts/downloads/hacker-news-posts.zip/1)

I will explore the distribution of posts by 'Ask HN' and 'Show HN' categories and do comments on these posts vary by the time of the day when they were created. 

In [39]:
import csv
fhand = open("HN_posts_year_to_Sep_26_2016.csv", "r", encoding ='utf8')
read = csv.reader(fhand)
hn = list(read)

print("Sample Data:\n", hn[:5])

Sample Data:
 [['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']]


In [40]:
headers = hn[0]
hn = hn[1:]
print("Headers:\n", headers)
print("Data:\n", hn[:5])

Headers:
 ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
Data:
 [['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]

Here I will separate posts by the categories - Ask HN posts, Show HN posts and Others.

In [41]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print("Post counts -")
print("Ask Posts -", len(ask_posts))
print("Show Posts -", len(show_posts))
print("Other Posts -", len(other_posts))
    

Post counts -
Ask Posts - 9139
Show Posts - 10158
Other Posts - 273822


In [42]:
ask_post_comments = 0
for row in ask_posts:
    comm_cnt = int(row[4])
    ask_post_comments += comm_cnt
avg_ask_post_comments = ask_post_comments/len(ask_posts)
print("Average number of comments on Ask Posts - ", round(avg_ask_post_comments,2))

show_post_comments = 0
for row in show_posts:
    comm_cnt = int(row[4])
    show_post_comments += comm_cnt
avg_show_post_comments = show_post_comments/len(show_posts)
print("Average number of comments on Show Posts - ", round(avg_show_post_comments,2))

Average number of comments on Ask Posts -  10.39
Average number of comments on Show Posts -  4.89


As the Ask HN posts seem much more common and frequently-commented on, I will explore these further by extracting the time of post creation and the number of comments they received. 

In [43]:
import datetime as dt
results_list = []

for row in ask_posts:
    create_dt = row[6]
    comm = int(row[4])
    results_list.append([create_dt, comm])

print(results_list[:5])

[['9/26/2016 2:53', 7], ['9/26/2016 1:17', 3], ['9/25/2016 22:57', 0], ['9/25/2016 22:48', 3], ['9/25/2016 21:50', 2]]


Going forward, I will extract the hour of post creation and calculate the average number of comments received by the posts in that hour. This will be done using two dictionaries with 'hour of the day' as the key column.

In [44]:
counts_by_hour = {}
comments_by_hour = {}

for row in results_list:
    comm = row[1]
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
    else:
        counts_by_hour[hour] = 1
        
    if hour in comments_by_hour:
        comments_by_hour[hour] += comm
    else:
        comments_by_hour[hour] = comm

print("Counts by hour - ", counts_by_hour)
print("Comments by hour - ", comments_by_hour)
        
    

Counts by hour -  {'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}
Comments by hour -  {'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


In [45]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, round(comments_by_hour[hour] / counts_by_hour[hour],2)])

print(avg_by_hour)

[['02', 11.14], ['01', 7.41], ['22', 8.8], ['21', 8.69], ['19', 7.16], ['17', 9.45], ['15', 28.68], ['14', 9.69], ['13', 16.32], ['11', 8.96], ['10', 10.68], ['09', 6.65], ['07', 7.01], ['03', 7.95], ['23', 6.7], ['20', 8.75], ['16', 7.71], ['08', 9.19], ['00', 7.56], ['18', 7.94], ['12', 12.38], ['04', 9.71], ['06', 6.78], ['05', 8.79]]


Here, I will sort the list by the average number of comments in descending order and then print the top 5 most 'active' hours of the day. 

In [46]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)

[[11.14, '02'], [7.41, '01'], [8.8, '22'], [8.69, '21'], [7.16, '19'], [9.45, '17'], [28.68, '15'], [9.69, '14'], [16.32, '13'], [8.96, '11'], [10.68, '10'], [6.65, '09'], [7.01, '07'], [7.95, '03'], [6.7, '23'], [8.75, '20'], [7.71, '16'], [9.19, '08'], [7.56, '00'], [7.94, '18'], [12.38, '12'], [9.71, '04'], [6.78, '06'], [8.79, '05']]


In [47]:
sorted_swap = sorted(swap_avg_by_hour,reverse = True)

In [48]:
print("Top 5 Hours for Ask Posts Comments :")

for row in sorted_swap[:5]:
    print(row[1],": ", row[0]," average comments per post")

Top 5 Hours for Ask Posts Comments :
15 :  28.68  average comments per post
13 :  16.32  average comments per post
12 :  12.38  average comments per post
02 :  11.14  average comments per post
10 :  10.68  average comments per post


By this analysis, we have the following conclusions:
1. Ask HN posts are more common, compared to Show HN posts 
2. Ask HN posts created in the afternoon (3 PM, 1 PM, 12 Noon) tend to gather greater number of comments on an average. 