# Exploring Hacker News Posts

That is another guided project from the DataQuest Data Analyst career path. This time is though, is more independent and I was just given hints. The aim of this project is to practice my work with strings and dates and times. **I also extended the task and added my non-guided part**


## In this project
I am going to work with a dataset of submissions to popular technology site Hacker News. It is a site started by the startup incubator Y Combinator, where user-submitted stories (posts) receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundereds of thousands of visitors as a result. The dataset is reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that didn't receive any comments.

Below are descriptions of the columns:
**id**: the unique identifier from Hacker News for posts
**title**: the title of the post
**url**: the URL that the posts links to, if the post has a URL
**num_points**: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
**num_comments**: the number of comments on the post
**author**: the username of the person who submitted the post
**created_at**: the date ad time of the post's submission

Here I am goint to open and list the dataset and show the header and the first 4 rows of the dataset.

In [1]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hacker_news = list(read_file)
headers = hacker_news[0]
hacker_news = hacker_news[1:]
print(headers)
for row in hacker_news[:4]:
    print(row)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


We're specifically interested in posts with titles that begin with either **Ask HN** or **Show HN**. User submit **Ask HN** posts to ask the Hacker News community a specific question. Below are a few examples:
Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Any recent changes to CSS that broke mobile?

Likewise, users submit **Show HN** posts to show the Hacker News community a project, product, or just something interesting. Below are a few examples:
Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm

We'll compare these two types of posts to determine the following:
- Do **Ask HN** or **Show HN** receive more comments on average?
- Do posts created at a certain time receive more comments on average?

To find postrs that begin with either **Ask HN** or **Show HN** I am going to use the string method **startswith**. Given a string object we can check if it starts with a specific word or just a sequence of letters. The method is case sensitive so I'll use **lower** method to lowercase the title first.

In [2]:
ask_posts = []
show_posts = []
other_posts = []

for row in hacker_news:
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


Now we separated "ask posts" and "show posts" into two lists of lists named **ask_posts** and **show_posts** and the other posts to, well, **other_posts**. we can see that there are 1744 ""ask posts", 1162 "show posts" and 17194 other kind of posts.
Below are few examples of **ask_posts** list of lists:

In [3]:
for post in ask_posts[:4]:
    print(post)

['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']
['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']
['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']
['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20']


Below are few examples of the **show_posts** list of lists:

In [4]:
for post in show_posts[:4]:
    print(post)

['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']
['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']
['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']
['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11']


Now, I will determine whether ask posts or show posts receive more comments on average. I will create a **total_ask_comments** variable and set it with a value of 0, loop through **ask_posts** list and add the number of comments on a post to the variable. Then just divide the variable by the number of posts. I will apply the same scheme for the **show_posts** list.

In [5]:
total_ask_comments = 0
for post in ask_posts:
    num_comments = post[4]
    num_comments = int(num_comments)
    total_ask_comments += int(num_comments)
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print(round(avg_ask_comments,2))


14.04


In [6]:
total_show_comments = 0
for post in show_posts:
    num_comments = post[4]
    num_comments = int(num_comments)
    total_show_comments += int(num_comments)
    
avg_show_comments = total_show_comments/len(show_posts)
print(round(avg_show_comments,2))


10.32


We can see that, on average, ask posts receive more comments than show posts- 14.04 on ask posts and 10.32 on show posts.

We can assume it is because **Ask HN** post are reffered directly to other users and encourage them to start a discussion on given topic and therefore generate more comments.

Since ask posts are more likely to receive comments, I'll focus the remaining analysis just on these posts.
Next thing I am going to do is to determine if ask posts createed at a certain time are more likely to attract comments. I will use the following scheme to perform the analysis:
- Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
- Calculate the average number of comments ask posts receive by hour created
Working on the first step, I will use the **datetime** module to work with the data in the **created_at** column.

In [7]:
import datetime as dt

result_list = []

for post in ask_posts:
    creation_hour = post[6]
    num_comments = int(post[4])
    result_list.append([creation_hour,num_comments])

Below are few examples of **result_list** which stores date and time, and number of comments on a post:

In [8]:
print(result_list[:4])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3]]


Now, I am going to create 2 dictionaires that will store:
- the number of posts created by hour
- the number of comments sent by hour

In [9]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

For now the count_by_hour dictionary looks like this:

In [10]:
for value in sorted(counts_by_hour):
    print(value,counts_by_hour[value],sep=':')

00:55
01:60
02:58
03:54
04:47
05:46
06:44
07:34
08:48
09:45
10:59
11:58
12:73
13:85
14:107
15:116
16:108
17:100
18:109
19:110
20:80
21:109
22:71
23:68


In [11]:
avg_by_hour = []

for value in comments_by_hour:
    avg_by_hour.append([value, comments_by_hour[value]/counts_by_hour[value]])

Below I will show the **avg_by_hour** list of lists containing an hour and average of comments on a single post:

In [12]:
for row in avg_by_hour:
    print(row)

['09', 5.5777777777777775]
['13', 14.741176470588234]
['10', 13.440677966101696]
['14', 13.233644859813085]
['16', 16.796296296296298]
['23', 7.985294117647059]
['12', 9.41095890410959]
['17', 11.46]
['15', 38.5948275862069]
['21', 16.009174311926607]
['20', 21.525]
['02', 23.810344827586206]
['18', 13.20183486238532]
['03', 7.796296296296297]
['05', 10.08695652173913]
['19', 10.8]
['01', 11.383333333333333]
['22', 6.746478873239437]
['08', 10.25]
['04', 7.170212765957447]
['00', 8.127272727272727]
['06', 9.022727272727273]
['07', 7.852941176470588]
['11', 11.051724137931034]


We can see that the first column of the list is the hour, so I will swap the columns and assign them to an empty **swap_avg_by_hour** list, so then I can use **sorted** function to sort the list by the average number of comments.

In [13]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])

And the sorted results look like:

In [14]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask Posts Comments")

for avg,hr in sorted_swap[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average comments per post".format(hour,avg))

Top 5 Hours for Ask Posts Comments
15:00 38.59 average comments per post
02:00 23.81 average comments per post
20:00 21.52 average comments per post
16:00 16.80 average comments per post
21:00 16.01 average comments per post


As we can see, the top 5 hours to make an Ask HN post, in order to get the most comments on them, are: 15, 2, 20, 16 and 21 which is kinda surprising, especially the 2am. I think it may be caused due to timezone differences for example while there is 2am in the New York, there is 9pm in London where most users, at that time, may be from. But I do not have any data about users localisation to verify that, it is just my assumption.

# Non-guided part

In the non-guided part i would like to explore the dataset a little bit more and check few thesis:
- determine if show or ask postrs receive more points on average
- determine if posts created at a cerain time are more likely to receive more points
- compare the results to the average number of comments and points other posts receive

The same procedure I did with **total_ask_comments**, **avg_ask_comments**, **result_list**, **counts_by_hour**, **comments_by_hour**, **avg_by_hour** I will do now with ask and show posts' points values

I will start with ask posts' points.

In [15]:
total_ask_points = 0
for post in ask_posts:
    num_points = int(post[3])
    total_ask_points += num_points
print(total_ask_points)

26268


In [16]:
avg_ask_points = (total_ask_points/len(ask_posts))
print(avg_ask_points)

15.061926605504587


In [17]:
total_show_points = 0
for post in show_posts:
    num_points = int(post[3])
    total_show_points += num_points
print(total_show_points)

32019


In [18]:
avg_show_points = (total_show_points/len(show_posts))
print(avg_show_points)

27.555077452667813


In [19]:
ask_points_list = []

for post in ask_posts:
    num_points = int(post[3])
    creation_hour = post[6]
    ask_points_list.append([creation_hour,num_points])

Checking if everything is ok

In [20]:
print(ask_points_list[:10])

[['8/16/2016 9:55', 2], ['11/22/2015 13:43', 28], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 1], ['10/15/2015 16:38', 28], ['9/26/2015 23:23', 2], ['4/22/2016 12:24', 4], ['11/16/2015 9:22', 2], ['2/24/2016 17:57', 2], ['6/4/2016 17:17', 3]]


In [21]:
ask_points_by_hour = {}

for row in ask_points_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in ask_points_by_hour:
        ask_points_by_hour[hour] = row[1]
    else:
        ask_points_by_hour[hour] += row[1]
print(ask_points_by_hour)

{'09': 329, '13': 2062, '10': 1102, '14': 1282, '16': 2522, '23': 581, '12': 782, '17': 1941, '15': 3479, '21': 1721, '20': 1151, '02': 793, '18': 1741, '03': 374, '05': 552, '19': 1513, '01': 700, '22': 511, '08': 515, '04': 389, '00': 451, '06': 591, '07': 361, '11': 825}


Checking

In [22]:
for value in sorted(ask_points_by_hour):
    print(value,ask_points_by_hour[value],sep=':')

00:451
01:700
02:793
03:374
04:389
05:552
06:591
07:361
08:515
09:329
10:1102
11:825
12:782
13:2062
14:1282
15:3479
16:2522
17:1941
18:1741
19:1513
20:1151
21:1721
22:511
23:581


In [23]:
avg_ask_points_by_hour = []

for value in ask_points_by_hour:
    avg_ask_points_by_hour.append([value, ask_points_by_hour[value]/counts_by_hour[value]])
print(avg_ask_points_by_hour)

[['09', 7.311111111111111], ['13', 24.258823529411764], ['10', 18.677966101694917], ['14', 11.981308411214954], ['16', 23.35185185185185], ['23', 8.544117647058824], ['12', 10.712328767123287], ['17', 19.41], ['15', 29.99137931034483], ['21', 15.788990825688073], ['20', 14.3875], ['02', 13.672413793103448], ['18', 15.972477064220184], ['03', 6.925925925925926], ['05', 12.0], ['19', 13.754545454545454], ['01', 11.666666666666666], ['22', 7.197183098591549], ['08', 10.729166666666666], ['04', 8.27659574468085], ['00', 8.2], ['06', 13.431818181818182], ['07', 10.617647058823529], ['11', 14.224137931034482]]


In [24]:
swap_avg_ask_points_by_hour = []

for row in avg_ask_points_by_hour:
    swap_avg_ask_points_by_hour.append([row[1],row[0]])

print(swap_avg_ask_points_by_hour)

[[7.311111111111111, '09'], [24.258823529411764, '13'], [18.677966101694917, '10'], [11.981308411214954, '14'], [23.35185185185185, '16'], [8.544117647058824, '23'], [10.712328767123287, '12'], [19.41, '17'], [29.99137931034483, '15'], [15.788990825688073, '21'], [14.3875, '20'], [13.672413793103448, '02'], [15.972477064220184, '18'], [6.925925925925926, '03'], [12.0, '05'], [13.754545454545454, '19'], [11.666666666666666, '01'], [7.197183098591549, '22'], [10.729166666666666, '08'], [8.27659574468085, '04'], [8.2, '00'], [13.431818181818182, '06'], [10.617647058823529, '07'], [14.224137931034482, '11']]


Now I repeat the procedure for shows' posts points values

In [25]:
show_points_list = []

for post in show_posts:
    num_points = int(post[3])
    creation_hour = post[6]
    show_points_list.append([creation_hour,num_points])

In [55]:
print(show_points_list[:10])

[['11/25/2015 14:03', 26], ['11/29/2015 22:46', 747], ['4/28/2016 18:05', 1], ['7/28/2016 7:11', 3], ['1/9/2016 20:45', 1], ['3/7/2016 5:17', 3], ['11/20/2015 20:23', 4], ['3/27/2016 16:19', 8], ['9/26/2015 19:02', 6], ['8/9/2016 16:11', 2]]


In [27]:
show_points_by_hour = {}
show_counts_by_hour = {}
for row in show_points_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in show_points_by_hour:
        show_points_by_hour[hour] = row[1]
        show_counts_by_hour[hour] = 1
    else:
        show_points_by_hour[hour] += row[1]
        show_counts_by_hour[hour] += 1
print(show_points_by_hour)

{'14': 2187, '22': 1856, '18': 2215, '07': 494, '20': 1819, '05': 104, '16': 2634, '19': 1702, '15': 2228, '03': 679, '17': 2521, '06': 375, '02': 340, '13': 2438, '08': 519, '21': 866, '04': 386, '11': 1480, '12': 2543, '23': 1526, '09': 553, '01': 700, '10': 681, '00': 1173}


In [28]:
for value in sorted(show_points_by_hour):
    print(value,show_points_by_hour[value],sep=':')

00:1173
01:700
02:340
03:679
04:386
05:104
06:375
07:494
08:519
09:553
10:681
11:1480
12:2543
13:2438
14:2187
15:2228
16:2634
17:2521
18:2215
19:1702
20:1819
21:866
22:1856
23:1526


In [29]:
avg_show_points_by_hour = []

for value in show_points_by_hour:
    avg_show_points_by_hour.append([value, show_points_by_hour[value]/show_counts_by_hour[value]])
print(avg_ask_points_by_hour)

[['09', 7.311111111111111], ['13', 24.258823529411764], ['10', 18.677966101694917], ['14', 11.981308411214954], ['16', 23.35185185185185], ['23', 8.544117647058824], ['12', 10.712328767123287], ['17', 19.41], ['15', 29.99137931034483], ['21', 15.788990825688073], ['20', 14.3875], ['02', 13.672413793103448], ['18', 15.972477064220184], ['03', 6.925925925925926], ['05', 12.0], ['19', 13.754545454545454], ['01', 11.666666666666666], ['22', 7.197183098591549], ['08', 10.729166666666666], ['04', 8.27659574468085], ['00', 8.2], ['06', 13.431818181818182], ['07', 10.617647058823529], ['11', 14.224137931034482]]


In [30]:
swap_avg_show_points_by_hour = []

for row in avg_show_points_by_hour:
    swap_avg_show_points_by_hour.append([row[1],row[0]])

print(swap_avg_show_points_by_hour)

[[25.430232558139537, '14'], [40.34782608695652, '22'], [36.31147540983606, '18'], [19.0, '07'], [30.316666666666666, '20'], [5.473684210526316, '05'], [28.322580645161292, '16'], [30.945454545454545, '19'], [28.564102564102566, '15'], [25.14814814814815, '03'], [27.107526881720432, '17'], [23.4375, '06'], [11.333333333333334, '02'], [24.626262626262626, '13'], [15.264705882352942, '08'], [18.425531914893618, '21'], [14.846153846153847, '04'], [33.63636363636363, '11'], [41.68852459016394, '12'], [42.388888888888886, '23'], [18.433333333333334, '09'], [25.0, '01'], [18.916666666666668, '10'], [37.83870967741935, '00']]


In [31]:
sorted_swap_ask_points = sorted(swap_avg_ask_points_by_hour, reverse = True)
sorted_swap_show_points = sorted(swap_avg_show_points_by_hour, reverse = True)

In [32]:
print("Top 5 Hours for Ask Post Points")

for avg,hr in sorted_swap_ask_points[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average points per ask post".format(hour,avg))

print("\n\n")

print("Top 5 Hours for show Post Points")

for avg,hr in sorted_swap_show_points[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average points per ask post".format(hour,avg))

Top 5 Hours for Ask Post Points
15:00 29.99 average points per ask post
13:00 24.26 average points per ask post
16:00 23.35 average points per ask post
17:00 19.41 average points per ask post
10:00 18.68 average points per ask post



Top 5 Hours for show Post Points
23:00 42.39 average points per ask post
12:00 41.69 average points per ask post
22:00 40.35 average points per ask post
00:00 37.84 average points per ask post
18:00 36.31 average points per ask post


I will repeat the same procedure for other posts here:

In [33]:
total_other_comments = 0
for post in other_posts:
    num_comments = post[4]
    num_comments = int(num_comments)
    total_other_comments += int(num_comments)
    
avg_other_comments = total_other_comments/len(other_posts)
print(round(avg_other_comments,2))

26.87


In [34]:
other_comments_list = []

for post in other_posts:
    creation_hour = post[6]
    num_comments = int(post[4])
    other_comments_list.append([creation_hour,num_comments])


In [35]:
print(other_comments_list[:4])

[['8/4/2016 11:52', 52], ['1/26/2016 19:30', 10], ['6/23/2016 22:20', 1], ['6/17/2016 0:01', 1]]


In [36]:
other_counts_by_hour = {}
other_comments_by_hour = {}

In [37]:
for row in result_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in other_counts_by_hour:
        other_counts_by_hour[hour] = 1
        other_comments_by_hour[hour] = row[1]
    else:
        other_counts_by_hour[hour] += 1
        other_comments_by_hour[hour] += row[1]

In [38]:
for value in sorted(other_counts_by_hour):
    print(value,counts_by_hour[value],sep=':')

00:55
01:60
02:58
03:54
04:47
05:46
06:44
07:34
08:48
09:45
10:59
11:58
12:73
13:85
14:107
15:116
16:108
17:100
18:109
19:110
20:80
21:109
22:71
23:68


In [39]:
avg_other_comments_by_hour = []

for value in other_comments_by_hour:
    avg_other_comments_by_hour.append([value, other_comments_by_hour[value]/other_counts_by_hour[value]])

In [40]:
for row in avg_other_comments_by_hour:
    print(row)

['09', 5.5777777777777775]
['13', 14.741176470588234]
['10', 13.440677966101696]
['14', 13.233644859813085]
['16', 16.796296296296298]
['23', 7.985294117647059]
['12', 9.41095890410959]
['17', 11.46]
['15', 38.5948275862069]
['21', 16.009174311926607]
['20', 21.525]
['02', 23.810344827586206]
['18', 13.20183486238532]
['03', 7.796296296296297]
['05', 10.08695652173913]
['19', 10.8]
['01', 11.383333333333333]
['22', 6.746478873239437]
['08', 10.25]
['04', 7.170212765957447]
['00', 8.127272727272727]
['06', 9.022727272727273]
['07', 7.852941176470588]
['11', 11.051724137931034]


In [41]:
swap_avg_other_comments_by_hour = []

for row in avg_other_comments_by_hour:
    swap_avg_other_comments_by_hour.append([row[1],row[0]])

In [42]:
sorted_other_comments_swap = sorted(swap_avg_other_comments_by_hour, reverse = True)

In [43]:
print("Top 5 Hours for Other Posts Comments")

for avg,hr in sorted_swap[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average comments per post".format(hour,avg))

Top 5 Hours for Other Posts Comments
15:00 38.59 average comments per post
02:00 23.81 average comments per post
20:00 21.52 average comments per post
16:00 16.80 average comments per post
21:00 16.01 average comments per post


In [44]:
total_other_points = 0
for post in other_posts:
    num_points = int(post[3])
    total_other_points += num_points
print(total_other_points)

952664


In [45]:
avg_total_other_points = (total_other_points/len(other_posts))
print(avg_total_other_points)

55.4067698034198


In [46]:
other_points_list = []

for post in other_posts:
    num_points = int(post[3])
    creation_hour = post[6]
    other_points_list.append([creation_hour,num_points])

In [56]:
print(other_points_list[:10])

[['8/4/2016 11:52', 386], ['1/26/2016 19:30', 39], ['6/23/2016 22:20', 2], ['6/17/2016 0:01', 3], ['9/30/2015 4:12', 8], ['10/31/2015 9:48', 53], ['11/13/2015 0:45', 3], ['3/22/2016 16:18', 34], ['10/13/2015 9:30', 91], ['3/27/2016 18:08', 3]]


In [48]:
other_points_by_hour = {}
other_counts_by_hour = {}

for row in other_points_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    if hour not in other_points_by_hour:
        other_points_by_hour[hour] = row[1]
        other_counts_by_hour[hour] = 1
    else:
        other_points_by_hour[hour] += row[1]
        other_counts_by_hour[hour] += 1
print(other_points_by_hour)
print(other_counts_by_hour)

{'11': 37995, '19': 58811, '22': 38079, '00': 35718, '04': 22549, '09': 28802, '16': 59655, '18': 58459, '10': 35746, '12': 45287, '20': 41218, '03': 23167, '17': 67777, '14': 59191, '13': 57398, '01': 25303, '23': 35068, '08': 26830, '02': 25786, '21': 43149, '15': 62964, '06': 18864, '05': 19387, '07': 25461}
{'11': 660, '19': 980, '22': 758, '00': 611, '04': 454, '09': 534, '16': 1101, '18': 1084, '10': 591, '12': 789, '20': 911, '03': 407, '17': 1169, '14': 958, '13': 918, '01': 500, '23': 674, '08': 496, '02': 441, '21': 874, '15': 1040, '06': 408, '05': 388, '07': 448}


In [49]:
for value in sorted(other_points_by_hour):
    print(value,other_points_by_hour[value],sep=':')

00:35718
01:25303
02:25786
03:23167
04:22549
05:19387
06:18864
07:25461
08:26830
09:28802
10:35746
11:37995
12:45287
13:57398
14:59191
15:62964
16:59655
17:67777
18:58459
19:58811
20:41218
21:43149
22:38079
23:35068


In [50]:
avg_other_points_by_hour = []

for value in other_points_by_hour:
    avg_other_points_by_hour.append([value, other_points_by_hour[value]/other_counts_by_hour[value]])
print(avg_other_points_by_hour)

[['11', 57.56818181818182], ['19', 60.01122448979592], ['22', 50.236147757255935], ['00', 58.4582651391162], ['04', 49.66740088105727], ['09', 53.93632958801498], ['16', 54.182561307901906], ['18', 53.928966789667896], ['10', 60.4839255499154], ['12', 57.3979721166033], ['20', 45.24478594950604], ['03', 56.92137592137592], ['17', 57.97861420017109], ['14', 61.78601252609603], ['13', 62.525054466230934], ['01', 50.606], ['23', 52.02967359050445], ['08', 54.09274193548387], ['02', 58.471655328798185], ['21', 49.369565217391305], ['15', 60.542307692307695], ['06', 46.23529411764706], ['05', 49.96649484536083], ['07', 56.832589285714285]]


In [51]:
swap_avg_other_points_by_hour = []

for row in avg_other_points_by_hour:
    swap_avg_other_points_by_hour.append([row[1],row[0]])

print(swap_avg_other_points_by_hour)

[[57.56818181818182, '11'], [60.01122448979592, '19'], [50.236147757255935, '22'], [58.4582651391162, '00'], [49.66740088105727, '04'], [53.93632958801498, '09'], [54.182561307901906, '16'], [53.928966789667896, '18'], [60.4839255499154, '10'], [57.3979721166033, '12'], [45.24478594950604, '20'], [56.92137592137592, '03'], [57.97861420017109, '17'], [61.78601252609603, '14'], [62.525054466230934, '13'], [50.606, '01'], [52.02967359050445, '23'], [54.09274193548387, '08'], [58.471655328798185, '02'], [49.369565217391305, '21'], [60.542307692307695, '15'], [46.23529411764706, '06'], [49.96649484536083, '05'], [56.832589285714285, '07']]


In [52]:
sorted_swap_other_points = sorted(swap_avg_other_points_by_hour, reverse = True)

In [53]:
print("Top 5 Hours for Other Posts Comments")

for avg,hr in sorted_swap[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average comments per post".format(hour,avg))

print("\n\nTop 5 Hours for Other Post Points")

for avg,hr in sorted_swap_other_points[:5]: 
    hour = dt.datetime.strptime(hr,"%H")
    hour = dt.datetime.strftime(hour,"%H:%M")
    print("{} {:.2f} average points per other post".format(hour,avg))

Top 5 Hours for Other Posts Comments
15:00 38.59 average comments per post
02:00 23.81 average comments per post
20:00 21.52 average comments per post
16:00 16.80 average comments per post
21:00 16.01 average comments per post


Top 5 Hours for Other Post Points
13:00 62.53 average points per other post
14:00 61.79 average points per other post
15:00 60.54 average points per other post
10:00 60.48 average points per other post
19:00 60.01 average points per other post


In [54]:
print(total_ask_comments,total_show_comments,total_other_comments,len(ask_posts),len(show_posts),len(other_posts))

24483 11988 462055 1744 1162 17194


So,now, we can compare these 3 types of posts regarding the number of comments and points per a single post and the hours of when to post in order to receive the highest number of comments and posts

Let's start with comparing the numbers of points and posts in total for **Ask HN**, **Show HN** and **other** posts:

- total number of comments:
    - **Ask HN**: 24483
    - **Show HN**: 11988
    - **Other**: 462055
    
Now, let's move to the average number of comments:

- average number of comments:
    - **Ask HN**: 14.04
    - **Show HN**: 10.32
    - **Other**: 26.87

The hours of when to post in order to get the highest number of comments:


Top 5 Hours for Comments and their average values:
- **Ask HN**: 
    - 15:00 38.59 
    - 02:00 23.81 
    - 20:00 21.52 
    - 16:00 16.80 
    - 21:00 16.01 
- **Show HN**: 
    - 18:00 15.77 
    - 00:00 15.71 
    - 14:00 13.44 
    - 23:00 12.42 
    - 22:00 12.39 
- **Other**: 
    - 15:00 38.59 
    - 02:00 23.81 
    - 20:00 21.52 
    - 16:00 16.80 
    - 21:00 16.01

Top 5 Hours for Points and their average values:
- **Ask HN**: 
    - 15:00 29.99 
    - 13:00 24.26 
    - 16:00 23.35 
    - 17:00 19.41 
    - 10:00 18.68  
- **Show HN**: 
    - 23:00 42.39 
    - 12:00 41.69 
    - 22:00 40.35 
    - 00:00 37.84 
    - 18:00 36.31 
- **Other**: 
    - 13:00 62.53 
    - 14:00 61.79 
    - 15:00 60.54 
    - 10:00 60.48 
    - 19:00 60.01