# <center> Hacker News posts : Show vs. Ask<center>
# <center>Which gets the most traction? <center>

## <center>Introduction<center>

The purpose of this project is to take this adapted [dataset](https://www.kaggle.com/hacker-news/hacker-news-posts) from Hacker News, a startup from Y Combinator, and determine whether 'Ask HN' or 'Show HN' posts receive more comments on average. Then the final step will be to calculate at what times does the more frequent of those two receive the most comments on average.

## <center>Summary of Results<center>

'Ask HN' receives more posts, total comments and more comments on average per post than Show HN.

| HN Posts | Posts | Total Comments | Number of avg. comments per post |   |
|----------|-------|----------------|--------------------------|---|
| Ask HN   | 1,744 | 24,483         | 14.04                    |   |
| Show HN  | 1,162 | 11,988         | 10.32                    |   |
|          |       |                |                          |   |


    
Focusing on 'Ask HN', 
15:00 was the hour which records the highest number of average comments per post, 38.59. With a wide enough gap, it is fairly clear that the best time to ask something on HN is at 15:00.
    
| Hour of day | Number of average comments per post |
|-------------|-----------------------------|
| 15:00 (03:00 p.m.)      | 38.59                       |
| 02:00 (02:00 a.m.)      | 23.81                       |
| 20:00 (08:00 p.m.)       | 21.52                       |
| 16:00 (04:00 p.m.)       | 16.80                       |
| 21:00 (09:00 p.m.)       | 16.01                       |

### <center>Reading the data as a list<center>

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
header = hn[0]
hn = hn[1:]

In [15]:
print(header)
print('\n')

for row in hn[:5]:
    print(row)
    print('\n')

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']




### <center>Classifying the types of posts<center>

In [16]:
ask_posts = []

show_posts = []

other_posts = []

for row in hn:
    
    title = row[1]
    
    #make it lowercase to not lose search items due to case
    title = title.lower()
    
    if title.startswith('ask hn'):
        ask_posts.append(row)
    
    elif title.startswith('show hn'):
        show_posts.append(row)
    
    else:
        other_posts.append(row)
    

print('Number of Posts in ask_posts:',len(ask_posts))
print('\n')
print('Number of Posts in show_posts:',len(show_posts))
print('\n')
print('Number of Posts in other_posts:',len(other_posts))
print('\n')

Number of Posts in ask_posts: 1744


Number of Posts in show_posts: 1162


Number of Posts in other_posts: 17194




| Types of posts | Number of posts |
|----------------|-----------------|
| Ask posts      | 1,744           |
| Show posts     | 1,162           |
| Other posts    | 17,194          |


Ask posts definitely have more activity in them. A few rows will be observed to confirm efficacy and progress to the comments and their respective timeslots.


#### Let's see a few rows in ask_posts.

In [17]:
for row in ask_posts[:5]:
    
    print(row)
    print('\n')

['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']


['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']


['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']


['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20']


['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']




#### Let's see a few rows in show_posts.

In [18]:
for row in show_posts[:5]:
    
    print(row)
    print('\n')

['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']


['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']


['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']


['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11']


['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']




### <center>Show_HN vs. Ask_HN: Which gets the higher average number comments per post?<center>

In [19]:
total_ask_comments = 0 

for row in ask_posts:
    
    ask_comment = int(row[4])
    
    total_ask_comments += ask_comment

avg_ask_comments = total_ask_comments/len(ask_posts)

print('Total ask_comments:',total_ask_comments)
print('Average ask_comments:',avg_ask_comments)

Total ask_comments: 24483
Average ask_comments: 14.038417431192661


In [20]:
total_show_comments = 0 

for row in show_posts:
    
    show_comment = int(row[4])
    
    total_show_comments += show_comment

avg_show_comments = total_show_comments/len(show_posts)

print('Total show_comments:',total_show_comments)
print('Average show_comments:',avg_show_comments)

Total show_comments: 11988
Average show_comments: 10.31669535283993


#### Ask hn posts receive many more comments in total and average, potentially due to the nature of many people from different, professional backgrounds wanting to add their perspectives to solving a specific problem or doubt.


| Post type | Total number of comments | Average number of comments per post |
|-----------|--------------------------|-------------------------------------|
| Show_HN   | 11,988                   | 10.32                               |
| Ask HN    | 24,483                   | 14.03                               |


### <center>Parsing the dates<center>

In [21]:
import datetime as dt

result_list = []

for row in ask_posts: #Preparation for the parsing
    
    create_date = row[6]
    
    comment_num = int(row[4])
    
    result_list.append([create_date,comment_num]) 

counts_by_hour =  {} #Will house the total number of post by hour

comments_by_hour = {} #While this one will have the total sum of comments per hour

for row in result_list:
    
    create_date = row[0]
    
    parse_date = dt.datetime.strptime(create_date,'%m/%d/%Y %H:%M')
    
    hour = parse_date.strftime('%H')
    
    comment_num = int(row[1])
    
    if hour in counts_by_hour and comments_by_hour:
        
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment_num
        
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment_num



### <center>Finding the averages by hour<center>

In [22]:
#For the average number of comments per ask hn by hour

avg_by_hour = []

for hour in comments_by_hour:
    
    comment_number = int(comments_by_hour[hour])
    
    posts = int(counts_by_hour[hour])
    
    avg_by_hour.append([hour, (comment_number/posts)])

In [23]:
#For easier printing and sorting

swap_avg_by_hour = []

for hour, comments in avg_by_hour:
    
    swap_avg_by_hour.append([comments, hour])

### <center>Which hour gets the most Ask HN comments?<center>

In [24]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask Posts Comments:", '\n')

for row in sorted_swap[:5]:
    
    hour = row[1]
    
    hour = dt.datetime.strptime(hour,'%H')
    
    hour = hour.strftime('%H:%M')
    
    comments = float(row[0])
    
    result_string = '{0}: {1:.2f} average comments per post.'
    
    format_string = result_string.format(hour, comments)
    
    print(format_string)
    
    print('\n')

Top 5 Hours for Ask Posts Comments: 

15:00: 38.59 average comments per post.


02:00: 23.81 average comments per post.


20:00: 21.52 average comments per post.


16:00: 16.80 average comments per post.


21:00: 16.01 average comments per post.




### <center> Conclusion <center>

  
| Hour of day | Number of average comments per post |
|-------------|-----------------------------|
| 15:00 (03:00 p.m.)      | 38.59                       |
| 02:00 (02:00 a.m.)      | 23.81                       |
| 20:00 (08:00 p.m.)       | 21.52                       |
| 16:00 (04:00 p.m.)       | 16.80                       |
| 21:00 (09:00 p.m.)       | 16.01                       |

#### The main times that receive the most comments for ask_posts are summarized in this table. It seems that 3:00 p.m. is the best time to get many responses to a question, as the gap between the comments for that time and the others is large enough. This concludes this short study. Thank you for reading.
   
