# Exploring Hacker News Posts

In this project I worked with Hacker News Posts dataset. Dataset is about post which had been written on Hacker News site and I focused on post withe titles begin with "Ask HN" or "Show HN", because this posts are to Hacker News community. If you need read more about dataset follow this [link](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts). Below are desriptions of the columns:
* `id` -  the unique identifier from Hacker News for the post
* `title` - the title of the post
* `url` - the URL that the posts links to, if the post has a URL
* `num_points` - the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
* `num_coments` - the number of comments on the post
* `author` - the username of the person who submitted the post
* `creared_at` - the date and time of the post's submisson

## Importing the libraries and reading the dataset

In [1]:
from csv import reader
import datetime as dt

file = open('hacker_news.csv')
hn = reader(file) #hn = hacker news
hn = list(hn)

headers = hn[0]
hn = hn[1:]

print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


## Extracting Ask HN and Show HN Posts

Like I said in introduction I want to extract post which start with "Ask HN" or "Show HN". Code below helping me with this.

In [2]:
def extract_posts(dataset, index_of_title):
    ask_posts = []
    show_posts = []
    other_posts = []
    
    for row in dataset:
        title = row[index_of_title]
        title = title.lower()
        
        if title.startswith("ask hn"):
            ask_posts.append(row)
        elif title.startswith("show hn"):
            show_posts.append(row)
        else:
            other_posts.append(row)
            
    return ask_posts, show_posts, other_posts

In [3]:
ask_posts, show_posts, other_posts = extract_posts(hn, 1)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


As we see above in dataset is more others post (17194) than about ask or show something (appropriately 1744 and 1162).

## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [4]:
def average_of_comments(dataset, index_of_column):
    total_comments = 0
    
    for row in dataset:
        number_of_comments = int(row[index_of_column])
        total_comments += number_of_comments
    
    avg_comments = round(total_comments / len(dataset), 4)
    
    return avg_comments

In [5]:
avg_ask_comments = average_of_comments(ask_posts, 4)
print(f' Average of ask posts comments: {avg_ask_comments}')

avg_show_comments = average_of_comments(show_posts, 4)
print(f' Average of show posts comments: {avg_show_comments}')

 Average of ask posts comments: 14.0384
 Average of show posts comments: 10.3167


The average of comments tells us that ask posts are commented more often than show posts. It is on average 4 more comments.

## Ask Posts

### Finding the Number of Ask Posts and Comments by Hour Created

In [6]:
def number_of_posts_and_comments_by_hour(dataset, 
                                         index_of_date_col,
                                         index_of_comment_col):
    result_list = []
    
    for row in dataset:
        date = row[index_of_date_col]
        comments = int(row[index_of_comment_col])
        result_list.append([date, comments])
        
    counts_by_hour = {}
    comments_by_hour = {}
    
    for row in result_list:
        date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
        hour = date.strftime("%H")
        if hour not in counts_by_hour:
            counts_by_hour[hour] = 1
            comments_by_hour[hour] = row[1]
        else:
            counts_by_hour[hour] += 1
            comments_by_hour[hour] += row[1]
            
    return comments_by_hour, counts_by_hour

In [7]:
comments_by_hour, counts_by_hour = number_of_posts_and_comments_by_hour(ask_posts, 6, 4)

### Calculating the Average Number of Comments for Ask HN Posts by Hour

In [8]:
def average_in_lists(comment_dict, count_dict):
    result = []
    
    for hour in count_dict:
        result.append([round(comment_dict[hour] / count_dict[hour], 4), hour])
        
    return result

In [9]:
avg_by_hour = average_in_lists(comments_by_hour, counts_by_hour)

### Sorting and Printing Values from a List of Lists

In [10]:
def top5(list_of_avg):
    
    sorted_results = sorted(list_of_avg, reverse = True)

    print("Top 5 Hours for Ask Posts Comments:")

    counter = 1

    for hour in sorted_results[:5]:
        date = dt.datetime.strptime(hour[1], "%H").strftime("%H:%M")
        avg = hour[0]
        print(f'{counter}. {date} - {avg} average comments per post')
        counter += 1

In [11]:
top5(avg_by_hour)

Top 5 Hours for Ask Posts Comments:
1. 15:00 - 38.5948 average comments per post
2. 02:00 - 23.8103 average comments per post
3. 20:00 - 21.525 average comments per post
4. 16:00 - 16.7963 average comments per post
5. 21:00 - 16.0092 average comments per post


As we see above 15:00 (3:00 pm) hour is the best commented hour in whole time (38.5948). Second is 2:00 (2:00 am) and very near on the third place is 20:00 (8:00 pm). 4th and 5th place is almost the same and it is 16:00 (4:00 pm) and 21:00 (9:00 pm).

## Show posts

### Finding the Number of Ask Posts and Comments by Hour Created

In [12]:
comments_by_hour, counts_by_hour = number_of_posts_and_comments_by_hour(show_posts, 6, 4)

### Calculating the Average Number of Comments for Ask HN Posts by Hour

In [13]:
avg_by_hour = average_in_lists(comments_by_hour, counts_by_hour)

### Sorting and Printing Values from a List of Lists

In [15]:
top5(avg_by_hour)

Top 5 Hours for Ask Posts Comments:
1. 18:00 - 15.7705 average comments per post
2. 00:00 - 15.7097 average comments per post
3. 14:00 - 13.4419 average comments per post
4. 23:00 - 12.4167 average comments per post
5. 22:00 - 12.3913 average comments per post


We can't unequivocally stated which hour are the best to write post about show something but the best time to wrtite about this is definitely evening.

## Conclusion

To conclusion, the most commented posts are ask posts. They have average 4 more comments than show posts. If we look deeper, we can state that the best time to write post about asking something is afternoon (15:00 and 16:00) and evening (20:00 and 21:00). If you want ask about something after midnight you should choose 2:00 hour.