# Hacker News Project Overview
Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

**For this projet, we're interested in figuring out what type of posts (`Ask HN`, `Show HN`) to create in order to generate the most comments, pushing our posts to the top of the stack**
    
`Ask HN` posts pose specific question to the Hacker News community. Below are a couple examples:
```
Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Aby recent changes to CSS that broke mobile?
```
Likewise, users submit `Show HN` posts to show the Hacker News community a project, product, or something interesting. Below are a couple of examples:

```
Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm
```

## Summary of Results
After analyzing the data, we concluded that users should post **`Ask Posts`** during the **3 o'clock afternoon hour**. This proves to produce the most comments per post and could lead to users increasing post visibility on Hacker News. The top five times to post follow:

|Rank|Hour to Post|Avg Comments per Post|
|----|------------|---------------------|
|1   |3pm         |38.59                |
|2   |2am         |23.81                |
|3   |8pm         |21.52                |
|4   |4pm         |16.80                |
|5   |9pm         |16.01                |

For more details, please refer to the the full analysis below.

# Environment Setup

## Loading Dependencies

In [1]:
import os
import git
import datetime as dt


from csv import reader
from pathlib import Path

## Importing our Hacker News Data
The data is publicly available on [Kaggle](https://www.kaggle.com/hacker-news/hacker-news-posts), but we've already downloaded it and added it to our repository. Below, we'll do a quick exploration of the *hacker_news.csv* file stored in the `data` directory at the root of our repository.

In [2]:
# Read in the data
repo_root = Path(git.Repo(os.getcwd(), search_parent_directories=True).git.rev_parse("--show-toplevel"))
file_name = 'hacker_news.csv'
file_path = f'{repo_root}/data/{file_name}'
headers = list(reader(open(file_path)))[0]
hn = list(reader(open(file_path)))[1:]

# Quick exploration of the data
print(headers)
print()
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


# Filtering by Article Title
Since we're only concerned with post titles beginning with `Ask HN` or `Show HN`, we'll create new lists of lists containing the data for those titles.

In [3]:
def title_starts_with(article_list, words):
    """Takes a list of articles and a string of words and 
        returns a list of articles whose title starts with those words
    
    Keyword arguments:
    article_list -- list of articles
    words -- string of words the title should start with (i.e 'apple pie')
    """
    word_cases = (words, words.lower(), words.upper(), words.title())
    return [row for row in article_list if row[1].startswith(word_cases)]


def title_does_not_start_with(article_list, words):
    """Takes a list of articles and a list of words and 
        returns a list of articles whose title does not start with those words
    
    Keyword arguments:
    article_list -- list of articles
    words -- a list of words the title should not start with (i.e ['apple', 'pie', 'tarti'])
    """    
    word_cases = ()
    for word in words:
        word_cases += (word, word.lower(), word.upper(), word.title())
    return [row for row in article_list if not row[1].startswith(word_cases)]

In [4]:
def get_posts_by_title(article_list, words, starts_with=True):
    """Takes a list of articles, a string or list of words, and a flag that
        returns a list of articles whose title does or does not start with those words
    
    Keyword arguments:
    article_list -- list of articles
    words -- string of words the title should start with (i.e 'apple pie')
    starts_with -- a flag to indicate whether to return posts that start, doesn't start, with those words
    """
    return title_starts_with(article_list, words) if starts_with else title_does_not_start_with(article_list, words)

In [5]:
# Storing our list of articles by post type
posts = {
    'ask_posts': get_posts_by_title(hn, 'Ask HN'),
    'show_posts': get_posts_by_title(hn, 'Show HN'),
    'other_posts': get_posts_by_title(hn, ['Ask HN', 'Show HN'], starts_with=False)
}


# Quick exploration of each Post type
print("ask_posts posts: {:,}".format(len(posts['ask_posts'])))
print("show_posts posts: {:,}".format(len(posts['show_posts'])))
print("other_posts posts: {:,}".format(len(posts['other_posts'])))

ask_posts posts: 1,744
show_posts posts: 1,162
other_posts posts: 17,194


To assure us that we split out the titles correctly, we can compare the lengths of **hn** to the data within **posts**. If the two are equal, than we can proceed with the understanding that we filtered correctly. If they aren't equal, than the code below will throw an error and we conclude that something went wrong during our filtering process!

In [6]:
# Quick sanity check
total_articles = len(hn)
total_articles_from_posts = len(posts['ask_posts']) + len(posts['show_posts']) + len(posts['other_posts'])
assert total_articles == total_articles_from_posts

# Our Questions
Now that we have the data in the format to get started, let's go ahead and answer our questions.

## Which Posts Type Receive More Comments (AVG)?

In [7]:
def total_comments(post_type):
    """Takes a list of articles
        returns the total comments
    
    Keyword arguments:
    post_type -- list of articles by post type
    """
    total_ask_comments = 0
    for row in post_type:
        total_ask_comments += int(row[4])
    return total_ask_comments


def avg_comments(total_comments, post_type):
    """Takes the total comments and list of articles
        returns the average comments per post
    
    Keyword arguments:
    total_comments -- total comments by post type
    post_type -- list of articles by post type
    """    
    return round(total_comments / len(post_type))

In [8]:
# Calculate stats for ask posts
total_ask_comments = total_comments(posts['ask_posts'])
avg_ask_comments = avg_comments(total_ask_comments, posts['ask_posts'])


print("Total Comments (Ask Posts): {:,}".format(total_ask_comments))
print("Average Comments (per Ask Post): {:,}".format(avg_ask_comments))

Total Comments (Ask Posts): 24,483
Average Comments (per Ask Post): 14


In [9]:
# Calculate stats for show posts
total_show_comments = total_comments(posts['show_posts'])
avg_show_comments = avg_comments(total_show_comments, posts['show_posts'])


print("Total Comments (Show Posts): {:,}".format(total_show_comments))
print("Average Comments (per Show Post): {:,}".format(avg_show_comments))

Total Comments (Show Posts): 11,988
Average Comments (per Show Post): 10


### Conclusion
As we can see from the above, `Ask Posts` receive more posts on average than `Show Posts`. For this reason, we'll continue exclusively focusing on `Ask Posts`. Onto the next question!

## Do posts created at a certain hour receive more comments?
Now let's determine if `Ask Posts` created at certain times influence the number of comments posted.

In [10]:
def append_hour_to_posts(article_list):
    """Takes a list of articles
        appends the hour and returns the list
    
    Keyword arguments:
    article_list -- list of articles
    """
    for post in article_list:
        post[6] = dt.datetime.strptime(post[6], "%m/%d/%Y %H:%M")
        post.append(post[6].hour)
    return article_list

In [11]:
def get_post_by_hour(article_list):
    """Takes a list of articles
        returns a dictionary of hours:posts per hour
    
    Keyword arguments:
    article_list -- list of articles
    """    
    posts_by = {}
    for post in article_list:
        posts_by[post[7]] = 1 if post[7] not in posts_by else posts_by[post[7]] + 1
    return posts_by


def get_posts_by_comments(article_list):
    """Takes a list of articles
        returns a dictionary of hours:comments per hour
    
    Keyword arguments:
    article_list -- list of articles
    """    
    posts_by = {}
    for post in article_list:
        posts_by[post[7]] = int(post[4]) if post[7] not in posts_by else posts_by[post[7]] + int(post[4])
    return posts_by


def get_posts_by(article_list, get_posts_by='hour'):
    """Takes a list of articles, and a get_posts_by string
        returns a dictionary of hours:posts per hour or comments per hour
    
    Keyword arguments:
    article_list -- list of articles
    get_posts_by -- a flag to indicate whether to return posts per hour or comments per hour
    """    
    if get_posts_by == 'hour':
        posts_by = get_post_by_hour(article_list)
    elif get_posts_by == 'comments':
        posts_by = get_posts_by_comments(article_list)
    else:
        print("You can either get posts by 'hour' or by 'comments'")
    return posts_by

In [12]:
# Appending the hour of post creation to the ask posts list
posts['ask_posts'] = append_hour_to_posts(posts['ask_posts'])

# Storing our posts by hour and comments in a dictionary
posts_by = {
    'hour': get_posts_by(posts['ask_posts']),
    'comments': get_posts_by(posts['ask_posts'], 'comments'),
}

In [13]:
def get_avg_comment_per_post_by_hour(posts_by_comments, posts_by_hour):
    """Takes a list of articles, and a get_posts_by string
        returns a dictionary of hours:posts per hour or comments per hour
    
    Keyword arguments:
    posts_by_comments -- list of articles
    posts_by_hour -- a flag to indicate whether to return posts per hour or comments per hour
    """    
    avg_comments_per_post_by_hour = []
    for hour, comment_count in posts_by_comments.items():
        avg_comments_per_post_by_hour.append(
            [hour, comment_count / posts_by_hour[hour]]
        )
    return sorted(avg_comments_per_post_by_hour, key=lambda x: x[1], reverse=True)

In [14]:
# Getting the average comments per post per hour
avg_comment_per_post_by_hour = get_avg_comment_per_post_by_hour(
    posts_by['comments'], posts_by['hour']
)

# Getting the top 5 times to post
top_five_avg_comment_per_post_by_hour = avg_comment_per_post_by_hour[:5]

### Conclusion

In [15]:
# Printing our conclusion
print("Top 5 Best Times to Post on Hacker News:\n")
for hour, avg_com in top_five_avg_comment_per_post_by_hour:
    print("Time\n{h}:00 ({a:,.2f} comments per post)\n".format(h=hour,a=avg_com))

Top 5 Best Times to Post on Hacker News:

Time
15:00 (38.59 comments per post)

Time
2:00 (23.81 comments per post)

Time
20:00 (21.52 comments per post)

Time
16:00 (16.80 comments per post)

Time
21:00 (16.01 comments per post)



# Future Consideration
Here are some next steps for you to consider:

- Determine if `show` or `ask` posts receive more points on average.

- Determine if `posts` created at a certain time are more likely to receive more points.

- Compare your results to the average number of comments and points other `posts` receive.