# Exploring Hacker News Posts
In this project, we are analyzing `20,000` rows of Hacker News posts. 

We want to know whether 'Ask HN' posts or 'Show HN' posts get more comments on Hacker News. 

We are also analyzing whether posts created at a certain time get more comments on average than others. 

# Opening Our Data Set
We must open our csv file to access the data.

1. Import `reader` from `csv` by using `from csv import reader`.
2. Use `open('HN_posts.csv')` to open the file and save it to the variable `opened_csv`.
3. Use `reader(opened_csv)` to read the file and save it to the variable `read_csv`.
4. Use `list(read_csv)` to create a list of the data and save it to the variable `hn`.

We can combine steps 2-4 into one line of code instead of three as shown below:

In [1]:
from csv import reader

hn = list(reader(open('HN_posts.csv')))

# We print the first few rows of the data to analyze the columns
for row in hn[:5]:
    print(row)
    print('\n')

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']


['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']


['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']




# Removing the Header from Data Set
To analyze our data, we must first remove the `header` row containing the column information.

We then display the first few rows of the new data set to confirm that the `header` row was removed.

In [2]:
headers = hn[0]

hn = hn[1:]

print(headers)
print('\n')
print('END HEADER')
print('\n')

for row in hn[:3]:
    print(row)
    print('\n')

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


END HEADER


['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']


['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']




# Extracting 'Ask HN' and 'Show HN' Posts
Now that we have removed the `header` row, we can filter the data set to display posts beginning with either `Ask` or `Show`.

We will be using the `startswith` method to find these posts and sort them into separate lists.

We first loop through our data set:

1. We save the title at index 1 to the variable `title`
2. We then set `title` to lowercase using `title.lower()`
3. We then check, if `title` starts with either `ask hn`, `show hn`, or `neither`, and append them to their respective lists.
4. We then print the length of each list.

In [3]:
# We create three lists: ask_posts, show_posts, other_posts
ask_posts = []
show_posts = []
other_posts = []

# We loop through the data set to filter our posts
for row in hn:
    title = row[1]
    title = title.lower()
    
    # If title starts with 'ask hn', append to ask_posts
    if title.startswith('ask hn'):
        ask_posts.append(row)
    # elif title starts with 'show hn', append to show_posts
    elif title.startswith('show hn'):
        show_posts.append(row)
    # else append title to other_posts
    else:
        other_posts.append(row)
        
print('Total Ask HN Posts: ', len(ask_posts))
print('Total Show HN Posts: ', len(show_posts))
print('Total Other Posts: ', len(other_posts))

Total Ask HN Posts:  9139
Total Show HN Posts:  10158
Total Other Posts:  273822


# Calculating Average Comments for 'Ask HN' and 'Show HN' Posts
Now that we have our posts sorted by `Ask HN` and `Show HN`, we can calculate the average number of comments for each type of post.

In [4]:
# Total number of comments on either posts
total_ask_comments = 0
total_show_comments = 0

# Loop for ask_posts
for comments in ask_posts:
    num_comments = int(comments[4])
    total_ask_comments += num_comments
    
# Loop for show_posts
for comments in show_posts:
    num_comments = int(comments[4])
    total_show_comments += num_comments
    
# Calculate average comments for ask_posts
avg_ask_comments = total_ask_comments / len(ask_posts)

# Calculate average comments for show_posts
avg_show_comments = total_show_comments / len(show_posts)

# Print both averages
print('Average Number of Comments for ask_posts: ', round(avg_ask_comments, 2))
print('Average Number of Comments for show_posts: ', round(avg_show_comments, 2))

Average Number of Comments for ask_posts:  10.39
Average Number of Comments for show_posts:  4.89


The avereage number of comments for `ask_posts` is `10.39`.

The average number of comments for `show_posts` is `4.89`.

It can be inferred that posts starting with `Ask HN` have a higher average number of comments than posts starting with `Show HN`.