![Y](https://s3.amazonaws.com/dq-content/354/hacker_news.jpg)

#  Exploring Hacker News Posts

## 1. Introduction

**Hacker News** (sometimes abbreviated as HN) is a social news website https://news.ycombinator.com/ focusing on computer science and entrepreneurship. 

It is run by Paul Graham's investment fund and startup incubator, Y Combinator. In general, content that can be submitted is defined as "anything that gratifies one's intellectual curiosity."

The word hacker in "Hacker News" is used in its original meaning and refers to the hacker culture which consists of people who enjoy tinkering with technology.

The intention was to recreate a community similar to the early days of Reddit. 

However, unlike Reddit where new users can immediately both upvote and downvote content, Hacker News does not allow users to downvote content until they have accumulated 501 "karma" points...

https://en.wikipedia.org/wiki/Hacker_News

We're specifically interested in posts with titles that begin with either `Ask HN` or `Show HN`. 

Users submit `Ask HN` posts to ask the Hacker News community a specific question and users also submit `Show HN` posts to show the Hacker News community a project, product, or just something interesting.

We'll compare these two types of posts to determine the following:

- Do `Ask HN` or `Show HN` receive more comments on average?

- Do posts created at a certain time receive more comments on average?


We will compare the most common types of posts (`Ask HN` or `Show HN`) in this site to know:


- What type of comments on average are the most abundant. 

- Analyze the relationship ( whether or not ) between the time 

- which posts are created and the number of comments they receive.

### Data dictionary:

- `id`: the unique identifier from Hacker News for the post

- `title`: the title of the post

- `url`: the URL that the posts links to, if the post has a URL

- `num_points`: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes

- `num_comments`: the number of comments on the post

- `author`: the username of the person who submitted the post

- `created_at`: the date and time of the post's submission

In [1]:
from csv import reader
hn = open('hacker_news.csv')
hn = reader(hn)
hn = list(hn)
header = hn[0]

This is the header we are going to work with, which corresponds to row 0.

In [2]:
header

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

These are the first 5 rows of our dataset with which we are going to work, so we can get an idea of the data content and what it looks like.

In [3]:
for filas in hn[0:5]:
    print(filas)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


## 2. Managing headers on a list of lists

#### - 2.1 Extract the first row of data, and assign it to the variable headers.

In [4]:
headers = hn[0]

#### - 2.2 Removing the first row from hn.

In [5]:
del hn[0]

#### . 2.3 Display headers.

In [6]:
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


- 4.Display the first five rows of hn to verify that you removed the header row properly.

In [7]:
print(hn[0:4])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


## 3. Extracting Ask HN and Show HN Posts

Now that we have removed the `hn` headers, we are ready to filter our data.

Since we are only interested in the titles of entries that begin with:

- **`Ask HN`** or **`Show HN`** we will create new lists of lists containing only the data for those two titles.

To find posts starting with  **`Ask HN`** or **`Show HN`**, we will use the string method `startswith`. 

example:

Given an object of type string, say `string1`, we can check if it starts with **'whatever_it_is'**, just by inspecting the output of the object as follows:

- `string1.startswith('dq')`. 

If `string1` starts with **data**, it will return `True`, otherwise it will return `False`.

        print('dataquest'.startswith('Data'))
        `False`

        print('dataquest'.startswith('data'))
        `True`
        
In the above example we get `False` because dataquest does not start with `'what_it_is'`, however the second print out prints `True` because dataquest **does** start with `'what_it_is'`. 

**Case is important**, so **if we want to control case, we can use the lower method which returns a lowercase version of the initial string**. 

Here is an example:

`print('DataQuest'.lower())`

`dataquest`

### Filter strategy


The strategy we will follow next to filter the information we will create three empty lists called: 

- **`ask_posts`**

- **`show_posts`** 

- **`other_posts`**.


What we will do is convert the titles to lowercase with the method `.lower()` once this process is done, then what will be done is filter by the content we want with the method `object.startswith('string')`.

We will loop through `hn` and assign the title of each row to a variable called `title` which will be the one that already has the content in lowercase.

- If the lowercase version of the title starts with `ask hn`, add the row to `ask_posts`.

- If the lowercase version of the title starts with `show hn`, add the row to `show_posts`.

- Otherwise, add to `other_post`

In [8]:
ask_posts = []
show_posts = []
other_posts = []

In [9]:
header

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

A sample of what we will find in the position corresponding to the `fila[1]`.

In [10]:
for fila in hn[7:15]:
    title = fila[1]

When the contents of `fila[1]` (**tittle**) are converted by the lower method and the title begins with lowercase version of title starting with `ask hn`, it will be append the row to `ask_posts`.

Else if the lowercase version of title starts with `show hn`, append the row to `show_posts`.

Otherwise append to `other_posts`.

In [11]:
for fila in hn:
    title = fila[1]
    title = title.lower()   # all in lower_case
    
    if title.startswith('ask hn'):
        ask_posts.append(fila)
        
    elif title.startswith('show hn'):
        show_posts.append(fila)
        
    else:
        other_posts.append(fila)

- 4.Check the number of posts in:

- `ask_posts`

- `show_posts`

- `other_posts`

In [12]:
len(ask_posts)

1744

In [13]:
len(show_posts)

1162

In [14]:
len(other_posts)

17194

## 4. Calculating the Average Number of Comments for `Ask HN` and `Show HN` Posts

Samples of listing contents `ask_posts` and `show_posts`

In [15]:
print(ask_posts[0:3])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']]


In [16]:
print(show_posts[0:3])

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']]


The total number of comments on **ask** entries must be assigned to `total_ask_comments`.

Remenber Initialize `total_ask_comments` = 0.

In [17]:
total_ask_comments = 0

- Using a for loop to iterating over the `ask_posts` entries.

- The `num_comments` column is the fifth column of `ask_posts`, so we will need to get the index element 4 in each row.

- Also convert the value to an integer in order to calculate the sum of all comments.
    - Add this value to `total_ask_comments`.
    - Calculate the average number of comments on ask entries and assign it to `avg_ask_comments`.
    
**Header looks like:**

    ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
       0      1       2          3             4             5           6

### Computing:

- #### avg number on `Ask post`
- ####  max of num_comments on `Ask post`.

In [18]:
maxi = 0

for comment in ask_posts:
    total_ask_comments += int(comment[4])  # convert string to int ['num_comments']
    if maxi <= int(comment[4]):
        maxi = int(comment[4])
      
    
avg_ask_comments = total_ask_comments / len(ask_posts) # That`s the average: Ask comments / Total amount comments

print('avg number of "Ask_posts" comments:', round(avg_ask_comments,2))
print('max number of "Ask_posts" comment:', maxi)

avg number of "Ask_posts" comments: 14.04
max number of "Ask_posts" comment: 947


- ### Title of the max of `num_comments` on `Ask` post.

**Header looks like:**

    ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
       0      1       2          3             4             5           6

In [19]:
# ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

most_important_questions = {}
maxi = 0

for comment in ask_posts:
     if maxi < int(comment[4]): #Only if comment[4] is bigger than maxi, this value will be on dictionary. 
            maxi = int(comment[4]) # update the value 
            # save into dictionary: key:num_comment value: title
            most_important_questions[int(comment[4])] = comment[1] # ['title']
            
most_voted = list(reversed(sorted(most_important_questions.keys()))) # list of most_important_questions 
                                                                     # by keys sorted 

print("-- Score --            -- Tittle -- ")

for score in most_voted:
    texto = "    {points}     {relevance}".format(points = score,relevance = most_important_questions[score])
    print(texto)

-- Score --            -- Tittle -- 
    947     Ask HN: Who is hiring? (August 2016)
    910     Ask HN: Who is hiring? (September 2016)
    266     Ask HN: What are the must-read books about economics/finance?
    250     Ask HN: Who wants to be hired? (June 2016)
    234     Ask HN: What are you currently building?
    182     Ask HN: What is your go-to example for a good REST API?
    37     Ask HN: Things you created in 2015?
    33     Ask HN: teaching basic coding and web design offline, solely via iOS devices?
    29     Ask HN: Am I the only one outraged by Twitter shutting down share counts?
    6     Ask HN: How to improve my personal website?


### Computing:

- #### avg number on `Show post`

- #### max of `num_comments` on Show post.

In [20]:
show_posts[:2] # Sample of content.

[['10627194',
  'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform',
  'https://iot.seeed.cc',
  '26',
  '22',
  'kfihihc',
  '11/25/2015 14:03'],
 ['10646440',
  'Show HN: Something pointless I made',
  'http://dn.ht/picklecat/',
  '747',
  '102',
  'dhotson',
  '11/29/2015 22:46']]

In [21]:
# ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

max_show = 0
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4]) ## Accumulator value
    if max_show <= int(row[4]):
        max_show = int(row[4])
    
avg_show_comments = total_show_comments / len(show_posts)
print('avg number of comments: "Show posts" ', round(avg_show_comments,2))

print('max number of "Show posts" comments:', max_show)

avg number of comments: "Show posts"  10.32
max number of "Show posts" comments: 306


- ### Title of the max of `num_comments` on `Show post`.

In [22]:
most_important_show = {}
max_show = 0

for comment in show_posts:
     if max_show < int(comment[4]):
            max_show = int(comment[4])
            most_important_show[int(comment[4])] = comment[1]
            
most_voted_show = list(reversed(sorted(most_important_show.keys())))

print("-- Score --    -- Tittle -- ")

for score in most_voted_show:
    texto = "   {points}           {relevance}".format(points = score,relevance = most_important_show[score])
    print(texto)

-- Score --    -- Tittle -- 
   306           Show HN: BitKeeper  Enterprise-ready version control, now open-source
   168           Show HN: Nodal. Next-Generation Node.js Server and Framework
   134           Show HN: Download any song without knowing its name
   102           Show HN: Something pointless I made
   22           Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform


In [23]:
%%html
<style>
table {float:left}
</style>

### Observations

- #### avg number of "Ask_posts" comments: 14.04
- #### max number of "Ask_posts" comment: 947


|-- Score --|    -- Tittle --| 
|---|---| 
    |947 |    Ask HN: Who is hiring? (August 2016)|
    |910 |    Ask HN: Who is hiring? (September 2016)|
    |266 |    Ask HN: What are the must-read books about economics/finance?|
    |250 |    Ask HN: Who wants to be hired? (June 2016)|
    |234 |    Ask HN: What are you currently building?|
    |182 |    Ask HN: What is your go-to example for a good REST API?|
    |37  |   Ask HN: Things you created in 2015?|
    |33  |   Ask HN: teaching basic coding and web design offline, solely via iOS devices?|
    |29  |   Ask HN: Am I the only one outraged by Twitter shutting down share counts?|
    |6   |  Ask HN: How to improve my personal website?|

- #### avg number of `"Show posts"` comments: 10.32
- #### max number of `"Show posts"` comments: 306


|-- Score --|    -- Tittle --| 
|---|---|
|306|           Show HN: BitKeeper  Enterprise-ready version control, now open-source|
|168|           Show HN: Nodal. Next-Generation Node.js Server and Framework|
|134|           Show HN: Download any song without knowing its name|
|102|           Show HN: Something pointless I made|
|22 |          Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform|
   


Clearly there is a higher average value in the creation of questions we also see that the scores that have received the questions are much higher than the posts with a show content.

The highest score in show post is almost close to the fifth position of the posts in which questions are asked.

We can observe is that what matters most to the users of this forum (always taking into account the value of the scores) and that it is in the first three positions is to know who or what company is hiring, so this is a place to know about this. 

Another thing that we can deduce is that in June 2016 the importance of that question occupies the first position while as time passes in August and September the importance (according to the score) is reduced.


In another hand the relation to the content of the show post the most voted is related to know that BitKeeper is a fast, enterprise-ready, available as Open Source under the Apache 2.0 License distributed SCM that scales up to very large projects and down to tiny ones.

## 5. Finding the Number of Ask posted and Comments posted by Hour Created


Ask posts are more likely to receive comments, **we'll focus our analysis just on these posts.**

**We'll determine if `ask posts` created at a certain time are more likely to attract comments**. We'll use the following steps to perform this analysis:

- Calculate the number of **ask posts created in each hour of the day, along with the number of comments received.

- Calculate the **average number of comments ask posts receive by hour created**.

**NOTE**: we can use the `datetime.strptime()` constructor to **parse dates stored as strings and return datetime objects**, example:

   - `date_1_str = "December 24, 1984"`
   - `date_1_dt = dt.datetime.strptime(date_1_str, "%B %d, %Y")`

Let's use this technique to calculate the number of ask posts created per hour, along with the total number of comments.

In [24]:
import datetime as dt

Create an empty list, and assign it to `result_list`. This will be a list of lists.

In [25]:
result_list = []

Loop over `ask_posts`, and append to `result_list` a list with two elements:

   - The first element should be the column `created_at`. 
   
Because the `created_at` column is the seventh column in `ask_posts`, you'll need to get the element at index 6 in each row.
    
   - The second element should be the number of comments of the post. You'll also need to convert the value to an integer.
   
**Header looks like:**

    ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
       0      1       2          3             4             5           6

In [26]:
for row in ask_posts:
    time_stamp = row[6]
    num_comment = row[4]
    tupla = (time_stamp, num_comment)
    result_list.append(tupla)

result_list

[('8/16/2016 9:55', '6'),
 ('11/22/2015 13:43', '29'),
 ('5/2/2016 10:14', '1'),
 ('8/2/2016 14:20', '3'),
 ('10/15/2015 16:38', '17'),
 ('9/26/2015 23:23', '1'),
 ('4/22/2016 12:24', '4'),
 ('11/16/2015 9:22', '1'),
 ('2/24/2016 17:57', '1'),
 ('6/4/2016 17:17', '2'),
 ('9/19/2015 17:04', '7'),
 ('9/22/2015 13:16', '1'),
 ('6/21/2016 15:45', '1'),
 ('1/13/2016 21:17', '4'),
 ('10/4/2015 21:27', '4'),
 ('1/25/2016 20:27', '2'),
 ('10/27/2015 2:47', '3'),
 ('1/19/2016 12:01', '1'),
 ('3/22/2016 2:05', '22'),
 ('9/8/2015 14:04', '2'),
 ('8/28/2016 18:06', '2'),
 ('7/20/2016 13:44', '7'),
 ('9/12/2016 16:52', '7'),
 ('2/29/2016 17:52', '3'),
 ('4/18/2016 15:28', '6'),
 ('12/28/2015 14:38', '2'),
 ('4/4/2016 3:34', '1'),
 ('1/15/2016 21:47', '3'),
 ('11/19/2015 5:33', '29'),
 ('12/20/2015 3:59', '2'),
 ('10/15/2015 21:34', '20'),
 ('2/26/2016 19:20', '3'),
 ('8/2/2016 18:00', '3'),
 ('2/28/2016 1:24', '33'),
 ('1/13/2016 9:12', '5'),
 ('5/6/2016 1:14', '4'),
 ('6/23/2016 13:59', '7'),
 ('4

With two new and empty dictionaries called `counts_by_hour` and `comments_by_hour`.

Loop through each row of `result_list`.

- Extract the **hour from the date**, which is the first element of the row.

- Use the **datetime.strptime()** method to parse the date and create a datetime object.

- Use the string we want to parse as the first argument and a string that specifies the format as the second argument.

- Use the **datetime.strftime()** method to select just the hour from the datetime object.


If the hour isn't a key in `counts_by_hour`:

- Create the key in `counts_by_hour`, and set it equal to 1.

- Create the key in `comments_by_hour`, and set it equal to the `comment number`.

If the hour is already a key in `counts_by_hour`:

- Increment the value in `counts_by_hour` by 1.

- Increment the value in `comments_by_hour` by the `comment number`.

In [27]:
# ('8/28/2016 18:06', '2') <- date and number of comments (tuple)

counts_by_hour   = {}
comments_by_hour = {}

for row in result_list: # result_list content ('8/28/2016 18:06', '2')
    time_stamp = row[0]
    time_obj = dt.datetime.strptime(time_stamp, "%m/%d/%Y %H:%M")
    num_comments = int(row[1])
    
    hour = time_obj.hour
        
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = num_comments
    
    elif hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += num_comments
        
print(f"Counts by hour: {counts_by_hour}")
print("---")
print(f"Comments by hour: {comments_by_hour}")

Counts by hour: {9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}
---
Comments by hour: {9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}



`counts_by_hour`: contains the number of `ask posts` created during each hour of the day.

`comments_by_hour`: contains the corresponding number of `comments ask posts` created at each hour received.

## 6. Average number of comments on `Ask HN Posts` by Hour

Next, we'll use these two dictionaries `counts_by_hour` and `comments_by_hour` to calculate the average number of comments for posts created during each hour of the day.

Calculating the average number of comments per post, for posts created during each hour of the day.

In [28]:
avg_by_hour = []

for horas in counts_by_hour: # {9: 45, 13: 85,...}
    avg_by_hour.append([horas, comments_by_hour[horas] / counts_by_hour[horas]])
    
avg_by_hour # result in another tupla

[[9, 5.5777777777777775],
 [13, 14.741176470588234],
 [10, 13.440677966101696],
 [14, 13.233644859813085],
 [16, 16.796296296296298],
 [23, 7.985294117647059],
 [12, 9.41095890410959],
 [17, 11.46],
 [15, 38.5948275862069],
 [21, 16.009174311926607],
 [20, 21.525],
 [2, 23.810344827586206],
 [18, 13.20183486238532],
 [3, 7.796296296296297],
 [5, 10.08695652173913],
 [19, 10.8],
 [1, 11.383333333333333],
 [22, 6.746478873239437],
 [8, 10.25],
 [4, 7.170212765957447],
 [0, 8.127272727272727],
 [6, 9.022727272727273],
 [7, 7.852941176470588],
 [11, 11.051724137931034]]

**avg_by_hour:** Average number of comments for posts created during each hour of the day, and stored the results in a list of lists named.

## 7. Sorting and printing values from `avg_by_hour`

This format makes it difficult to identify the hours with the highest values:

        [[9, 5.5777777777777775],
         [13, 14.741176470588234],
         [10, 13.440677966101696],
         [14, 13.233644859813085],
         ...
         ]

Let's finish by sorting the list of lists `avg_by_hour` and printing the five highest values in a format that's easier to read.

Sorting the `avg_by_hour`.

In [29]:
sorted_avg_by_hour = sorted(avg_by_hour, reverse=True)
sorted_avg_by_hour

[[23, 7.985294117647059],
 [22, 6.746478873239437],
 [21, 16.009174311926607],
 [20, 21.525],
 [19, 10.8],
 [18, 13.20183486238532],
 [17, 11.46],
 [16, 16.796296296296298],
 [15, 38.5948275862069],
 [14, 13.233644859813085],
 [13, 14.741176470588234],
 [12, 9.41095890410959],
 [11, 11.051724137931034],
 [10, 13.440677966101696],
 [9, 5.5777777777777775],
 [8, 10.25],
 [7, 7.852941176470588],
 [6, 9.022727272727273],
 [5, 10.08695652173913],
 [4, 7.170212765957447],
 [3, 7.796296296296297],
 [2, 23.810344827586206],
 [1, 11.383333333333333],
 [0, 8.127272727272727]]

### Formating hours properly.

In [30]:
date_format = "%H"

#15:00: 38.59 average comments per hour.

#print("Top 5 Hours for Ask Posts Comments:",'\n')

for row in sorted_avg_by_hour:
    horas = row[0]
    hora = str(horas)
    objeto_datetime =  dt.datetime.strptime(hora, date_format)
    hora = objeto_datetime.hour
    avg_comment = row[1]

    template = "At {h}:00 the average comments per post is {c:.2f}".format(h = horas, c = avg_comment )
    
    print(template)


At 23:00 the average comments per post is 7.99
At 22:00 the average comments per post is 6.75
At 21:00 the average comments per post is 16.01
At 20:00 the average comments per post is 21.52
At 19:00 the average comments per post is 10.80
At 18:00 the average comments per post is 13.20
At 17:00 the average comments per post is 11.46
At 16:00 the average comments per post is 16.80
At 15:00 the average comments per post is 38.59
At 14:00 the average comments per post is 13.23
At 13:00 the average comments per post is 14.74
At 12:00 the average comments per post is 9.41
At 11:00 the average comments per post is 11.05
At 10:00 the average comments per post is 13.44
At 9:00 the average comments per post is 5.58
At 8:00 the average comments per post is 10.25
At 7:00 the average comments per post is 7.85
At 6:00 the average comments per post is 9.02
At 5:00 the average comments per post is 10.09
At 4:00 the average comments per post is 7.17
At 3:00 the average comments per post is 7.80
At 2:00

In [31]:
date_format = "%H" 

swap_avg_by_hour = []
#sorted_avg_by_hour
for row in sorted_avg_by_hour: #avg_by_hour
    puntos = row[1]
    b = str(row[0])
    objeto_datetime =  dt.datetime.strptime(b, date_format)
    hora = objeto_datetime.hour
    tupla = (puntos,hora)
    swap_avg_by_hour.append(tupla)

swap_avg_by_hour

[(7.985294117647059, 23),
 (6.746478873239437, 22),
 (16.009174311926607, 21),
 (21.525, 20),
 (10.8, 19),
 (13.20183486238532, 18),
 (11.46, 17),
 (16.796296296296298, 16),
 (38.5948275862069, 15),
 (13.233644859813085, 14),
 (14.741176470588234, 13),
 (9.41095890410959, 12),
 (11.051724137931034, 11),
 (13.440677966101696, 10),
 (5.5777777777777775, 9),
 (10.25, 8),
 (7.852941176470588, 7),
 (9.022727272727273, 6),
 (10.08695652173913, 5),
 (7.170212765957447, 4),
 (7.796296296296297, 3),
 (23.810344827586206, 2),
 (11.383333333333333, 1),
 (8.127272727272727, 0)]

In [32]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[(38.5948275862069, 15),
 (23.810344827586206, 2),
 (21.525, 20),
 (16.796296296296298, 16),
 (16.009174311926607, 21),
 (14.741176470588234, 13),
 (13.440677966101696, 10),
 (13.233644859813085, 14),
 (13.20183486238532, 18),
 (11.46, 17),
 (11.383333333333333, 1),
 (11.051724137931034, 11),
 (10.8, 19),
 (10.25, 8),
 (10.08695652173913, 5),
 (9.41095890410959, 12),
 (9.022727272727273, 6),
 (8.127272727272727, 0),
 (7.985294117647059, 23),
 (7.852941176470588, 7),
 (7.796296296296297, 3),
 (7.170212765957447, 4),
 (6.746478873239437, 22),
 (5.5777777777777775, 9)]

In [33]:
date_format = "%H"

#15:00: 38.59 average comments per post.

for row in sorted_swap[0:4]:
    horas = row[1]
    avg_comment = row[0]
    
    plantilla = "At {h}:00 hours the average comments per post is {c:.2f}".format(h = horas, c = avg_comment )
    
    print(plantilla)

At 15:00 hours the average comments per post is 38.59
At 2:00 hours the average comments per post is 23.81
At 20:00 hours the average comments per post is 21.52
At 16:00 hours the average comments per post is 16.80


**Header looks like:**

    ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

## A little step further away 

### Who are the most relevant authors in `ask_posts`on the forum?.

Within a community there is always a group of people who have written a larger number of posts. 

Let's see who are those who have written in `Ask post` as those who have done it in `Show post`

In [34]:
def most_relevant_authors(kind_of_post):
    author = {}
    sort_dict = []
    
    for row in kind_of_post:
        who = row[5]
        
        if who in author:
            author[who] +=1
        else:
            author[who] = 1
            
    for (key,value) in author.items():
        sort_dict.append((value, key))
    
    return sorted(sort_dict, reverse = True)

### Most relevant authors in `Ask_posts` by number of times posted on the forum.

In [35]:
most_relevant_authors(ask_posts)

[(16, 'hoodoof'),
 (14, 'tmaly'),
 (9, 'whoishiring'),
 (7, 'prmph'),
 (7, 'hanniabu'),
 (6, 'tixocloud'),
 (5, 'vijayr'),
 (5, 'soulbadguy'),
 (5, 'sharemywin'),
 (5, 'rayalez'),
 (5, 'chirau'),
 (5, 'a_lifters_life'),
 (4, 'zuck9'),
 (4, 'wkoszek'),
 (4, 'selmat'),
 (4, 'probinso'),
 (4, 'nullundefined'),
 (4, 'neilsharma'),
 (4, 'max_'),
 (4, 'kevindeasis'),
 (4, 'holaboyperu'),
 (4, 'forgottenacc56'),
 (4, 'curiousgal'),
 (4, 'codegeek'),
 (4, 'cdvonstinkpot'),
 (4, 'basicscholar'),
 (4, 'baccheion'),
 (4, 'J-dawg'),
 (3, 'yeukhon'),
 (3, 'vinnyglennon'),
 (3, 'tuyguntn'),
 (3, 'tsaprailis'),
 (3, 'tonym9428'),
 (3, 'thrwawy20160421'),
 (3, 'thirstysusrando'),
 (3, 'techaddict009'),
 (3, 'simonebrunozzi'),
 (3, 'shubhamjain'),
 (3, 'shade23'),
 (3, 'sanosuke'),
 (3, 'philippnagel'),
 (3, 'networked'),
 (3, 'marktangotango'),
 (3, 'mangeletti'),
 (3, 'karimdag'),
 (3, 'jklein11'),
 (3, 'jason_slack'),
 (3, 'jacquesm'),
 (3, 'hellofunk'),
 (3, 'fratlas'),
 (3, 'ffggvv'),
 (3, 'enitih

### Most relevant authors in `Show_posts`

In [36]:
most_relevant_authors(show_posts)

[(4, 'vipul4vb'),
 (4, 'soheil'),
 (4, 'max0563'),
 (4, 'iisbum'),
 (4, 'emeth'),
 (4, 'chinchang'),
 (4, 'alexellisuk'),
 (3, 'stockkid'),
 (3, 'mojoe'),
 (3, 'gk_brown'),
 (3, 'fiatjaf'),
 (3, 'awwstn'),
 (2, 'zaytoun'),
 (2, 'vishaldpatel'),
 (2, 'viebel'),
 (2, 'traviswingo'),
 (2, 'tinjam'),
 (2, 'tgoldberg'),
 (2, 'syrusakbary'),
 (2, 'sunnynagra'),
 (2, 'stagename'),
 (2, 'solusipse'),
 (2, 'snehesht'),
 (2, 'shaharsol'),
 (2, 'ruiramos'),
 (2, 'rooviz'),
 (2, 'robot'),
 (2, 'rezashirazian'),
 (2, 'rayalez'),
 (2, 'qrv3w'),
 (2, 'powturbo'),
 (2, 'pezza3434'),
 (2, 'natsu90'),
 (2, 'mrzool'),
 (2, 'morninj'),
 (2, 'maxpert'),
 (2, 'marco1'),
 (2, 'm52go'),
 (2, 'licobo'),
 (2, 'laarc'),
 (2, 'ksowocki'),
 (2, 'knutmartin'),
 (2, 'keithwhor'),
 (2, 'jointhebox'),
 (2, 'jjets718'),
 (2, 'jcuga'),
 (2, 'impostervt'),
 (2, 'imakesnowflakes'),
 (2, 'harrisreynolds'),
 (2, 'gioscarab'),
 (2, 'fredrivett'),
 (2, 'fibo'),
 (2, 'fatiherikli'),
 (2, 'eon01'),
 (2, 'ejcx'),
 (2, 'dkoplovic

In order not to extend too much we will stay with the first 3 in `Ask_post` since the difference between the second and the third is not only 7 points.

    - hoodoof
    - tmaly
    - whoishiring

And we will do the same in `Show_post`however here the difference is not so much between one score and another, just 3 of difference.

    - vipul4vb
    - soheil
    - max0563

Ordered in such a way that the first one is the one who has made the most post.

Now that we know who are the people who write the most, let's see what score they gave them, when they wrote it and what the title is.

###  Number of posts by user.

In [37]:
# ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

def all_activity_post(name, posts):
    datum_format = "%m/%d/%Y %H:%M"
    time_format = '%Y/%m/%d'
    time = 0
    num_points = 0
    post_published = {}

    print("Score Date           Post ")
    for row in posts:
        if row[5] == name:           # ['author']
            time +=1
            num_points = int(row[3]) # ['num_points']
            tittle = row[1]
            
            ask_date = row[6]       # 'created_at'
            
            ask_date = dt.datetime.strptime(ask_date, datum_format) # date_format = "%m/%d/%Y %H:%M"
            date_object = ask_date.strftime(time_format)           # time_format = '%Y/%m/%d'
            
            post_published = "{points} -- " "{date} " " {tittle},".format(tittle = row[1],
                                                                          date = date_object, points = row[3] )
            print(post_published)
    
    total_number_post = "Total number of post = {n_post}".format(n_post = time)
    print("\n")
    print(total_number_post)

### What are the most significant activity by user in `Ask_posts` ?  

- What has been the **activity** in the forum per user? 

- What **score** has each post you created received? 

- **Date** when they did it?

- Title of the post.

- What is the total **number of posts** by user?

In [38]:
all_activity_post('hoodoof', ask_posts)

Score Date           Post 
3 -- 2016/07/10  Ask HN: Imagine it's 1993  what would you put in an MVP web browser?,
3 -- 2016/01/11  Ask HN: Can anyone suggest a good RSS newsreader with a set of tech news feeds?,
3 -- 2016/02/26  Ask HN: Someone is stealing things from my car. What security camera would help?,
2 -- 2016/09/04  Ask HN: Why should open source support be free? I don't think it should.,
9 -- 2016/04/12  Ask HN: What's it like working at a cannabis tech startup company?,
2 -- 2016/04/08  Ask HN: What is the most money a bootstrapped, one-person company has sold for?,
1 -- 2016/05/20  Ask HN: Are you building something?  How long for? How much longer to go?,
4 -- 2016/04/07  Ask HN: Why is it still not possible to search an S3 bucket?,
1 -- 2016/05/04  Ask HN: Why does Etsy have so many items titled DO Not PURCHASE?,
11 -- 2016/06/03  Ask HN: Is there an up-to-date global index of conferences?,
2 -- 2016/04/23  Ask HN: What would TV cop drama be about if drugs were legal?,
4 

In [39]:
all_activity_post('tmaly', ask_posts)

Score Date           Post 
12 -- 2016/03/21  Ask HN: How do you find unused CSS?,
3 -- 2015/09/23  Ask HN: Laptop bag for 15 Macbook Pro for inclement weather,
1 -- 2016/06/22  Ask HN: Operational complexity of micro services?,
8 -- 2016/01/12  Ask HN: Online learning,
2 -- 2015/10/20  Ask HN: Wireless HDMI for Macbook to TV,
3 -- 2015/12/28  Ask HN: Arduino or Raspberry PI for teenager?,
2 -- 2016/01/15  Ask HN: Website Obesity Crisis,
1 -- 2015/10/26  Ask HN: Algorithm for Scheduling appointments by location,
3 -- 2016/08/27  Ask HN: Accuracy of ip to geo location?,
2 -- 2015/10/30  Ask HN: Feedback on design?,
2 -- 2016/05/17  Ask HN: What is your favorite podcast episode?,
2 -- 2016/03/29  Ask HN: Best use case writeups of SOA?,
1 -- 2016/06/06  Ask HN: Collaborative RSS?,
2 -- 2016/02/14  Ask HN: Weather site negative temps,


Total number of post = 14


In [40]:
all_activity_post('tixocloud', ask_posts)

Score Date           Post 
5 -- 2016/01/07  Ask HN: Anyone interested in starting an IT consultancy?,
6 -- 2015/11/09  Ask HN: How do I start an analytics consulting company?,
1 -- 2016/02/02  Ask HN: Anyone from Malaysia on HN?,
1 -- 2016/02/03  Ask HN: How do you find early adopters?,
8 -- 2016/06/08  Ask HN: Would you use an app to keep track of all your relationships?,
1 -- 2016/06/15  Ask HN: Protecting database information?,


Total number of post = 6


### What are the most significant activity by user in `Show_posts`  ?

In [41]:
all_activity_post('vipul4vb', show_posts)

Score Date           Post 
2 -- 2016/04/04  Show HN: Quantitative user research reveals useful UX observation on LinkedIn,
1 -- 2016/06/06  Show HN: Import Balsamiq Mockups into CanvasFlip using this simple interface,
1 -- 2016/04/25  Show HN: Donald Trump V/s Hillary Clinton Better On-Site UX?,
2 -- 2016/04/06  Show HN: UX Insights on largest e-commerce app  Amazon,


Total number of post = 4


In [42]:
all_activity_post('soheil', show_posts)

Score Date           Post 
1 -- 2016/03/28  Show HN: Get hired as a team: Work with people you know,
3 -- 2016/02/04  Show HN: Demo: most accurate speech recognition,
7 -- 2016/03/06  Show HN: Lightweight Twitter for Mac client,
2 -- 2016/08/22  Show HN: 1) Build Team 2) Interview 3) Offer,


Total number of post = 4


In [43]:
all_activity_post('brakmic', show_posts) 

Score Date           Post 
21 -- 2016/02/27  Show HN: PouchDB Bindings for PureScript,
3 -- 2016/01/27  Show HN: JSON.stringify (without circular deps) for AngularJS 1.x,


Total number of post = 2


### Who has been the most voted, when they was and what was the title of the post?

We note that not all publications have the same value for the community and we see this reflected in the votes they receive, so it would be interesting to know which of the publications that our users made had more votes and therefore which were the most important issues for the community. 

In [44]:
# ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

def author_most_valuable_post(name, posts):
    datum_format = "%m/%d/%Y %H:%M"
    num_points = 0
    post_published = {}

    print("Score Date           Post ")
    for row in posts:
        if row[5] == name:
            if num_points < int(row[3]):
                num_points = int(row[3])

                tittle = row[1]
                
                time_format = '%Y/%m/%d'
                ask_date = row[6]
                ask_date = dt.datetime.strptime(ask_date, datum_format)
                date_object = ask_date.strftime(time_format)
                
                post_published.setdefault(num_points,[]).append(num_points)
                post_published.setdefault(num_points,[]).append(date_object)
                post_published.setdefault(num_points,[]).append(tittle)
                
    for i in reversed(post_published):
        print(post_published[i])

### `Ask_post`

In [45]:
author_most_valuable_post('hoodoof', ask_posts)

Score Date           Post 
[11, '2016/06/03', 'Ask HN: Is there an up-to-date global index of conferences?']
[9, '2016/04/12', "Ask HN: What's it like working at a cannabis tech startup company?"]
[3, '2016/07/10', "Ask HN: Imagine it's 1993  what would you put in an MVP web browser?"]


In [46]:
author_most_valuable_post('tmaly', ask_posts)

Score Date           Post 
[12, '2016/03/21', 'Ask HN: How do you find unused CSS?']


In [47]:
author_most_valuable_post('tixocloud', ask_posts)

Score Date           Post 
[8, '2016/06/08', 'Ask HN: Would you use an app to keep track of all your relationships?']
[6, '2015/11/09', 'Ask HN: How do I start an analytics consulting company?']
[5, '2016/01/07', 'Ask HN: Anyone interested in starting an IT consultancy?']


### `Show_post`

In [48]:
author_most_valuable_post('vipul4vb', show_posts)

Score Date           Post 
[2, '2016/04/04', 'Show HN: Quantitative user research reveals useful UX observation on LinkedIn']


In [49]:
author_most_valuable_post('soheil', show_posts)

Score Date           Post 
[7, '2016/03/06', 'Show HN: Lightweight Twitter for Mac client']
[3, '2016/02/04', 'Show HN: Demo: most accurate speech recognition']
[1, '2016/03/28', 'Show HN: Get hired as a team: Work with people you know']


In [50]:
brakmic = author_most_valuable_post('brakmic', show_posts)

Score Date           Post 
[21, '2016/02/27', 'Show HN: PouchDB Bindings for PureScript']


{{aaa}}

Ranking of Authors by type of post, number of times they have written post, title and rating

|-- post --|-- ranking-- | -- authors --| -- # of post -- | -- Title of most valuable post --- |-- Score -- | -- Date -- |
|:---|:---|: -- |:--- |:-- |:--: |:--:|
|Ask post|1|hoodoof  |16  |Ask HN: Is there an up-to-date global index of conferences?'|11|2016/06/03|
|Ask post|2|tmaly    |14  |'12, '2016/03/21', 'Ask HN: How do you find unused CSS?'|12|2016/06/03|
|Ask post|3|tixocloud|6   |'Ask HN: Would you use an app to keep track of all your relationships?'|8|'2016/01/07'|
|Show post|1|vipul4vb  |4  |'Show HN: Quantitative user research reveals useful UX observation on LinkedIn'|2|'2016/04/04'|
|Show post|2|soheil  |4  |'Ask HN: How do you find unused CSS?'|7|2016/03/06|
|Show post|3|brakmic |2  |'Show HN: PouchDB Bindings for PureScript'|21|'2016/02/27'|

## Some conclusions.

We can see the users separated by those who asked questions in `ask_post` and those who posted in `show_show`.

It is striking that there is no correlation between the fact of writing more and having a better score (at least in `ask_post`) in `show_show` it is almost maintained.

It is also noteworthy that the score received does not correlate with the number of posts they have written.

We see that users write more questions than shows, however the scores received by those who wrote in show are higher than in ask. 

From this we can deduce that this community values sharing information that is useful to the community and types of questions that help its members.