# How to Hack Hacker News

<img src="https://s3.amazonaws.com/codecademy-content/courses/sql-intensive/hackernews.gif">

<a href="https://news.ycombinator.com">Hacker News</a> is a popular website run by Y Combinator. It’s widely known by people in the tech industry as a community site for sharing news, showing off projects, asking questions, among other things.

In this project, you will be working with a table named hacker_news that contains stories from Hacker News since its launch in 2007. It has the following columns:

title - the title of the story
user - the user who submitted the story
score - the score of the story
timestamp - the time of the story
url - the link of the story
This data was kindly made publicly available under the <a href="https://opensource.org/licenses/MIT/">MIT license</a>.

## Pre-Gaming for Aggregates

In [None]:
# Start by getting a feel for the hacker_news table!
# Let’s find the most popular Hacker News stories
SELECT title, score
FROM hacker_news
ORDER BY score DESC
LIMIT 5;

## Hacker News Moderating

Recent studies have found that online forums tend to be dominated by a small percentage of their users (<a href="https://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)">1-9-90 Rule</a>).

Is this true of Hacker News?

Is a small percentage of Hacker News submitters taking the majority of the points?

First, find the total score of all the stories.

In [None]:
SELECT SUM(score) AS 'total_scores'
FROM hacker_news;

Next, we need to pinpoint the users who have accumulated a lot of points across their stories.

Find the individual users who have gotten combined scores of more than 200, and their combined scores.

In [None]:
SELECT user, SUM(score) AS 'combined_scores'
FROM hacker_news
GROUP BY 1
HAVING combined_scores > 200
ORDER BY 2 DESC;


Then, we want to add these users’ scores together and divide by the total to get the percentage.

Add their scores together and divide it by the total sum. 

In [None]:
SELECT SUM(combined_scores) / 6366.0 AS 'percent_score'
FROM (SELECT user, SUM(score) AS 'combined_scores'
FROM hacker_news
GROUP BY 1
HAVING combined_scores > 200
ORDER BY 2 DESC);

Oh no! While we are looking at the power users, some users are rickrolling — tricking readers into clicking on a link to a funny video and claiming that it links to information about coding.

The url of the video is:

https://www.youtube.com/watch?v=dQw4w9WgXcQ

How many times has each offending user posted this link?

In [None]:
SELECT user, COUNT(*) AS 'number_of_offending_links'
FROM hacker_news
WHERE url LIKE '%youtube.com/watch?v=dQw4w9WgXcQ%'
GROUP BY 1
ORDER BY 2 DESC;

## Which sites feed Hacker News?


Hacker News stories are essentially links that take users to other websites.

Which of these sites feed Hacker News the most:

*GitHub, Medium, or New York Times?*

First, we want to categorize each story based on their source.

We can do this using a CASE statement:

In [None]:
SELECT CASE
    WHEN url LIKE '%github.com%' THEN 'Github'
    WHEN url LIKE '%medium.com%' THEN 'Medium'
    WHEN url LIKE '%nytimes.com%' THEN 'New York Times'
    ELSE 'Other'
    END AS 'source'
FROM hacker_news;

Add a column for the number of stories from each URL using COUNT().

In [None]:
SELECT CASE
   WHEN url LIKE '%github.com%' THEN 'GitHub'
   WHEN url LIKE '%medium.com%' THEN 'Medium'
   WHEN url LIKE '%nytimes.com%' THEN 'New York Times'
   ELSE 'Other'
  END AS 'source',
  COUNT(*)
FROM hacker_news
GROUP BY 1;

## What's the best time to post a story?

Every submitter wants their story to get a high score so that the story makes it to the front page, but…

What’s the best time of the day to post a story on Hacker News?

Before we get started, let’s run this query and take a look at the timestamp column:

In [None]:
SELECT timestamp
FROM hacker_news
LIMIT 10;

SQLite comes with a strftime() function - a very powerful function that allows you to return a formatted date.

It takes two arguments:

strftime(format, column)

Let’s test this function out:

In [None]:
SELECT timestamp,
   strftime('%H', timestamp)
FROM hacker_news
GROUP BY 1
LIMIT 20;

Okay, now we understand how strftime() works. Let’s write a query that returns three columns:

1. The hours of the timestamp
2. The average score for each hour
3. The count of stories for each hour

In [None]:
SELECT strftime('%H', timestamp) AS 'hours', AVG(score) AS 'avg_score', COUNT(*) AS 'count'
FROM hacker_news
GROUP BY 1
ORDER BY 1;

In [None]:
# The best hours to post a story on Hacker News
SELECT hours, ROUND(avg_score, 2), count
FROM (SELECT strftime('%H', timestamp) AS 'hours', AVG(score) AS 'avg_score', COUNT(*) AS 'count'
FROM hacker_news
GROUP BY 1
ORDER BY 1)
WHERE avg_score IS NOT NULL
ORDER BY 2 DESC;