# Median Google Search Frequency

Google's marketing team is making a Superbowl commercial & needs a simple statistic to put to their TV ad: the median number of searches a person made last year.

However, at Google scale, querying the 2 trillion searches is too costly. Luckily, you have access to the summary table which tells you the number of searches made last year & how many Google user's fall into that bucket.

Write a query to report the median of searches made by a user. Round the median to one decimal point.

# Answer

I'll be using the `search_frequency` table:

```
CREATE TABLE search_frequency (
	searches smallint,
	num_users smallint
);

INSERT INTO search_frequency
VALUES (1, 2),
	   (4, 1),
	   (2, 2),
	   (3, 3),
	   (6, 1),
	   (5, 3),
	   (7, 2);

SELECT * FROM search_frequency;
```

<img src = "search_frequency Table.png" width = "600" style = "margin:auto"/>


Let's do this manually so we can know if we write our query correctly. Our array is {1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 7}, which should give us a median of 3.5.

```
SELECT *,
	  sum(num_users) OVER (ORDER BY searches) 
		  AS sum_freq,
	  sum(num_users) OVER () AS total_freq
FROM search_frequency
ORDER BY searches;
```

<img src = "Frequency Table.png" width = "600" style = "margin:auto"/>

Now I just need some mathematical way to filter the result to just searches = 3 or 4, so I can take the average of that.

```
SELECT round(avg(searches), 1) AS median
FROM (
	SELECT *,
	      sum(num_users) OVER (ORDER BY searches) 
		  	  AS sum_freq,
		  sum(num_users) OVER () AS total_freq
	FROM search_frequency
	ORDER BY searches
)
WHERE total_freq <= 2 * sum_freq
	AND total_freq >= 2 * (sum_freq - num_users);
```

<img src = "Median Search Frequency.png" width = "600" style = "margin:auto"/>

The median number of searches each person made last year was 3.5.

To make sure this works with other frequency tables, I'll create a dummy table to test on.

```
CREATE TABLE freq_table (
	number smallserial,
	freq smallint
);

INSERT INTO freq_table (freq)
VALUES (5),
	   (3),
	   (2),
	   (7),
	   (6),
	   (4),
	   (5),
	   (1);

SELECT * FROM freq_table;
```

<img src = "freq_table Table.png" width = "600" style = "margin:auto"/>

The array for this example is {1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8} with a median of 4.

```
SELECT round(avg(number), 1) AS median
FROM (
	SELECT *,
		   sum(freq) OVER (ORDER BY number) 
		   	   AS sum_freq,
		   sum(freq) OVER () AS total_freq
	FROM freq_table
)
WHERE total_freq <= 2 * sum_freq
	AND total_freq >= 2 * (sum_freq - freq);
```

<img src = "Median of freq_table Example.png" width = "600" style = "margin:auto"/>

Tremendous.