# Ordered Set Statistics Window Functions

Ordered Set Statistics are similar to regular window function, but imply the grouping from the columns not in the ordering.


In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro

'Connected: dsa_ro_user@dsa_ro'

## Ordered-Set Aggregate Functions

All the aggregates listed in below ignore `null` values in their sorted input. 
For those that take a fraction parameter, the fraction value must be between 0 and 1; 
an error is thrown if not. 
However, a `null` fraction value simply produces a `null` result.


<table class="CALSTABLE" border="1">
<colgroup><col>
<col>
<col>
<col>
<col>

</colgroup><thead>
<tr>
<th>Function</th>

<th>Direct Argument Type(s)</th>

<th>Aggregated Argument Type(s)</th>

<th>Return Type</th>

<th>Description</th>
</tr>
</thead>

<tbody>
<tr>
<td><code class="FUNCTION">mode() WITHIN GROUP (ORDER BY
<tt class="REPLACEABLE c3">sort_expression</tt>)</code></td>

<td>&nbsp;</td>

<td>any sortable type</td>

<td>same as sort expression</td>

<td>returns the most frequent input value (arbitrarily
choosing the first one if there are multiple
equally-frequent results)</td>
</tr>

<tr>
<td><code class="FUNCTION">percentile_cont(<tt class="REPLACEABLE c3">fraction</tt>) WITHIN GROUP (ORDER BY
<tt class="REPLACEABLE c3">sort_expression</tt>)</code></td>

<td><tt class="TYPE">double precision</tt></td>

<td><tt class="TYPE">double precision</tt> or <tt class="TYPE">interval</tt></td>

<td>same as sort expression</td>

<td>continuous percentile: returns a value corresponding
to the specified fraction in the ordering, interpolating
between adjacent input items if needed</td>
</tr>

<tr>
<td><code class="FUNCTION">percentile_cont(<tt class="REPLACEABLE c3">fractions</tt>) WITHIN GROUP (ORDER BY
<tt class="REPLACEABLE c3">sort_expression</tt>)</code></td>

<td><tt class="TYPE">double precision[]</tt></td>

<td><tt class="TYPE">double precision</tt> or <tt class="TYPE">interval</tt></td>

<td>array of sort expression's type</td>

<td>multiple continuous percentile: returns an array of
results matching the shape of the <tt class="LITERAL">fractions</tt> parameter, with each non-null
element replaced by the value corresponding to that
percentile</td>
</tr>

<tr>
<td><code class="FUNCTION">percentile_disc(<tt class="REPLACEABLE c3">fraction</tt>) WITHIN GROUP (ORDER BY
<tt class="REPLACEABLE c3">sort_expression</tt>)</code></td>

<td><tt class="TYPE">double precision</tt></td>

<td>any sortable type</td>

<td>same as sort expression</td>

<td>discrete percentile: returns the first input value
whose position in the ordering equals or exceeds the
specified fraction</td>
</tr>

<tr>
<td><code class="FUNCTION">percentile_disc(<tt class="REPLACEABLE c3">fractions</tt>) WITHIN GROUP (ORDER BY
<tt class="REPLACEABLE c3">sort_expression</tt>)</code></td>

<td><tt class="TYPE">double precision[]</tt></td>

<td>any sortable type</td>

<td>array of sort expression's type</td>

<td>multiple discrete percentile: returns an array of
results matching the shape of the <tt class="LITERAL">fractions</tt> parameter, with each non-null
element replaced by the input value corresponding to that
percentile</td>
</tr>
</tbody>
</table>




Previously, we looked at percent rank.

```sql
SELECT grade
 , percent_rank() OVER (PARTITION BY grade ORDER BY price)
 , id, price
FROM houses
WHERE grade IN (1,3,4,13)
```

Now, let us look at more advanced Ordered Set window functions.

In [2]:
%%sql
SELECT grade
  , mode() WITHIN GROUP (ORDER BY price)
  , AVG(price)
FROM houses
WHERE grade IN (1,3,4,13)
GROUP BY grade;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
4 rows affected.


grade,mode,avg
1,142000.0,142000.0
3,75000.0,205666.666666667
4,355000.0,214381.034482759
13,3800000.0,3709615.38461538


These ordered set functions allow us to do slightly more advanced statistical analytics.

In [3]:
%%sql
SELECT grade
  , percentile_cont(0.25) WITHIN GROUP (ORDER BY price) as first_quartile
  , percentile_cont(0.5) WITHIN GROUP (ORDER BY price) as median
  , percentile_cont(0.75) WITHIN GROUP (ORDER BY price) as third_quartile
  , AVG(price)
FROM houses
WHERE grade IN (1,3,4,13)
GROUP BY grade;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
4 rows affected.


grade,first_quartile,median,third_quartile,avg
1,142000.0,142000.0,142000.0,142000.0
3,168500.0,262000.0,271000.0,205666.666666667
4,145000.0,205000.0,265000.0,214381.034482759
13,2415000.0,2983000.0,3800000.0,3709615.38461538


## <span style="background:yellow">Your Turn</span>

 Find the average, median and mode for the prices in grades 2, 5, and 6
  


In [4]:
%%sql
SELECT  avg(price)
        ,percentile_cont(0.5) WITHIN GROUP (ORDER BY price) as median
        ,mode() WITHIN GROUP (ORDER BY price) as mode
FROM    houses
WHERE   grade IN (2, 5, 6)
GROUP BY grade;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


avg,median,mode
248523.97107438,228700.0,275000.0
301919.637389598,275276.5,250000.0


Find the first and third quartiles for the number of bedrooms in grades 4 and 5

In [5]:
%%sql
SELECT  percentile_cont(0.25) WITHIN GROUP (ORDER BY bedrooms) as first_quartile
        ,percentile_cont(0.75) WITHIN GROUP (ORDER BY bedrooms) as third_quartile
FROM    houses
WHERE   grade IN (4, 5);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


first_quartile,third_quartile
2.0,3.0


# Save your notebook, then `File > Close and Halt`