Permalink
Browse files

add tutorial on using cross-tab in postgresql

  • Loading branch information...
1 parent 949a626 commit 4bbbdee8a93cc272f0e010a0bd4d1d4f3d53a89c @alistairewj alistairewj committed May 25, 2017
Showing with 99 additions and 0 deletions.
  1. +99 −0 tutorials/sql-crosstab.md
View
@@ -0,0 +1,99 @@
+# Cross Tabulation
+
+Cross tabulation, shortened to `crosstab`, is a very common operation when working with data.
+The goal is to count the number of observations stratified by two groups.
+
+PostgreSQL is packaged with an extension which allows cross tabulation.
+First, enable the extension for the database (requires PostgreSQL v9.1 or higher):
+
+```sql
+CREATE EXTENSION tablefunc;
+```
+
+You only need to run this operation *once* per database.
+
+Let's try cross tabulating the admission and discharge locations for all hospital admissions.
+
+First, we need to find out the unique values for the two columns:
+
+```sql
+select distinct admission_location from admissions order by admission_location;
+```
+
+... returns:
+
+```
+ admission_location
+---------------------------
+ CLINIC REFERRAL/PREMATURE
+ EMERGENCY ROOM ADMIT
+ HMO REFERRAL/SICK
+ ** INFO NOT AVAILABLE **
+ PHYS REFERRAL/NORMAL DELI
+ TRANSFER FROM HOSP/EXTRAM
+ TRANSFER FROM OTHER HEALT
+ TRANSFER FROM SKILLED NUR
+ TRSF WITHIN THIS FACILITY
+(9 rows)
+```
+
+
+```sql
+select distinct admission_type from admissions order by admission_type;
+```
+
+... returns:
+
+```
+ admission_type
+----------------
+ ELECTIVE
+ EMERGENCY
+ NEWBORN
+ URGENT
+(4 rows)
+```
+
+Now, with knowledge of the unique values, we can cross tabulate data:
+
+```sql
+SELECT *
+FROM crosstab(
+ 'SELECT admission_location, admission_type, count(*) as ct
+ FROM admissions
+ GROUP BY admission_location, admission_type
+ ORDER BY 1,2'
+ -- below, we list all the unique values in admission_type
+ -- these will become the columns
+ ,'SELECT DISTINCT admission_type FROM admissions ORDER BY 1'
+ )
+AS ct (
+ -- first column has each unique value for the rows
+ "Admission Location" text,
+ -- now we list the columns
+ "ELECTIVE" text, "EMERGENCY" text,
+ "NEWBORN" text, "URGENT" text
+);
+```
+
+... returns:
+
+```
+ Admission Location | ELECTIVE | EMERGENCY | NEWBORN | URGENT
+---------------------------+----------+-----------+---------+--------
+ CLINIC REFERRAL/PREMATURE | 25 | 10002 | 1987 | 18
+ EMERGENCY ROOM ADMIT | | 22754 | |
+ HMO REFERRAL/SICK | | 1 | 101 |
+ ** INFO NOT AVAILABLE ** | | 5 | 199 |
+ PHYS REFERRAL/NORMAL DELI | 7646 | 1432 | 5553 | 448
+ TRANSFER FROM HOSP/EXTRAM | 19 | 7565 | 23 | 849
+ TRANSFER FROM OTHER HEALT | 3 | 61 | | 7
+ TRANSFER FROM SKILLED NUR | 13 | 246 | | 14
+ TRSF WITHIN THIS FACILITY | | 5 | |
+(9 rows)
+```
+
+Which is an interesting view of the data! In particular, we can see some sensible counts.
+Elective patients and newborn patients are never admitted from the emergency room.
+Interestingly, urgent patients are never admitted from the emergency room either,
+which helps highlight the subtle differences between `EMERGENCY` and `URGENT`.

0 comments on commit 4bbbdee

Please sign in to comment.