# First Analysis for Munich Traffic DWH

## Connect

In [1]:
import configparser
%load_ext sql

In [2]:
config = configparser.ConfigParser()
config.read("dwh.cfg")

['dwh.cfg']

In [3]:
connstring = config['Red']['DWH_CS']

In [5]:
%sql connstring

'Connected: dwhuser@sbmd1'

## Data Size
Please find below some SQL queries, which show the size for each table in the data model

In [6]:
%sql SELECT COUNT(*) FROM t01_delay_fact;

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
1 rows affected.


count
314014


In [13]:
%sql SELECT COUNT(*) FROM t02_dim_time;

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
1 rows affected.


count
1143


In [14]:
%sql SELECT COUNT(*) FROM t03_dim_conn;

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
1 rows affected.


count
22797


In [15]:
%sql SELECT COUNT(*) FROM t04_dim_part;

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
1 rows affected.


count
2


In [16]:
%sql SELECT COUNT(*) FROM t05_dim_weather;

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
1 rows affected.


count
81267


## Examples for basic questions

#### What is the average delay in % per transport method?
Seems that the train times (part 1) take overall definitely longer

In [28]:
%%sql
SELECT
part_id,
AVG(delay_sec) AS Avg_Delay_Sec,
AVG(duration_sec) AS Avg_Duration_Sec,
AVG(delay_sec) / AVG(duration_sec) *100 AS Avg_Delay_Perc
FROM
t01_delay_fact
GROUP BY 
part_id

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
2 rows affected.


part_id,avg_delay_sec,avg_duration_sec,avg_delay_perc
2,-88.372477859947,4220.44777686479,-2.09391236504283
1,33.9061579963575,6398.35319470881,0.529920074190287


#### What is the average delay in % per weekday for trains?
Apparently - on a first look - there is defintely a high variance between the weekdays

In [35]:
%%sql
SELECT 
t.weekday,
AVG(delay_sec) / AVG(duration_sec) * 100 AS Avg_Delay_Perc
FROM
t01_delay_fact f
LEFT JOIN t02_dim_time t
ON t.time_id = f.time_id
WHERE f.part_id = 1
GROUP BY 
t.weekday
ORDER BY 1

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
7 rows affected.


weekday,avg_delay_perc
0,0.390186740325889
1,0.852500428660007
2,0.712473580246334
3,0.846352723273282
4,-0.0840668843770511
5,0.25278087663306
6,0.570494855400964


#### What is the average delay in % per hour for trains?
Apparently - on a first look - there is defintely a high variance between the hours of the day. And there seems to be a pattern, that during the day the delays are increasing.

In [34]:
%%sql
SELECT 
t.hour,
AVG(delay_sec) / AVG(duration_sec) * 100 AS Avg_Delay_Perc
FROM
t01_delay_fact f
LEFT JOIN t02_dim_time t
ON t.time_id = f.time_id
WHERE f.part_id = 1
GROUP BY 
t.hour
ORDER BY 1

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
24 rows affected.


hour,avg_delay_perc
0,0.554179984405692
1,0.615531758637769
2,0.428381458966565
3,0.319559717722249
4,0.296706737087606
5,0.423960189070874
6,0.642011116668265
7,1.03773465497242
8,0.894398764110785
9,0.729050581189273


#### What is the average delay per weather state for trains?
Apparently - on a first rough look - it seems that clear weather decreases the delay time.

In [37]:
%%sql
SELECT 
w.weather_status,
AVG(delay_sec) / AVG(duration_sec) AS Avg_Delay_Perc
FROM
t01_delay_fact f
LEFT JOIN t05_dim_weather w
ON w.w_id = f.w_id
WHERE f.part_id = 1
GROUP BY 
w.weather_status
ORDER BY 1

 * postgresql://dwhuser:***@redshift-cluster-1sbmd.cg1lpo6kzhm9.eu-central-1.redshift.amazonaws.com:5439/sbmd1
10 rows affected.


weather_status,avg_delay_perc
Clear,-8.72413929178074e-05
Clouds,0.0066858927275063
Drizzle,0.0077923008513474
Fog,0.0017921006690128
Mist,0.0048587014728483
Rain,0.007881918249623
Snow,0.0096397448685328
Squall,0.0017975050629725
Thunderstorm,0.0185595275756617
,0.0079785248784355
