## Analyzing train disruptions in the Netherlands

From https://www.rijdendetreinen.nl/en/open-data/disruptions#downloads

In [29]:
%use dataframe

In [30]:
// reading the csv by dragAndDrop
// from https://www.rijdendetreinen.nl/en/open-data/disruptions#downloads
val disruptions2023 = DataFrame.readCSV("data/disruptions/disruptions-2023.csv", delimiter = ',')
disruptions2023

rdt_id,ns_lines,rdt_lines,rdt_lines_id,rdt_station_names,rdt_station_codes,cause_nl,cause_en,statistical_cause_nl,statistical_cause_en,cause_group,start_time,end_time,duration_minutes
45999,Amsterdam-Rotterdam-Brussel (HSL),Amsterdam Centraal - Schiphol Airport...,2432.0,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",wisselstoring,points failure,wisselstoring,points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864
46000,Zwolle-Leeuwarden,Leeuwarden - Zwolle,160.0,"Heerenveen,Wolvega,Heerenveen IJsstadion","HR, WV, HRY",dier op het spoor,an animal on the railway track,dier op het spoor,an animal on the railway track,external,2023-01-01T10:31:49,2023-01-01T10:56:17,24
46001,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130.0,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-01T13:19:24,2023-01-02T00:02:39,643
46002,Zutphen-Winterswijk,Winterswijk - Zutphen,83.0,"Vorden,Zutphen","VD, ZP",aanrijding,collision,aanrijding,collision,accidents,2023-01-01T17:15:22,2023-01-01T20:14:23,179
46003,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130.0,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-02T05:57:27,2023-01-03T02:07:13,1210
46004,Amersfoort-Ede-Wageningen,Amersfoort - Ede-Wageningen,47.0,"Amersfoort Centraal,Barneveld Centrum...","AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52
46005,Dordrecht-Breda; Dordrecht-Roosendaal,"Breda - Dordrecht, Dordrecht - Roosen...",170171.0,"Dordrecht,Dordrecht Zuid,Lage Zwaluwe","DDR, DDZD, ZLW",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T07:31:33,2023-01-02T08:09:37,38
46006,'s-Hertogenbosch-Tilburg,'s-Hertogenbosch - Tilburg,69.0,"'s-Hertogenbosch,Tilburg","HT, TB",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T11:33:15,2023-01-02T11:44:27,11
46007,Rotterdam-Breda (HSL),Breda - Rotterdam Centraal (HSL),15.0,"Breda,Rotterdam Centraal","BD, RTD",gestrande trein,stranded train,gestrande trein,stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35
46008,Amsterdam-Schiphol-Rotterdam (HSL),Amsterdam Centraal - Schiphol Airport...,2432.0,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28


In [31]:
disruptions2023.schema()

rdt_id: Int
ns_lines: String
rdt_lines: String?
rdt_lines_id: Double?
rdt_station_names: String?
rdt_station_codes: String?
cause_nl: String
cause_en: String
statistical_cause_nl: String
statistical_cause_en: String
cause_group: String
start_time: kotlinx.datetime.LocalDateTime
end_time: kotlinx.datetime.LocalDateTime?
duration_minutes: Int?

 Looking at the schema, we can see it mostly parsed the data correctly.
 `rdt_lines_id: Double?` is a mistake though.
 
From the website: "These are the IDs of the lines linked to a disruption by Rijden de Treinen, separated by a comma."
Understandably, `"24,32"` is parsed as a `Double` instead of `String`. Let's try to nudge it into the right direction when reading the data
by supplying it with a manual type for this column.

Let's also rename it to camel case while we're at it.

In [32]:
val disruptions2023 = DataFrame.readCSV(
    fileOrUrl = "data/disruptions/disruptions-2023.csv",
    delimiter = ',',
    colTypes = mapOf("rdt_lines_id" to ColType.String),
).renameToCamelCase()

disruptions2023

rdtId,nsLines,rdtLines,rdtLinesId,rdtStationNames,rdtStationCodes,causeNl,causeEn,statisticalCauseNl,statisticalCauseEn,causeGroup,startTime,endTime,durationMinutes
45999,Amsterdam-Rotterdam-Brussel (HSL),Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",wisselstoring,points failure,wisselstoring,points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864
46000,Zwolle-Leeuwarden,Leeuwarden - Zwolle,160,"Heerenveen,Wolvega,Heerenveen IJsstadion","HR, WV, HRY",dier op het spoor,an animal on the railway track,dier op het spoor,an animal on the railway track,external,2023-01-01T10:31:49,2023-01-01T10:56:17,24
46001,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-01T13:19:24,2023-01-02T00:02:39,643
46002,Zutphen-Winterswijk,Winterswijk - Zutphen,83,"Vorden,Zutphen","VD, ZP",aanrijding,collision,aanrijding,collision,accidents,2023-01-01T17:15:22,2023-01-01T20:14:23,179
46003,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-02T05:57:27,2023-01-03T02:07:13,1210
46004,Amersfoort-Ede-Wageningen,Amersfoort - Ede-Wageningen,47,"Amersfoort Centraal,Barneveld Centrum...","AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52
46005,Dordrecht-Breda; Dordrecht-Roosendaal,"Breda - Dordrecht, Dordrecht - Roosen...",170171,"Dordrecht,Dordrecht Zuid,Lage Zwaluwe","DDR, DDZD, ZLW",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T07:31:33,2023-01-02T08:09:37,38
46006,'s-Hertogenbosch-Tilburg,'s-Hertogenbosch - Tilburg,69,"'s-Hertogenbosch,Tilburg","HT, TB",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T11:33:15,2023-01-02T11:44:27,11
46007,Rotterdam-Breda (HSL),Breda - Rotterdam Centraal (HSL),15,"Breda,Rotterdam Centraal","BD, RTD",gestrande trein,stranded train,gestrande trein,stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35
46008,Amsterdam-Schiphol-Rotterdam (HSL),Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28


In [33]:
disruptions2023.schema()

rdtId: Int
nsLines: String
rdtLines: String?
rdtLinesId: String?
rdtStationNames: String?
rdtStationCodes: String?
causeNl: String
causeEn: String
statisticalCauseNl: String
statisticalCauseEn: String
causeGroup: String
startTime: kotlinx.datetime.LocalDateTime
endTime: kotlinx.datetime.LocalDateTime?
durationMinutes: Int?

Now the schema looks better! One of the best things about using DataFrame in notebooks
is that in between cell calls type-safe accessors are generated for you!

In [34]:
disruptions2023.rdtLinesId

rdtLinesId
2432
160
130
83
130
47
170171
69
15
2432


We can actually make this hidden process visible by tracking all code that's executed under the hood.

Libraries for the Kotlin Jupyter kernel and notebooks can be very powerful as you can see!

In [35]:
%trackExecution
val dataFrame = dataFrameOf("a", "b")(1, 2)

Executing:

val dataFrame = dataFrameOf("a", "b")(1, 2)

Executing:
(dataFrame as org.jetbrains.kotlinx.dataframe.DataFrame<*>).cast<Line_21_jupyter._DataFrameType2>()
Executing:
val dataFrame = res96


In [36]:
%trackExecution off

In [37]:
val a = dataFrame.a
val b = dataFrame.b

a

a
1


Anyway, let's get back to our data!

Let's remove the columns we don't need and convert and rename some others.

In [38]:
// before
disruptions2023

rdtId,nsLines,rdtLines,rdtLinesId,rdtStationNames,rdtStationCodes,causeNl,causeEn,statisticalCauseNl,statisticalCauseEn,causeGroup,startTime,endTime,durationMinutes
45999,Amsterdam-Rotterdam-Brussel (HSL),Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",wisselstoring,points failure,wisselstoring,points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864
46000,Zwolle-Leeuwarden,Leeuwarden - Zwolle,160,"Heerenveen,Wolvega,Heerenveen IJsstadion","HR, WV, HRY",dier op het spoor,an animal on the railway track,dier op het spoor,an animal on the railway track,external,2023-01-01T10:31:49,2023-01-01T10:56:17,24
46001,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-01T13:19:24,2023-01-02T00:02:39,643
46002,Zutphen-Winterswijk,Winterswijk - Zutphen,83,"Vorden,Zutphen","VD, ZP",aanrijding,collision,aanrijding,collision,accidents,2023-01-01T17:15:22,2023-01-01T20:14:23,179
46003,Heerlen-Aachen Hbf,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",beperkingen in de materieelinzet,problems with the rolling stock,beperkingen in de materieelinzet,problems with the rolling stock,rolling stock,2023-01-02T05:57:27,2023-01-03T02:07:13,1210
46004,Amersfoort-Ede-Wageningen,Amersfoort - Ede-Wageningen,47,"Amersfoort Centraal,Barneveld Centrum...","AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52
46005,Dordrecht-Breda; Dordrecht-Roosendaal,"Breda - Dordrecht, Dordrecht - Roosen...",170171,"Dordrecht,Dordrecht Zuid,Lage Zwaluwe","DDR, DDZD, ZLW",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T07:31:33,2023-01-02T08:09:37,38
46006,'s-Hertogenbosch-Tilburg,'s-Hertogenbosch - Tilburg,69,"'s-Hertogenbosch,Tilburg","HT, TB",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T11:33:15,2023-01-02T11:44:27,11
46007,Rotterdam-Breda (HSL),Breda - Rotterdam Centraal (HSL),15,"Breda,Rotterdam Centraal","BD, RTD",gestrande trein,stranded train,gestrande trein,stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35
46008,Amsterdam-Schiphol-Rotterdam (HSL),Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",defecte trein,broken down train,defecte trein,broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28


In [39]:
import kotlin.time.Duration.Companion.minutes

val df1 = disruptions2023

    // we remove nsLines, dutch columns, and causeEn (as statisticalCauseEn is better according to the docs)
    .remove { nsLines and nameEndsWith("Nl") and causeEn }

    // let's also remove some rows where durationMinutes == null
    .dropNulls { durationMinutes }
    
    // Parsing minutes into kotlin.time.Duration and creating an extra date column
    .add {
        "duration" from { durationMinutes!!.minutes }
        "date" from { startTime.date }
    }

    // renaming columns to remove "rdt" and "En" from the beginning and end
    .rename { all() }.into {
        it.name
            .removePrefix("rdt")
            .replaceFirstChar { it.lowercase() }
            .removeSuffix("En")
    }

df1

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46000,Leeuwarden - Zwolle,160,"Heerenveen,Wolvega,Heerenveen IJsstadion","HR, WV, HRY",an animal on the railway track,external,2023-01-01T10:31:49,2023-01-01T10:56:17,24,24m,2023-01-01
46001,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",problems with the rolling stock,rolling stock,2023-01-01T13:19:24,2023-01-02T00:02:39,643,10h 43m,2023-01-01
46002,Winterswijk - Zutphen,83,"Vorden,Zutphen","VD, ZP",collision,accidents,2023-01-01T17:15:22,2023-01-01T20:14:23,179,2h 59m,2023-01-01
46003,Aachen Hbf - Heerlen,130,"Aachen Hbf,Eygelshoven Markt,Heerlen,...","AHBF, EGHM, HRL, HRLK, HZ, LG, AW",problems with the rolling stock,rolling stock,2023-01-02T05:57:27,2023-01-03T02:07:13,1210,20h 10m,2023-01-02
46004,Amersfoort - Ede-Wageningen,47,"Amersfoort Centraal,Barneveld Centrum...","AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ",broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52,52m,2023-01-02
46005,"Breda - Dordrecht, Dordrecht - Roosen...",170171,"Dordrecht,Dordrecht Zuid,Lage Zwaluwe","DDR, DDZD, ZLW",broken down train,rolling stock,2023-01-02T07:31:33,2023-01-02T08:09:37,38,38m,2023-01-02
46006,'s-Hertogenbosch - Tilburg,69,"'s-Hertogenbosch,Tilburg","HT, TB",broken down train,rolling stock,2023-01-02T11:33:15,2023-01-02T11:44:27,11,11m,2023-01-02
46007,Breda - Rotterdam Centraal (HSL),15,"Breda,Rotterdam Centraal","BD, RTD",stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35,35m,2023-01-02
46008,Amsterdam Centraal - Schiphol Airport...,2432,"Amsterdam Centraal,Amsterdam Lelylaan...","ASD, ASDL, ASS, RTD, SHL",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02


Almost perfect! However, we still have some list-like columns. We can split those into lists to make them more manageable.

In [40]:
val df2 = df1
    .split {
        cols(lines, linesId, stationNames, stationCodes)
    }.by(",").inplace()
    .convert { linesId.cast<List<String>>() }.with { it.map { it.toInt() } }

df2

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]","[Amsterdam Centraal, Amsterdam Lelyla...","[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46000,[Leeuwarden - Zwolle],[160],"[Heerenveen, Wolvega, Heerenveen IJss...","[HR, WV, HRY]",an animal on the railway track,external,2023-01-01T10:31:49,2023-01-01T10:56:17,24,24m,2023-01-01
46001,[Aachen Hbf - Heerlen],[130],"[Aachen Hbf, Eygelshoven Markt, Heerl...","[AHBF, EGHM, HRL, HRLK, HZ, LG, AW]",problems with the rolling stock,rolling stock,2023-01-01T13:19:24,2023-01-02T00:02:39,643,10h 43m,2023-01-01
46002,[Winterswijk - Zutphen],[83],"[Vorden, Zutphen]","[VD, ZP]",collision,accidents,2023-01-01T17:15:22,2023-01-01T20:14:23,179,2h 59m,2023-01-01
46003,[Aachen Hbf - Heerlen],[130],"[Aachen Hbf, Eygelshoven Markt, Heerl...","[AHBF, EGHM, HRL, HRLK, HZ, LG, AW]",problems with the rolling stock,rolling stock,2023-01-02T05:57:27,2023-01-03T02:07:13,1210,20h 10m,2023-01-02
46004,[Amersfoort - Ede-Wageningen],[47],"[Amersfoort Centraal, Barneveld Centr...","[AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ]",broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52,52m,2023-01-02
46005,"[Breda - Dordrecht, Dordrecht - Roose...","[170, 171]","[Dordrecht, Dordrecht Zuid, Lage Zwal...","[DDR, DDZD, ZLW]",broken down train,rolling stock,2023-01-02T07:31:33,2023-01-02T08:09:37,38,38m,2023-01-02
46006,['s-Hertogenbosch - Tilburg],[69],"['s-Hertogenbosch, Tilburg]","[HT, TB]",broken down train,rolling stock,2023-01-02T11:33:15,2023-01-02T11:44:27,11,11m,2023-01-02
46007,[Breda - Rotterdam Centraal (HSL)],[15],"[Breda, Rotterdam Centraal]","[BD, RTD]",stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35,35m,2023-01-02
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]","[Amsterdam Centraal, Amsterdam Lelyla...","[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02


In [41]:
df2.schema()

id: Int
lines: List<String>
linesId: List<Int>
stationNames: List<String>
stationCodes: List<String>
statisticalCause: String
causeGroup: String
startTime: kotlinx.datetime.LocalDateTime
endTime: kotlinx.datetime.LocalDateTime
durationMinutes: Int
duration: time.Duration
date: kotlinx.datetime.LocalDate

Done! Now let's get to work! We can find all sorts of interesting stuff:

  - What's the longest delay duration in 2023? (clicking in the table)
  - What track had the most delays in 2023?
  - What causes delays?
  - Do I have the right to complain about Dutch trains in demos?

## Cause groups

I'm actually quite interested in these causes and what makes up a "cause group".
Let's find all groups and see what causes are inside :)

Note the nested DataFrames :)

In [42]:
df2
    .groupBy { causeGroup }.aggregate {
        statisticalCause.valueCounts() into "statisticalCauses"
    }
    .sortByDesc { 
        expr { getFrameColumn("statisticalCauses").count() } 
    }

causeGroup,statisticalCauses
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
statisticalCause,count
infrastructure,DataFrame [13 x 2]statisticalCausecountsignalling and points failure289points failure229signal failure163damaged overhead wires104defective railway track87... showing only top 5 of 13 rows
statisticalCause,count
signalling and points failure,289
points failure,229
signal failure,163
damaged overhead wires,104
defective railway track,87
external,DataFrame [18 x 2]statisticalCausecountan emergency call151person on the railway track146people on the railway track72police action46fire alarm35... showing only top 5 of 18 rows
statisticalCause,count
an emergency call,151

statisticalCause,count
signalling and points failure,289
points failure,229
signal failure,163
damaged overhead wires,104
defective railway track,87

statisticalCause,count
an emergency call,151
person on the railway track,146
people on the railway track,72
police action,46
fire alarm,35

statisticalCause,count
broken down train,1704
stranded train,134
problems with the rolling stock,81
defective trains,2
the use of alternative train units,1

statisticalCause,count
collision,493
damaged railway bridge,37
damaged level crossing,7

statisticalCause,count
logistical limitations,99
disruption elsewhere,78
railway problems abroad,39
an earlier disruption,14
excessive delays,1

statisticalCause,count
repair works,146
over-running engineering works,47
engineering works,9

statisticalCause,count
staffing problems,160
strike of Arriva staff,76
strike of Keolis staff,36
staff strikes abroad,24
strike of Connexxion staff,11

statisticalCause,count
technical investigation,70
multiple disruptions,7

statisticalCause,count
weather circumstances,25
overhead wires covered with frost,11
lightning strike,9
slippery railway tracks,6
an amended timetable,3


## Which line had the most delays?

In [43]:
val byLines = df2
    .explode { lines }
    .groupBy { lines }

byLines.count().sortByDesc("count")

lines,count
Amsterdam Centraal - Schiphol Airport,258
Rotterdam Centraal - Schiphol Airport...,248
Amersfoort - Schiphol Airport,209
Leiden Centraal - Schiphol Airport,200
Lelystad Centrum - Schiphol Airport,194
Breda - Rotterdam Centraal (HSL),165
Den Haag HS - Rotterdam Centraal,165
Schiphol Airport - Utrecht Centraal,160
Amsterdam Centraal - Utrecht Centraal,149
Dordrecht - Rotterdam Centraal,147


Well, what a surprise that was!

Now, this was per line, what about per station? The data also provides the affected stations in each line:

In [44]:
val byStation = df2
    .explode { stationNames }
    .groupBy { stationNames }

byStation.count().sortByDesc("count")

stationNames,count
Rotterdam Centraal,612
Schiphol Airport,493
Amsterdam Centraal,392
Utrecht Centraal,324
Amsterdam Sloterdijk,308
Breda,289
Arnhem Centraal,264
Zwolle,260
Leiden Centraal,248
Amersfoort Centraal,221


Let's get some more information about the duration of the delay, because just a count doesn't tell the whole story.

In [45]:
byStation.aggregate {
    duration.describe().first() into "duration"
}

stationNames,duration,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0
Unnamed: 0_level_1,name,type,count,unique,nulls,top,freq,min,median,max
Amsterdam Centraal,duration,Duration,392,215,0,37m,7,0s,1h 10m,5d 1h 54m
Amsterdam Lelylaan,duration,Duration,198,140,0,17m,5,1m,1h 6m,2d 4h 33m
Amsterdam Sloterdijk,duration,Duration,308,176,0,31m,8,0s,1h 3m,2d 4h 33m
Rotterdam Centraal,duration,Duration,612,226,0,6m,15,0s,42m,13d 7h 1m
Schiphol Airport,duration,Duration,493,220,0,6m,14,0s,48m,13d 7h 1m
Heerenveen,duration,Duration,33,32,0,24m,2,2m,51m,14h 44m
Wolvega,duration,Duration,31,29,0,21m,2,2m,51m,14h 44m
Heerenveen IJsstadion,duration,Duration,26,26,0,24m,1,2m,50m,14h 44m
Aachen Hbf,duration,Duration,33,31,0,9m,2,3m,10h 43m,7d 16h 26m
Eygelshoven Markt,duration,Duration,73,64,0,6m,3,2m,5h 29m,9d 13h 51m


Interesting! We have another 'winner'.

I don't know about you, but this requires some visualization, doesn't it?

Let's use Kandy, as it has excellent integration with notebooks and DataFrame.

Let's take a look at the examples: https://kotlin.github.io/kandy/examples.html

Looks like a boxplot can best show the results of a top-10 of "worst" stations.

In [46]:
%use kandy

In [47]:
val top10 = byStation.sortByGroupDesc {
    count()
//    durationMinutes.mean()
//    count() * durationMinutes.median()
//    count() * durationMinutes.mean()
}.filter { it.index() < 10 }

top10

stationNames,group,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
Rotterdam Centraal,"DataFrame [612 x 12]idlineslinesIdstationNamesstationCodesstatisticalCausecauseGroupstartTimeendTimedurationMinutesdurationdate45999[Amsterdam Centraal - Schiphol Airpor...[24, 32]Rotterdam Centraal[ASD, ASDL, ASS, RTD, SHL]points failureinfrastructure2023-01-01T08:19:262023-01-01T22:43:0886414h 24m2023-01-0146007[Breda - Rotterdam Centraal (HSL)][15]Rotterdam Centraal[BD, RTD]stranded trainrolling stock2023-01-02T11:50:112023-01-02T12:25:393535m2023-01-0246008[Amsterdam Centraal - Schiphol Airpor...[24, 32]Rotterdam Centraal[ASD, ASDL, ASS, RTD, SHL]broken down trainrolling stock2023-01-02T12:40:112023-01-02T13:08:082828m2023-01-0246022[Rotterdam Centraal - Schiphol Airpor...[24]Rotterdam Centraal[RTD, SHL]broken down trainrolling stock2023-01-03T14:54:422023-01-03T15:02:5488m2023-01-0346030[Breda - Rotterdam Centraal (HSL)][15]Rotterdam Centraal[BD, RTD]person on the railway trackexternal2023-01-04T09:15:072023-01-04T12:37:352023h 22m2023-01-04... showing only top 5 of 612 rows",,,,,,,,,,
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Rotterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46007,[Breda - Rotterdam Centraal (HSL)],[15],Rotterdam Centraal,"[BD, RTD]",stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35,35m,2023-01-02
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Rotterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02
46022,[Rotterdam Centraal - Schiphol Airpor...,[24],Rotterdam Centraal,"[RTD, SHL]",broken down train,rolling stock,2023-01-03T14:54:42,2023-01-03T15:02:54,8,8m,2023-01-03
46030,[Breda - Rotterdam Centraal (HSL)],[15],Rotterdam Centraal,"[BD, RTD]",person on the railway track,external,2023-01-04T09:15:07,2023-01-04T12:37:35,202,3h 22m,2023-01-04
Schiphol Airport,"DataFrame [493 x 12]idlineslinesIdstationNamesstationCodesstatisticalCausecauseGroupstartTimeendTimedurationMinutesdurationdate45999[Amsterdam Centraal - Schiphol Airpor...[24, 32]Schiphol Airport[ASD, ASDL, ASS, RTD, SHL]points failureinfrastructure2023-01-01T08:19:262023-01-01T22:43:0886414h 24m2023-01-0146008[Amsterdam Centraal - Schiphol Airpor...[24, 32]Schiphol Airport[ASD, ASDL, ASS, RTD, SHL]broken down trainrolling stock2023-01-02T12:40:112023-01-02T13:08:082828m2023-01-0246022[Rotterdam Centraal - Schiphol Airpor...[24]Schiphol Airport[RTD, SHL]broken down trainrolling stock2023-01-03T14:54:422023-01-03T15:02:5488m2023-01-0346040[Rotterdam Centraal - Schiphol Airpor...[24]Schiphol Airport[RTD, SHL]disruption elsewherelogistical2023-01-05T16:42:562023-01-05T17:17:383535m2023-01-0546071[Amsterdam Centraal - Schiphol Airpor...[24, 32]Schiphol Airport[ASD, ASDL, ASS, RTD, SHL]broken down trainrolling stock2023-01-09T18:37:382023-01-09T19:08:173131m2023-01-09... showing only top 5 of 493 rows",,,,,,,,,,
id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Schiphol Airport,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Rotterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46007,[Breda - Rotterdam Centraal (HSL)],[15],Rotterdam Centraal,"[BD, RTD]",stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35,35m,2023-01-02
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Rotterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02
46022,[Rotterdam Centraal - Schiphol Airpor...,[24],Rotterdam Centraal,"[RTD, SHL]",broken down train,rolling stock,2023-01-03T14:54:42,2023-01-03T15:02:54,8,8m,2023-01-03
46030,[Breda - Rotterdam Centraal (HSL)],[15],Rotterdam Centraal,"[BD, RTD]",person on the railway track,external,2023-01-04T09:15:07,2023-01-04T12:37:35,202,3h 22m,2023-01-04

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Schiphol Airport,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Schiphol Airport,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02
46022,[Rotterdam Centraal - Schiphol Airpor...,[24],Schiphol Airport,"[RTD, SHL]",broken down train,rolling stock,2023-01-03T14:54:42,2023-01-03T15:02:54,8,8m,2023-01-03
46040,[Rotterdam Centraal - Schiphol Airpor...,[24],Schiphol Airport,"[RTD, SHL]",disruption elsewhere,logistical,2023-01-05T16:42:56,2023-01-05T17:17:38,35,35m,2023-01-05
46071,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Schiphol Airport,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-09T18:37:38,2023-01-09T19:08:17,31,31m,2023-01-09

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Centraal,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02
46011,[Amsterdam Centraal - Utrecht Centraal],[136],Amsterdam Centraal,"[AC, ASA, ASB, ASD, ASDM, ASHD, BKL, ...",an animal on the railway track,external,2023-01-02T16:05:42,2023-01-02T17:29:03,83,1h 23m,2023-01-02
46018,"[Amersfoort - Amsterdam Centraal, Ams...","[135, 145]",Amsterdam Centraal,"[ASD, ASDM, ASSP, DMN, WP]",damaged overhead wires,infrastructure,2023-01-03T05:57:26,2023-01-03T06:15:27,18,18m,2023-01-03
46061,[Amsterdam Centraal - Lelystad Centrum],[145],Amsterdam Centraal,"[ALM, ALMM, ASD, ASDM, ASSP, DMN, WP,...",broken down train,rolling stock,2023-01-07T14:22:30,2023-01-07T15:28:09,66,1h 6m,2023-01-07

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46011,[Amsterdam Centraal - Utrecht Centraal],[136],Utrecht Centraal,"[AC, ASA, ASB, ASD, ASDM, ASHD, BKL, ...",an animal on the railway track,external,2023-01-02T16:05:42,2023-01-02T17:29:03,83,1h 23m,2023-01-02
46043,[Den Haag Centraal - Utrecht Centraal...,"[142, 143, 147]",Utrecht Centraal,"[UT, UTT, VTN, WD, UTLR]",points failure,infrastructure,2023-01-05T18:25:42,2023-01-05T18:55:42,30,30m,2023-01-05
46046,[Almere Oostvaarders - Utrecht Centra...,"[40, 149]",Utrecht Centraal,"[HOR, HVS, HVSP, UT, UTO]",person on the railway track,external,2023-01-05T20:33:52,2023-01-05T21:27:54,54,54m,2023-01-05
46054,[Den Haag Centraal - Utrecht Centraal...,"[142, 143]",Utrecht Centraal,"[GD, GDG, GVC, UT, UTT, VB, VTN, WD, ...",broken down train,rolling stock,2023-01-06T13:11:54,2023-01-06T16:04:51,173,2h 53m,2023-01-06
46067,[Amsterdam Centraal - Utrecht Centraal],[136],Utrecht Centraal,"[AC, ASA, ASB, ASD, ASDM, ASHD, BKL, ...",copper theft,external,2023-01-09T05:53:36,2023-01-09T09:40:58,227,3h 47m,2023-01-09

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
45999,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Sloterdijk,"[ASD, ASDL, ASS, RTD, SHL]",points failure,infrastructure,2023-01-01T08:19:26,2023-01-01T22:43:08,864,14h 24m,2023-01-01
46008,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Sloterdijk,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-02T12:40:11,2023-01-02T13:08:08,28,28m,2023-01-02
46071,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Sloterdijk,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-09T18:37:38,2023-01-09T19:08:17,31,31m,2023-01-09
46111,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Sloterdijk,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-12T11:30:01,2023-01-12T12:04:43,35,35m,2023-01-12
46117,[Amsterdam Centraal - Schiphol Airpor...,"[24, 32]",Amsterdam Sloterdijk,"[ASD, ASDL, ASS, RTD, SHL]",broken down train,rolling stock,2023-01-12T13:21:48,2023-01-12T16:26:20,185,3h 5m,2023-01-12

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46007,[Breda - Rotterdam Centraal (HSL)],[15],Breda,"[BD, RTD]",stranded train,rolling stock,2023-01-02T11:50:11,2023-01-02T12:25:39,35,35m,2023-01-02
46030,[Breda - Rotterdam Centraal (HSL)],[15],Breda,"[BD, RTD]",person on the railway track,external,2023-01-04T09:15:07,2023-01-04T12:37:35,202,3h 22m,2023-01-04
46037,[Breda - Tilburg],[68],Breda,"[BD, GZ, TB, TBR, TBU]",collision,accidents,2023-01-04T23:18:57,2023-01-05T00:35:02,76,1h 16m,2023-01-04
46074,[Breda - Rotterdam Centraal (HSL)],[15],Breda,"[BD, RTD]",broken down train,rolling stock,2023-01-09T22:02:46,2023-01-09T22:02:59,0,0s,2023-01-09
46168,[Breda - Tilburg],[68],Breda,"[BD, GZ, TB, TBR, TBU]",broken down train,rolling stock,2023-01-16T15:08:50,2023-01-16T15:15:52,7,7m,2023-01-16

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46013,[Arnhem Centraal - Nijmegen],[59],Arnhem Centraal,"[AH, AHZ, EST, NM, NML]",broken down train,rolling stock,2023-01-02T18:25:24,2023-01-02T18:33:15,8,8m,2023-01-02
46038,[Arnhem Centraal - Zutphen],[132],Arnhem Centraal,"[AH, AHP, AHPR, BMN, DR, RH, VP, ZP]",collision,accidents,2023-01-05T00:50:35,2023-01-05T04:38:45,228,3h 48m,2023-01-05
46055,[Arnhem Centraal - Zutphen],[132],Arnhem Centraal,"[AH, AHP, AHPR, BMN, DR, RH, VP, ZP]",broken down train,rolling stock,2023-01-06T13:21:58,2023-01-06T13:28:34,7,7m,2023-01-06
46090,[Arnhem Centraal - Nijmegen],[59],Arnhem Centraal,"[AH, AHZ, EST, NM, NML]",signal failure,infrastructure,2023-01-10T22:03:40,2023-01-11T01:07:47,184,3h 4m,2023-01-10
46092,"[Arnhem Centraal - Düsseldorf Hbf, Ar...","[129, 178]",Arnhem Centraal,"[AH, AHP, DVN, EM, ZV, WTV, EL]",railway problems abroad,logistical,2023-01-11T04:50:46,2023-01-11T05:04:57,14,14m,2023-01-11

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46060,[Kampen - Zwolle],[49],Zwolle,"[KPN, ZL, ZLSH]",staffing problems,staff,2023-01-07T09:15:50,2023-01-07T10:31:07,75,1h 15m,2023-01-07
46064,"[Groningen - Zwolle, Leeuwarden - Zwo...","[146, 160]",Zwolle,"[MP, ZL]",collision,accidents,2023-01-07T20:35:58,2023-01-08T01:51:56,316,5h 16m,2023-01-07
46069,"[Groningen - Zwolle, Leeuwarden - Zwo...","[146, 160]",Zwolle,"[MP, ZL]",people on the railway track,external,2023-01-09T09:49:42,2023-01-09T10:08:09,18,18m,2023-01-09
46094,[Kampen - Zwolle],[49],Zwolle,"[KPN, ZL, ZLSH]",signal failure,infrastructure,2023-01-11T10:48,2023-01-11T11:32:39,45,45m,2023-01-11
46110,[Deventer - Zwolle],[94],Zwolle,"[DV, OST, WH, ZL]",broken down train,rolling stock,2023-01-12T11:06:03,2023-01-12T11:24:44,19,19m,2023-01-12

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46027,[Haarlem - Leiden Centraal],[23],Leiden Centraal,"[HAD, HIL, HLM, LEDN, VH]",broken down train,rolling stock,2023-01-04T08:21:58,2023-01-04T09:22:26,60,1h,2023-01-04
46041,[Leiden Centraal - Schiphol Airport],[22],Leiden Centraal,"[HFD, LEDN, NVP, SSH]",person on the railway track,external,2023-01-05T17:24:27,2023-01-05T18:17:43,53,53m,2023-01-05
46072,[Haarlem - Leiden Centraal],[23],Leiden Centraal,"[HAD, HIL, HLM, LEDN, VH]",collision,accidents,2023-01-09T19:33:45,2023-01-09T19:34:54,1,1m,2023-01-09
46079,[Den Haag Centraal - Leiden Centraal],[169],Leiden Centraal,"[DVNK, GVC, GVM, LAA, LEDN, VST]",people on the railway track,external,2023-01-10T12:06:01,2023-01-10T12:08:17,2,2m,2023-01-10
46086,[Leiden Centraal - Utrecht Centraal],[147],Leiden Centraal,"[APN, LDL, LEDN]",an emergency call,external,2023-01-10T17:59:52,2023-01-10T18:06:33,7,7m,2023-01-10

id,lines,linesId,stationNames,stationCodes,statisticalCause,causeGroup,startTime,endTime,durationMinutes,duration,date
46004,[Amersfoort - Ede-Wageningen],[47],Amersfoort Centraal,"[AMF, BNC, BNN, ED, EDC, LTN, HVL, BNZ]",broken down train,rolling stock,2023-01-02T06:36:39,2023-01-02T07:28:16,52,52m,2023-01-02
46056,[Amersfoort - Apeldoorn],[50],Amersfoort Centraal,"[AMF, APD]",broken down train,rolling stock,2023-01-06T17:03:42,2023-01-06T18:15:46,72,1h 12m,2023-01-06
46057,[Amersfoort - Ede-Wageningen],[47],Amersfoort Centraal,"[AMF, BNC, BNN, HVL, BNZ]",broken down train,rolling stock,2023-01-06T17:13:34,2023-01-06T18:26:51,73,1h 13m,2023-01-06
46065,[Amersfoort - Apeldoorn],[50],Amersfoort Centraal,"[AMF, APD]",collision,accidents,2023-01-07T22:29:52,2023-01-08T01:52:01,202,3h 22m,2023-01-07
46083,[Amersfoort - Utrecht Centraal],[134],Amersfoort Centraal,"[AMF, BHV, DLD, UT, UTO]",collision,accidents,2023-01-10T14:29:32,2023-01-10T16:54:18,145,2h 25m,2023-01-10


In [48]:
top10.boxplot {
    x(stationNames named "name")
    y(durationMinutes)
}.configure {
    y { scale = continuous(transform = Transformation.LOG10) }

    layout {
        size = 1000 to 500
    }
}

## Do I have the right to complain about Dutch trains in a demo?