## Do I have the right to complain about Dutch trains in a demo?
As a student, I had to take the train from Breda to Eindhoven and back multiple times a week.
I did experience a lot of delays, but can we visualize this?

I collected all data from 2011-2023 from rijdendetreinen.nl into an SQLite database which we can also access from DataFrame!

In [None]:
// Adding support for the sqlite jdbc driver before importing DataFrame
USE { dependencies("org.xerial:sqlite-jdbc:3.45.1.0") }

In [None]:
%use dataframe, kandy

Making a connection to the database and then reading the right table.

In [None]:
val connection = DatabaseConfiguration(
    url = "jdbc:sqlite:data/disruptions/disruptions-history.sqlite",
)

In [None]:
val df1 = DataFrame.readSqlTable(
    dbConfig = connection,
    tableName = "disruptions",
    limit = Int.MAX_VALUE,
)

df1

The table has 50k rows, which for DataFrame is not too large, but if your table has millions of rows, it may be beneficial to only load a given query into the DF. DataFrame runs in memory to be as flexible as possible and to infer the type-safe accessors.

Let's try this with weather related disruptions and see in which months most weather related disruptions occur.

In [None]:
val weatherDisruptions = DataFrame.readSqlQuery(
    dbConfig = connection,
    sqlQuery = """SELECT * FROM disruptions WHERE cause_group="weather"""",
    limit = Int.MAX_VALUE,
).parse().renameToCamelCase()

weatherDisruptions

In [None]:
import kotlinx.datetime.Month

val month by column<Month>()

val weatherDisruptionsCounts = weatherDisruptions
    .groupBy { expr { startTime.month } into month }
    .count()
    .sortBy(month)

weatherDisruptionsCounts

In [None]:
weatherDisruptionsCounts.countPlot { x(month); weight(count) }

But back to the entire table! Let's parse it similarly to in the other notebook, but just a bit more quickly (and unsafely).

In [None]:
import kotlin.time.Duration.Companion.minutes

/**
 * A line consists of two stations where firstStation
 * is always alphabetically first.
 */
data class Line private constructor(
    val firstStation: String,
    val secondStation: String,
) {
    companion object {
        operator fun invoke(station: String, otherStation: String): Line {
            val (a, b) = listOf(station, otherStation).sorted()
            return Line(a, b)
        }

        fun parseOrNull(rdtString: String): Line? {
            val stations = rdtString.split(" - ")
            return invoke(
                stations.getOrNull(0) ?: return null,
                stations.getOrNull(1) ?: return null,
            )
        }
    }

    override fun toString(): String = "$firstStation <-> $secondStation"
}

val allDisruptions = df1
    .parse() // parse string columns
    .renameToCamelCase()
    .remove { "nsLines" and nameEndsWith("Nl") and "causeEn" } // remove unnecessary columns
    .update { "durationMinutes"<String?>() }.where { it.isNullOrBlank() }.withZero() // imputing blank durations
    .add { // adding helper columns
        "duration" from { "durationMinutes"<String>().toInt().minutes }
        "date" from { "startTime"<LocalDateTime>().date }
    }
    .rename { all() }.into { // renaming
        it.name
            .removePrefix("rdt")
            .replaceFirstChar { it.lowercase() }
            .removeSuffix("En")
    }
    .split("lines", "linesId", "stationNames", "stationCodes").by { // splitting list-like string columns into lists
        (it as String?)
            .takeUnless { it.isNullOrBlank() }
            ?.let { it.split(",") }
            ?: emptyList()
    }.inplace()
    .convert { "linesId"<List<String>>() }.with { it.map { it.toInt() } } // converting linesId to List<Int> todo remove?
    .convert { "lines"<List<String>>() }.with { it.mapNotNull { Line.parseOrNull(it) } } // converting lines to List<Line> 
    .sortBy("startTime") // sort :)

allDisruptions

Now we've got all disruptions, let's find the ones related to me.
I used to take the InterCity from Breda to Eindhoven and back. It stopped in Tilburg too.
I'll count a disruption when it happened either in Breda <-> Tilburg or Tilburg <-> Eindhoven.


In [None]:
val bredaTilburg = Line("Breda", "Tilburg")
val tilburgEindhoven = Line("Tilburg", "Eindhoven")

val relatedToMe = allDisruptions.filter {
    bredaTilburg in lines || tilburgEindhoven in lines
}

relatedToMe

We've got 11k results! That's a lot.
Let's plot them over time and see what types of disruptions were most common.

In [None]:
val monthAndYear by column<LocalDate>()
val count by column<Int>()

// temp column containing all months from 2011 to 2023 as LocalDate(year, month, 1)
val dates = buildList {
    for (year in 2011..2023) {
        for (month in 1..12) {
            add(LocalDate(year = year, monthNumber = month, dayOfMonth = 1))
        }
    }
}.toColumn(monthAndYear)
    .toDataFrame()

val relatedGrouped = relatedToMe

    // group by causeGroup and expression (temp) column monthAndYear to count the accidents
    .groupBy {
        causeGroup and expr {
            LocalDate(year = date.year, month = date.month, dayOfMonth = 1)
        }.into(monthAndYear)
    }.count(count.name())

    // group just by causeGroup and make sure each group has a count value for each date
    .groupBy { causeGroup }.updateGroups {
        val causeGroupName = causeGroup.first()
        rightJoin(dates) // matches monthAndYear
            .fillNulls { causeGroup }.with { causeGroupName }
            .fillNulls(count).with { 0 }
            .sortBy { monthAndYear }
    }

relatedGrouped

In [None]:
relatedGrouped.plot {
    x.axis.breaks(format = "%B %Y")
    
    area {
        x(monthAndYear)
        y(count)
        fillColor(causeGroup)
        position = Position.stack()
        borderLine.width = 0.5
    }

    layout.size = 1000 to 700
}

## So... yes? 

## Now, will you get to complain about Dutch trains this evening too?