## 0.15: New Features

- Experimental new CSV parser based on [Deephaven-CSV](https://github.com/deephaven/deephaven-csv)
- Experimental new `GeoDataFrame` class for working with geographical data (from GeoJson/Shapefile) and plotting it with [Kandy](https://github.com/Kotlin/kandy)
- Custom SQL Database support by passing the `dbType` parameter to read functions
- Full `BigInteger` support
- Improved parsing

In [1]:
// loading dependencies for the SQL examples
// this needs to be called before importing dataframe itself
USE {
    dependencies {
        implementation("com.h2database:h2:2.3.232")
        implementation("com.mysql:mysql-connector-j:9.1.0")
    }
}

In [2]:
%useLatestDescriptors

// you can enable the new experimental modules (in notebooks) in the following way:
%use dataframe(v=0.15.0, enableExperimentalCsv=true, enableExperimentalGeo=true)

Enabling experimental CSV module: dataframe-csv
Enabling experimental Geo module: dataframe-geo


### Experimental new CSV parser based on Deephaven-CSV

DataFrame's CSV parsing has been based on [Apache Commons CSV](https://commons.apache.org/proper/commons-csv/) from the beginning. While this has been sufficient for most applications, it had some issues like running out of memory, performance, and our API lacking in clarity, documentation, and completeness.
([Related issue #827](https://github.com/Kotlin/dataframe/issues/827))

For DataFrame 0.15, we introduce a new separate package [`org.jetbrains.kotlinx:dataframe-csv`](https://central.sonatype.com/artifact/org.jetbrains.kotlinx/dataframe-csv) which tries to solve all these issues at once. It's based on [Deephaven-CSV](https://github.com/deephaven/deephaven-csv) which makes it faster and more memory efficient. And since we built it from the ground up, we made sure the API was complete, predictable, and documented carefully.

To try it yourself, explicitly add the dependency [`org.jetbrains.kotlinx:dataframe-csv`](https://central.sonatype.com/artifact/org.jetbrains.kotlinx/dataframe-csv) to your project. In notebooks you can add `enableExperimentalCsv=true` to the %use-magic, as seen above.

Given a large CSV file, such as below, the chances of running out of memory are now (still possible, but) lower:

In [6]:
// Old csv function:
DataFrame.readCSV(
    "../../../../dataframe-csv/src/test/resources/largeCsv.csv.gz",
)

java.lang.OutOfMemoryError: Ran out of memory reading this CSV-like file. You can try our new experimental CSV reader by adding the dependency "org.jetbrains.kotlinx:dataframe-csv:{VERSION}" and using `DataFrame.readCsv()` instead of `DataFrame.readCSV()`.

In [3]:
// New csv function:
DataFrame.readCsv(
    "../../../../dataframe-csv/src/test/resources/largeCsv.csv.gz",
)

Year,Age,Ethnic,Sex,Area,count
2018,0,1,1,1,795
2018,0,1,1,2,5067
2018,0,1,1,3,2229
2018,0,1,1,4,1356
2018,0,1,1,5,180
2018,0,1,1,6,738
2018,0,1,1,7,630
2018,0,1,1,8,1188
2018,0,1,1,9,2157
2018,0,1,1,12,177


40 million rows! Not bad, right?

The fact we can now read this file is due to Deephaven CSV's ability to parse columns directly to the target type, like `Int`, or `Double`, instead of reading and storing everything as a `String` first and then parsing it. This saves both memory and running time. Deephaven made a [blogpost](https://deephaven.io/blog/2022/02/23/csv-reader/) if you're curious about the specifics.

DataFrame still reads everything into (boxed) memory, so there are limits to the size of the file you can read, but now the CSV reader is not a limiting factor anymore.
(Check the "Max heap size" setting if you're running this notebook in IntelliJ, and you're still running out of memory for large files).

Switching to the new API, in most cases, is as easy as swapping `readCSV` with `readCsv` (and `readTSV` with `readTsv`, etc.). However, there are a few differences in the API, so be sure to check the KDocs of the new functions.

Here's a small demonstration of the new API:

In [3]:
import java.util.Locale

DataFrame.readCsv(
    "../../../../dataframe-csv/src/test/resources/irisDataset.csv",
    delimiter = ',',

    // overwriting the given header
    header = listOf("sepalLength", "sepalWidth", "petalLength", "petalWidth", "species"),

    // skipping the first line in the file with old header
    skipLines = 1,

    // reading only 50 lines
    readLines = 50,

    // manually specifying the types of the columns, will be inferred otherwise
    colTypes = mapOf(
        "species" to ColType.String, // setting the type of the species column to String
        ColType.DEFAULT to ColType.Double, // setting type of all other columns to Double
    ),

    // manually specifying some parser options
    // Will be read from the global parser options `DataFrame.parser` otherwise
    parserOptions = ParserOptions(
        // setting the locale to US, uses `DataFrame.parser.locale` or `Locale.getDefault()` otherwise
        locale = Locale.US,
        // overriding null strings
        nullStrings = DEFAULT_DELIM_NULL_STRINGS + "nothing",
        // using the new faster double parser, true by default for readCsv
        useFastDoubleParser = true,
    ),

    // new! specifying the quote character
    quote = '\"',

    // specifying whether to ignore empty lines in between rows in the file, and plenty more options...
    ignoreEmptyLines = false,
    allowMissingColumns = true,
    ignoreSurroundingSpaces = true,
    trimInsideQuoted = false,
    parseParallel = true,
)

sepalLength,sepalWidth,petalLength,petalWidth,species
5.1,3.5,1.4,0.2,Setosa
4.9,3.0,1.4,0.2,Setosa
4.7,3.2,1.3,0.2,Setosa
4.6,3.1,1.5,0.2,Setosa
5.0,3.6,1.4,0.2,Setosa
5.4,3.9,1.7,0.4,Setosa
4.6,3.4,1.4,0.3,Setosa
5.0,3.4,1.5,0.2,Setosa
4.4,2.9,1.4,0.2,Setosa
4.9,3.1,1.5,0.1,Setosa


Since deephaven supports it, we can now also read multi-space separated files, like logs ([Relevant issue #746](https://github.com/Kotlin/dataframe/issues/746)):

In [4]:
DataFrame.readDelimStr(
    """
    NAME                     STATUS   AGE      LABELS
    argo-events              Active   2y77d    app.kubernetes.io/instance=argo-events,kubernetes.io/metadata.name=argo-events
    argo-workflows           Active   2y77d    app.kubernetes.io/instance=argo-workflows,kubernetes.io/metadata.name=argo-workflows
    argocd                   Active   5y18d    kubernetes.io/metadata.name=argocd
    beta                     Active   4y235d   kubernetes.io/metadata.name=beta
    """.trimIndent(),
    hasFixedWidthColumns = true,
)

NAME,STATUS,AGE,LABELS
argo-events,Active,2y77d,app.kubernetes.io/instance=argo-event...
argo-workflows,Active,2y77d,app.kubernetes.io/instance=argo-workf...
argocd,Active,5y18d,kubernetes.io/metadata.name=argocd


We provide single overload (with `InputStream`) which exposes the underlying implementation for when ours is not sufficient for your needs ([Relevant issue #787](https://github.com/Kotlin/dataframe/issues/787)).

In [4]:
import io.deephaven.csv.containers.ByteSlice
import io.deephaven.csv.tokenization.Tokenizer
import java.io.InputStream

DataFrame.readCsv(
    inputStream = File("../../../../dataframe-csv/src/test/resources/irisDataset.csv").inputStream(),
    adjustCsvSpecs = {
        this
            .headerLegalizer {
                it.map { it.lowercase().replace('.', '_') }.toTypedArray()
            }
            .customDoubleParser(object : Tokenizer.CustomDoubleParser {
                override fun parse(bs: ByteSlice?): Double = TODO("Not yet implemented")
                override fun parse(cs: CharSequence?): Double = TODO("Not yet implemented")
            })
            // etc..
    },
)

sepal_length,sepal_width,petal_length,petal_width,variety
5.1,3.5,1.4,0.2,Setosa
4.9,3.0,1.4,0.2,Setosa
4.7,3.2,1.3,0.2,Setosa
4.6,3.1,1.5,0.2,Setosa
5.0,3.6,1.4,0.2,Setosa
5.4,3.9,1.7,0.4,Setosa
4.6,3.4,1.4,0.3,Setosa
5.0,3.4,1.5,0.2,Setosa
4.4,2.9,1.4,0.2,Setosa
4.9,3.1,1.5,0.1,Setosa


Finally, we now support reading from ZIP files directly, along with GZIP (already demonstrated above) and custom compression formats ([Relevant issue #469](https://github.com/Kotlin/dataframe/issues/469)):

In [6]:
DataFrame.readCsv(
    "../../../../dataframe-csv/src/test/resources/testCSV.zip",
    // this can be manually specified, but is inferred automatically from the file extension
    // compression = Compression.Zip,
)

untitled,user_id,name,duplicate,username,duplicate1,duplicate11,double,number,time,empty
0,4,George,,abc,a,,1203.0,599.213,2021-01-07T15:12:32,
1,5,Paul,,paul,,,,214.211,2021-01-14T14:36:19,
2,8,Johnny,,qwerty,b,,20.0,412.214,2021-02-23T19:47,
3,10,Jack,,buk,,,2414.0,1.01,2021-03-08T23:38:52,
4,12,Samuel,,qwerty,,,inf,0.0,2021-04-01T02:30:22,


In [9]:
USE { dependencies("org.tukaani:xz:1.10", "org.apache.commons:commons-compress:1.27.1") }

In [10]:
import org.apache.commons.compress.archivers.tar.TarFile
import org.apache.commons.io.IOUtils
import org.apache.commons.compress.utils.SeekableInMemoryByteChannel

// custom compression format by specifying how to convert a compressed InputStream to a normal one
val tarCompression = Compression.Custom({ tarInputStream ->
    val tar = TarFile(SeekableInMemoryByteChannel(IOUtils.toByteArray(tarInputStream)))
    tar.getInputStream(tar.entries.first())
})

DataFrame.readCsv("irisDataset.tar", compression = tarCompression)

sepal.length,sepal.width,petal.length,petal.width,variety
5.1,3.5,1.4,0.2,Setosa
4.9,3.0,1.4,0.2,Setosa
4.7,3.2,1.3,0.2,Setosa
4.6,3.1,1.5,0.2,Setosa
5.0,3.6,1.4,0.2,Setosa
5.4,3.9,1.7,0.4,Setosa
4.6,3.4,1.4,0.3,Setosa
5.0,3.4,1.5,0.2,Setosa
4.4,2.9,1.4,0.2,Setosa
4.9,3.1,1.5,0.1,Setosa


Writing is also supported; it still uses Apache Commons CSV under the hood.
The API is similar to the reading API:

In [9]:
val irisDf = DataFrame.readCsv("../../../../dataframe-csv/src/test/resources/irisDataset.csv")

irisDf.writeCsv("irisDataset.csv")

some options can be specified:

In [10]:
irisDf.writeDelim(
    path = "irisDataset.csv",
    delimiter = ';',
    includeHeader = false,
    quoteMode = QuoteMode.ALL,
    escapeChar = '\\',
    commentChar = '#',
    headerComments = listOf("This is a comment", "This is another comment"),
    recordSeparator = "\n",
)

and similarly we have a single overload which exposes the underlying implementation:

In [11]:
irisDf.writeCsv(
    writer = File("irisDataset.csv").writer(),
    adjustCsvFormat = {
        this
            .setSkipHeaderRecord(true)
            .setHeader("sepalLength", "sepalWidth", "petalLength", "petalWidth", "species")
            .setTrailingData(true)
            .setNullString("null")
            // etc..
    },
)

### Experimental new GeoDataFrame with Kandy

[Kandy](https://github.com/Kotlin/kandy) v0.8 introduces geo-plotting which allows you to visualize geospatial/geographical data using the awesome Kandy DSL. To make working with this geographical data (from GeoJson/Shapefile) easier, we happily accepted the [GeoDataFrame PR](https://github.com/Kotlin/dataframe/pull/909) from the Kandy team ([Relevant issue #875](https://github.com/Kotlin/dataframe/issues/875))

To try it yourself, explicitly add the dependency [`org.jetbrains.kotlinx:dataframe-geo`](https://central.sonatype.com/artifact/org.jetbrains.kotlinx/dataframe-geo) to your project (with the repository `maven("https://repo.osgeo.org/repository/release")`) or add `enableExperimentalGeo=true` to the %use-magic, as seen at the start of the notebook.

Then use `GeoDataFrame.readGeoJson()` or `GeoDataFrame.readShapeFile()` to get started!

In [11]:
USE {
    repositories("https://repo.osgeo.org/repository/release")
    dependencies("org.jetbrains.kotlinx:kandy-geo:0.8.0-dev-57")
}

Here's a small demonstration of the new API, reading and plotting a GeoJson file:

In [12]:
val usaGeo = GeoDataFrame.readGeoJson("https://echarts.apache.org/examples/data/asset/geo/USA.json")
usaGeo.df

name,geometry
Alabama,"POLYGON ((-87.359296 35.00118, -85.60..."
Alaska,MULTIPOLYGON (((-131.602021 55.117982...
Arizona,"POLYGON ((-109.042503 37.000263, -109..."
Arkansas,"POLYGON ((-94.473842 36.501861, -90.1..."
California,"POLYGON ((-123.233256 42.006186, -122..."
Colorado,"POLYGON ((-107.919731 41.003906, -105..."
Connecticut,"POLYGON ((-73.053528 42.039048, -71.7..."
Delaware,"POLYGON ((-75.414089 39.804456, -75.5..."
District of Columbia,"POLYGON ((-77.035264 38.993869, -76.9..."
Florida,"POLYGON ((-85.497137 30.997536, -85.0..."


In [13]:
usaGeo.plot { geoMap() }

Let's modify the `GeoDataFrame` a bit by adding some population data and plotting that too:

In [14]:
val usPopByState = DataFrame.readCsv("us_pop_by_state.csv")
usPopByState

rank,state,state_code,2020_census,percent_of_total
1.0,California,CA,39538223,0.1191
2.0,Texas,TX,29145505,0.0874
3.0,Florida,FL,21538187,0.0647
4.0,New York,NY,20201249,0.0586
5.0,Pennsylvania,PA,13002700,0.0386
6.0,Illinois,IL,12801989,0.0382
7.0,Ohio,OH,11799448,0.0352
8.0,Georgia,GA,10711908,0.032
9.0,North Carolina,NC,10439388,0.0316
10.0,Michigan,MI,10077331,0.0301


In [15]:
val usaGeoPopulation = usaGeo.modify {
    this.join(
        usPopByState.select { state and `2020_census`.named("population") },
    ) { name match right.state }
}
usaGeoPopulation.df

name,geometry,population
Alabama,"POLYGON ((-87.359296 35.00118, -85.60...",5024279
Alaska,MULTIPOLYGON (((-131.602021 55.117982...,733391
Arizona,"POLYGON ((-109.042503 37.000263, -109...",7151502
Arkansas,"POLYGON ((-94.473842 36.501861, -90.1...",3011524
California,"POLYGON ((-123.233256 42.006186, -122...",39538223
Colorado,"POLYGON ((-107.919731 41.003906, -105...",5773714
Connecticut,"POLYGON ((-73.053528 42.039048, -71.7...",3605944
Delaware,"POLYGON ((-75.414089 39.804456, -75.5...",989948
Florida,"POLYGON ((-85.497137 30.997536, -85.0...",21538187
Georgia,"POLYGON ((-83.109191 35.00118, -83.32...",10711908


In [16]:
usaGeoPopulation.plot {
    // crop out alaska and hawaii
    x.axis.limits = -130..-65
    y.axis.limits = 25..50

    geoMap {
        tooltips(name, population)
        fillColor(population) {
            scale = continuousColorViridis()
        }
        borderLine {
            width = 0.1
            color = Color.BLACK
        }
        alpha = 0.5
        layout.style(Style.Void)
    }
}

### Custom SQL Database support

Our JDBC-based SQL integration for DataFrame has become extensible!

This means that if you have an SQL database that we currently don't support, you can
create your own `DbType` instance and read from your database to a dataframe.
(Remember that we already support quite a few databases: MariaDB, PostgreSQL, MySQL, SQLite, MS SQL, and H2 (with dialects))

To get started, we need a custom `DbType`.

For the sake of example, we'll create a custom DbType based on the `H2` Database. Ordinarily, you'd extend `DbType("jdbc name of your database")`.

In [6]:
import org.jetbrains.kotlinx.dataframe.io.db.*
import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import java.sql.ResultSet
import kotlin.reflect.KType

object CustomDbType : H2(MySql) {

    /**
     * Represents the JDBC driver class name for a given database type.
     * Something like "org.h2.Driver".
     */
    override val driverClassName: String
        get() = super.driverClassName

    /**
     * Here you define which KType you expect the column to be based on [tableColumnMetadata].
     * This is mostly for special cases, as DataFrame can already infer most types
     * from the databse automatically.
     *
     * Return `null` to let DataFrame figure out the type.
     */
    override fun convertSqlTypeToKType(tableColumnMetadata: TableColumnMetadata): KType? {
        return super.convertSqlTypeToKType(tableColumnMetadata)
    }

    /**
     * Similar to [convertSqlTypeToKType] but here you'll need to define a [ColumnSchema] for the column
     * based on [tableColumnMetadata].
     *
     * Return `null` to let DataFrame figure out the schema.
     */
    override fun convertSqlTypeToColumnSchemaValue(tableColumnMetadata: TableColumnMetadata): ColumnSchema? {
        return super.convertSqlTypeToColumnSchemaValue(tableColumnMetadata)
    }

    /**
     * Here you define where to get the table metadata for information about the database table,
     * including its name, schema name, and catalogue name.
     */
    override fun buildTableMetadata(tables: ResultSet): TableMetadata {
        return super.buildTableMetadata(tables)
    }

    /**
     * Return whether the table with metadata [tableMetadata] should be considered
     * a system table or not.
     *
     * System tables are skipped when reading.
     */
    override fun isSystemTable(tableMetadata: TableMetadata): Boolean {
        return super.isSystemTable(tableMetadata)
    }

    /**
     * Can be overridden to change DataFrame limits queries in your specific DataBase type.
     *
     * By default it executes: `"$sqlQuery LIMIT $limit"`
     */
    override fun sqlQueryLimit(sqlQuery: String, limit: Int): String {
        return super.sqlQueryLimit(sqlQuery, limit)
    }
}

Now that we have a custom `DbType` we can connect to our database (add some demo data) and retrieve it in a dataframe!

In [7]:
import org.intellij.lang.annotations.Language
import java.sql.DriverManager

val URL = "jdbc:h2:mem:test5;DB_CLOSE_DELAY=-1;MODE=MySQL;DATABASE_TO_UPPER=false"
val connection = DriverManager.getConnection(URL)

// insert some demo data
val statements = listOf(
    """
    CREATE TABLE Customer (
        id INT PRIMARY KEY,
        name VARCHAR(50),
        age INT
    )
    """.trimIndent(),
    """
    CREATE TABLE Sale (
        id INT PRIMARY KEY,
        customerId INT,
        amount DECIMAL(10, 2) NOT NULL
    )
    """.trimIndent(),
    "INSERT INTO Customer (id, name, age) VALUES (1, 'John', 40)",
    "INSERT INTO Customer (id, name, age) VALUES (2, 'Alice', 25)",
    "INSERT INTO Customer (id, name, age) VALUES (3, 'Bob', 47)",
    "INSERT INTO Customer (id, name, age) VALUES (4, NULL, NULL)",
    "INSERT INTO Sale (id, customerId, amount) VALUES (1, 1, 100.50)",
    "INSERT INTO Sale (id, customerId, amount) VALUES (2, 2, 50.00)",
    "INSERT INTO Sale (id, customerId, amount) VALUES (3, 1, 75.25)",
    "INSERT INTO Sale (id, customerId, amount) VALUES (4, 3, 35.15)",
)
statements.forEach { connection.createStatement().execute(it) }

In [8]:
// and read it :)
DataFrame.readSqlQuery(connection, "SELECT * FROM Customer")

id,name,age
1,John,40.0
2,Alice,25.0
3,Bob,47.0
4,,


On the documentation website, you can find another [example](https://kotlin.github.io/dataframe/readSqlFromCustomDatabase.html) to support custom databases.
This time, it uses HSQLDB.

### `BigInteger` support

Java has support for arbitrarily large decimal- and integer values: `BigDecimal` and `BigInteger`.
This is very helpful when working with huge numbers and `Double` and `Long` are not big enough.
Maybe Kotlin will even gain its own representation [in the future](https://youtrack.jetbrains.com/issue/KT-20912/BigDecimal-BigInteger-types-in-Kotlin-stdlib)!

DataFrame has supported `BigDecimal` for a while, but it lacked `BigInteger` support. DataFrame 0.15 fixes that.

Let's make a column with numbers so large that they can only be represented as `String`:

In [17]:
import java.math.BigInteger
import kotlin.random.Random
import kotlin.random.nextLong
import kotlin.random.nextUInt

val largestLong = Long.MAX_VALUE.toString()
val giantNumberCol: DataColumn<String> by List(10) {
    largestLong + abs(Random.nextLong()).toString()
}.toColumn()

giantNumberCol

giantNumberCol
92233720368547758075410656510493956830
9223372036854775807217965090998628954
92233720368547758078683340303582155930
92233720368547758071113169877721439960
92233720368547758074721285846526496147
92233720368547758077540148037531502473
92233720368547758071050228182348021871
92233720368547758074780588281726437941
92233720368547758073855912892018637566
92233720368547758075753933533340159539


We now have overloads to convert/parse this column to `BigInteger`, just like for the other conversions.
This also allows us to perform mathematical operations with it!

In [18]:
val bigIntCol = giantNumberCol.convertToBigInteger()

DISPLAY(bigIntCol.type)

bigIntCol * -1.toBigInteger()

java.math.BigInteger

giantNumberCol
-92233720368547758075410656510493956830
-9223372036854775807217965090998628954
-92233720368547758078683340303582155930
-92233720368547758071113169877721439960
-92233720368547758074721285846526496147
-92233720368547758077540148037531502473
-92233720368547758071050228182348021871
-92233720368547758074780588281726437941
-92233720368547758073855912892018637566
-92233720368547758075753933533340159539


We also support conversions from/to `BigInteger`, both on the column itself, and when the column is inside a dataframe:

In [19]:
val df = bigIntCol.toDataFrame()
    .convert { bigIntCol }.toBigDecimal()

DISPLAY(df.schema())
df

giantNumberCol: java.math.BigDecimal

giantNumberCol
9223372036854775807541065651049395683...
9223372036854775807217965090998628954...
9223372036854775807868334030358215593...
9223372036854775807111316987772143996...
9223372036854775807472128584652649614...
9223372036854775807754014803753150247...
9223372036854775807105022818234802187...
9223372036854775807478058828172643794...
9223372036854775807385591289201863756...
9223372036854775807575393353334015953...


Finally, statistics also support `BigInteger`, as well as all other number types.

(`.describe()` now also works a bit better, both supporting `BigInteger` as well as columns with mixed number types [Relevant issue #558](https://github.com/Kotlin/dataframe/issues/558).
We'll continue to improve the statistics functions in the next releases)

In [20]:
import java.math.BigDecimal

val bigDecimalCol: DataColumn<BigDecimal> by bigIntCol.convertTo()
val mixedNumberCol: DataColumn<Number> by bigIntCol.map {
    if (it % 2.toBigInteger() == 0.toBigInteger()) Random.nextDouble() else Random.nextInt()
}

dataFrameOf(
    bigIntCol named "bigIntCol",
    bigDecimalCol,
    mixedNumberCol,
).describe()

name,type,count,unique,nulls,top,freq,mean,std,min,median,max
bigIntCol,java.math.BigInteger,10,10,0,92233720368547758075410656510493956830,1,8393268553537846000000000000000000000...,2625017700921082500000000000000000000...,9223372036854775807217965090998628954,92233720368547758074750937064126467044,92233720368547758078683340303582155930
bigDecimalCol,java.math.BigDecimal,10,10,0,9223372036854775807541065651049395683...,1,8393268553537846000000000000000000000...,2625017700921082000000000000000000000...,9223372036854775807217965090998628954...,9223372036854775807475093706412646704...,9223372036854775807868334030358215593...
mixedNumberCol,Number,10,10,0,0.747226,1,-406581921.620887,834642739.757573,-1636052740.000000,0.146311,815386305.000000


### Improved Parsing

[Parsing](https://kotlin.github.io/dataframe/parse.html), in DataFrame, is a special case of [`convert`](https://kotlin.github.io/dataframe/convert.html).
It can convert `String` columns to any other supported type by guessing.
This can be done manually, by calling `.parse()` on a dataframe, but it also happens automatically when reading from textual data, like CSV.

In DataFrame 0.15:
- The speed of parsing and guessing types has improved
- We gained support for parsing strings to `Char`
- We have a new experimental double parser

The new double parser is based on [FastDoubleParser](https://github.com/wrandelshofer/FastDoubleParser) and
can be enabled by setting `useFastDoubleParser = true` in the parser options.

In [21]:
// enabling the fast double parser globally can be done like
DataFrame.parser.apply {
    useFastDoubleParser = true
    // you can also set other global parsing options here
}

// or you can choose to enable it per call
// Each function that parses strings should have the `parserOptions` argument:
DataFrame.readDelimStr(
    text = """
           numbers
           0,12
           100.456,23
           1,00
           """.trimIndent(),
    delimiter = ';',
    parserOptions = ParserOptions(
        locale = Locale.GERMAN,
        useFastDoubleParser = true,
    )
)

numbers
0.12
100456.23
1.0


Our implementation of `FastDoubleParser` is configured to be very forgiving, depending on the locale.

For instance, in the French, numbers are often formatted like "100 512,123", which contains a non-breaking space character (" ").
If you come across files which use normal spaces (" "), many double parsers would fail.

The same holds for the Estonian minus ("−") which is expected if your locale is set to Estonian ([Relevant issue #607](https://github.com/Kotlin/dataframe/issues/607)).

We now try to catch these cases and save you some headaches :).
Do not hesitate to provide feedback if you have a case that fails, and you think it should work!

In [22]:
val estonianNumbers by listOf(
    "12,45",
    "−13,35", // note the different minus sign '−' vs '-'
    "−204 235,23", // note the different minus sign '−' vs '-'
    "100 123,35", // space instead of NBSP
    "1,234e3",
    "-345,122", // 'ordinary' minus sign
).toColumn()

estonianNumbers.parse(ParserOptions(locale = Locale.forLanguageTag("et-EE"), useFastDoubleParser = true))

estonianNumbers
12.45
-13.35
-204235.23
100123.35
1234.0
-345.122
