Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 65 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Kotlin DataFrame: typesafe in-memory structured data processing for JVM

[![JetBrains incubator project](https://jb.gg/badges/incubator.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
[![Kotlin component beta stability](https://img.shields.io/badge/project-beta-kotlin.svg?colorA=555555&colorB=DB3683&label=&logo=kotlin&logoColor=ffffff&logoWidth=10)](https://kotlinlang.org/docs/components-stability.html)
[![Kotlin](https://img.shields.io/badge/kotlin-2.0.20-blue.svg?logo=kotlin)](http://kotlinlang.org)
Expand All @@ -8,29 +9,37 @@
[![GitHub License](https://img.shields.io/badge/license-Apache%20License%202.0-blue.svg?style=flat)](http://www.apache.org/licenses/LICENSE-2.0)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Kotlin/dataframe/HEAD)

Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPL.
Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full
power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and
REPL.

* **Hierarchical** — represents hierarchical data structures, such as JSON or a tree of JVM objects.
* **Functional** — the data processing pipeline is organized in a chain of `DataFrame` transformation operations.
* **Immutable** — every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's possible.
* **Immutable** — every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's
possible.
* **Readable** — data transformation operations are defined in DSL close to natural language.
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
* **Minimalistic** — simple, yet powerful data model of three column kinds.
* **Interoperable** — convertable with Kotlin data classes and collections. This also means conversion to/from other libraries' data structures is usually quite straightforward!
* **Interoperable** — convertable with Kotlin data classes and collections. This also means conversion to/from other
libraries' data structures is usually quite straightforward!
* **Generic** — can store objects of any type, not only numbers or strings.
* **Typesafe** — on-the-fly [generation of extension properties](https://kotlin.github.io/dataframe/extensionpropertiesapi.html) for type safe data access with Kotlin-style care for null safety.
* **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in a dataframe but doesn't care about other columns.
* **Typesafe** —
on-the-fly [generation of extension properties](https://kotlin.github.io/dataframe/extensionpropertiesapi.html) for
type safe data access with Kotlin-style care for null safety.
* **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires
a special subset of columns in a dataframe but doesn't care about other columns.
In notebooks this works out-of-the-box. In ordinary projects this requires casting (for now).

Integrates with [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections and [pandas](https://pandas.pydata.org/)
Integrates with [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections
and [pandas](https://pandas.pydata.org/)

## 🚀 Quickstart

Looking for a fast and simple way to learn the basics?
Get started in minutes with our [Quickstart Guide](https://kotlin.github.io/dataframe/quickstart.html).

It walks you through the core features of Kotlin DataFrame with minimal setup and clear examples
It walks you through the core features of Kotlin DataFrame with minimal setup and clear examples
— perfect for getting up to speed in just a few minutes.

[![quickstart_preview](docs/StardustDocs/images/guides/quickstart_preview.png)](https://kotlin.github.io/dataframe/quickstart.html)
Expand All @@ -54,7 +63,7 @@ You could find the following articles there:

### What's new

1.0.0-Beta2: [Release notes](https://github.com/Kotlin/dataframe/releases/tag/v1.0.0-Beta2)
1.0.0-Beta3: [Release notes](https://github.com/Kotlin/dataframe/releases/tag/v1.0.0-Beta3)

Check out this [notebook with new features](examples/notebooks/feature_overviews/0.15/new_features.ipynb) in v0.15.

Expand All @@ -66,7 +75,7 @@ Check out this [notebook with new features](examples/notebooks/feature_overviews
### Kotlin Notebook

You can use Kotlin DataFrame in [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html),
or other interactive environment with [Kotlin Jupyter Kernel](https://github.com/Kotlin/kotlin-jupyter) support,
or other interactive environment with [Kotlin Jupyter Kernel](https://github.com/Kotlin/kotlin-jupyter) support,
such as [Datalore](https://datalore.jetbrains.com/),
and [Jupyter Notebook](https://jupyter.org/).

Expand All @@ -90,7 +99,7 @@ Or manually specify the version:
%use dataframe($dataframe_version)
```

Refer to the
Refer to the
[Get started with Kotlin DataFrame in Kotlin Notebook](https://kotlin.github.io/dataframe/gettingstartedkotlinnotebook.html)
for details.

Expand All @@ -100,7 +109,7 @@ Add dependencies in the build.gradle.kts script:

```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta3")
}
```

Expand All @@ -115,7 +124,7 @@ repositories {
Refer to the
[Get started with Kotlin DataFrame on Gradle](https://kotlin.github.io/dataframe/gettingstartedgradle.html)
for details.
Also, check out the [custom setup page](https://kotlin.github.io/dataframe/gettingstartedgradleadvanced.html)
Also, check out the [custom setup page](https://kotlin.github.io/dataframe/gettingstartedgradleadvanced.html)
if you don't need some formats as dependencies,
for Groovy, and for configurations specific to Android projects.

Expand All @@ -124,32 +133,32 @@ for Groovy, and for configurations specific to Android projects.
This example of Kotlin DataFrame code with
the [Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html) enabled.
See [the full project](https://github.com/Kotlin/dataframe/tree/master/examples/kotlin-dataframe-plugin-example).
See also
See also
[this example in Kotlin Notebook](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/readme_example.ipynb).

```kotlin
val df = DataFrame
// Read DataFrame from the CSV file.
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
// And convert it to match the `Repositories` schema.
.convertTo<Repositories>()
// Read DataFrame from the CSV file.
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
// And convert it to match the `Repositories` schema.
.convertTo<Repositories>()

// Update the DataFrame.
val reposUpdated = repos
// Rename columns to CamelCase.
.renameToCamelCase()
// Rename "stargazersCount" column to "stars".
.rename { stargazersCount }.into("stars")
// Filter by the number of stars:
.filter { stars > 50 }
// Convert values in the "topic" column (which were `String` initially)
// to the list of topics.
.convert { topics }.with {
val inner = it.removeSurrounding("[", "]")
// Rename columns to CamelCase.
.renameToCamelCase()
// Rename "stargazersCount" column to "stars".
.rename { stargazersCount }.into("stars")
// Filter by the number of stars:
.filter { stars > 50 }
// Convert values in the "topic" column (which were `String` initially)
// to the list of topics.
.convert { topics }.with {
val inner = it.removeSurrounding("[", "]")
if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
}
// Add a new column with the number of topics.
.add("topicCount") { topics.size }
}
// Add a new column with the number of topics.
.add("topicCount") { topics.size }

// Write the updated DataFrame to a CSV file.
reposUpdated.writeCsv("jetbrains_repositories_new.csv")
Expand All @@ -158,39 +167,45 @@ reposUpdated.writeCsv("jetbrains_repositories_new.csv")
Explore [**more examples here**](https://kotlin.github.io/dataframe/guides-and-examples.html).

## Data model

* `DataFrame` is a list of columns with equal sizes and distinct names.
* `DataColumn` is a named list of values. Can be one of three kinds:
* `ValueColumn` — contains data
* `ColumnGroup` — contains columns
* `FrameColumn` — contains dataframes
* `ValueColumn` — contains data
* `ColumnGroup` — contains columns
* `FrameColumn` — contains dataframes

## Visualizations

[Kandy](https://kotlin.github.io/kandy/welcome.html) plotting library provides seamless visualizations
[Kandy](https://kotlin.github.io/kandy/welcome.html) plotting library provides seamless visualizations
for your dataframes.

![kandy_preview](docs/StardustDocs/images/guides/kandy_gallery_preview.png)

## Kotlin, Kotlin Jupyter, Arrow, and JDK versions

This table shows the mapping between main library component versions and minimum supported Java versions.

| Kotlin DataFrame Version | Minimum Java Version | Kotlin Version | Kotlin Jupyter Version | Apache Arrow version |
|--------------------------|----------------------|----------------|------------------------|----------------------|
| 0.10.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 |
| 0.10.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 |
| 0.11.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 |
| 0.11.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 |
| 0.12.0 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 |
| 0.12.1 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 |
| 0.13.1 | 8 | 1.9.22 | 0.12.0-139 | 15.0.0 |
| 0.14.1 | 8 | 2.0.20 | 0.12.0-139 | 17.0.0 |
| 0.15.0 | 8 | 2.0.20 | 0.12.0-139 | 18.1.0 |
| 1.0.0-Beta2 | 8 / 11 | 2.0.20 | 0.12.0-383 | 18.1.0 |
This table shows the mapping between main library component versions and minimum supported Java versions, along with
other recommended versions.

| Kotlin DataFrame Version | Minimum Java Version | Kotlin Version | Kotlin Jupyter Version | Apache Arrow Version | Compiler Plugin Version | Compatible Kandy version |
|--------------------------|----------------------|----------------|------------------------|----------------------|-------------------------|--------------------------|
| 0.10.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
| 0.10.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
| 0.11.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
| 0.11.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
| 0.12.0 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 | | |
| 0.12.1 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 | | |
| 0.13.1 | 8 | 1.9.22 | 0.12.0-139 | 15.0.0 | | |
| 0.14.1 | 8 | 2.0.20 | 0.12.0-139 | 17.0.0 | | |
| 0.15.0 | 8 | 2.0.20 | 0.12.0-139 | 18.1.0 | | 0.8.0 |
| 1.0.0-Beta2 | 8 / 11 | 2.0.20 | 0.12.0-383 | 18.1.0 | 2.2.20-dev-3524 | 0.8.1-dev-66 |
| 1.0.0-Beta3n (notebooks) | 8 / 11 | 2.2.20 | 0.15.0-587 (K1 only) | 18.3.0 | - | 0.8.1n |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea for a while (probably also worth to add the same stuff for our main documentation)

| 1.0.0-Beta3 | 8 / 11 | 2.2.20 | 0.15.0-587 | 18.3.0 | 2.2.20 / IDEA 2025.2+ | 0.8.1 |

## Code of Conduct

This project and the corresponding community are governed by the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct). Please make sure you read it.
This project and the corresponding community are governed by
the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct).
Please make sure you read it.

## License

Expand Down
3 changes: 3 additions & 0 deletions dataframe-jupyter/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ group = "org.jetbrains.kotlinx"
repositories {
// geo repository should come before Maven Central
maven("https://repo.osgeo.org/repository/release")
maven("https://packages.jetbrains.team/maven/p/kds/kotlin-ds-maven")
mavenCentral()
}

Expand All @@ -35,6 +36,8 @@ dependencies {

testImplementation(projects.dataframeJupyter)
testImplementation(projects.dataframeGeoJupyter)
testImplementation(libs.kandy.notebook)
testImplementation(libs.kandy.stats)

testImplementation(libs.kotestAssertions) {
exclude("org.jetbrains.kotlin", "kotlin-stdlib-jdk8")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import org.jetbrains.kotlinx.jupyter.testkit.ReplProvider
abstract class DataFrameJupyterTest :
JupyterReplTestCase(
ReplProvider.forLibrariesTesting(
libraries = setOf("dataframe", "kandy-geo"),
libraries = setOf("dataframe", "kandy-geo", "kandy"),
extraCompilerArguments = listOf(
"-Xopt-in=kotlin.time.ExperimentalTime",
"-Xopt-in=kotlin.uuid.ExperimentalUuidApi",
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/resources/guides/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
},
"cell_type": "code",
"source": [
"val df = DataFrame.readCSV(\n",
"val df = DataFrame.readCsv(\n",
" \"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv\"\n",
")"
],
Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/setup/Modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,7 @@ To enable the plugin in your Gradle project, add it to the `plugins` section:

```kotlin
plugins {
kotlin("plugin.dataframe") version "2.2.20-Beta1"
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
}
```

Expand All @@ -522,7 +522,7 @@ plugins {

```groovy
plugins {
id 'org.jetbrains.kotlin.plugin.dataframe' version '2.2.20-Beta1'
id 'org.jetbrains.kotlin.plugin.dataframe' version '%compilerPluginKotlinVersion%'
}
```

Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/setup/SetupAndroid.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ To enable the plugin in your Gradle project, add it to the `plugins` section:

```kotlin
plugins {
kotlin("plugin.dataframe") version "2.2.20-Beta1"
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
}
```

Expand All @@ -119,7 +119,7 @@ plugins {

```groovy
plugins {
id 'org.jetbrains.kotlin.plugin.dataframe' version '2.2.20-Beta1'
id 'org.jetbrains.kotlin.plugin.dataframe' version '%compilerPluginKotlinVersion%'
}
```

Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/setup/SetupGradle.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ To enable the plugin in your Gradle project, add it to the `plugins` section:

```kotlin
plugins {
kotlin("plugin.dataframe") version "2.2.20-Beta1"
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
}
```

Expand All @@ -113,7 +113,7 @@ plugins {

```groovy
plugins {
id 'org.jetbrains.kotlin.plugin.dataframe' version '2.2.20-Beta1'
id 'org.jetbrains.kotlin.plugin.dataframe' version '%compilerPluginKotlinVersion%'
}
```

Expand Down
7 changes: 7 additions & 0 deletions docs/StardustDocs/topics/setup/SetupKotlinNotebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ You can explicitly define the version you want:
Or use the latest stable version of Kotlin DataFrame
(specified in [Kotlin Jupyter descriptors](https://github.com/Kotlin/kotlin-jupyter-libraries)):

<warning>
For version `1.0.0-Beta3`, in notebooks use version `1.0.0-Beta3n` instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to make it a template with dfVersion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well hopefully we don't need that here. This is only a temporary thing that hopefeully only applies for Beta3

This uses the patch of [#1435](https://github.com/Kotlin/dataframe/pull/1435) for issue
[#1116](https://github.com/Kotlin/dataframe/issues/1116), avoiding `DefinitelyNotNullable` errors.

When using `%use dataframe` this version is applied automatically.
</warning>

```
%useLatestDescriptors
Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/v.list
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vars SYSTEM "https://resources.jetbrains.com/writerside/1.0/vars.dtd">
<vars>
<var name="dataFrameVersion" value="1.0.0-Beta2" type="string"/>
<var name="compilerPluginKotlinVersion" value="2.2.20-Beta1" type="string"/>
<var name="dataFrameVersion" value="1.0.0-Beta3" type="string"/>
<var name="compilerPluginKotlinVersion" value="2.2.20" type="string"/>
</vars>
10 changes: 6 additions & 4 deletions examples/android-example/app/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ plugins {
alias(libs.plugins.android.application)
alias(libs.plugins.kotlin.android)
alias(libs.plugins.kotlin.compose)
kotlin("plugin.dataframe") version "2.2.20-Beta2"

// DataFrame Compiler plugin, matching the Kotlin version
alias(libs.plugins.dataframe)
}

android {
Expand Down Expand Up @@ -66,9 +68,9 @@ dependencies {
// Core Kotlin DataFrame API, JSON and CSV IO.
// See custom Gradle setup:
// https://kotlin.github.io/dataframe/setupcustomgradle.html
implementation("org.jetbrains.kotlinx:dataframe-core:1.0.0-dev-8314")
implementation("org.jetbrains.kotlinx:dataframe-json:1.0.0-dev-8314")
implementation("org.jetbrains.kotlinx:dataframe-csv:1.0.0-dev-8314")
implementation("org.jetbrains.kotlinx:dataframe-core:1.0.0-Beta3")
implementation("org.jetbrains.kotlinx:dataframe-json:1.0.0-Beta3")
implementation("org.jetbrains.kotlinx:dataframe-csv:1.0.0-Beta3")
// You can add any additional IO modules you like, except for 'dataframe-arrow'.
// Apache Arrow is not supported well on Android.
}
3 changes: 2 additions & 1 deletion examples/android-example/gradle/libs.versions.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[versions]
agp = "8.11.1"
kotlin = "2.2.20-Beta2"
kotlin = "2.2.20"
coreKtx = "1.10.1"
junit = "4.13.2"
junitVersion = "1.1.5"
Expand Down Expand Up @@ -29,4 +29,5 @@ androidx-material3 = { group = "androidx.compose.material3", name = "material3"
android-application = { id = "com.android.application", version.ref = "agp" }
kotlin-android = { id = "org.jetbrains.kotlin.android", version.ref = "kotlin" }
kotlin-compose = { id = "org.jetbrains.kotlin.plugin.compose", version.ref = "kotlin" }
dataframe = { id = "org.jetbrains.kotlin.plugin.dataframe", version.ref = "kotlin" }

8 changes: 5 additions & 3 deletions examples/kotlin-dataframe-plugin-example/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@ import org.jlleitschuh.gradle.ktlint.KtlintExtension

plugins {
id("org.jlleitschuh.gradle.ktlint") version "12.3.0"
kotlin("jvm") version "2.2.20-Beta2"
kotlin("plugin.dataframe") version "2.2.20-Beta2"

val kotlinVersion = "2.2.20"
kotlin("jvm") version kotlinVersion
kotlin("plugin.dataframe") version kotlinVersion
}

group = "org.example"
Expand All @@ -15,7 +17,7 @@ repositories {
}

dependencies {
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta3")
testImplementation(kotlin("test"))
}

Expand Down
Loading