DataKit is a Swift package for loading, manipulating and analysing datasets in memory. Functionality is built around DataSet and Column classes with a DataSet containing a collection of Column instances, together representing tabular data.
The DataSet class takes inspiration from DataFrame classes in R, Python Pandas and Julia DataFrames.jl libraries.
DataKit is developed to explore machine learning algorithms in Swift. See TreeKit for decision tree classification built on DataKit.
Some guiding principles for the design of the package API are:
- Explicitly deal with data types throughout. Don't make any assumptions about what type is intended.
- Handle missing data explicitly as Swift Optionals.
- Use Swift Generics for common functionality, for example through Generic column methods.
- Use Swift Subscripts to simplify accessing data.
Install using the Swift package manager by adding a dependency in a project Package.swift file:
... dependencies: [ // Dependencies declare other packages that this package depends on. // .package(url: /* package url */, from: "1.0.0"), .package(url: "https://github.com/PeetV/DataKit.git", from: "0.3.0"), ], targets: [ // Targets are the basic building blocks of a package. A target can define a // module or a test suite. // Targets can depend on other targets in this package, and on products in // packages which this package depends on. .target( name: ..., dependencies: ["DataKit"]), ...
See the docs folder for interactive examples in playgrounds that can be run using Xcode or see the Contents.swift file in the playground folder.
|Overview.playground||General overview of the DataKit API.|
|Reshaping.playground||Examples of reshaping data, for example reformatting column content.|
- 0.1 Integer, Double, String and Bool column type datasets
- 0.2 Subscripts
- 0.3 Date column type
- 0.4 Column functions inc. maths, string regex
- 0.5 Group by and join datasets