Skip to content

Commit

Permalink
add more documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Lilith Hafner authored and Lilith Hafner committed Jun 3, 2023
1 parent 0e1d923 commit fdc5253
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 2 deletions.
40 changes: 38 additions & 2 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,42 @@ col1;col2;col3
file = CSV.File(IOBuffer(data); delim=';', decimal=',')
```

## [Thousands separator](@ref thousands_example)

```julia
using CSV

# In many places in the world, digits to the left of the decimal place are broken into
# groups by a thousands separator. We can ignore those separators by passing the `groupmark`
# keyword argument.
data = """
x y
1 2
2 1,729
3 87,539,319
"""

file = CSV.File(IOBuffer(data); groupmark=',')
```
```
## [Custom groupmarks](@ref groupmark_example)
```julia
using CSV
# In some contexts, separators other than thousands separators group digits in a number.
# `groupmark` supports ignoring them as long as the separator character is ASCII
data = """
name;ssn;credit card number
Ayodele Beren;597-21-8366;5538-6111-0574-2633
Trinidad Shiori;387-35-5126;3017-9300-0776-5301
Ori Cherokee;731-12-4606;4682-5416-0636-3877
"""
file = CSV.File(IOBuffer(data); groupmark='-')
```

## [Custom bool strings](@id truestrings_example)

```julia
Expand Down Expand Up @@ -577,11 +613,11 @@ data = """
file = CSV.File(IOBuffer(data); header=false)
file = CSV.File(IOBuffer(data); header=false, delim=' ', types=Float64)

# as a last step if you want to convert this to a Matrix, this can be done by reading in first as a DataFrame and then
# as a last step if you want to convert this to a Matrix, this can be done by reading in first as a DataFrame and then
# function chaining to a Matrix
using DataFrames
A = file|>DataFrame|>Matrix

# another alternative is to simply use CSV.Tables.matrix and say
B = file|>CSV.Tables.matrix # does not require DataFrames
```
Expand Down
8 changes: 8 additions & 0 deletions docs/src/reading.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,14 @@ An ASCII `Char` argument that is used when parsing float values that indicates w
### Examples
* [Custom decimal separator](@ref decimal_example)

## [`groupmark` / thousands separator](@id groupmark)

A "groupmark" is a symbol that separates groups of digits so that it easier for humans to read a number. Thousands separators are a common example of groupmarks. The argument `groupmark`, if provided, must be an ASCII `Char` which will be ignored during parsing when it occurs between two digits on the left hand side of the decimal. e.g the groupmark in the integer `1,729` is `','` and the groupmark for the US social security number `875-39-3196` is `-`. By default, `groupmark=nothing` which indicates that there are no stray characters separating digits.

### Examples
* [Thousands separator](@ref thousands_example)
* [Custom groupmarks](@ref groupmark_example)

## [`truestrings` / `falsestrings`](@id truestrings)

These arguments can be provided as `Vector{String}` to specify custom values that should be treated as the `Bool` `true`/`false` values for all the columns of a data input. By default, `["true", "True", "TRUE", "T", "1"]` string values are used to detect `true` values, and `["false", "False", "FALSE", "F", "0"]` string values are used to detect `false` values. Note that even though `"1"` and `"0"` _can_ be used to parse `true`/`false` values, in terms of _auto_ detecting column types, those values will be parsed as `Int64` first, instead of `Bool`. To instead parse those values as `Bool`s for a column, you can manually provide that column's type as `Bool` (see the [type](@ref types) argument).
Expand Down

0 comments on commit fdc5253

Please sign in to comment.