add more documentation

JuliaData · Jun 3, 2023 · fdc5253 · fdc5253
1 parent 0e1d923
commit fdc5253
Show file tree

Hide file tree

Showing 2 changed files with 46 additions and 2 deletions.
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -542,6 +542,42 @@ col1;col2;col3
 file = CSV.File(IOBuffer(data); delim=';', decimal=',')
 ```
 
+## [Thousands separator](@ref thousands_example)
+
+```julia
+using CSV
+
+# In many places in the world, digits to the left of the decimal place are broken into
+# groups by a thousands separator. We can ignore those separators by passing the `groupmark`
+# keyword argument.
+data = """
+x y
+1 2
+2 1,729
+3 87,539,319
+"""
+
+file = CSV.File(IOBuffer(data); groupmark=',')
+```
+```
+
+## [Custom groupmarks](@ref groupmark_example)
+
+```julia
+using CSV
+
+# In some contexts, separators other than thousands separators group digits in a number.
+# `groupmark` supports ignoring them as long as the separator character is ASCII
+data = """
+name;ssn;credit card number
+Ayodele Beren;597-21-8366;5538-6111-0574-2633
+Trinidad Shiori;387-35-5126;3017-9300-0776-5301
+Ori Cherokee;731-12-4606;4682-5416-0636-3877
+"""
+
+file = CSV.File(IOBuffer(data); groupmark='-')
+```
+
 ## [Custom bool strings](@id truestrings_example)
 
 ```julia
@@ -577,11 +613,11 @@ data = """
 file = CSV.File(IOBuffer(data); header=false)
 file = CSV.File(IOBuffer(data); header=false, delim=' ', types=Float64)
 
-# as a last step if you want to convert this to a Matrix, this can be done by reading in first as a DataFrame and then 
+# as a last step if you want to convert this to a Matrix, this can be done by reading in first as a DataFrame and then
 # function chaining to a Matrix
 using DataFrames
 A = file|>DataFrame|>Matrix
- 
+
 # another alternative is to simply use CSV.Tables.matrix and say
 B = file|>CSV.Tables.matrix # does not require DataFrames
 ```

diff --git a/docs/src/reading.md b/docs/src/reading.md
@@ -162,6 +162,14 @@ An ASCII `Char` argument that is used when parsing float values that indicates w
 ### Examples
   * [Custom decimal separator](@ref decimal_example)
 
+## [`groupmark` / thousands separator](@id groupmark)
+
+A "groupmark" is a symbol that separates groups of digits so that it easier for humans to read a number. Thousands separators are a common example of groupmarks. The argument `groupmark`, if provided, must be an ASCII `Char` which will be ignored during parsing when it occurs between two digits on the left hand side of the decimal. e.g the groupmark in the integer `1,729` is `','` and the groupmark for the US social security number `875-39-3196` is `-`. By default, `groupmark=nothing` which indicates that there are no stray characters separating digits.
+
+### Examples
+  * [Thousands separator](@ref thousands_example)
+  * [Custom groupmarks](@ref groupmark_example)
+
 ## [`truestrings` / `falsestrings`](@id truestrings)
 
 These arguments can be provided as `Vector{String}` to specify custom values that should be treated as the `Bool` `true`/`false` values for all the columns of a data input. By default, `["true", "True", "TRUE", "T", "1"]` string values are used to detect `true` values, and `["false", "False", "FALSE", "F", "0"]` string values are used to detect `false` values. Note that even though `"1"` and `"0"` _can_ be used to parse `true`/`false` values, in terms of _auto_ detecting column types, those values will be parsed as `Int64` first, instead of `Bool`. To instead parse those values as `Bool`s for a column, you can manually provide that column's type as `Bool` (see the [type](@ref types) argument).