adds read mult file support to all delims, parquet

TidierOrg · Jul 12, 2024 · 1606686 · 1606686
1 parent 7e66727
commit 1606686
Show file tree

Hide file tree

Showing 8 changed files with 231 additions and 161 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "TidierFiles"
 uuid = "8ae5e7a9-bdd3-4c93-9cc3-9df4d5d947db"
 authors = ["Daniel Rizk <rizk.daniel.12@gmail.com> and contributors"]
-version = "0.1.2"
+version = "0.1.3"
 
 [deps]
 Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"

diff --git a/README.md b/README.md
@@ -71,7 +71,6 @@ The path can be a file available either locally or on the web.
 ```julia
 read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
 ```
-
 ```
 3×2 DataFrame
  Row │ ID       Score 
@@ -80,4 +79,20 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin
    1 │       3     77
    2 │ missing     85
    3 │       5     95
+```
+
+Read multiple files by passing paths as a vector. 
+```
+path = "https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv"
+read_csv([path, path], skip=3)
+```
+```
+4×3 DataFrame
+ Row │ ID     Name     Score 
+     │ Int64  String7  Int64 
+─────┼───────────────────────
+   1 │     4  David       85
+   2 │     5  Eva         95
+   3 │     4  David       85
+   4 │     5  Eva         95
 ```
diff --git a/docs/examples/UserGuide/delim.jl b/docs/examples/UserGuide/delim.jl
@@ -16,7 +16,7 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin
 
 #These functions read a delimited file (CSV, TSV, or custom delimiter) into a DataFrame. The arguments are:
 
-# - `file`: Path to the file or a URL.
+# - `file`: Path or vector of paths to the file(s) or a URL(s).
 # - `delim`: Field delimiter. Default is ',' for `read_csv`, '\t' for `read_tsv` and `read_delim`.
 # - `col_names`: Use first row as column names. Can be `true`, `false`, or an array of strings. Default is `true`.
 # - `skip`: Number of lines to skip before reading data. Default is 0.

diff --git a/docs/examples/UserGuide/parquet.jl b/docs/examples/UserGuide/parquet.jl
@@ -4,7 +4,7 @@
 
 # This function reads a Parquet (.parquet) file into a DataFrame. The arguments are:
 
-# - `path`: The path to the .parquet file.
+# - `path`: The path or vector of paths or URLs to the .parquet file.
 # - `col_names`: Indicates if the first row of the file is used as column names. Default is `true`.
 # - `skip`: Number of initial rows to skip before reading data. Default is 0.
 # - `n_max`: Maximum number of rows to read. Default is `Inf` (read all rows).

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -68,7 +68,6 @@ The path can be a file available either locally or on the web.
 ```julia
 read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
 ```
-
 ```
 3×2 DataFrame
  Row │ ID       Score 
@@ -77,4 +76,20 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin
    1 │       3     77
    2 │ missing     85
    3 │       5     95
+```
+
+Read multiple files by passing paths as a vector. 
+```
+path = "https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv"
+read_csv([path, path], skip=3)
+```
+```
+4×3 DataFrame
+ Row │ ID     Name     Score 
+     │ Int64  String7  Int64 
+─────┼───────────────────────
+   1 │     4  David       85
+   2 │     5  Eva         95
+   3 │     4  David       85
+   4 │     5  Eva         95
 ```