Skip to content

Commit

Permalink
adds read mult file support to all delims, parquet
Browse files Browse the repository at this point in the history
  • Loading branch information
drizk1 committed Jul 12, 2024
1 parent 7e66727 commit 1606686
Show file tree
Hide file tree
Showing 8 changed files with 231 additions and 161 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "TidierFiles"
uuid = "8ae5e7a9-bdd3-4c93-9cc3-9df4d5d947db"
authors = ["Daniel Rizk <rizk.daniel.12@gmail.com> and contributors"]
version = "0.1.2"
version = "0.1.3"

[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Expand Down
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ The path can be a file available either locally or on the web.
```julia
read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
```

```
3×2 DataFrame
Row │ ID Score
Expand All @@ -80,4 +79,20 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin
1 │ 3 77
2 │ missing 85
3 │ 5 95
```

Read multiple files by passing paths as a vector.
```
path = "https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv"
read_csv([path, path], skip=3)
```
```
4×3 DataFrame
Row │ ID Name Score
│ Int64 String7 Int64
─────┼───────────────────────
1 │ 4 David 85
2 │ 5 Eva 95
3 │ 4 David 85
4 │ 5 Eva 95
```
2 changes: 1 addition & 1 deletion docs/examples/UserGuide/delim.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin

#These functions read a delimited file (CSV, TSV, or custom delimiter) into a DataFrame. The arguments are:

# - `file`: Path to the file or a URL.
# - `file`: Path or vector of paths to the file(s) or a URL(s).
# - `delim`: Field delimiter. Default is ',' for `read_csv`, '\t' for `read_tsv` and `read_delim`.
# - `col_names`: Use first row as column names. Can be `true`, `false`, or an array of strings. Default is `true`.
# - `skip`: Number of lines to skip before reading data. Default is 0.
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/UserGuide/parquet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# This function reads a Parquet (.parquet) file into a DataFrame. The arguments are:

# - `path`: The path to the .parquet file.
# - `path`: The path or vector of paths or URLs to the .parquet file.
# - `col_names`: Indicates if the first row of the file is used as column names. Default is `true`.
# - `skip`: Number of initial rows to skip before reading data. Default is 0.
# - `n_max`: Maximum number of rows to read. Default is `Inf` (read all rows).
Expand Down
17 changes: 16 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ The path can be a file available either locally or on the web.
```julia
read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
```

```
3×2 DataFrame
Row │ ID Score
Expand All @@ -77,4 +76,20 @@ read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testin
1 │ 3 77
2 │ missing 85
3 │ 5 95
```

Read multiple files by passing paths as a vector.
```
path = "https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv"
read_csv([path, path], skip=3)
```
```
4×3 DataFrame
Row │ ID Name Score
│ Int64 String7 Int64
─────┼───────────────────────
1 │ 4 David 85
2 │ 5 Eva 95
3 │ 4 David 85
4 │ 5 Eva 95
```
Loading

0 comments on commit 1606686

Please sign in to comment.