Add character array interface to loadtxt #919

chuckyvt · 2025-01-17T14:49:34Z

This PR is to add functionality to the loadtxt function to load a text file into a 1-D allocatable character array. The len of the array will be set based on the longest line found in the file, and the size of the array will equal the number of lines in the file. Added a 'skip_blank_lines' option, which is usueful to match list directed read capability, since that typically would skip blank lines.

The PR functionality and code is inspired by this post and thread though it was heavily modified from Beliavsky's initial code.

Initial commit with src additions, added a test and example and updated docs.

perazz · 2025-01-19T11:42:33Z

@chuckyvt loadtxt is meant for reading 2D tabular data, so I'm not sure that an interface with the same name would fit a generic text file read.

Could we instead extend the (getfile) in #904 to also return individual lines?

chuckyvt · 2025-01-23T02:21:53Z

My thought was "loadtxt" is a generic name and shouldn't necessarily be limited to numeric arrays. Using "loadtxt" for numeric arrays and "getfile" for character and string arrays isn't obvious at first glance. Compare that to extending loadtxt to interface with numeric, character and string arrays, which I think is an easier approach for someone new to Fortran.

All that said, I'm open to putting this functionality under getfile, however that procedure being a function instead of a subroutine complicates it a bit. We would need to add an argument to getfile to define the type to return. Alternatively, we could convert getfile to a subroutine.

jvdp1 · 2025-02-08T19:06:25Z

I think that overloading loadtxt is a good idea. However, the shape of the output might be different. A new set of subroutines (called getfile or another name) could be also an option. And if it has the same properties as loadtxt, then loadtxt could be depreceted.

perazz

Thank you for the PR @chuckyvt, I've added some comments in light of the recent get_line and get_file addition.

perazz · 2025-03-05T09:49:38Z

doc/specs/stdlib_io.md


 ### Arguments

 `filename`: Shall be  a character expression containing the file name from which to load the rank-2 `array`.

-`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer`.
+`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a allocatable rank-1 `character` array.


Suggested change

`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a allocatable rank-1 `character` array.

`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a deferred-length rank-1 `character` array.

perazz · 2025-03-05T09:50:27Z

doc/specs/stdlib_io.md


 `skiprows` (optional): Skip the first `skiprows` lines. If skipping more rows than present, a 0-sized array will be returned. The default is 0.

 `max_rows` (optional): Read `max_rows` lines of content after `skiprows` lines. A negative value results in reading all lines. A value of zero results in no lines to be read. The default value is -1.

-`fmt` (optional): Fortran format specifier for the text read.  Defaults to the write format for the data type.  Setting fmt='*' will specify list directed read.   
+`fmt` (optional): Fortran format specifier for the text read.  Defaults to the write format for the data type.  Setting fmt='*' will specify list directed read.  Valid only for `real`, `complex` and `integer`.    


Suggested change

`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for `real`, `complex` and `integer`.

`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for the `real`, `complex` and `integer` interfaces.

perazz · 2025-03-05T09:50:43Z

doc/specs/stdlib_io.md


+`skip_blank_lines` (optional): Will ignore blank lines in the text file.  Valid only for `character` array.  


Suggested change

`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for `character` array.

`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for the `character` array interface.

perazz · 2025-03-05T09:51:23Z

doc/specs/stdlib_io.md



 ### Return value

-Returns an allocated rank-2 `array` with the content of `filename`.
+Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array where the length is the longest line of the file.


Suggested change

Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array where the length is the longest line of the file.

Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array with length equal to the longest line length in the file.

perazz · 2025-03-05T09:52:07Z

src/stdlib_io.fypp

+        skip_blank_lines_ = optval(skip_blank_lines, .false.)
+
+        !! Open and store all of file contents.  
+        open (newunit=u, file=filename, action='read', form='unformatted', access='stream')


Could this use the new get_file interface?

perazz · 2025-03-05T09:56:41Z

src/stdlib_io.fypp

+
+                start_pos = next_line_pos
+
+                ! Search text starting at start_pos for end of line.  end_pos will exclude CRLR or LR characters.


Suggested change

! Search text starting at start_pos for end of line. end_pos will exclude CRLR or LR characters.

! Search text starting at start_pos for end of line. end_pos will exclude CRLF or LF characters.

perazz · 2025-03-05T09:59:15Z

src/stdlib_io.fypp

+                    ascii_idx = iachar(text(idx:idx))
+
+                    if (ascii_idx == 13) then
+                        ! Found CR return.  Check for LR


Suggested change

! Found CR return. Check for LR

! Found CR return. Check for LF

perazz · 2025-03-05T09:59:24Z

src/stdlib_io.fypp

+
+                do while (idx <= len(text))
+                    !! Find line end
+                    ! Look for either CR or LR


Suggested change

! Look for either CR or LR

! Look for either CR or LF

perazz · 2025-03-05T09:59:45Z

src/stdlib_io.fypp

+                            return
+                        endif
+
+                    ! Check for standalone LR


Suggested change

! Check for standalone LR

! Check for standalone LF

perazz · 2025-03-05T10:01:01Z

src/stdlib_io.fypp

+
+                    if (ascii_idx == 13) then
+                        ! Found CR return.  Check for LR
+                        if (iachar(text(idx+1:idx+1)) == 10) then


2 questions:

since idx<=len(text), should we check that this does not go out of bounds?

just a style comment: would it be better to refer to the line feed using the intrinsic new_line? i.e.:

Suggested change

if (iachar(text(idx+1:idx+1)) == 10) then

if (text(idx+1:idx+1) == new_line('a')) then

chuckyvt · 2025-03-22T17:12:35Z

My thought is this this functionality makes more sense in the updated get_file subroutine?

perazz · 2025-03-23T08:33:02Z

I think it’s a good idea @chuckyvt. Here is a couple of thoughts for discussion:

if the numpy equivalent loadtxt can also return strings, I think this PR could be moved forward
the skip_lines flag would well be useful also in get_file
because get_file is a subroutine, to add the line versions we may just add two procedures to the interface, that return either an allocatable character array, or a stringlist_type type?

RC 1

a8da52a

Initial commit with src additions, added a test and example and updated docs.

perazz mentioned this pull request Feb 21, 2025

io: getfile #939

Merged

perazz reviewed Mar 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add character array interface to loadtxt #919

Add character array interface to loadtxt #919

chuckyvt commented Jan 17, 2025

perazz commented Jan 19, 2025 •

edited

Loading

chuckyvt commented Jan 23, 2025

jvdp1 commented Feb 8, 2025

perazz left a comment

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

perazz Mar 5, 2025

chuckyvt commented Mar 22, 2025

perazz commented Mar 23, 2025 •

edited

Loading

	`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a allocatable rank-1 `character` array.
	`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a deferred-length rank-1 `character` array.

	`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for `real`, `complex` and `integer`.
	`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for the `real`, `complex` and `integer` interfaces.


		`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for `character` array.

	Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array where the length is the longest line of the file.
	Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array with length equal to the longest line length in the file.


		start_pos = next_line_pos

		! Search text starting at start_pos for end of line. end_pos will exclude CRLR or LR characters.

	! Found CR return. Check for LR
	! Found CR return. Check for LF

	if (iachar(text(idx+1:idx+1)) == 10) then
	if (text(idx+1:idx+1) == new_line('a')) then

Add character array interface to loadtxt #919

Are you sure you want to change the base?

Add character array interface to loadtxt #919

Conversation

chuckyvt commented Jan 17, 2025

perazz commented Jan 19, 2025 • edited Loading

chuckyvt commented Jan 23, 2025

jvdp1 commented Feb 8, 2025

perazz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuckyvt commented Mar 22, 2025

perazz commented Mar 23, 2025 • edited Loading

perazz commented Jan 19, 2025 •

edited

Loading

perazz commented Mar 23, 2025 •

edited

Loading