Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add character array interface to loadtxt #919

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chuckyvt
Copy link
Contributor

This PR is to add functionality to the loadtxt function to load a text file into a 1-D allocatable character array. The len of the array will be set based on the longest line found in the file, and the size of the array will equal the number of lines in the file. Added a 'skip_blank_lines' option, which is usueful to match list directed read capability, since that typically would skip blank lines.

The PR functionality and code is inspired by this post and thread though it was heavily modified from Beliavsky's initial code.

Initial commit with src additions, added a test and example and updated docs.
@perazz
Copy link
Member

perazz commented Jan 19, 2025

@chuckyvt loadtxt is meant for reading 2D tabular data, so I'm not sure that an interface with the same name would fit a generic text file read.

Could we instead extend the (getfile) in #904 to also return individual lines?

@chuckyvt
Copy link
Contributor Author

My thought was "loadtxt" is a generic name and shouldn't necessarily be limited to numeric arrays. Using "loadtxt" for numeric arrays and "getfile" for character and string arrays isn't obvious at first glance. Compare that to extending loadtxt to interface with numeric, character and string arrays, which I think is an easier approach for someone new to Fortran.

All that said, I'm open to putting this functionality under getfile, however that procedure being a function instead of a subroutine complicates it a bit. We would need to add an argument to getfile to define the type to return. Alternatively, we could convert getfile to a subroutine.

@jvdp1
Copy link
Member

jvdp1 commented Feb 8, 2025

I think that overloading loadtxt is a good idea. However, the shape of the output might be different. A new set of subroutines (called getfile or another name) could be also an option. And if it has the same properties as loadtxt, then loadtxt could be depreceted.

@perazz perazz mentioned this pull request Feb 21, 2025
Copy link
Member

@perazz perazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @chuckyvt, I've added some comments in light of the recent get_line and get_file addition.


### Arguments

`filename`: Shall be a character expression containing the file name from which to load the rank-2 `array`.

`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer`.
`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a allocatable rank-1 `character` array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a allocatable rank-1 `character` array.
`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer` or a deferred-length rank-1 `character` array.


`skiprows` (optional): Skip the first `skiprows` lines. If skipping more rows than present, a 0-sized array will be returned. The default is 0.

`max_rows` (optional): Read `max_rows` lines of content after `skiprows` lines. A negative value results in reading all lines. A value of zero results in no lines to be read. The default value is -1.

`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read.
`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for `real`, `complex` and `integer`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for `real`, `complex` and `integer`.
`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. Valid only for the `real`, `complex` and `integer` interfaces.


`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for `character` array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for `character` array.
`skip_blank_lines` (optional): Will ignore blank lines in the text file. Valid only for the `character` array interface.



### Return value

Returns an allocated rank-2 `array` with the content of `filename`.
Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array where the length is the longest line of the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array where the length is the longest line of the file.
Returns an allocated rank-2 `array` with the content of `filename`, or a rank-1 `character` array with length equal to the longest line length in the file.

skip_blank_lines_ = optval(skip_blank_lines, .false.)

!! Open and store all of file contents.
open (newunit=u, file=filename, action='read', form='unformatted', access='stream')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use the new get_file interface?


start_pos = next_line_pos

! Search text starting at start_pos for end of line. end_pos will exclude CRLR or LR characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Search text starting at start_pos for end of line. end_pos will exclude CRLR or LR characters.
! Search text starting at start_pos for end of line. end_pos will exclude CRLF or LF characters.

ascii_idx = iachar(text(idx:idx))

if (ascii_idx == 13) then
! Found CR return. Check for LR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Found CR return. Check for LR
! Found CR return. Check for LF


do while (idx <= len(text))
!! Find line end
! Look for either CR or LR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Look for either CR or LR
! Look for either CR or LF

return
endif

! Check for standalone LR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Check for standalone LR
! Check for standalone LF


if (ascii_idx == 13) then
! Found CR return. Check for LR
if (iachar(text(idx+1:idx+1)) == 10) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  1. since idx<=len(text), should we check that this does not go out of bounds?
  2. just a style comment: would it be better to refer to the line feed using the intrinsic new_line? i.e.:
Suggested change
if (iachar(text(idx+1:idx+1)) == 10) then
if (text(idx+1:idx+1) == new_line('a')) then

@chuckyvt
Copy link
Contributor Author

My thought is this this functionality makes more sense in the updated get_file subroutine?

@perazz
Copy link
Member

perazz commented Mar 23, 2025

I think it’s a good idea @chuckyvt. Here is a couple of thoughts for discussion:

  • if the numpy equivalent loadtxt can also return strings, I think this PR could be moved forward
  • the skip_lines flag would well be useful also in get_file
  • because get_file is a subroutine, to add the line versions we may just add two procedures to the interface, that return either an allocatable character array, or a stringlist_type type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants