Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add File module #71

Closed
wants to merge 5 commits into from
Closed

Add File module #71

wants to merge 5 commits into from

Conversation

cevans87
Copy link
Contributor

@cevans87 cevans87 commented Apr 14, 2020

ref #52

@cevans87 cevans87 requested a review from jasone April 14, 2020 01:19
@cevans87
Copy link
Contributor Author

The File module as is uses a bit less exception handling than would be needed for an actual robust implementation build on top of the Unix module. I actually can't get most of the kinds of errors I actually expect in the Hemlock implementation. I'm wondering if I should write small wrappers around the Unix module calls so that they give back the c-like error codes I'd expect in Hemlock.

I think there are a few nitpicky problems with function implementations as well. Go ahead and point them out if you see them, but right now I'm more worried about making this look translatable into Hemlock.

Comment on lines 43 to 52
let of_path ?(mode=Mode.RW) path =
let open Unix in
let unix_flags = Mode.to_unix_flags mode in
try
Ok ({fd=Some (openfile path unix_flags 0o640)})
with
| Unix_error (errno, fn_name, fn_param) -> begin
Error (String.concat ~sep:" -> " [Unix.error_message errno; fn_name; fn_param])
end
| _ -> not_reached ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a good example of where I'd like to improve things. The contents of this function look nothing like what Hemlock will look like. I'd like to hide the Unix.openfile call and exception handler behind a wrapper that looks more Hemlock-like, but not really sure what the wrapper function signature should look like. I also wonder if it'd be easier to wrap native c function calls than to wrap Unix calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing native wrappers around the system calls sounds like a pretty good idea, actually. The result would be quite close to how the Hemlock code will look.

Comment on lines 208 to 243
let%expect_test "of_path_hlt" =
let open Format in
printf "@[<h>";
let t = of_path_hlt ~mode:Mode.R_O "/home/cevans/test.txt" in
let bytes = read_hlt t in
let s = Bytes.to_string_hlt bytes in
let _ = close_hlt t in
printf "t = %a\n" String.pp s;
printf "@]";

[%expect{|
t = "The quick brown fox jumped over the red fence.\n"
|}]

let%expect_test "Stream.of_path_hlt" =
let open Format in
printf "@[<h>";
Stream.of_path_hlt "/home/cevans/test.txt"
|> Array.of_stream
|> Bytes.to_string_hlt
|> printf "%a\n" String.pp;
printf "@]";

[%expect{|
"The quick brown fox jumped over the red fence.\n"
|}]

let%expect_test "Stream.of_path_hlt" =
let open Format in
printf "@[<h>";
let _ = Stream.of_path_hlt "/home/cevans/test.txt"
|> Stream.write_hlt (of_path_hlt ~mode:Mode.W_A "/home/cevans/test2.txt")
in
printf "@]";

[%expect{| |}]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get rid of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it's possible to use the stand-alone test infrastructure similar to what I did for the entropy tests (in bootstrap/test/basis/) for the cases where you need a test file to operate on.

@cevans87 cevans87 marked this pull request as draft April 14, 2020 01:57
Copy link
Contributor

@jasone jasone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the overall way this is coming together.

[0; 1; 2];
[0; 1; 2; 3];
] in
List.iter streams ~f:test;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List.iter streams ~f:test;
List.iter streams ~f:test;

| [] -> None
| hd :: tl -> Some (hd, tl)
end in
let streams = List.map ~f:(Stream.init_indef ~f) [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to add Stream.of_list, but I'll leave that alone for now and take care of it when I try to put Stream everywhere that it belongs.

Comment on lines 6 to 29
type t =
| R_O (** open existing file for read, failing if it does not exist *)
| W (** create new or truncate existing file and open for write *)
| W_A (** create new or append to existing file and open for write *)
| W_AO (** append to existing file and open for write, failing if it does not
exist *)
| W_C (** create new file and open for write, failing if file exists *)
| W_O (** truncate existing file and open for write, failing if file does
not exist *)
| RW (** create new or truncate existing file and open for read and write *)
| RW_A (** create new append to existing file and open for read and write *)
| RW_AO (** append to existing file and open for read and write, failing if
file does not exist *)
| RW_C (** create new file and open for read and write, failing if file
exists *)
| RW_O (** truncate existing file and open for read and write, failing if
file does not exist *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier today, I really like this approach, since the number of legal combinations isn't overwhelming, and this prevents weird/bad combinations.


module Mode : sig
type t =
| R_O (** open existing file for read, failing if it does not exist *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment on, ahem, comments, I try hard to use capitalization and periods whenever it's feasible to write sentence-like comments (and often even for phrases vaguely approximating sentences, but that's not always a good fit). These doc strings are all complete sentences (yay), so they're almost great. 😉

| W_O (** truncate existing file and open for write, failing if file does
not exist *)
| RW (** create new or truncate existing file and open for read and write *)
| RW_A (** create new append to existing file and open for read and write *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| RW_A (** create new append to existing file and open for read and write *)
| RW_A (** create new or append to existing file and open for read and write *)

Comment on lines 55 to 56
let t_res = of_path ?mode path in
match t_res with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let t_res = of_path ?mode path in
match t_res with
match of_path ?mode path with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same basic idea in numerous other places. As a general rule, if you have to make up a temporary name that doesn't aid understanding, inline the expression.

let rec fn i n chars fd = begin
let n' = Unix.write fd chars i n in
match n' = -1 with
| true -> Some "Error reading file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| true -> Some "Error reading file"
| true -> Some "Error writing file"

(** [read_hlt n t] reads up to [n] bytes from [t] and returns the read bytes or
halts if bytes could not be read. *)

val write: Bytes.t -> t -> e option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to support partial writes. Ordinarily that just causes a bunch of trouble for files (yay interruptible syscalls), but what about for broken pipes, where we did write some bytes before the pipe was closed? I think this comes up for sockets too.

val write: Bytes.t -> t -> (usize, e) Result.t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Maybe partial writes should return an error that contain the number of lines written, and None still indicates success? Regardless, the unix standard write operation returns the number of bytes written, so we should as well.

match write file t with
| Some e -> halt e
| None -> ()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 191 to 192
let t', t'' = Stream.split buflen t in
let bytes = Array.of_stream t' in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice.

@cevans87
Copy link
Contributor Author

Cool, I agree about dropping the Unix module and try to do this with native c wrappers. File.ml will look quite different after I revise, but I think it'll work much better as a bootstrap implementation.

I'll fix the doc comments and add real file I/O tests in bootstrap/test/basis/.

@jasone
Copy link
Contributor

jasone commented Apr 14, 2020

Oh, one thing we're missing here is a way to access files which have names that aren't UTF-8. We talked about having a bottom-level open function which takes a Bytes.t rather thanstring for the path.

@cevans87
Copy link
Contributor Author

Oh, one thing we're missing here is a way to access files which have names that aren't UTF-8. We talked about having a bottom-level open function which takes a Bytes.t rather thanstring for the path.

Ah, right. I'll fix that. I was hasty here and just plugged in the type that OCaml didn't complain about. Iirc, most filesystems just care about the byte content of the filename, but shells often interpret the bytes as utf8 depending on the locale environment variables?

@jasone
Copy link
Contributor

jasone commented Apr 14, 2020

Iirc, most filesystems just care about the byte content of the filename, but shells often interpret the bytes as utf8 depending on the locale environment variables?

Yep, that's the way I understand it. Dealing with non-UTF-8 in the shells is difficult these days, which means non-UTF-8 file names almost never come up in practice. But we can't completely ignore them, unfortunately.

@cevans87 cevans87 force-pushed the file_io branch 2 times, most recently from e01cb67 to 82350d3 Compare May 14, 2020 03:53
@cevans87 cevans87 force-pushed the file_io branch 2 times, most recently from e9f58e9 to 27c2c17 Compare May 22, 2020 04:40
Copy link
Contributor

@jasone jasone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot has changed since my first review. Once you have my feedback incorporated I'll do another full pass.

type file = t
type t = Buffer.t Stream.t

(* val read: ?file !-> file * (Buffer.t, Error.t) result Stream.t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val read: ?file !-> file * (Buffer.t, Error.t) result Stream.t *)
(* val of_file: file -> (Buffer.t, Error.t) result Stream.t *)

I'm uneasy about requiring isolated file references here (?file), because it requires some care to keep a reference isolated until it is consumed. Additionally, there's no benefit here, since we return a reference to the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was to consume a unique reference to a file. The stream would own it. The file that we return here would only be useful for the File.close function, basically the interruptable type we mentioned before. Every other function in the API that takes a file input except File.close requires a the file to be a unique reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique references are rather precious, and they require meticulous care. I've been calling them "isolated" rather than "unique", because the only thing that keeps them unique is isolation. We should definitely chat about this in more detail and figure out whether they're appropriate for this use case.

(* val read_hlt: ?file !-> file * t *)
val of_file_hlt: file -> file * t

(* val write: ?file -> t !-> Error.t option * ?file *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val write: ?file -> t !-> Error.t option * ?file *)
(* val write: file -> t !-> Error.t option *)

(** [of_path mode path] opens file in [mode] at [path] and returns the resulting
file or halts if file could not be opened. *)

(* val of_stdin: unit !-> Bytes.t !-> ?t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val of_stdin: unit !-> Bytes.t !-> ?t *)
(* val of_stdin: unit !-> Bytes.t !-> t *)

The file isn't necessarily isolated; it could be a copy of a value in the environment. Same for stdout and stderr. In fact, these should probably be much simpler (values rather than functions).

Suggested change
(* val of_stdin: unit !-> Bytes.t !-> ?t *)
(* val stdin: t *)


CAMLprim value
hemlock_file_of_stdin_inner(value a_unit) {
return hemlock_file_finalize_result(dup(STDIN_FILENO));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't dup the terminal file descriptors.

(** [of_path mode path] opens file in [mode] at [path] and returns the resulting
file or an error if file could not be opened. *)

(* val of_path_hlt: ?flag:Flag.t -> ?mode:usize -> ^Buffer.t !-> t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val of_path_hlt: ?flag:Flag.t -> ?mode:usize -> ^Buffer.t !-> t *)
(* val of_path_hlt: ?flag:Flag.t -> ?mode:usize -> ^Buffer.t !$-> t *)

(* val read_into: !^Buffer.t -> t $-> Error.t option *)
val read_into: Buffer.t -> t -> (Buffer.t, Error.t) result

(* val read_into_hlt: !^Buffer.t -> t !-> Buffer.t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val read_into_hlt: !^Buffer.t -> t !-> Buffer.t *)
(* val read_into_hlt: !^Buffer.t -> t !$-> Buffer.t *)

(** [read n t] reads up to [n] bytes from [t] and returns the read bytes or an
error if bytes could not be read. *)

(* val read_hlt: t !-> @Buffer.t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val read_hlt: t !-> @Buffer.t *)
(* val read_hlt: t !$-> @Buffer.t *)

(** [write bytes t] writes [bytes] to [t] and returns an error if bytes could
not be written. *)

(* val write_hlt: ^Buffer.t -> t !-> usize *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val write_hlt: ^Buffer.t -> t !-> usize *)
(* val write_hlt: ^Buffer.t -> t !$-> usize *)

(** [close t] closes file [t] and returns an error if it could not be closed.
*)

(* val close_hlt: t !-> unit *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val close_hlt: t !-> unit *)
(* val close_hlt: t !$-> unit *)

index relatve to the beginning of the file or an error if it could not be
changed. *)

(* val seek_hlt: isize -> t !-> usize *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val seek_hlt: isize -> t !-> usize *)
(* val seek_hlt: isize -> t !$-> usize *)

after to the beginning of the file. Returns the new byte index relatve to
the beginning of the file or an error if it could not be changed. *)

(* val seek_hd_hlt: isize -> t !-> usize *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val seek_hd_hlt: isize -> t !-> usize *)
(* val seek_hd_hlt: isize -> t !$-> usize *)

before to the end of the file. Returns the new byte index relatve to the
beginning of the file or an error if it could not be changed. *)

(* val seek_tl_hlt: ?i:isize -> t !-> usize *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val seek_tl_hlt: ?i:isize -> t !-> usize *)
(* val seek_tl_hlt: ?i:isize -> t !$-> usize *)

writes contents of [byte_stream] to it. Returns an error if not all bytes
could be written. *)

(* val write_hlt: file -> t !-> unit *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val write_hlt: file -> t !-> unit *)
(* val write_hlt: file -> t !$-> unit *)

@@ -9,6 +9,7 @@ open Rudiments

(** {1 Type and derivations} *)

(* (`a, >e) t *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* (`a, >e) t *)
(* ('a, >e) t *)

@@ -34,63 +38,76 @@ val init_indef: f:('state -> ('a * 'state) option) -> 'state -> 'a t

(** {1 Length} *)

(* val length: 'a >e t >e-> usize *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* val length: 'a >e t >e-> usize *)
(* val length: ('a, >e) t >e-> usize *)

Same thing in several other places.

bootstrap/src/basis/stream.mli Outdated Show resolved Hide resolved
bootstrap/src/basis/stream.mli Outdated Show resolved Hide resolved
@cevans87 cevans87 force-pushed the file_io branch 2 times, most recently from 8cc7539 to 6e0143b Compare August 3, 2020 06:22
@cevans87 cevans87 marked this pull request as ready for review August 3, 2020 06:26
@jasone
Copy link
Contributor

jasone commented Aug 6, 2020

Merged via #84.

@jasone jasone closed this Aug 6, 2020
@cevans87 cevans87 deleted the file_io branch September 20, 2021 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants