From bdfb6c64470b7e378ff107c9d71128b66e99e81d Mon Sep 17 00:00:00 2001
From: Thibaut Mattio <thibaut.mattio@gmail.com>
Date: Wed, 29 Nov 2023 15:38:58 +0100
Subject: [PATCH] Add Practical OCaml to the Planet (#1806)

---
 data/planet-sources.yml                       |   3 +
 .../re-developing-tcp-from-the-grounds-up.md  |  49 ++++++++
 data/planet/practicalocaml/-hello--world-.md  |  16 +++
 ...-gadts-and-why-you-aint-gonna-need-them.md | 102 +++++++++++++++
 ...d-fast-with-ocaml-modeling-a-web-router.md |  16 +++
 ...pe-safe-state-machines-using-type-state.md | 117 ++++++++++++++++++
 .../unix-module-considered-harmful.md         |  26 ++++
 7 files changed, 329 insertions(+)
 create mode 100644 data/planet/hannes/re-developing-tcp-from-the-grounds-up.md
 create mode 100644 data/planet/practicalocaml/-hello--world-.md
 create mode 100644 data/planet/practicalocaml/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them.md
 create mode 100644 data/planet/practicalocaml/how-i-explore-domain-problems-cheaply-and-fast-with-ocaml-modeling-a-web-router.md
 create mode 100644 data/planet/practicalocaml/how-to-build-type-safe-state-machines-using-type-state.md
 create mode 100644 data/planet/practicalocaml/unix-module-considered-harmful.md
diff --git a/data/planet-sources.yml b/data/planet-sources.yml
index f968480ccd..df40f009d4 100644
--- a/data/planet-sources.yml
+++ b/data/planet-sources.yml
@@ -1,3 +1,6 @@
+- id: practicalocaml
+  name: Practical OCaml
+  url: https://practicalocaml.com/rss/
 - id: ocamlpro
   name: OCamlPro
   url: https://ocamlpro.com/blog/feed
diff --git a/data/planet/hannes/re-developing-tcp-from-the-grounds-up.md b/data/planet/hannes/re-developing-tcp-from-the-grounds-up.md
new file mode 100644
index 0000000000..90783fac99
--- /dev/null
+++ b/data/planet/hannes/re-developing-tcp-from-the-grounds-up.md
@@ -0,0 +1,49 @@
+---
+title: Redeveloping TCP From the Ground Up
+description:
+url: https://hannes.robur.coop/Posts/TCP-ns
+date: 2023-11-28T21:17:01-00:00
+preview_image:
+featured:
+authors:
+- hannes
+source:
+---
+
+<p>The <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).</p>
+<p>As common for Internet protocols, TCP is also specified in a series of so-called requests for comments (RFC). The latest revised version from August 2022 is <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC 9293</a>; the initial one was <a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> from September 1981.</p>
+<h1>My Brief Personal TCP Story</h1>
+<p>My interest in TCP started back in 2006 when we worked on a <a href="https://github.com/dylan-hackers/network-night-vision">network stack in Dylan</a> (these days abandoned). Ever since, I wanted to understand the implementation tradeoffs in more detail, including attacks and how to prevent a TCP stack from being vulnerable.</p>
+<p>In 2012, I attended ICFP in Copenhagen while a PhD student at ITU Copenhagen. There, <a href="https://www.cl.cam.ac.uk/~pes20/">Peter Sewell</a> gave an invited talk &quot;Tales From the Jungle&quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself in (<a href="https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf">my dissertation</a>) and having a strong interest in real systems, I was immediately hooked by his perspective.</p>
+<p>To dive a bit more into <a href="https://www.cl.cam.ac.uk/~pes20/Netsem/">network semantics</a>, the work done on TCP by Peter Sewell, et. al., is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with nondeterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD, which have been checked by the model for validity. This copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stacks and reported them upstream to have them fixed (though, there still is some special treatment, e.g., for the &quot;BSD listen bug&quot;).</p>
+<p>In 2014, I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the <a href="https://dl.acm.org/doi/10.1145/3243650">journal paper</a> (<a href="http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf">author's copy</a>) published. At the same time, the <a href="https://mirage.io">MirageOS</a> melting pot was happening at University of Cambridge, where I contributed OCaml-TLS, etc., with David.</p>
+<p>My intention was to understand TCP better and use the specification as a basis for a TCP stack for MirageOS. The <a href="https://github.com/mirage/mirage-tcpip">existing one</a> (which is still used) has technical debt: a high issue to number of lines ratio. The Lwt monad is ubiquitous, which makes testing and debugging pretty hard, so utilising multiple cores with OCaml Multicore won't be easy. Plus it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me implement things such as restart-on-failure in albatross, there are certain connection states which will never be exited.</p>
+<h1>The Rise of <a href="https://github.com/robur-coop/utcp">&micro;TCP</a></h1>
+<p>Back in Cambridge, I didn't manage to write a TCP stack based on the model, but in 2019, I restarted that work and got &micro;TCP (the formal model manually translated to OCaml) to compile and do TCP-session setup and teardown. Since it was a model that uses nondeterminism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be made. Due to other projects, I worked only briefly in 2021 and 2022 on &micro;TCP, but finally in the Summer of 2023, I motivated myself to push &micro;TCP into a usable state. So far I've spent 25 days in 2023 on &micro;TCP. Thanks to <a href="https://tarides.com">Tarides</a> for supporting my work.</p>
+<p>Since late August, we have been running some unikernels using &micro;TCP, e.g., the <a href="https://retreat.mirage.io">retreat</a> website. This allows us to observe &micro;TCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e., there is no retransmit timer in the close wait state, which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an influx time series database and view graphs rendered by Grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what &micro;TCP does, what the RFC says, and eventually what existing deployed TCP stacks do.</p>
+<h1>The Secondary Nameserver Issue</h1>
+<p>One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilisation (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver. You can observe that both increases over time, and roughly every 3 hours, the unikernel hits its configured memory limit (64 MB), crashes with *out of memory*, and is restarted. The graph below is using the `mirage-tcpip` stack.</p>
+<p><a href="https://hannes.robur.coop/static/img/a.ns.mtcp.png"><img src="https://hannes.robur.coop/static/img/a.ns.mtcp.png" width="750"/></a></p>
+<p>Now, after switching over to &micro;TCP, graphed below, there's much less network utilisation and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. On their left side, both graphs contain a few hours of `mirage-tcpip`, and shortly after 20:00 on Nov 23rd, &micro;TCP got deployed.</p>
+<p><a href="https://hannes.robur.coop/static/img/a.ns.mtcp-utcp.png"><img src="https://hannes.robur.coop/static/img/a.ns.mtcp-utcp.png" width="750"/></a></p>
+<p>Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and it doesn't inform the application that the connection couldn't be established. Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilisation increased enormously.</p>
+<p><a href="https://hannes.robur.coop/static/img/a.ns.utcp-ev.png"><img src="https://hannes.robur.coop/static/img/a.ns.utcp-ev.png" width="750"/></a></p>
+<p>Now, the network utilisation is unwanted. This was hidden by the application waiting forever that the TCP connection getting established. Our bug fix uncovered another issue--a tight loop:</p>
+<ul>
+<li>The nameserver attempts to connect to the other nameserver (<code>request</code>);
+</li>
+<li>This results in a <code>TCP.create_connection</code> which errors after one roundtrip;
+</li>
+<li>This leads to a <code>close</code>, which attempts a <code>request</code> again.
+</li>
+</ul>
+<p>This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on <a href="https://github.com/mirage/ocaml-dns/pull/347">the fix</a> and redeployed the nameserver again. On the left edge, the graph has the tight loop (so you have a comparison), and at 16:05, we deployed the fix, Since then it looks pretty smooth, both in memory usage and in network utilisation.</p>
+<p><a href="https://hannes.robur.coop/static/img/a.ns.utcp-fixed.png"><img src="https://hannes.robur.coop/static/img/a.ns.utcp-fixed.png" width="750"/></a></p>
+<p>To give you the entire picture, below is the graph where you can spot the `mirage-tcpip` stack (lots of network, restarting every 3 hours), &micro;TCP-without-informing-application (run for 3 * ~36 hours), DNS-server-high-network-utilisation (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.</p>
+<p><a href="https://hannes.robur.coop/static/img/a.ns.all.png"><img src="https://hannes.robur.coop/static/img/a.ns.all.png" width="750"/></a></p>
+<h1>Conclusion</h1>
+<p>What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the `mirage-tcpip` was running with the DNS server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.</p>
+<p>We'll take more time to investigate issues of &micro;TCP in production, plan to write further documentation and blog posts, and hopefully soon will be ready for an initial public release. In the meantime, you can follow our development repository.</p>
+<p>We at <a href="https://robur.coop">Robur</a> are working as a collective since 2018 on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust, and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continue our work by <a href="https://aenderwerk.de/donate/">donating</a> (select Robur from the drop-down or put &quot;donation Robur&quot; in the purpose of the bank transfer).</p>
+<p>If you have any questions, reach us best via eMail to team AT Robur DOT coop.</p>
+
diff --git a/data/planet/practicalocaml/-hello--world-.md b/data/planet/practicalocaml/-hello--world-.md
new file mode 100644
index 0000000000..9eccb2030a
--- /dev/null
+++ b/data/planet/practicalocaml/-hello--world-.md
@@ -0,0 +1,16 @@
+---
+title: '{ hello = `world; }'
+description: And here we are! After writing my share of ReScript at Practical ReScript,
+  I figured I'd start a blog for OCaml that would help consolidate my experiences
+  with it into some Practical advice, that you can use to start new projects, to contribute
+  to existing ones, hell even get a
+url: https://practicalocaml.com/hello-world/
+date: 2023-08-23T00:30:16-00:00
+preview_image:
+featured:
+authors:
+- Practical OCaml
+source:
+---
+
+<p>And here we are! After writing my share of ReScript at Practical ReScript, I figured I'd start a blog for OCaml that would help consolidate my experiences with it into some Practical advice, that you can use to start new projects, to contribute to existing ones, hell even get a job as with OCaml!</p><p>Since I don't write OCaml <em>every day</em>, sometimes I'll tag in other people to help out.</p><p>But today is just about saying hello &#128075; </p><p>Some of the subjects you can expect to read about here are:</p><ul><li>How to set up projects with shared libraries between melange and dune</li><li>How to structure your GraphQL server/client</li><li>How to build web apps with HTMX, React</li><li>Maybe even how to run OCaml on some runtimes like AWS Lambda</li></ul><p>And definitely on topics like:</p><ul><li>Modeling, and how to cheaply explore ideas </li><li>How to structure your projects so refactoring is easy</li><li>How to build command line applications</li><li>and more!</li></ul><p>So if you've got any ideas, feel free to post at me in X: <a href="https://practicalocaml.com/rss/x.com/leostera">@leostera</a></p>
diff --git a/data/planet/practicalocaml/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them.md b/data/planet/practicalocaml/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them.md
new file mode 100644
index 0000000000..8af17e8e83
--- /dev/null
+++ b/data/planet/practicalocaml/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them.md
@@ -0,0 +1,102 @@
+---
+title: A quick guide to GADTs and why you ain't gonna need them
+description: Ever wanted to use a GADT but did not know if you really needed them?
+  You probably don't. And here's why.
+url: https://practicalocaml.com/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them/
+date: 2023-08-28T13:46:43-00:00
+preview_image:
+featured:
+authors:
+- Practical OCaml
+source:
+---
+
+<p>I've been seeing some more posts about how to use and when to use GADTs. GADTs (Generalized Abstract Data Types, I pronounce them &quot;gats&quot; like in &quot;cats&quot; but with a &quot;g&quot;) are an extension to regular ADTs that is usually reserved for very specific scenarios, but its not always clear which those are and why.</p><p>So I figured I'd give this a shot, and write a super small primer on them, by example. We're gonna write a library that didn't need to use GADTs, and along the way we're gonna feel the pains that come with GADTs, and what specific things they are amazing for.</p><p>Buckle on to your camels! &#128043;</p><h2>GADTs by Example</h2><p>Alright usually you'd write your variant as:</p><pre><code class="language-ocaml">type role = 
+  | Guest of guest
+  | User of user</code></pre><p>and use it as <code>User(user)</code> or <code>Guest(guest)</code>. You can think of these constructors as little functions that go from some arguments to a value of type <code>role</code>. So really <code>Guest</code> &quot;has type&quot; <code>guest -&gt; role</code>. And <code>User</code> has type <code>user -&gt; role</code>.</p><p>A GADT will let you change (a little) the <em>return type</em> of these constructors. So let's see what our <code>role</code> example looks like with GADT syntax.</p><pre><code class="language-ocaml">type role =
+  | Guest: guest -&gt; role
+  | User: user -&gt; role</code></pre><p>Neat, right? I really like this syntax.</p><p>But in this case, our <code>role</code> type can't be <em>segmented</em> or <em>parametrized. </em>I mean you can just have <code>role</code>, it's not like an <code>option</code> or a <code>list</code> where you can have an <code>int option</code> or <code>bool list</code> and those are more specific types of <code>option</code> or <code>list</code>. This means we can really only return <code>role</code> in any of our constructors, so <strong>you probably don't need GADTs.</strong> </p><p>Let's look at an example that <em>does not need GADTs </em>but uses them anyway.</p><h3>A Validation Library</h3><p>We'll make a validation type that we can use to mark things as validated throughout our app:</p><pre><code class="language-ocaml">type _ validation =
+  | Valid: 'thing -&gt; 'thing validation
+  | Invalid: 'thing * string -&gt; 'thing validation
+  | Pending: 'thing -&gt; 'thing validation</code></pre><p>Note how using our constructors returns different types. If you use <code>Valid(&quot;hello&quot;)</code> you get a <code>string validation</code>, if you use <code>Invalid(2113, &quot;not cool number&quot;)</code> you get an <code>int validation</code>, and so on.</p><p>If we try to make some helper functions they may look like this:</p><pre><code class="language-ocaml">let make : 'thing -&gt; 'thing validation =
+  fun x -&gt; Pending x
+ 
+let get : 'x validation -&gt; 'x =
+  function
+  | Pending x -&gt; x
+  | Valid x -&gt; x
+  | Invalid (x, _) -&gt; x
+  
+let errors : 'x validation -&gt; string option =
+  function
+  | Invalid (_, err) -&gt; Some err
+  | _ -&gt; None 
+  
+let is_valid : 'x validation -&gt; bool =
+  function
+  | Valid _ -&gt; true
+  | _ -&gt; false</code></pre><p>You can imagine how we'll have to check every time we want to open up a validation, to see if its pending, valid, or invalid. This means other devs will have to remember to check if the validation passed before using the value.</p><p>I don't know about you but I'm 100% certain I will forget to do that at least once.</p><p><strong>GADTs can help us here to reduce the space of all the possible types that our variant constructors create. </strong>Right now, we can essentially create any <code>&lt;type&gt; validation</code> by just passing the right value to the constructors. But we could change that, so that you can statically check a value has passed, is pending, or has failed validation. Our new GADT could look like this:</p><pre><code class="language-ocaml">(* we'll make some abstract types to use for differentiating
+   the validation states *)
+type pending
+type failed
+type valid
+
+(* our new validation GADT *)
+type _ validation =
+  | Pending: 'thing -&gt; pending validation
+  | Valid: 'thing -&gt; valid validation
+  | Failed: 'thing * string -&gt; failed validation</code></pre><p>This means that our functions can have restricted type signatures and smaller implementations. The compiler now knows that there is only one possible type that matches the <code>Valid</code> constructor, so we can't consider the others.</p><pre><code class="language-ocaml">let make x = Pending x
+
+let get (Valid x) = x
+
+let errors (Fail (_, err)) = err</code></pre><p>Great! The downside is that this code doesn't actually type-check. &nbsp;You can't pattern-match and get a value out of <code>Valid x</code> because the compiler &quot;forgot&quot; what type that value had. Let me show you what I mean. This function:</p><pre><code class="language-ocaml">let get (Valid x) = x</code></pre><p>Fails to type with this error:</p><pre><code class="language-ocaml">File &quot;lib/gadt.ml&quot;, line 18, characters 20-21:
+18 | let get (Valid x) = x
+                         ^
+Error: This expression has type $Valid_'thing
+       but an expression was expected of type 'a
+       The type constructor $Valid_'thing would escape its scope</code></pre><p>And the type of x in the error is <code>$Valid_'thing</code>, which is a weird type. The compiler yells that it would escape its scope. That's how OCaml tells us <em>&quot;Hey I know there should be a type here but I...erhm...don't know anymore what that type *actually* was. So, yeah, can't let you use it. Sorry&quot;</em>.</p><p>So how does one get this value out?</p><p>Turns out that while the compiler won't let this type <em>escape</em>, if you put many things together inside the same constructor, <em>it will remember if the type was the same across all of them</em>. For example, the compiler is completely happy here:</p><pre><code class="language-ocaml">(* we introduce a function in our Valid arguments *)
+type 'validity validation = 
+  | Valid: 'thing * ('thing -&gt; bool) -&gt; valid validation
+  | ...
+  
+let get_that_bool (Valid (x, fn)) = fn x </code></pre><p>because it understands that once you pack together a <code>'thing</code> and a <code>'thing -&gt; bool</code>, then it's the same <code>'thing</code>. &nbsp;So this will work for any type. &nbsp;And that's both quite powerful and also super non-obvious at first. Like, what is this useful for? </p><p><strong>GADTs can help us hide type information, but still be able to use it later. </strong>In <a href="https://practicalocaml.com/how-i-explore-domain-problems-faster-and-cheaply-in-ocaml/">the last issue of Practical OCaml</a> we created a <code>route</code> type for our web router that uses this pattern to hide the types that different route-handler functions use as inputs, and it lets us put together into a single list, a bunch of routes that have type-safe parameters/body parsing. Pretty cool. </p><p>Anyways, back to our question, to get our Valid value out, we will need to either:</p><ul><li>let the value <em>escape</em></li><li>or turn it into a type that is known outside the GADT, like include a <code>'thing -&gt; string</code> function so we can always just call that function to turn our <code>'thing</code> into a string and return a string</li></ul><p>We want to preserve the type of our value, so we're gonna do the first. Letting our value escape means actually putting <code>'thing</code> in the return type of our constructors. Like this:</p><pre><code class="language-ocaml">type _ validation =
+  | Valid: 'thing -&gt; (valid * 'thing) validation
+  | ...</code></pre><p>That's actually all we need. Now the compiler can infer the signature of our <code>get</code> function is <code>(valid * 'thing) validation -&gt; 'thing</code>, and it lets us take that <code>'thing</code> out. BAM. Done.</p><p>And yet our validation solution still doesn't help us actually validate anything. We don't have a function that goes from pending to valid, or from pending to invalid. Since our <code>Pending</code> variant doesn't know what a <code>'thing</code> is, it also doesn't know how to validate it. &nbsp;We'll start there:</p><pre><code class="language-ocaml">(* we added a `fn` to our Pending constructor *)
+type _ validation =
+  | Pending: 'thing * ('thing -&gt; bool) -&gt; pending validation
+  | ...
+ 
+let validate (Pending (x, fn)) =
+  if fn x
+  then Valid x
+  else Invalid (x, &quot;invalid value!&quot;)</code></pre><p>We run the compiler, and see the same issue as before: <code>Valid x</code> would have type <code>$Pending_'thing</code>, because our Pending variant doesn't really expose it's internal <code>'thing</code> type... yadda yadda...we can fix that too by letting <code>'thing</code> escape:</p><pre><code class="language-ocaml">type _ validation =
+  | Pending: 'thing * ('thing -&gt; bool) -&gt; (pending * 'thing) validation
+  | ...</code></pre><p>Aaaand now we run into another issue. Oof. <code>Valid x</code> and <code>Invalid (x, &quot;invalid value!&quot;)</code> have different types &#128584; &ndash; this is a very common &quot;dead end&quot;.</p><p>For now, we are going to wrap 'em up in a <code>result</code> and it will push the problem to future you and me:</p><pre><code class="language-ocaml">let validate (Pending (x, fn)) =
+   if fn x
+   then Ok (Valid x)
+   else Error (Invalid (x, &quot;invalid value!&quot;))</code></pre><p>So now we can use our Extremely Type-Safe Validation Lib:</p><pre><code class="language-ocaml">let _ =
+  let user_value = make 2113 (fun x -&gt; x &gt; 0) in
+  match validate user_value with
+  | Ok value -&gt;
+      let age = get value in
+      print_int age;
+  | Error err -&gt;
+      let err = errors err in
+      print_string err;
+</code></pre><p>Hopefully implementing this has shown some of the reasons why GADTs while powerful are rather painful to work with. </p><p>For completeness's sake here's the full code:</p><pre><code class="language-ocaml">type valid
+type invalid
+type pending
+
+type _ validation =
+  | Pending : 'thing * ('thing -&gt; bool) -&gt; (pending * 'thing) validation
+  | Valid : 'thing -&gt; (valid * 'thing) validation
+  | Invalid : 'thing * string -&gt; invalid validation
+
+let make x fn = Pending (x, fn)
+let get (Valid x) = x
+let errors (Invalid (_, err)) = err
+
+let validate (Pending (x, fn)) =
+  if fn x then Ok (Valid x) else Error (Invalid (x, &quot;invalid value!&quot;))</code></pre><h2>Conclusion: You Don't Need GADTs</h2><p>Truth is that unless you are Jane Street and <a href="https://blog.janestreet.com/why-gadts-matter-for-performance/?ref=practicalocaml.com">need to optimize the hell out of your compact arrays</a>, or <a href="https://v2.ocaml.org/manual/gadts-tutorial.html?ref=practicalocaml.com">are writing a toy &lambda;-calculus interpreter</a>, you're probably better off without them.</p><p>GADTs can be super useful if you need to:</p><ol><li>hide type information</li><li>restrict the kind of types that can be instantiated</li><li>have more control over the relation between the input and return type of a function</li></ol><p>But GADTs are not only hard to pronounce, they also come with a host of problems. The ones we've seen, and some more that we haven't that need solutions with wizardly names like <em>locally abstract types </em>or <em>polymorphic recursion.</em> Learning about this is fun and great, but sometimes can get in the way of shipping without substantially improving the quality of your product or developer experience.</p><p>So stick to simpler types until you run into one of those 3 problems and I promise you you'll be a productive, happy camelid &#128640;</p><p>Have you implemented typed state machines in some other ways? Have anything to add or challenge? I'd love to hear it! <a href="https://twitter.com/leostera/status/1696157662122065975?ref=practicalocaml.com">Join the x.com thread</a>.</p><p>Happy Cameling! &#128043;</p><hr/><p>Thanks to <a href="https://twitter.com/patricoferris?ref=practicalocaml.com">@patricoferris</a> for pointing out that if you do implement the above pattern, but move outside the current module (for ex. have a submodule for your <code>valid</code>, <code>invalid</code>, <code>pending</code> types), you may run into some undecidability problems that make pattern-matching non-exhaustive. Oof, many words. The gist is, if you see a &quot;This pattern is non-exhaustive&quot; error, try adding a private constructor to your type-tags:</p><pre><code class="language-ocaml">type valid = private | Valid_tag
+type invalid = private | Invalid_tag
+type pending = private | Pending_tag</code></pre><p>For a more detailed answer, check out this <a href="https://discuss.ocaml.org/t/gadt-pattern-matching-exhaustiveness/7195?ref=practicalocaml.com">OCaml forum post</a>.</p><p></p>
diff --git a/data/planet/practicalocaml/how-i-explore-domain-problems-cheaply-and-fast-with-ocaml-modeling-a-web-router.md b/data/planet/practicalocaml/how-i-explore-domain-problems-cheaply-and-fast-with-ocaml-modeling-a-web-router.md
new file mode 100644
index 0000000000..9cc961925b
--- /dev/null
+++ b/data/planet/practicalocaml/how-i-explore-domain-problems-cheaply-and-fast-with-ocaml-modeling-a-web-router.md
@@ -0,0 +1,16 @@
+---
+title: 'How I explore domain problems cheaply and fast with OCaml: modeling a web
+  router'
+description: "You've heard of Domain-Driven Design, now buckle up for Type-Driven
+  Domain..wait. Typed Domains Driving...nevermind. We're gonna use Only Types to Understand
+  our Domain Problems Very Fast! \U0001F680"
+url: https://practicalocaml.com/how-i-explore-domain-problems-faster-and-cheaply-in-ocaml/
+date: 2023-08-24T13:29:21-00:00
+preview_image:
+featured:
+authors:
+- Practical OCaml
+source:
+---
+
+<p>Hello folks! Starting out the blog with a topic that I love: domain modeling.</p><p>Domain modeling is the art and science of figuring out how to map some messy, fuzzy, real-life ideas and things, into a computer program that we can execute.</p><p>It is usually easier to say than it is to do, so I figured I'd give you an example of how I've done domain modeling in the past and how I like to explore domain problems through it.</p><h2>Shapes of Things</h2><p>There's only so many kinds of data we can have in our programs. You have single things, you have collections of the same things. You have collections of different things that also happen to be a thing, and they can either be one of many things that are the same thing, or many things together making a thing!</p><p>Where I'm going with this is that the shapes of data that you normally work with come in a few packages.</p><p>We have many things that belong together but are distinct on their own. In OCaml we call these <em>variants</em>.</p><p>We have many things that belong together and form a single unit, where every piece has its own place, and the place doesn't have a name. In OCaml we call these <em>tuples. </em>But when these pieces don't have a place and instead have a name, in OCaml we call these <em>records</em>.</p><p>We have things that exist on their own and we don't know much about them. In OCaml we call these <em>abstract types</em>.</p><p>We have things that don't tell us everything that they are, but that can hide information from us. In OCaml we call these <em>generalized abstract data types</em> (GADTs, and I pronounce it like &quot;cats&quot; but with a G, try it out loud, its cool, no-one can hear you).</p><p>We have things that are collections of other things. Lists, arrays, hash tables, sets, queues, heaps, tuples. </p><p>And in fact, we can create most of these different shapes of data with some of the simpler ones. Records can made with tuples. Lists can be made with variants. And so on.</p><p>Okay, enough of this. Let's start using these shapes for some specific domain problems.</p><h2>Modeling a Web Router</h2><p>We'll start with one that most of you will likely be familiar with: a web router. That is a router that helps a web framework figure out where to send each request. There are many out there, and I'm not pitching you to write your own (but you should, because it's a good way to learn!).</p><p>But what really <em>is</em> a web router? It's a collection of patterns and handler functions. Not unlike a <code>match</code> expression really. It matches the <code>pattern</code> against a web request object, and if it succeeds it will execute the function that corresponds to it.</p><p>So we can start by defining what we know:</p><ul><li>there are patterns, and</li><li>there are handler functions,</li><li>a route is a pair of a pattern and a handler</li><li>a router has routes</li></ul><figure class="kg-card kg-image-card"><img src="https://practicalocaml.com/content/images/2023/08/image-1.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="386" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-1.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-1.png 1000w, https://practicalocaml.com/content/images/2023/08/image-1.png 1040w" sizes="(min-width: 720px) 720px"/></figure><p>Brilliant! Now we have our types in place, we can start exploring how they interact with each other.</p><p>A router typically will receive some form of <code>request</code> and turn it into a <code>response</code>. After all, we expect to reply to our users with something. </p><p>So then, to make a response, we need &nbsp;to find a <code>handler</code>. We can do that by matching against every <code>route</code> until a <code>pattern</code> matches. When that happens it expects to receive a <code>response</code>.</p><p>Now slowly we are building up the right <em>vocabulary</em> not just around the problem, but also in the code that deals with it.</p><figure class="kg-card kg-image-card kg-width-wide"><img src="https://practicalocaml.com/content/images/2023/08/image-3.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="746" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-3.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-3.png 1000w, https://practicalocaml.com/content/images/2023/08/image-3.png 1040w"/></figure><p> Notice how we create a few new types for <code>request</code> and <code>response</code>, which are new Things we are working with.</p><p>We also created two new functions, one for matching a <code>pattern</code> against a <code>request</code> called <code>matches</code>; and a second one called <code>route</code> to create a <code>response</code> from a <code>router</code> and <code>request</code>.</p><p>And that's it. We have our first model for a router. We have a clearer understanding of what the moving pieces are, and how they connect together.</p><p>From here we can take it in many directions, but what I like to do is to do a second pass and <em>challenge the model</em>.</p><h3>Challenging the model</h3><p>In the process of challenging, we want to grab individual pieces and ask what's important about them, and how are they different than other things, and why are they really needed.</p><p>For example, why is a <code>pattern</code> a separate entity and not just a behavior of a <code>handler</code>? A handler could well <em>ignore</em> a request and just let the next handler handle it.</p><p>This would lead to a slightly different model, where a handler either tells us it has handled or ignored something, or the handler itself calls the next thing.</p><p>In the first case, we can model it by making a new type of that can be either an ignored handler result, or a handled handler result:</p><figure class="kg-card kg-image-card"><img src="https://practicalocaml.com/content/images/2023/08/image-4.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="746" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-4.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-4.png 1000w, https://practicalocaml.com/content/images/2023/08/image-4.png 1040w" sizes="(min-width: 720px) 720px"/></figure><p>This leads naturally to some implementations, such as folding over the list of routes, and bailing as soon as we find a route that returned <code>Handled(res)</code>. This is super flexible when it comes letting the route itself decide how or if it will process a request.</p><p>But in the second case, we can see we have an even more powerful model. In this one we are making sure every handler receives <em>the next handler</em>, which it can call at any point, any number of times. This is what this second model looks like:</p><figure class="kg-card kg-image-card"><img src="https://practicalocaml.com/content/images/2023/08/image-5.png" class="kg-image" alt="alt" loading="lazy" width="1056" height="656" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-5.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-5.png 1000w, https://practicalocaml.com/content/images/2023/08/image-5.png 1056w" sizes="(min-width: 720px) 720px"/></figure><p>This model leads to a recursive implementation, where we have to build our handlers in advance, so that the calls to <code>next</code> go in the right order. This can be much trickier than the prior models we saw.</p><h3>Refining the Model</h3><p>Once we find a model that we like, and in this case we'd like to stick with the first one, we can start doing some <em>refinements</em>.</p><p>Refining is the process of adding detail to the model, and it helps us see how it can materialize as a working application.</p><p>For example, we can take our <code>pattern</code> type and start looking into what shapes it can actually take. Usually, a domain expert here is the best person to ask: &quot;what really is a pattern?&quot;</p><p>In our case, we want to be able to match on the route URL or <em>path;</em> the kind of HTTP method they are using, or <em><em>verb</em></em>; and we'd like to know what is in the body. </p><p>To do this we expand our <code>pattern</code> type once to include some data, and in the process we define a new type for the HTTP method, since we knew we needed that and we roughly understand upfront the shape it has: it can be one of some options. The new pattern now looks like this: </p><figure class="kg-card kg-image-card"><img src="https://practicalocaml.com/content/images/2023/08/image-6.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="522" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-6.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-6.png 1000w, https://practicalocaml.com/content/images/2023/08/image-6.png 1040w" sizes="(min-width: 720px) 720px"/></figure><p>Excellent, but now what exactly happens in the body? </p><p>As it is above, it can either be present and be a string, or be not present. If it is <code>Some(string)</code> then either an empty string <code>&quot;&quot;</code> and the entire works of Shakespeare in Korean would be valid bodies. Is this really what we mean to say?</p><p>Here's where our refining doubles down, and asks if there's anything special about the body in this specific pattern, or in this specific route. So we're relating the current refining learnings with our past learnings.</p><p>Turns out the body should actually be something that the handler can in fact handle, so we need somehow to make it fit into what the handler expects.</p><p>So does the handler really expect a <code>request</code>? Or does it expect a specific kind of request? Let's see if we can be more specific in a few steps:</p><ol><li>Let our <code>pattern</code> be more specific about a request payload</li><li>Make our <code>handler</code> be specific about the request payload</li><li>Make our <code>router</code> work with our new <code>handler</code></li></ol><p>We will start by making our <code>pattern</code> take <em>type parameter</em>. At this point a <em>type parameter</em> usually means &quot;here's a kind of data that is really a much larger group of data, where there are subgroups of it that can't be mixed&quot;.</p><figure class="kg-card kg-image-card"><img src="https://practicalocaml.com/content/images/2023/08/image-9.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="432" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-9.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-9.png 1000w, https://practicalocaml.com/content/images/2023/08/image-9.png 1040w" sizes="(min-width: 720px) 720px"/></figure><p>BUT there's a big problem here. We can't really create the <em>pattern</em> ahead of time, knowing what the payload will be like. </p><p>A pattern really is a <em>specification</em> for how to match against requests. So instead, we need to provide a <em>way</em> of reading the body into the <code>'payload</code> type. Thankfully, we have first-class functions in OCaml, so this is an easy fix.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://practicalocaml.com/content/images/2023/08/image-10.png" class="kg-image" alt="alt" loading="lazy" width="1040" height="432" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-10.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-10.png 1000w, https://practicalocaml.com/content/images/2023/08/image-10.png 1040w" sizes="(min-width: 720px) 720px"/><figcaption>Notice how our <code>body</code> now becomes a function that will receive a <code>string</code> and try to return a <code>'payload</code>. If it can't, the it can always return an <code>option</code>. In practice we would probably use a <code>result</code> type here, but for the sake of this post an <code>option</code> is good enough.</figcaption></figure><p>Next up, we have our handler, which now should receive a type parameter for our payload and look a bit more like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://practicalocaml.com/content/images/2023/08/image-19.png" class="kg-image" alt="alt" loading="lazy" width="1130" height="304" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-19.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-19.png 1000w, https://practicalocaml.com/content/images/2023/08/image-19.png 1130w" sizes="(min-width: 720px) 720px"/><figcaption>Our handler function type now receives a payload before a request.</figcaption></figure><p>And finally, it seems that our <code>route</code> and <code>router</code> type doesn't need much amending. Because we really just need a list of patterns and handlers, and that's exactly what they are, right? </p><p>Right?! &#128584;</p><figure class="kg-card kg-image-card kg-width-wide"><img src="https://practicalocaml.com/content/images/2023/08/image-17.png" class="kg-image" alt="alt" loading="lazy" width="2000" height="394" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-17.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-17.png 1000w, https://practicalocaml.com/content/images/size/w1600/2023/08/image-17.png 1600w, https://practicalocaml.com/content/images/size/w2400/2023/08/image-17.png 2400w" sizes="(min-width: 1200px) 1200px"/></figure><p>Oh no. If we follow this current refinement and thread in a <code>'payload</code> parameter to our <code>route</code> type, we will end up with a single type of payload in the list of routes. This is because the list type can only hold one type of elements, and every <code>'payload route</code> is essentially a new type!</p><ul><li><code>unit route</code> is a type of routes that have no payloads</li><li><code>user route</code> is a type of routes that have payloads of type <code>user</code></li><li>and every one of these is not mixable with the rest :(</li></ul><p>So we can either backtrack, and <em>move</em> this body parsing function inside the handler, to let the handlers figure out how to work with it. Or we can find another way of putting all these handlers together in a list, while maintaining the model as close to the domain. &nbsp; </p><p>For this, we can use a special type of type OCaml has, that lets us <em>hide information</em>. The gist is this:</p><ul><li>We will refactor our <code>route</code> type to include a constructor named <code>Route</code></li><li>this constructor will take a <code>'payload pattern</code> and a <code>'payload handler</code></li><li>and it will return a <code>route</code> that hides the <code>'payload</code> information</li><li>so we can put all our routes in a single list!</li></ul><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://practicalocaml.com/content/images/2023/08/image-20.png" class="kg-image" alt="alt" loading="lazy" width="1330" height="432" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-20.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-20.png 1000w, https://practicalocaml.com/content/images/2023/08/image-20.png 1330w" sizes="(min-width: 1200px) 1200px"/><figcaption>Using a GADT to capture a route with a pattern and a handler, but hiding their payload parameter. {</figcaption></figure><p>Damn, there we go! &#128293; Now we have all the information we want and it seems to be nicely encapsulated in this <code>route</code> type.</p><p>This model actually leads to a rather complex implementation, because every time we unpack the <code>Route</code>, we must make use of both the pattern and the handler at the same time. That's the only requirement for using this information-hiding pattern: you can peak, but you can't leak the information.</p><p>For completeness sake, here's a small implementation that follows our model:</p><figure class="kg-card kg-image-card kg-width-wide"><img src="https://practicalocaml.com/content/images/2023/08/image-21.png" class="kg-image" alt="alt" loading="lazy" width="1348" height="1466" srcset="https://practicalocaml.com/content/images/size/w600/2023/08/image-21.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/08/image-21.png 1000w, https://practicalocaml.com/content/images/2023/08/image-21.png 1348w" sizes="(min-width: 1200px) 1200px"/></figure><h2>Conclusions from Modeling a Router</h2><p>Like this, we've quickly gone through several iterations of our model, tried to understand better what problem we are trying to solve, what are some of the constraints it has, and how our model leads to different implementations.</p><p>It is very important to understand that this first implementation is meant to be <em>correct</em>, and not necessarily optimal. But it can make a great first implementation to test things against, and eventually, help you optimize making sure you are not breaking good behavior!</p><p>I'd love to go on with some more examples, like:</p><ol><li>modeling regulatory compliance for betting companies</li><li>modeling the publishability window of content in the music industry following geographic restrictions</li><li>modeling the optimal publishing of photography content to a social network</li><li>modeling an offline-first graph database for the edge</li><li>and more!</li></ol><p>But we're already over 2,000 words and I'd like you to get a glass of water and maybe stretch your legs. So let me know which modeling example you'd like to see in a next post.</p><p>If you liked this, please subscribe so you get the next issue of Practical OCaml right in your inbox, and share it with your camel friends on lobste.rs, hackernews, x.com, and so on.</p><p>Have you implemented typed state machines in some other ways? Have anything to add or challenge? I'd love to hear it! <a href="https://twitter.com/leostera/status/1695703044409676029?ref=practicalocaml.com">Join the x.com thread</a>.</p><p>Happy Cameling! &#128043;</p><p></p>
diff --git a/data/planet/practicalocaml/how-to-build-type-safe-state-machines-using-type-state.md b/data/planet/practicalocaml/how-to-build-type-safe-state-machines-using-type-state.md
new file mode 100644
index 0000000000..26ae4a0b78
--- /dev/null
+++ b/data/planet/practicalocaml/how-to-build-type-safe-state-machines-using-type-state.md
@@ -0,0 +1,117 @@
+---
+title: How to build type-safe State Machines using type-state
+description: Tired of writing state machines full of invalid transitions? Type-state
+  may be what you're looking for. In this issue of Practical OCaml we show you how
+  to use it to build type-safe state machines.
+url: https://practicalocaml.com/how-to-build-type-safe-state-machines-using-type-state/
+date: 2023-08-29T10:29:59-00:00
+preview_image:
+featured:
+authors:
+- Practical OCaml
+source:
+---
+
+<p>This will be a short one, but I hope it'll make you want to go refactor everything.</p><p>There are a lot of ways of writing state machines in typed languages, all varying in the degree of type-safety. Sure you can put everything in a record full of <code>this option</code> or have one variant per state and a bunch of repeated things...but today I wanna show you a way of doing state machines with something called <em>type-state.</em></p><p>What this buys you is that you can offer a nice uniform API for your state machines while keeping the internal states cleanly separated.</p><p>Let's start from the beginning.</p><h2>What is type-state?</h2><p>Type state is kind of a weird term. You don't really have <em>state</em> in your types, right? Like you don't have a counter that keeps incrementing.</p><figure class="kg-card kg-code-card"><pre><code class="language-ocaml">type counter = int
+let inc : counter -&gt; counter = fun x -&gt; x + 1
+
+let one: counter = inc 0
+let two: counter+1?  = inc one
+</code></pre><figcaption>Look ma! I made a counter that counts, but you can't see how much it counts on its type. But it's counting, I promise.</figcaption></figure><p>You can bend backwards to do this with some <em>type schemes</em> (which is a fancy word for &quot;how to do something&quot; or &quot;pattern&quot; that you may read around). Like a common way of doing this is to use type-params to keep adding types to something. For example:</p><pre><code class="language-ocaml">type zero = Zero
+type 'a inc = Inc of 'a
+
+(* our `inc` function wraps whatever in an `inc` type *)
+let inc : 'a -&gt; 'a inc = fun x -&gt; Inc x
+
+let one: zero inc = inc Zero
+let two: zero inc inc = inc one
+let three: zero inc inc inc = inc two</code></pre><p>And this sort of thing can be suuuper useful if you want to statically validate something. For example, let's say you have a <code>fuel</code> type and you can only accelerate your car if you have <em>enough fuel</em>. And <em>enough fuel</em> is defined as &quot;one unit of fuel&quot; or in types as &quot;it's wrapped by one <code>fuel</code> type&quot;.</p><p>So we can start with a car without fuel, fuel it up, and then consume the fuel. And in the next example, you can see how the type annotation matches the amount of fuel a car has.</p><figure class="kg-card kg-code-card"><pre><code class="language-ocaml">type no_fuel = No_fuel
+type 'a fuel = Fuel of 'a
+
+let fuel_up x = (Fuel (Fuel x))
+
+let run (Fuel x) = x
+
+let empty_car = No_fuel
+let full_car: no_fuel fuel fuel = empty_car |&gt; fuel_up
+let full_car: no_fuel fuel = run full_car
+let full_car: no_fuel = run full_car
+
+(* this last one will be a type error! *)
+let full_car: no_fuel = run full_car</code></pre><figcaption>In this example we statically track the amount of fuel by using a <code>fuel</code> type that is essentially a counter. When we <em>fuel up</em> the counter goes up, when we <em>run</em> the counter goes down. We can only call <em>run</em> on a car with at least one <code>fuel</code>.</figcaption></figure><p>This fails because the last <code>full_car</code> that worked becomes a car with no fuel, and our <code>run</code> function requires a car <em>with</em> fuel.</p><p>But alas, this can get super complex quickly because:</p><ol><li>We're not used to this kind of programming in the types (unlike in programming languages like Idris or TypeScript)</li><li>We don't have good debugging tools for this, and type errors can become strange quickly, especially if we use polymorphic variants, objects, or <a href="https://practicalocaml.com/a-quick-guide-to-gadts-and-why-you-aint-gonna-need-them/">GADTs</a>. </li></ol><p>Okay, there's more to say about type-state, but this should be enough for now: <strong>type-state lets you rule out certain behaviors by putting in your types, information about the values.</strong></p><p>Moving on...</p><h2>Type-state state machines</h2><p>Alright, we're ready to go. The pattern is pretty straightforward:</p><ol><li>We need a type for our <em>thing</em>, which will be a state machine</li><li>We need a type parameter for the <em>state</em> the machine is in</li><li>We want to save that state as a discrete value in our state machine type</li></ol><pre><code class="language-ocaml">type 'state fsm = { state: 'state }</code></pre><p>That's it. That's the pattern. If it seems small it's because it is, but there's so much power to this pattern! Let's explore with an example.</p><h3>Requesting Permissions as a Type-state machine</h3><p>We'll build a tiny Permissions module that will help us check if a User has the right permissions to access a resource. There will be 3 states: Requested, Granted, and Denied. We'll always start on Requested, and we can move to Granted or Denied. If we are on Granted we should have access to the resource itself, if we are on Denied we should have access to some reason for why we were denied access.</p><p>So we start with our basics, the pattern:</p><pre><code class="language-ocaml">type 'state t = { state: 'state };</code></pre><p>Next, we're going to add the 3 states as distinct types:</p><pre><code class="language-ocaml">type id
+type scope
+
+type 'resource granted = { resource : 'resource }
+type denied = { reason : string }
+type requested = { scopes : scope list }</code></pre><p>Let's put this together with our <code>t</code> type. We will need to introduce a new type parameter for the <code>'resource</code>, so we know what <code>'resource</code> to use when we create our <code>'resource granted</code> state. We'll also add some more metadata that is specific to a permission request but is shared across all states.</p><pre><code class="language-ocaml">type ('state, 'resource) t = {
+  (* the state of the permission request *)
+  state : 'state;
+  (* other shared metadata *)
+  resource_id : id;
+  user_id : id;
+}</code></pre><p>And now we can build our API on top of this, using our <code>'state</code> type to guide the user to the functions they can use:</p><ol><li>We want to create a new Permissions Request that will be &nbsp;<code>requested t</code> </li><li>We want to call a function that returns either a <code>granted t</code> with our resource, or a <code>denied t</code> with a reason</li><li>If we get a <code>denied t</code> we want to be able to see the reasons</li><li>If we get a <code>granted t</code> we want to be able to <em>use</em> the resource.</li></ol><p>Let's get to work!</p><p>We'll start with a constructor function, and a function to transition to our final states:</p><pre><code class="language-ocaml">let make ~resource_id ~user_id ~scopes =
+  { state = { scopes }; resource_id; user_id }
+
+let request_access t =
+  match run_request t with
+  | Ok resource -&gt; Ok { t with state = { resource } }
+  | Error reason -&gt; Error { t with state = { reason } }</code></pre><p>The basic machinery is done. We create, and the type system will infer correctly that the state should be <code>requested</code>, because a <code>requested</code> record includes <code>scopes</code>.</p><p>Then, our <code>request_access</code> function will return either <code>(granted, 'resource) t</code> or a <code>(denied, 'resource) t</code> wrapped in a result. We can use any variant for this, even a Future, but a result is already present so we'll go with that here.</p><p>Great, the next step is implementing our operations:</p><pre><code class="language-ocaml">(* extract the reason from a `denied` state *)
+let reason { state = { reason }; _ } = reason
+
+(* do something with the resource in a `granted` state *)
+let with_resource { state = {resource}; _ } fn = fn resource
+
+(* get the resource out of a granted permission *)
+let get { state = {resource}; _ } fn = resource</code></pre><p>And done. The types get inferred nicely here as well because the records are all disjoint. Now we can use our little state machine:</p><pre><code class="language-ocaml">(* this would be our resource *)
+module Album =  struct
+  type t = string
+  let print = print_string
+end
+
+let _ =
+  let req: (_, Album.t) Permission_request.t =
+    Permission_request.make
+      ~resource_id:&quot;spotify:album:5SYItU4P7NIwiI6Swug4GE&quot;
+      ~user_id:&quot;user:2pVEM9qgPvPeMslgnGDDOr&quot;
+      ~scopes:[ &quot;listen&quot;; &quot;star&quot;; &quot;playlist/add&quot;; &quot;share&quot; ]
+  in
+
+  match Permission_request.request_access req with
+  | Ok res -&gt;
+      Permission_request.with_resource res Album.print;
+      let album = Permission_request.get res in
+      Album.print album;
+  | Error res -&gt;
+      print_string (Permission_request.reason res)</code></pre><p>Pretty neat, right?</p><p>For completeness sake, here's our <code>Permission_request</code> module:</p><pre><code class="language-ocaml">module Permission_request = struct
+  type id = string
+  type scope = string
+  type 'resource granted = { resource : 'resource }
+  type denied = { reason : string }
+  type requested = { scopes : scope list }
+
+  type ('state, 'resource) t = {
+    (* the state of the permission request *)
+    state : 'state;
+    (* other shared metadata *)
+    resource_id : id;
+    user_id : id;
+  }
+
+  let make ~resource_id ~user_id ~scopes =
+    { state = { scopes }; resource_id; user_id }
+
+  (* TODO(@you): you can implement here the logic for checking
+   * if you actually have permissions for this request :)
+   *)
+  let run_request : (requested, 'resource') t -&gt; ('resource, string) result =
+   fun _t -&gt; Error &quot;unimplemented!&quot;
+
+  let request_access t =
+    match run_request t with
+    | Ok resource -&gt; Ok { t with state = { resource } }
+    | Error reason -&gt; Error { t with state = { reason } }
+
+  let reason { state = { reason }; _ } = reason
+  let with_resource { state = { resource }; _ } fn = fn resource
+  let get { state = {resource}; _ } = resource
+end</code></pre><h2>Conclusions: Type-state is Great</h2><p>Type-state is just another tool in your bat-belt to build great APIs with type-safety, and good ergonomics.</p><p>It has a cost, like everything else, but <strong>it enables you to grow your states</strong> and maintain them separately by using submodules, in a very natural way. After all they are different types!</p><p>I picked up this pattern from Rust libraries that model the state of sockets, database connections, and many other things, by doing exactly the same thing. Of course, having <code>traits</code> in Rust means we can extend the methods for a specific type-state, which means narrowing down even further the information you have to process when you go type <code>req.</code> and get the autocompletion list.</p><p>But at least in OCaml we can implement the pattern safely and maybe with time our autocompletion will get smarter and do the same filtering for us!</p><p>That's it. That's type-states for OCaml.</p><p>Have you implemented typed state machines in some other ways? Have anything to add or challenge? I'd love to hear it! <a href="https://twitter.com/leostera/status/1696470989582864872?ref=practicalocaml.com">Join the x.com thread</a>.</p><p>Happy Cameling! &#128043;</p>
diff --git a/data/planet/practicalocaml/unix-module-considered-harmful.md b/data/planet/practicalocaml/unix-module-considered-harmful.md
new file mode 100644
index 0000000000..c37d878004
--- /dev/null
+++ b/data/planet/practicalocaml/unix-module-considered-harmful.md
@@ -0,0 +1,26 @@
+---
+title: Unix Module Considered Harmful
+description: 'Recently I was working on a socket pool for a new scheduler for OCaml
+  5 (multicore baby!) and I ran into a strange issue.
+
+
+  This new socket pool works by spinning up a series of lightweight processes to accept
+  connections. Every one of those will wait for a client to'
+url: https://practicalocaml.com/unix-module-considered-harmful/
+date: 2023-11-29T07:19:43-00:00
+preview_image:
+featured:
+authors:
+- Practical OCaml
+source:
+---
+
+<p>Recently I was working on a socket pool for a <a href="https://github.com/leostera/riot?ref=practicalocaml.com">new scheduler</a> for OCaml 5 (multicore baby!) and I ran into a strange issue.</p><p>This new socket pool works by spinning up a series of lightweight processes to <em>accept connections</em>. Every one of those will wait for a client to connect, and create a new lightweight process to <em>handle a connection</em>. Eventually, the client will terminate the connection and the relevant processes are terminated.</p><p>All good so far.</p><p>All of this accepting and connecting is done via file descriptors (a <code>Unix.file_desc</code>). In some cases they correspond to <em>listening sockets</em>, when used to accept new connections, and when connected to a client they become <em>streaming sockets</em> (so a socket used to send/receive data). But really all you have is an integer that's behind the <code>Unix.file_desc</code> type: the Unix file descriptor.</p><p>Okay, so what went wrong?</p><p>In one of my load tests, I consistently could reproduce that <em>the entire application would just exit</em>. No error messages, no prints, no stack traces. It was running and then at some point, it just wasn't.</p><p>I can't emphasize enough how much I dug through the entire runtime, adding more logging, and more safety nets, just to see if I was doing something wrong. A good day of work was lost to this.</p><p>Then asking around, after exploring all the options I could think of, I asked on the #multicore channel of OCaml Labs, and I got an answer from Stephen Dolan.</p><p>Turns out that:</p><ul><li>if you have a streaming socket</li><li>and you <strong>write to it</strong></li><li>but the client <strong>has closed it</strong></li><li>your program will <strong>receive a Unix signal:</strong> <code>SIGPIPE</code> </li><li>which if you didn't know about, and didn't specifically set to ignore, will <strong>TERMINATE YOUR PROGRAM</strong>.</li></ul><p>&#129318;&zwj;&#9794;&#65039;</p><p>No return value, no exception, nothing of the sort. You have this entirely out-of-band input to your program that even in an impure functional language like OCaml feels like a sucker punch.</p><p>Why does this happen? Let's see.</p><h2>The Unix module</h2><p><a href="https://v2.ocaml.org/api/Unix.html?ref=practicalocaml.com" rel="noreferrer">The Unix module </a>is the default way to interact with your operating system in OCaml. You've probably used <code>Lwt_unix</code> before or the <code>-unix</code> flavor of your favorite lib if you aren't using promises yet. All of those rely on <code>Unix</code>.</p><p>But really, this module is just super low-level bindings to <em>syscalls</em>. </p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">&#128009;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Here be Dragons</strong></b>. We're digging deeper than usual here, so here's a little sword in case we find something dangerous: &#128481;&#65039;</div></div><p><em>Syscalls,</em> or &quot;system calls&quot;, are little bridges between the boundaries of <em>User space </em>and <em>Kernel space</em> in your operating system:</p><ul><li>Kernel space is where your operating system implements all sorts of things to make your computer run, like how to write to disks, or read from the network. If something is buggy here it will BRICK your computer.</li><li>User space is where you and I write our buggy software. Buggy is cool here. It <s>keeps us employed</s> won't brick the computer.</li></ul><p>And our Unix module is full of bindings to syscalls like <code>write(2)</code> that lets User space programs actually write files by asking Kernel space code to do the writing. Neat, right?</p><p>The fact that were are using these syscalls isn't obvious, but as you can see here in this <a href="https://github.com/ocaml/ocaml/blob/trunk/otherlibs/unix/unix_unix.ml?ref=practicalocaml.com#L264-L283">snippet</a> for <code>Unix.write</code> we are making an <em>external</em> call to a function called <code>caml_unix_write</code>:</p><pre><code class="language-ocaml">(* lowest-level binding, directly calling C code *)
+external unsafe_write : file_descr -&gt; bytes -&gt; int -&gt; int -&gt; int
+                      = &quot;caml_unix_write&quot;
+
+(* slightly-higher level binding, that checks the buffer offset is ok *)
+let write fd buf ofs len =
+  if ofs &lt; 0 || len &lt; 0 || ofs &gt; Bytes.length buf - len
+  then invalid_arg &quot;Unix.write&quot;
+  else unsafe_write fd buf ofs len</code></pre><p><code>caml_unix_write</code> will in turn call a C function called <code>write</code> which comes from <code>libc</code> on Unix-like operating systems that follow the POSIX standard, and will call <code>WriteFile</code> from the Windows APIs when compiling on Windows.</p><p><code>write</code> from libc, and <code>WriteFile</code> from the Windows APIS. Those are the syscalls.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">&#128584;</div><div class="kg-callout-text">If I've lost you already because you wanted to learn about OCaml and now we're talking about C, then you will understand why I'm frustrated about this whole thing.</div></div><p>The important thing to know is that when you are using this module, many of the functions you will call there are not OCaml code. They are C code, and they reach into the depths of your operating system to do dangerous, wonderful, weird things.</p><p>On Unix systems, one of those is<em> signals</em>.</p><h2>Unix Signals</h2><p>Unix has a way of <em>interrupting</em> a process with a mechanism called <em>signals</em>. A process in turn can tell Unix how it's going to react to those signals, by setting a <em>signal handler.</em></p><p>It's essentially a configurable, OS-triggered callback.</p><p>Some of these signals are very common. Like when you press <code>Ctr+C</code> to exit a long-running program, you're really sending a <code>SIGINT</code> signal, also known as an<em> interrupt signal.</em></p><p>You can of course override this, and you see many REPLs do it, so that if you accidentally press <code>Ctrl+C</code> you get a chance to confirm this and exit or return to the program.</p><p>Signals, however, are not a part of the Unix module. If we want to configure them (and their handlers) we need to use the <code>Sys</code> module.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#8265;&#65039;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">What is the Sys module?</strong></b> It's a bag of sort of random stuff, and a few things that probably deserve to be in a module called Sys like what OS you're on. If you ask me, I'd rather we didn't have a Sys/Unix module at all, and just had proper abstractions for File, Socket, OS, Env, Process, etc. But c'est la vie. In the meantime, we have </div></div><p>In particular, we need to use the <code>Sys.set_signal</code>. This function lets you set the behavior for a particular signal, which can be one of:</p><ul><li>Default &ndash; whatever POSIX decides is the default</li><li>Ignore &ndash; just do nothing with it</li><li>Set a custom handler &ndash; use this handler to do something that fits your program</li></ul><h2>Fixing The SIGPIPEs</h2><p>The error we had described before is fixed with a single line of OCaml at the top of our program:</p><figure class="kg-card kg-code-card"><pre><code class="language-ocaml">Sys.(set_signal sigpipe Signal_ignore);;</code></pre><figcaption><p dir="ltr"><span style="white-space: pre-wrap;">Mark the SIGPIPE signal to be ignored.</span></p></figcaption></figure><p>But the knowledge required to put that line there isn't trivial.</p><p>You need to know that the Unix module is just a wrapper around OS syscalls. And in here you'll want to know exactly which one, which may involve <a href="https://github.com/ocaml/ocaml/blob/20336d050ca9787c347f78608ed4decf3b5a21d9/otherlibs/unix/write_unix.c?ref=practicalocaml.com#L31-L56">digging</a> through some of the OCaml C libraries.</p><p>You need to know where to find the right doc for that syscall (<a href="https://man.openbsd.org/write.2?ref=practicalocaml.com">is it BSD</a> since macOS inherited a lot from it? That doesn't mention anything about SIGPIPEs, maybe <a href="https://www.man7.org/linux/man-pages/man2/write.2.html?ref=practicalocaml.com">the Linux syscall manual</a> is relevant here?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://practicalocaml.com/content/images/2023/11/Screenshot-2023-11-29-at-07.30.12.png" class="kg-image" alt="alt" loading="lazy" width="1378" height="256" srcset="https://practicalocaml.com/content/images/size/w600/2023/11/Screenshot-2023-11-29-at-07.30.12.png 600w, https://practicalocaml.com/content/images/size/w1000/2023/11/Screenshot-2023-11-29-at-07.30.12.png 1000w, https://practicalocaml.com/content/images/2023/11/Screenshot-2023-11-29-at-07.30.12.png 1378w" sizes="(min-width: 720px) 720px"/><figcaption><span style="white-space: pre-wrap;">Linux system manual saving the day</span></figcaption></figure><p>And then you have to learn about Signals, how to catch them, and how to use the <code>Sys</code> module to do that. Granted this last part is the easiest since it's more actionable, but that second step?! Not as easy a leap to make.</p><h2>Conclusion</h2><p>This is most definitely not the kind of surprise you want to find when writing a type-safe, high-level functional programming language like OCaml.</p><p>Hell, I think <em>Python does this better</em> by throwing an <code>IOError</code> instead. That would've saved me hours of self-doubt.</p><p><strong>If you really need this level of control, </strong>you may find it useful to mentally frame it as writing <em>garbage-collected C</em>, and behave accordingly. And please shield your users from all the gory details.</p><p>Otherwise<strong>,</strong> <strong>stay happy and away from the Unix module</strong> and look for alternatives. Use <a href="https://ocaml.org/p/bos/latest?ref=practicalocaml.com">Bos</a> for your OS interactions, stick to a higher-level library for sockets, and if it comes to it, isolate that part of your system.</p><p>I hope this gotcha won't get you the next time you're writing network code, and if you have any stories like this one, I'd be happy to share them on Practical OCaml too.</p>