While certainly many details on this site will be relevant to any variety of Apple 16, some may not, so it’s important to highlight the boat I’m building:
-
-
4 plank per side “Apple AHL” as it’s described in some of Tom’s pages, rather than the 5 plank “Swedish Apple” (or the gaff cutter – which doesn’t describe the hull, but rather what goes above!). The 5 plank Apple requires 6 sheets of ply for planks, rather than 4 sheets for the 4 plank Apple: given that the marine ply I’m using (Occume) is tropical hardwood, minimizing it is a no brainer (I’m sure I would use off-cuts in the 5 plank version). I was also just amazed by the way that Tom nested the planks into the sheets, and how that turned into such a beautiful 3D shape – often the planks would be less than an inch from each other in multiple places.
-
Single (large) rig, with a small mizzen. This is shown in the picture above. The plans account for a larger mizzen “light-air” rig, and corresponding multiple daggerboard positions – something I did not want to deal with!
-
Pivoting centerboard, rather than a daggerboard. The design in the plans calls for a long daggerboard case to accommodate the two rig options, so the centerboard case doesn’t actually take up more room in the boat. While it isn’t in the plan set (or at least, wasn’t when I bought them), when I asked Tom about a centerboard he sent me CAD drawings he had made based on old sketches (for two different options), so I didn’t actually have to do any designing: his design, conveniently, had the same slot size in the hull.
-
An enclosed rear tank seat (technically, two rear tanks, with an open channel for the mizzen step to drain), enclosed bow tank, open rear side seats and mid-ship thwart, but no seats forward of the middle of the boat. Related to this—
-
Floorboards! Sitting on the floor of the boat seems nice for children and dogs, both of which I have – especially forward of the middle thwart, the hull starts to get steep, so sitting without thwarts requires floorboards. Also, that way you don’t have to sit in bilge water!
-
The mast partner is a slight deviation: I’m using the mast gate used by Iain Oughtred, as it allows the mast to be stepped by first placing the butt is the step and then lifting up the mast (the back of the mast partner is open). As a result of this change, I did deviate a bit in the bow, as I made the mast partner be part of one continuous king-plank, rather than one of the plywood options that are in the plans.
-
For the tiller, rather than the curved one that goes around the mizzen mast, I’m going to put in a Norwegian-style push-pull tiller. This seems easier, and should allow more room for seating further back in the boat.
-
-
Materials
-
-
Plywood: 6 sheets 2500mm x 1225mm (8ft x 4ft “metric”) 6mm thick.
-
Solid wood: 16ft (should have 17ft!) for gunwales, in & out. 6 strips 20mmx20mm. Quarter knees, breast hook, and king plank mahogany. 25mm x 25mm douglas fir for centercase support, and lots of 20mm x 20mm stringers (I had a disasterous attempt to use way to brittle douglas fir for gunwales, so had plenty of this stuff), white oak for floors, ipe for floorboards.
-
Epoxy (to be updated as I go along): 6G (not all used yet)
+ While certainly many details on this site will be relevant to any variety of Apple 16, some may not, so it’s important to highlight the boat I’m building:
+
+
+
+ 4 plank per side “Apple AHL” as it’s described in some of Tom’s pages, rather than the 5 plank “Swedish Apple” (or the gaff cutter – which doesn’t describe the hull, but rather what goes above!). The 5 plank Apple requires 6 sheets of ply for planks, rather than 4 sheets for the 4 plank Apple: given that the marine ply I’m using (Occume) is tropical hardwood, minimizing it is a no brainer (I’m sure I would use off-cuts in the 5 plank version). I was also just amazed by the way that Tom nested the planks into the sheets, and how that turned into such a beautiful 3D shape – often the planks would be less than an inch from each other in multiple places.
+
+
+ Single (large) rig, with a small mizzen. This is shown in the picture above. The plans account for a larger mizzen “light-air” rig, and corresponding multiple daggerboard positions – something I did not want to deal with!
+
+
+ Pivoting centerboard, rather than a daggerboard. The design in the plans calls for a long daggerboard case to accommodate the two rig options, so the centerboard case doesn’t actually take up more room in the boat. While it isn’t in the plan set (or at least, wasn’t when I bought them), when I asked Tom about a centerboard he sent me CAD drawings he had made based on old sketches (for two different options), so I didn’t actually have to do any designing: his design, conveniently, had the same slot size in the hull.
+
+
+ An enclosed rear tank seat (technically, two rear tanks, with an open channel for the mizzen step to drain), enclosed bow tank, open rear side seats and mid-ship thwart, but no seats forward of the middle of the boat. Related to this—
+
+
+ Floorboards! Sitting on the floor of the boat seems nice for children and dogs, both of which I have – especially forward of the middle thwart, the hull starts to get steep, so sitting without thwarts requires floorboards. Also, that way you don’t have to sit in bilge water!
+
+
+ The mast partner is a slight deviation: I’m using the mast gate used by Iain Oughtred, as it allows the mast to be stepped by first placing the butt is the step and then lifting up the mast (the back of the mast partner is open). As a result of this change, I did deviate a bit in the bow, as I made the mast partner be part of one continuous king-plank, rather than one of the plywood options that are in the plans.
+
+
+ For the tiller, rather than the curved one that goes around the mizzen mast, I’m going to put in a Norwegian-style push-pull tiller. This seems easier, and should allow more room for seating further back in the boat.
+
+
+
+ Materials
+
+
+
+ Plywood: 6 sheets 2500mm x 1225mm (8ft x 4ft “metric”) 6mm thick.
+
+
+ Solid wood: 16ft (should have 17ft!) for gunwales, in & out. 6 strips 20mmx20mm. Quarter knees, breast hook, and king plank mahogany. 25mm x 25mm douglas fir for centercase support, and lots of 20mm x 20mm stringers (I had a disasterous attempt to use way to brittle douglas fir for gunwales, so had plenty of this stuff), white oak for floors, ipe for floorboards.
+
+
+ Epoxy (to be updated as I go along): 6G (not all used yet)
+
+
+ Fiberglass: XX 3” tape, XX 2” tape, XX 50” yards
+
+ While certainly many details on this site will be relevant to any variety of Apple 16, some may not, so it’s important to highlight the boat I’m building:
+
+
+
+ 4 plank per side “Apple AHL” as it’s described in some of Tom’s pages, rather than the 5 plank “Swedish Apple” (or the gaff cutter – which doesn’t describe the hull, but rather what goes above!). The 5 plank Apple requires 6 sheets of ply for planks, rather than 4 sheets for the 4 plank Apple: given that the marine ply I’m using (Occume) is tropical hardwood, minimizing it is a no brainer (I’m sure I would use off-cuts in the 5 plank version). I was also just amazed by the way that Tom nested the planks into the sheets, and how that turned into such a beautiful 3D shape – often the planks would be less than an inch from each other in multiple places.
+
+
+ Single (large) rig, with a small mizzen. This is shown in the picture above. The plans account for a larger mizzen “light-air” rig, and corresponding multiple daggerboard positions – something I did not want to deal with!
+
+
+ Pivoting centerboard, rather than a daggerboard. The design in the plans calls for a long daggerboard case to accommodate the two rig options, so the centerboard case doesn’t actually take up more room in the boat. While it isn’t in the plan set (or at least, wasn’t when I bought them), when I asked Tom about a centerboard he sent me CAD drawings he had made based on old sketches (for two different options), so I didn’t actually have to do any designing: his design, conveniently, had the same slot size in the hull.
+
+
+ An enclosed rear tank seat (technically, two rear tanks, with an open channel for the mizzen step to drain), enclosed bow tank, open rear side seats and mid-ship thwart, but no seats forward of the middle of the boat. Related to this—
+
+
+ Floorboards! Sitting on the floor of the boat seems nice for children and dogs, both of which I have – especially forward of the middle thwart, the hull starts to get steep, so sitting without thwarts requires floorboards. Also, that way you don’t have to sit in bilge water!
+
+
+ The mast partner is a slight deviation: I’m using the mast gate used by Iain Oughtred, as it allows the mast to be stepped by first placing the butt is the step and then lifting up the mast (the back of the mast partner is open). As a result of this change, I did deviate a bit in the bow, as I made the mast partner be part of one continuous king-plank, rather than one of the plywood options that are in the plans.
+
+
+ For the tiller, rather than the curved one that goes around the mizzen mast, I’m going to put in a Norwegian-style push-pull tiller. This seems easier, and should allow more room for seating further back in the boat.
+
+
+
+ Materials
+
+
+
+ Plywood: 6 sheets 2500mm x 1225mm (8ft x 4ft “metric”) 6mm thick.
+
+
+ Solid wood: 16ft (should have 17ft!) for gunwales, in & out. 6 strips 20mmx20mm. Quarter knees, breast hook, and king plank mahogany. 25mm x 25mm douglas fir for centercase support, and lots of 20mm x 20mm stringers (I had a disasterous attempt to use way to brittle douglas fir for gunwales, so had plenty of this stuff), white oak for floors, ipe for floorboards.
+
+
+ Epoxy (to be updated as I go along): 6G (not all used yet)
+
+
+ Fiberglass: XX 3” tape, XX 2” tape, XX 50” yards
+
Thus far, my build has taken the following approximate amounts of time for the various sections, described in the above sections. As I complete more parts of the boat, I’ll add more sections. The sections above took the following amounts of time.
-
-
-
-
Hull stitched and taped
-
62 hours
-
-
-
Gunwales & Quarter Knees
-
27.75 hours
-
-
-
Outer Stem & Front of Keel
-
12.75 hours
-
-
-
Bow Tank & Mast Partner
-
27 hours
-
-
-
Stern Tank & Mizzen Step/Partner
-
33 hours
-
-
-
-
Taking Care
-
-
Tools
-
There are countless tools that you might use in a project like this, but a few that I don’t think are avoidable (i.e., if you don’t have access to, you should borrow / buy):
-
-
-
-
-
-
-
-
Table Saw
-
For the gunwales and stringers, there really isn’t any other good option (cutting 20mm x 20mm in a 17ft piece is well beyond my ability with a circular saw). Perhaps a nice bandsaw could do it, but I’d be surprised if you had a band saw and not a table saw! Also, if you are going to do a birdsmouth hollow mast, while you can cut the birdsmouth with a router table (assuming you can get ahold of the strips without a table saw…), the table saw makes it really easy (assuming you are doing an 8 sided, 45 degree angle one).
-
-
-
Curved rasp
-
For cutting the rake into the steps and mizzen partner, if nothing else, I don’t think any other tool really can work. It’s also useful in trimming things like the breasthook, quarter knees, hatch holes, etc. In theory, sand paper and a dowel could substitute but only if you are very patient.
-
-
-
Jig Saw
-
I did not use this to cut out panels (I think a small circular saw worked better, cutting more smooth curves), but for cutting out the bulkhead profiles, cutting out hatch holes, and any number of other places where cutting curves was necessary.
-
-
-
Keyhole Saw
-
For when you are cutting something out with the Jig Saw, but you end up in a place where the body of the Jig Saw prevents it from cutting further. Duckworks sells a nice one. Obviously, the smaller your Jig Saw and the further in advance you plan (i.e., the less things you are cutting once they are epoxied onto the boat), the less you will need this, but I’d be surprised if it never comes up!
-
-
-
Carbide Scraper
-
If you can avoid the need for this, bravo; but for the rest of us, that miss epoxy drips, this, plus possibly a heat gun (not necessary but for the heavy duty drips, makes it easier), is really helpful.
-
-
-
Hand plane
-
A small block plane is all that’s needed, but it should be a good one. You might be able to get away without one (I built a previous boat using power sanders where planes were called for), but it’ll be a pain.
-
-
-
Random orbital sander
-
Not only is there so much sanding to do to clean up epoxy, but between coats of paint, etc.
-
-
-
Clamps. Lots!
-
You’ll need the most when doing the gunwales, where cheap spring clamps (the bigger 2" ones) will mostly work, though once you are doing the last layer, they won’t quite fit, so having at least 15 or more regular clamps will be critical. If you were starting from scratch, 6" or 8" F clamps would probably be the most useful for the build.
-
-
-
-
And then, the tools that certainly aren’t irreplaceable, but that I use all the time.
-
-
-
-
-
-
-
-
Shinto Rasp
-
Probably the most common thing I used this for was knocking off epoxy drips, by using it almost as a sander; at the right angle, it doesn’t scrape the wood at all. I also used the tip for cutting chamfers where the router couldn’t reach (holding the tip with one hand, the body with another, and running it along at the right angle). And then of course if you need to take off a lot of wood, it’s pretty effective, but can do some damage.
-
-
-
Bandsaw
-
Unsurprising, given the place this typically holds in boat building shops. It’s totally unnecessary, as I think I resawed exactly one piece of wood for the boat, and thus all of the cuts that I made could have been made with either the table saw, jig saw, or hand saw, but at the same time, I used it more than all the rest of those combined. Given that nothing in the boat is square, being able to cut at arbitrary angles, into corners, quickly, (relatively) safely, is incredibly useful. I would often free cut and either it was in places where it didn’t matter (the joint would be filleted, so small gaps would disappear), or I would cut outside the line and fine tune it with a plane anyway.
-
-
-
Oscillating multi-tool
-
I mostly use this as a small power sander that can get into places that a normal 5" random orbital can’t, but the surprising use was actually the flush cutting, which I’ve used exclusively to un-epoxy things that I accidentally glued together. Unlike using heat, this doesn’t harm any epoxy underneath (like when I didn’t anticipate epoxy running down the centerline and gluing a random panel that was laying on it, and I was able to cut it off without damaging the glass tape). A more careful craftsperson may never need this, but that I am not.
-
-
-
-
Build Thread
-
As I went along, I documented what I was doing and questions I had. It’s a lot less organized than this page, but in case you are curious: Build Thread with Photos
+ Thus far, my build has taken the following approximate amounts of time for the various sections, described in the above sections. As I complete more parts of the boat, I’ll add more sections. The sections above took the following amounts of time.
+
+
+
+
+
+ Hull stitched and taped
+
+
+ 62 hours
+
+
+
+
+ Gunwales & Quarter Knees
+
+
+ 27.75 hours
+
+
+
+
+ Outer Stem & Front of Keel
+
+
+ 12.75 hours
+
+
+
+
+ Bow Tank & Mast Partner
+
+
+ 27 hours
+
+
+
+
+ Stern Tank & Mizzen Step/Partner
+
+
+ 33 hours
+
+
+
+
+
+
+
+ Taking Care
+
+
+
+ Tools
+
+
+ There are countless tools that you might use in a project like this, but a few that I don’t think are avoidable (i.e., if you don’t have access to, you should borrow / buy):
+
+
+
+
+
+
+
+
+
+ Table Saw
+
+
+ For the gunwales and stringers, there really isn’t any other good option (cutting 20mm x 20mm in a 17ft piece is well beyond my ability with a circular saw). Perhaps a nice bandsaw could do it, but I’d be surprised if you had a band saw and not a table saw! Also, if you are going to do a birdsmouth hollow mast, while you can cut the birdsmouth with a router table (assuming you can get ahold of the strips without a table saw…), the table saw makes it really easy (assuming you are doing an 8 sided, 45 degree angle one).
+
+
+
+
+ Curved rasp
+
+
+ For cutting the rake into the steps and mizzen partner, if nothing else, I don’t think any other tool really can work. It’s also useful in trimming things like the breasthook, quarter knees, hatch holes, etc. In theory, sand paper and a dowel could substitute but only if you are very patient.
+
+
+
+
+ Jig Saw
+
+
+ I did not use this to cut out panels (I think a small circular saw worked better, cutting more smooth curves), but for cutting out the bulkhead profiles, cutting out hatch holes, and any number of other places where cutting curves was necessary.
+
+
+
+
+ Keyhole Saw
+
+
+ For when you are cutting something out with the Jig Saw, but you end up in a place where the body of the Jig Saw prevents it from cutting further. Duckworks sells a nice one. Obviously, the smaller your Jig Saw and the further in advance you plan (i.e., the less things you are cutting once they are epoxied onto the boat), the less you will need this, but I’d be surprised if it never comes up!
+
+
+
+
+ Carbide Scraper
+
+
+ If you can avoid the need for this, bravo; but for the rest of us, that miss epoxy drips, this, plus possibly a heat gun (not necessary but for the heavy duty drips, makes it easier), is really helpful.
+
+
+
+
+ Hand plane
+
+
+ A small block plane is all that’s needed, but it should be a good one. You might be able to get away without one (I built a previous boat using power sanders where planes were called for), but it’ll be a pain.
+
+
+
+
+ Random orbital sander
+
+
+ Not only is there so much sanding to do to clean up epoxy, but between coats of paint, etc.
+
+
+
+
+ Clamps. Lots!
+
+
+ You’ll need the most when doing the gunwales, where cheap spring clamps (the bigger 2” ones) will mostly work, though once you are doing the last layer, they won’t quite fit, so having at least 15 or more regular clamps will be critical. If you were starting from scratch, 6” or 8” F clamps would probably be the most useful for the build.
+
+
+
+
+
+
+
+ And then, the tools that certainly aren’t irreplaceable, but that I use all the time.
+
+
+
+
+
+
+
+
+
+ Shinto Rasp
+
+
+ Probably the most common thing I used this for was knocking off epoxy drips, by using it almost as a sander; at the right angle, it doesn’t scrape the wood at all. I also used the tip for cutting chamfers where the router couldn’t reach (holding the tip with one hand, the body with another, and running it along at the right angle). And then of course if you need to take off a lot of wood, it’s pretty effective, but can do some damage.
+
+
+
+
+ Bandsaw
+
+
+ Unsurprising, given the place this typically holds in boat building shops. It’s totally unnecessary, as I think I resawed exactly one piece of wood for the boat, and thus all of the cuts that I made could have been made with either the table saw, jig saw, or hand saw, but at the same time, I used it more than all the rest of those combined. Given that nothing in the boat is square, being able to cut at arbitrary angles, into corners, quickly, (relatively) safely, is incredibly useful. I would often free cut and either it was in places where it didn’t matter (the joint would be filleted, so small gaps would disappear), or I would cut outside the line and fine tune it with a plane anyway.
+
+
+
+
+ Oscillating multi-tool
+
+
+ I mostly use this as a small power sander that can get into places that a normal 5” random orbital can’t, but the surprising use was actually the flush cutting, which I’ve used exclusively to un-epoxy things that I accidentally glued together. Unlike using heat, this doesn’t harm any epoxy underneath (like when I didn’t anticipate epoxy running down the centerline and gluing a random panel that was laying on it, and I was able to cut it off without damaging the glass tape). A more careful craftsperson may never need this, but that I am not.
+
+
+
+
+
+
+
+ Build Thread
+
+
+ As I went along, I documented what I was doing and questions I had. It’s a lot less organized than this page, but in case you are curious: Build Thread with Photos
+
My version of the bow tank deviated pretty significantly from the plans, as I used a different style mast partner and as a result (and also, to save on plywood), used a lot more solid timber inside the tank. I have the tank top ending at the bulkhead, partly because the square of plywood that I had set aside for this based on suggestion from Tom (500mm x 1100mm) was (I think) for just up until the bulkhead. I’m not actually sure if this is what the short/long foredeck refers to, but given my mast gate is a single piece of hardwood that is supported at multiple places going all the way to the stem, I don’t think that losing the couple inches of ply will have any impact! Obviously in the case of the partner from the plans, it might be more important to have the ply support of the longer foredeck.
+ My version of the bow tank deviated pretty significantly from the plans, as I used a different style mast partner and as a result (and also, to save on plywood), used a lot more solid timber inside the tank. I have the tank top ending at the bulkhead, partly because the square of plywood that I had set aside for this based on suggestion from Tom (500mm x 1100mm) was (I think) for just up until the bulkhead. I’m not actually sure if this is what the short/long foredeck refers to, but given my mast gate is a single piece of hardwood that is supported at multiple places going all the way to the stem, I don’t think that losing the couple inches of ply will have any impact! Obviously in the case of the partner from the plans, it might be more important to have the ply support of the longer foredeck.
+
+ My version of the bow tank deviated pretty significantly from the plans, as I used a different style mast partner and as a result (and also, to save on plywood), used a lot more solid timber inside the tank. I have the tank top ending at the bulkhead, partly because the square of plywood that I had set aside for this based on suggestion from Tom (500mm x 1100mm) was (I think) for just up until the bulkhead. I’m not actually sure if this is what the short/long foredeck refers to, but given my mast gate is a single piece of hardwood that is supported at multiple places going all the way to the stem, I don’t think that losing the couple inches of ply will have any impact! Obviously in the case of the partner from the plans, it might be more important to have the ply support of the longer foredeck.
+
The gunwales are pretty easy – my time estimate should be high, as I had a bit of a disaster with the first wood I was trying to use kept breaking.
-
Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
-
-
Be sure the wood you have bends well, as there is a compound curve near bulkhead 10 that kept breaking the first wood I tried to use (or just steam it).
+ The gunwales are pretty easy – my time estimate should be high, as I had a bit of a disaster with the first wood I was trying to use kept breaking.
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ Be sure the wood you have bends well, as there is a compound curve near bulkhead 10 that kept breaking the first wood I tried to use (or just steam it).
+
+ The gunwales are pretty easy – my time estimate should be high, as I had a bit of a disaster with the first wood I was trying to use kept breaking.
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ Be sure the wood you have bends well, as there is a compound curve near bulkhead 10 that kept breaking the first wood I tried to use (or just steam it).
+
Building the hull is pretty straightforward – while it certainly takes some time, once you get the strakes cut out, it goes together very quickly.
-
Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
-
-
The offsets where you are marking the points for the strakes are the station marks, which will be used to key the bulkheads, etc. That’s why they are at sometimes odd not-quite even numbers.
-
It is mentioned in the plans, but it is super important (and I missed it!): mark vertical lines connecting each pair of station points.
-
If, like me, you marked on one set of scarfed sheets and then cut through both at once, this means you need to flip one set of strakes over (once you’ve marked the edge on the bottom strakes) and draw the lines on the opposite site as you were marking points on the top (otherwise you’ll have the inside on one side and the outside of the other). This will be critical when you are wiring in the bulkheads, but is also useful just in aligning the panels together.
-
If you don’t need to move the hull while you are doing the glass between the stitches, there is no need to tack with fast epoxy or hot glue. This advice is in the case that you are building in a single car garage (amazing that it can be done) where work can only be done on one side. Otherwise, if you can level and stabilize the boat once it is wired and then do all of the 50mm tape between the wires without disturbing the boat, you can skip that.
-
-
Photos of building sequence, with notes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
At this point, sequentially, you will do the Gunwales, before flipping the boat, but I organized the time for the Hull to include all of the glass taping, which happens over the course of two more flips.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Again, while working on the outer seams, it makes sense to do the Outer stem. Technically, it could probably be deferred to later, but this is the order Tom suggests, and I don’t see any reason not to. This is also when he suggested to install the centercase (cutting the slot from the top and bringing the case up from underneath), glassing the hull, and putting on the skeg, but I did defer all of those things.
-
-
-
-
-
-
-
-
-
-
-
-
-
At this point, the hull is quite strong – it’ll obviously get stronger from knees, thwarts, tanks, floors, etc, and I haven’t felt confident getting into it (but also have been able to reach from the side for everything I’ve needed). Also, as additional (unintentional) confirmation: before building support cradles onto my saw horses, I accidentally dropped the back of the boat off onto the concrete floor (so, about an 18" drop). There was no damage anywhere, which is good!
+ Building the hull is pretty straightforward – while it certainly takes some time, once you get the strakes cut out, it goes together very quickly.
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+
+ The offsets where you are marking the points for the strakes are the station marks, which will be used to key the bulkheads, etc. That’s why they are at sometimes odd not-quite even numbers.
+
+
+
+
+ It is mentioned in the plans, but it is super important (and I missed it!): mark vertical lines connecting each pair of station points.
+
+
+ If, like me, you marked on one set of scarfed sheets and then cut through both at once, this means you need to flip one set of strakes over (once you’ve marked the edge on the bottom strakes) and draw the lines on the opposite site as you were marking points on the top (otherwise you’ll have the inside on one side and the outside of the other). This will be critical when you are wiring in the bulkheads, but is also useful just in aligning the panels together.
+
+
+
+
+ If you don’t need to move the hull while you are doing the glass between the stitches, there is no need to tack with fast epoxy or hot glue. This advice is in the case that you are building in a single car garage (amazing that it can be done) where work can only be done on one side. Otherwise, if you can level and stabilize the boat once it is wired and then do all of the 50mm tape between the wires without disturbing the boat, you can skip that.
+
+
+
+
+ Photos of building sequence, with notes
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ At this point, sequentially, you will do the Gunwales, before flipping the boat, but I organized the time for the Hull to include all of the glass taping, which happens over the course of two more flips.
+
+
+
+
+
+
+
+
+
+ Again, while working on the outer seams, it makes sense to do the Outer stem. Technically, it could probably be deferred to later, but this is the order Tom suggests, and I don’t see any reason not to. This is also when he suggested to install the centercase (cutting the slot from the top and bringing the case up from underneath), glassing the hull, and putting on the skeg, but I did defer all of those things.
+
+
+
+
+
+
+
+
+ At this point, the hull is quite strong – it’ll obviously get stronger from knees, thwarts, tanks, floors, etc, and I haven’t felt confident getting into it (but also have been able to reach from the side for everything I’ve needed). Also, as additional (unintentional) confirmation: before building support cradles onto my saw horses, I accidentally dropped the back of the boat off onto the concrete floor (so, about an 18” drop). There was no damage anywhere, which is good!
+
+ Building the hull is pretty straightforward – while it certainly takes some time, once you get the strakes cut out, it goes together very quickly.
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+
+ The offsets where you are marking the points for the strakes are the station marks, which will be used to key the bulkheads, etc. That’s why they are at sometimes odd not-quite even numbers.
+
+
+
+
+ It is mentioned in the plans, but it is super important (and I missed it!): mark vertical lines connecting each pair of station points.
+
+
+ If, like me, you marked on one set of scarfed sheets and then cut through both at once, this means you need to flip one set of strakes over (once you’ve marked the edge on the bottom strakes) and draw the lines on the opposite site as you were marking points on the top (otherwise you’ll have the inside on one side and the outside of the other). This will be critical when you are wiring in the bulkheads, but is also useful just in aligning the panels together.
+
+
+
+
+ If you don’t need to move the hull while you are doing the glass between the stitches, there is no need to tack with fast epoxy or hot glue. This advice is in the case that you are building in a single car garage (amazing that it can be done) where work can only be done on one side. Otherwise, if you can level and stabilize the boat once it is wired and then do all of the 50mm tape between the wires without disturbing the boat, you can skip that.
+
+
+
+
+ Photos of building sequence, with notes
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ At this point, sequentially, you will do the Gunwales, before flipping the boat, but I organized the time for the Hull to include all of the glass taping, which happens over the course of two more flips.
+
+
+
+
+
+
+
+
+
+ Again, while working on the outer seams, it makes sense to do the Outer stem. Technically, it could probably be deferred to later, but this is the order Tom suggests, and I don’t see any reason not to. This is also when he suggested to install the centercase (cutting the slot from the top and bringing the case up from underneath), glassing the hull, and putting on the skeg, but I did defer all of those things.
+
+
+
+
+
+
+
+
+ At this point, the hull is quite strong – it’ll obviously get stronger from knees, thwarts, tanks, floors, etc, and I haven’t felt confident getting into it (but also have been able to reach from the side for everything I’ve needed). Also, as additional (unintentional) confirmation: before building support cradles onto my saw horses, I accidentally dropped the back of the boat off onto the concrete floor (so, about an 18” drop). There was no damage anywhere, which is good!
+
+ Thus far, my build has taken the following approximate amounts of time for the various sections, described in the above sections. As I complete more parts of the boat, I’ll add more sections. The sections above took the following amounts of time.
+
+
+
+
+
+ Hull stitched and taped
+
+
+ 62 hours
+
+
+
+
+ Gunwales & Quarter Knees
+
+
+ 27.75 hours
+
+
+
+
+ Outer Stem & Front of Keel
+
+
+ 12.75 hours
+
+
+
+
+ Bow Tank & Mast Partner
+
+
+ 27 hours
+
+
+
+
+ Stern Tank & Mizzen Step/Partner
+
+
+ 33 hours
+
+
+
+
+
+
+
+ Taking Care
+
+
+
+ Tools
+
+
+ There are countless tools that you might use in a project like this, but a few that I don’t think are avoidable (i.e., if you don’t have access to, you should borrow / buy):
+
+
+
+
+
+
+
+
+
+ Table Saw
+
+
+ For the gunwales and stringers, there really isn’t any other good option (cutting 20mm x 20mm in a 17ft piece is well beyond my ability with a circular saw). Perhaps a nice bandsaw could do it, but I’d be surprised if you had a band saw and not a table saw! Also, if you are going to do a birdsmouth hollow mast, while you can cut the birdsmouth with a router table (assuming you can get ahold of the strips without a table saw…), the table saw makes it really easy (assuming you are doing an 8 sided, 45 degree angle one).
+
+
+
+
+ Curved rasp
+
+
+ For cutting the rake into the steps and mizzen partner, if nothing else, I don’t think any other tool really can work. It’s also useful in trimming things like the breasthook, quarter knees, hatch holes, etc. In theory, sand paper and a dowel could substitute but only if you are very patient.
+
+
+
+
+ Jig Saw
+
+
+ I did not use this to cut out panels (I think a small circular saw worked better, cutting more smooth curves), but for cutting out the bulkhead profiles, cutting out hatch holes, and any number of other places where cutting curves was necessary.
+
+
+
+
+ Keyhole Saw
+
+
+ For when you are cutting something out with the Jig Saw, but you end up in a place where the body of the Jig Saw prevents it from cutting further. Duckworks sells a nice one. Obviously, the smaller your Jig Saw and the further in advance you plan (i.e., the less things you are cutting once they are epoxied onto the boat), the less you will need this, but I’d be surprised if it never comes up!
+
+
+
+
+ Carbide Scraper
+
+
+ If you can avoid the need for this, bravo; but for the rest of us, that miss epoxy drips, this, plus possibly a heat gun (not necessary but for the heavy duty drips, makes it easier), is really helpful.
+
+
+
+
+ Hand plane
+
+
+ A small block plane is all that’s needed, but it should be a good one. You might be able to get away without one (I built a previous boat using power sanders where planes were called for), but it’ll be a pain.
+
+
+
+
+ Random orbital sander
+
+
+ Not only is there so much sanding to do to clean up epoxy, but between coats of paint, etc.
+
+
+
+
+ Clamps. Lots!
+
+
+ You’ll need the most when doing the gunwales, where cheap spring clamps (the bigger 2” ones) will mostly work, though once you are doing the last layer, they won’t quite fit, so having at least 15 or more regular clamps will be critical. If you were starting from scratch, 6” or 8” F clamps would probably be the most useful for the build.
+
+
+
+
+
+
+
+ And then, the tools that certainly aren’t irreplaceable, but that I use all the time.
+
+
+
+
+
+
+
+
+
+ Shinto Rasp
+
+
+ Probably the most common thing I used this for was knocking off epoxy drips, by using it almost as a sander; at the right angle, it doesn’t scrape the wood at all. I also used the tip for cutting chamfers where the router couldn’t reach (holding the tip with one hand, the body with another, and running it along at the right angle). And then of course if you need to take off a lot of wood, it’s pretty effective, but can do some damage.
+
+
+
+
+ Bandsaw
+
+
+ Unsurprising, given the place this typically holds in boat building shops. It’s totally unnecessary, as I think I resawed exactly one piece of wood for the boat, and thus all of the cuts that I made could have been made with either the table saw, jig saw, or hand saw, but at the same time, I used it more than all the rest of those combined. Given that nothing in the boat is square, being able to cut at arbitrary angles, into corners, quickly, (relatively) safely, is incredibly useful. I would often free cut and either it was in places where it didn’t matter (the joint would be filleted, so small gaps would disappear), or I would cut outside the line and fine tune it with a plane anyway.
+
+
+
+
+ Oscillating multi-tool
+
+
+ I mostly use this as a small power sander that can get into places that a normal 5” random orbital can’t, but the surprising use was actually the flush cutting, which I’ve used exclusively to un-epoxy things that I accidentally glued together. Unlike using heat, this doesn’t harm any epoxy underneath (like when I didn’t anticipate epoxy running down the centerline and gluing a random panel that was laying on it, and I was able to cut it off without damaging the glass tape). A more careful craftsperson may never need this, but that I am not.
+
+
+
+
+
+
+
+ Build Thread
+
+
+ As I went along, I documented what I was doing and questions I had. It’s a lot less organized than this page, but in case you are curious: Build Thread with Photos
+
This is a pretty short part of the process, but the breasthook is one of the most visible parts of the boat!
-
Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
-
-
If you want to have a chamfered lower section of the stem (to cut through the water better), figure out that shape before you glue the wood on the boat: I tried a chamfer and it looked quite bad, so ended up laminating back on strips of khaya to build back to something big enough that I could trim to a square stem.
+ This is a pretty short part of the process, but the breasthook is one of the most visible parts of the boat!
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ If you want to have a chamfered lower section of the stem (to cut through the water better), figure out that shape before you glue the wood on the boat: I tried a chamfer and it looked quite bad, so ended up laminating back on strips of khaya to build back to something big enough that I could trim to a square stem.
+
+ This is a pretty short part of the process, but the breasthook is one of the most visible parts of the boat!
+
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ If you want to have a chamfered lower section of the stem (to cut through the water better), figure out that shape before you glue the wood on the boat: I tried a chamfer and it looked quite bad, so ended up laminating back on strips of khaya to build back to something big enough that I could trim to a square stem.
+
Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
-
-
You (probably) need to install the rudder post before gluing in the mizzen partner, assuming you want to screw in from the inside in addition to gluing on the outside (as is suggested by the designer). The design for the rear tank I used keeps this area accessible, in theory, but it is a lot harder to get to once the partner (and then tank top) is on.
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ You (probably) need to install the rudder post before gluing in the mizzen partner, assuming you want to screw in from the inside in addition to gluing on the outside (as is suggested by the designer). The design for the rear tank I used keeps this area accessible, in theory, but it is a lot harder to get to once the partner (and then tank top) is on.
+
+ Important Notes (!!! READ THESE EVEN IF YOU IGNORE THE REST !!!)
+
+
+
+ You (probably) need to install the rudder post before gluing in the mizzen partner, assuming you want to screw in from the inside in addition to gluing on the outside (as is suggested by the designer). The design for the rear tank I used keeps this area accessible, in theory, but it is a lot harder to get to once the partner (and then tank top) is on.
+
There are many plywood boats suitable for home builders in the ~15-16ft range, roughly 5ft beam, a couple hundred pounds, good performance as sailboats, decent as rowboats. It’s a particularly compelling size, as pretty much any plywood sailboat boat that can fit more than one person will be longer than 8 feet, and once you are scarfing a sheet of plywood (and have gone above what could go on the roof of a car or bed of a pickup), you might as well push to the edge of what two sheets of plywood gets you. The similar light weight of all these boats reflects their overall cost and complexity: a 200lb boat and a 500lb boat, even if they are the same length, are going to be very different in terms of difficulty of building and cost, as at this weight almost no boats have built in ballast and thus all of the weight are actual structural elements. That means if you want to compare similar boats, length and beam will give you an idea of carrying capacity (to first approximation), and weight will give you an idea of cost/complexity (to first approximation).
-
All of the boats on this page are in a roughly similar design space as the Apple 16: 15ft-16ft long, about 5ft wide (beam), and somewhere between 130lbs to 250lbs (actual weights will vary wildly, as choice of materials makes a huge difference; I found this out when I built a supposedly 65lb boat that weighs 120lbs by replacing the 4mm ply with 6mm and using 29lb per sheet lumberyard ply instead of 12lb 4mm occume and 18lb 6mm occume).
-
A few are particularly popular, and this page compares the Apple 16 with them, highlighting advantages, in particular, why I ended up building it!
-
Goat Island Skiffby Michael Storer
-
Storer’s most popular boat, the GIS is 15’6", 5’ beam, 125+lbs, with a 105ft lug sail, though Clint Chase has a lug-yawl option for it.
-
In a lot of ways, this boat seems incredibly similar, though it is certainly easier to build (the hull is a flat bottom panel, two sides, and a transom), and it’s very highly recommended. Also, Storer’s plans are generally very good, and the GIS is probably the best of them, given that the plans have had the most eyes on them! The downside is that it seems to be a boat designed for dinghy sailors, in that it is a bit unstable! The light weight, combined with the narrow front means that capsizing it is certainly a risk, and there doesn’t seem to be much to do about it aside from be an experienced sailor!
-
While obviously any light dinghy will capsize, since the Apple 16 doesn’t have a flat bottom, it can have ballast put in (water tanks, or more easily, heavy material like metal or sand), which will make it much more stable. This might make it less exciting to sail, but much better when out with family, or in heavier weather – and with removable ballast, it’s easy to switch. Ballasting isn’t really possible in a flat bottom boat like the GIS, because it would need to switch from side to side on each tack (obviously, the sailors serve as ballast, but that requires skill). Aside from the easier build and more straightforward plans (there is essentially one GIS – modifications do exist, but are pretty rare and minor), another advantage the GIS has is the flat transom makes mounting a motor more straightforward – no complicated mount needed to account for the 30 degree raked transom on the Apple 16.
-
Phoenix IIIby Ross Lillistone
-
A narrower boat at 4’9“, and a little shorter at 15’1.5”, and in general, it is quite a bit smaller, given the area ahead of the mast is inaccessible and the side decks, while making it more seaworthy, also cut into space. At least one sailor experienced with it (and indeed, who thinks it is great) said that it is really comfortably a two person boat. On the flip side, it is probably a much better row boat! It is reported to be quite stable, but is still quite light (spec’ed at 132lbs), and with 104sqft (for the sloop), it moves along!
-
Building wise, it is more complex, given the glued lapstrake, but that’s easily avoided by building the similarly designed stitch and glue First Mate.
-
In terms of comparison with the Apple, I think the main difference is that for a similar (though, 8.5" isn’t nothing!) length, and probably similar building process, the Apple 16 is a lot more boat: whether that is actually relevant is of course, a personal decision, but being able to take out three other people comfortably was important to me, and all the other constraints (total cost, space it would take up in garage, difficulty to transport it) seemed pretty similar to me. But, I say that as someone who considers rowing a necessary was to get around when there isn’t enough wind!
-
Argie 15by Dudley Dix
-
This is perhaps the closest boat to the Apple 16 – partly because the design brief was very similar. While with the Apple 16, Tom was trying to figure out the biggest boat that could be built in a single car garage, he ended up determining that it should come out of 6 sheets of plywood. With the Argie 15, Dudley wanted to build the biggest boat possible out of 6 sheets of plywood. The Argie 15 is a little shorter (15’5“) and a little wider (6’0”). The pluses are that it is designed as a 3-in-1, so fitting a motor is straightforward (indeed, it can be used solely as a motor boat), and it’s also designed to make it easy to sleep on the floor (if that is important to you). And it is clearly a well-tested design by a good designer. Finally, it can be gotten in kit form in many places (in the US, the kits are cut by CLC, though sold by Dudley). So there is a lot going for it! What tips me over to the Apple 16 is partly the rig: while people have put different rigs on the Argie 15, a stayed bermuda sloop is what it is intended to be, and I prefer unstayed lug rigs and like the idea (though have not sailed!) of lug yawls. The spars in the Apple 16 fit in the hull without needing to be in multiple pieces. Finally, there are pure aesthetics: the Argie 15 is certainly a pretty boat, but the plumb bow of the Apple 16 grabbed me.
-
Calendar Islands Yawlby Clint Chase
-
This is a much newer design that any of the other boats on this list (which are all, as far as I know, from the mid-early 90s or before). At 15’6" and 5’2" beam, 235lbs, it is very similar in specs to the Apple 16. Appearance wise, the lapstrake upper strakes certainly give a different look, and the fact that it can be built from a kit may be appealing to some (but, the cost was prohibitive to me; it would have increased the total cost of the build by at least a factor of two). The downside, of course, is that it is quite a new design, and while Clint is certainly a designer who has put a ton of time into this boat, it hasn’t been tested in the way that others on this list have.
-
Have another boat you have compared with? Share it with dbp@dbpmail.net
+ There are many plywood boats suitable for home builders in the ~15-16ft range, roughly 5ft beam, a couple hundred pounds, good performance as sailboats, decent as rowboats. It’s a particularly compelling size, as pretty much any plywood sailboat boat that can fit more than one person will be longer than 8 feet, and once you are scarfing a sheet of plywood (and have gone above what could go on the roof of a car or bed of a pickup), you might as well push to the edge of what two sheets of plywood gets you. The similar light weight of all these boats reflects their overall cost and complexity: a 200lb boat and a 500lb boat, even if they are the same length, are going to be very different in terms of difficulty of building and cost, as at this weight almost no boats have built in ballast and thus all of the weight are actual structural elements. That means if you want to compare similar boats, length and beam will give you an idea of carrying capacity (to first approximation), and weight will give you an idea of cost/complexity (to first approximation).
+
+
+ All of the boats on this page are in a roughly similar design space as the Apple 16: 15ft-16ft long, about 5ft wide (beam), and somewhere between 130lbs to 250lbs (actual weights will vary wildly, as choice of materials makes a huge difference; I found this out when I built a supposedly 65lb boat that weighs 120lbs by replacing the 4mm ply with 6mm and using 29lb per sheet lumberyard ply instead of 12lb 4mm occume and 18lb 6mm occume).
+
+
+ A few are particularly popular, and this page compares the Apple 16 with them, highlighting advantages, in particular, why I ended up building it!
+
+
+ Goat Island Skiffby Michael Storer
+
+
+ Storer’s most popular boat, the GIS is 15’6”, 5’ beam, 125+lbs, with a 105ft lug sail, though Clint Chase has a lug-yawl option for it.
+
+
+ In a lot of ways, this boat seems incredibly similar, though it is certainly easier to build (the hull is a flat bottom panel, two sides, and a transom), and it’s very highly recommended. Also, Storer’s plans are generally very good, and the GIS is probably the best of them, given that the plans have had the most eyes on them! The downside is that it seems to be a boat designed for dinghy sailors, in that it is a bit unstable! The light weight, combined with the narrow front means that capsizing it is certainly a risk, and there doesn’t seem to be much to do about it aside from be an experienced sailor!
+
+
+ While obviously any light dinghy will capsize, since the Apple 16 doesn’t have a flat bottom, it can have ballast put in (water tanks, or more easily, heavy material like metal or sand), which will make it much more stable. This might make it less exciting to sail, but much better when out with family, or in heavier weather – and with removable ballast, it’s easy to switch. Ballasting isn’t really possible in a flat bottom boat like the GIS, because it would need to switch from side to side on each tack (obviously, the sailors serve as ballast, but that requires skill). Aside from the easier build and more straightforward plans (there is essentially one GIS – modifications do exist, but are pretty rare and minor), another advantage the GIS has is the flat transom makes mounting a motor more straightforward – no complicated mount needed to account for the 30 degree raked transom on the Apple 16.
+
+
+ Phoenix IIIby Ross Lillistone
+
+
+ A narrower boat at 4’9”, and a little shorter at 15’1.5”, and in general, it is quite a bit smaller, given the area ahead of the mast is inaccessible and the side decks, while making it more seaworthy, also cut into space. At least one sailor experienced with it (and indeed, who thinks it is great) said that it is really comfortably a two person boat. On the flip side, it is probably a much better row boat! It is reported to be quite stable, but is still quite light (spec’ed at 132lbs), and with 104sqft (for the sloop), it moves along!
+
+
+ Building wise, it is more complex, given the glued lapstrake, but that’s easily avoided by building the similarly designed stitch and glue First Mate.
+
+
+ In terms of comparison with the Apple, I think the main difference is that for a similar (though, 8.5” isn’t nothing!) length, and probably similar building process, the Apple 16 is a lot more boat: whether that is actually relevant is of course, a personal decision, but being able to take out three other people comfortably was important to me, and all the other constraints (total cost, space it would take up in garage, difficulty to transport it) seemed pretty similar to me. But, I say that as someone who considers rowing a necessary was to get around when there isn’t enough wind!
+
+
+ Argie 15by Dudley Dix
+
+
+ This is perhaps the closest boat to the Apple 16 – partly because the design brief was very similar. While with the Apple 16, Tom was trying to figure out the biggest boat that could be built in a single car garage, he ended up determining that it should come out of 6 sheets of plywood. With the Argie 15, Dudley wanted to build the biggest boat possible out of 6 sheets of plywood. The Argie 15 is a little shorter (15’5”) and a little wider (6’0”). The pluses are that it is designed as a 3-in-1, so fitting a motor is straightforward (indeed, it can be used solely as a motor boat), and it’s also designed to make it easy to sleep on the floor (if that is important to you). And it is clearly a well-tested design by a good designer. Finally, it can be gotten in kit form in many places (in the US, the kits are cut by CLC, though sold by Dudley). So there is a lot going for it! What tips me over to the Apple 16 is partly the rig: while people have put different rigs on the Argie 15, a stayed bermuda sloop is what it is intended to be, and I prefer unstayed lug rigs and like the idea (though have not sailed!) of lug yawls. The spars in the Apple 16 fit in the hull without needing to be in multiple pieces. Finally, there are pure aesthetics: the Argie 15 is certainly a pretty boat, but the plumb bow of the Apple 16 grabbed me.
+
+
+ Calendar Islands Yawlby Clint Chase
+
+
+ This is a much newer design that any of the other boats on this list (which are all, as far as I know, from the mid-early 90s or before). At 15’6” and 5’2” beam, 235lbs, it is very similar in specs to the Apple 16. Appearance wise, the lapstrake upper strakes certainly give a different look, and the fact that it can be built from a kit may be appealing to some (but, the cost was prohibitive to me; it would have increased the total cost of the build by at least a factor of two). The downside, of course, is that it is quite a new design, and while Clint is certainly a designer who has put a ton of time into this boat, it hasn’t been tested in the way that others on this list have.
+
+
+ Have another boat you have compared with? Share it with dbp@dbpmail.net
+
+ There are many plywood boats suitable for home builders in the ~15-16ft range, roughly 5ft beam, a couple hundred pounds, good performance as sailboats, decent as rowboats. It’s a particularly compelling size, as pretty much any plywood sailboat boat that can fit more than one person will be longer than 8 feet, and once you are scarfing a sheet of plywood (and have gone above what could go on the roof of a car or bed of a pickup), you might as well push to the edge of what two sheets of plywood gets you. The similar light weight of all these boats reflects their overall cost and complexity: a 200lb boat and a 500lb boat, even if they are the same length, are going to be very different in terms of difficulty of building and cost, as at this weight almost no boats have built in ballast and thus all of the weight are actual structural elements. That means if you want to compare similar boats, length and beam will give you an idea of carrying capacity (to first approximation), and weight will give you an idea of cost/complexity (to first approximation).
+
+
+ All of the boats on this page are in a roughly similar design space as the Apple 16: 15ft-16ft long, about 5ft wide (beam), and somewhere between 130lbs to 250lbs (actual weights will vary wildly, as choice of materials makes a huge difference; I found this out when I built a supposedly 65lb boat that weighs 120lbs by replacing the 4mm ply with 6mm and using 29lb per sheet lumberyard ply instead of 12lb 4mm occume and 18lb 6mm occume).
+
+
+ A few are particularly popular, and this page compares the Apple 16 with them, highlighting advantages, in particular, why I ended up building it!
+
+
+ Goat Island Skiffby Michael Storer
+
+
+ Storer’s most popular boat, the GIS is 15’6”, 5’ beam, 125+lbs, with a 105ft lug sail, though Clint Chase has a lug-yawl option for it.
+
+
+ In a lot of ways, this boat seems incredibly similar, though it is certainly easier to build (the hull is a flat bottom panel, two sides, and a transom), and it’s very highly recommended. Also, Storer’s plans are generally very good, and the GIS is probably the best of them, given that the plans have had the most eyes on them! The downside is that it seems to be a boat designed for dinghy sailors, in that it is a bit unstable! The light weight, combined with the narrow front means that capsizing it is certainly a risk, and there doesn’t seem to be much to do about it aside from be an experienced sailor!
+
+
+ While obviously any light dinghy will capsize, since the Apple 16 doesn’t have a flat bottom, it can have ballast put in (water tanks, or more easily, heavy material like metal or sand), which will make it much more stable. This might make it less exciting to sail, but much better when out with family, or in heavier weather – and with removable ballast, it’s easy to switch. Ballasting isn’t really possible in a flat bottom boat like the GIS, because it would need to switch from side to side on each tack (obviously, the sailors serve as ballast, but that requires skill). Aside from the easier build and more straightforward plans (there is essentially one GIS – modifications do exist, but are pretty rare and minor), another advantage the GIS has is the flat transom makes mounting a motor more straightforward – no complicated mount needed to account for the 30 degree raked transom on the Apple 16.
+
+
+ Phoenix IIIby Ross Lillistone
+
+
+ A narrower boat at 4’9”, and a little shorter at 15’1.5”, and in general, it is quite a bit smaller, given the area ahead of the mast is inaccessible and the side decks, while making it more seaworthy, also cut into space. At least one sailor experienced with it (and indeed, who thinks it is great) said that it is really comfortably a two person boat. On the flip side, it is probably a much better row boat! It is reported to be quite stable, but is still quite light (spec’ed at 132lbs), and with 104sqft (for the sloop), it moves along!
+
+
+ Building wise, it is more complex, given the glued lapstrake, but that’s easily avoided by building the similarly designed stitch and glue First Mate.
+
+
+ In terms of comparison with the Apple, I think the main difference is that for a similar (though, 8.5” isn’t nothing!) length, and probably similar building process, the Apple 16 is a lot more boat: whether that is actually relevant is of course, a personal decision, but being able to take out three other people comfortably was important to me, and all the other constraints (total cost, space it would take up in garage, difficulty to transport it) seemed pretty similar to me. But, I say that as someone who considers rowing a necessary was to get around when there isn’t enough wind!
+
+
+ Argie 15by Dudley Dix
+
+
+ This is perhaps the closest boat to the Apple 16 – partly because the design brief was very similar. While with the Apple 16, Tom was trying to figure out the biggest boat that could be built in a single car garage, he ended up determining that it should come out of 6 sheets of plywood. With the Argie 15, Dudley wanted to build the biggest boat possible out of 6 sheets of plywood. The Argie 15 is a little shorter (15’5”) and a little wider (6’0”). The pluses are that it is designed as a 3-in-1, so fitting a motor is straightforward (indeed, it can be used solely as a motor boat), and it’s also designed to make it easy to sleep on the floor (if that is important to you). And it is clearly a well-tested design by a good designer. Finally, it can be gotten in kit form in many places (in the US, the kits are cut by CLC, though sold by Dudley). So there is a lot going for it! What tips me over to the Apple 16 is partly the rig: while people have put different rigs on the Argie 15, a stayed bermuda sloop is what it is intended to be, and I prefer unstayed lug rigs and like the idea (though have not sailed!) of lug yawls. The spars in the Apple 16 fit in the hull without needing to be in multiple pieces. Finally, there are pure aesthetics: the Argie 15 is certainly a pretty boat, but the plumb bow of the Apple 16 grabbed me.
+
+
+ Calendar Islands Yawlby Clint Chase
+
+
+ This is a much newer design that any of the other boats on this list (which are all, as far as I know, from the mid-early 90s or before). At 15’6” and 5’2” beam, 235lbs, it is very similar in specs to the Apple 16. Appearance wise, the lapstrake upper strakes certainly give a different look, and the fact that it can be built from a kit may be appealing to some (but, the cost was prohibitive to me; it would have increased the total cost of the build by at least a factor of two). The downside, of course, is that it is quite a new design, and while Clint is certainly a designer who has put a ton of time into this boat, it hasn’t been tested in the way that others on this list have.
+
+
+ Have another boat you have compared with? Share it with dbp@dbpmail.net
+
The Apple 16 is a 15ft 10in lug-yawl designed by Tom Dunderdale of Campion Boats. The official webpage has quite a bit of information about the Apple 16, but it’s sometimes hard to follow (particularly, there are many links that are not styled as links! And other links that are only images…).
-
To buy plans, email Tom: td@campionboats.co.uk, you can find the current plan prices here: http://www.campionboats.co.uk/prices.html. Tom is very responsive over email, so getting the plans is very easy (note the three options: the plain Apple 16, as described on this website, is the Apple AHL there; the Apple 16 plan set is the regular Apple includes plans for the gaff cutter, and the Swedish Apple has more planks).
-
Why This Site
-
The Apple 16 is an amazing design, but for various reasons it seems to be less popular than other boats that fit into the same category (general length, weight, etc), even while it seems to have many advantages (described in Comparison section). But, one problem with building a less popular boat is that there are less community resources, and it’s harder to understand, before building, what is going to be involved – hopefully this site, including a detailed log of my own process, which should give at least rough time estimates (everyone works at different paces, has different standards, tools, etc), should help!
-
Also, while Tom is incredibly responsive, the plans themselves can be somewhat overwhelming once you get past the hull (where the step-by-step instructions stop) – the issue is that the boat has had tons of different options added over time (water ballast, 3/4 decking, floorboards, side tanks, etc, etc), and these all overlap in various ways in the plan sheets. This does mean, if you want to customize thing, you can often find guidance in the plans (and if not, Tom is incredibly helpful), but it can be intimidating. So another goal of this site is to show how to interpret the plans into a single boat: the one that I’m building, so that if you want to build a similar one, hopefully you have to do less pondering than I did. I will also include all of the communication that I have had with Tom (clarifying details, etc), but of course the plans themselves you will get from him.
+ The Apple 16 is a 15ft 10in lug-yawl designed by Tom Dunderdale of Campion Boats. The official webpage has quite a bit of information about the Apple 16, but it’s sometimes hard to follow (particularly, there are many links that are not styled as links! And other links that are only images…).
+
+
+ To buy plans, email Tom: td@campionboats.co.uk, you can find the current plan prices here: http://www.campionboats.co.uk/prices.html. Tom is very responsive over email, so getting the plans is very easy (note the three options: the plain Apple 16, as described on this website, is the Apple AHL there; the Apple 16 plan set is the regular Apple includes plans for the gaff cutter, and the Swedish Apple has more planks).
+
+
+ Why This Site
+
+
+ The Apple 16 is an amazing design, but for various reasons it seems to be less popular than other boats that fit into the same category (general length, weight, etc), even while it seems to have many advantages (described in Comparison section). But, one problem with building a less popular boat is that there are less community resources, and it’s harder to understand, before building, what is going to be involved – hopefully this site, including a detailed log of my own process, which should give at least rough time estimates (everyone works at different paces, has different standards, tools, etc), should help!
+
+
+ Also, while Tom is incredibly responsive, the plans themselves can be somewhat overwhelming once you get past the hull (where the step-by-step instructions stop) – the issue is that the boat has had tons of different options added over time (water ballast, 3/4 decking, floorboards, side tanks, etc, etc), and these all overlap in various ways in the plan sheets. This does mean, if you want to customize thing, you can often find guidance in the plans (and if not, Tom is incredibly helpful), but it can be intimidating. So another goal of this site is to show how to interpret the plans into a single boat: the one that I’m building, so that if you want to build a similar one, hopefully you have to do less pondering than I did. I will also include all of the communication that I have had with Tom (clarifying details, etc), but of course the plans themselves you will get from him.
+
+ We present FunTAL, the first multi-language system to
formalize safe interoperability between a high-level
functional language and low-level assembly code while
supporting compositional reasoning about the mix. A central
@@ -30,16 +37,20 @@
+ We present a type checker and stepper for the FunTAL machine
semantics. We include well-typed, runnable examples
from
the paper,
@@ -48,203 +59,293 @@
+ Note: there are some syntactic differences
from the presentation in
the paper,
which nonetheless we expect will be the primary reference for
the language. These changes were made to eliminate the
necessity of unicode, reduce ambiguity in the grammar, and
make the type checking algorithm syntax-directed. We summarize
- these changes at the bottom of this page.
+ TAL components use brackets around instructions and the heap fragment.
+
+
+
+
+ τFT e
+
+
+ FT[t,s] e
+
+
+ FT (Fun outside, TAL inside) boundary
+ specifies the type s that the stack has after running e.
+
+
+
+
+ τFT e
+
+
+ FT[t,?] e
+
+
+ FT boundaries can use ? to indicate
that running e will not modify the type of
the stack (though values may be modified), allowing s
- to be inferred.
-
-
-
-
import r,σTFτ e
-
import r1, s as z, t TF{e};
-
import binds the stack s on
+ to be inferred.
+
+
+
+
+ import r,σTFτ e
+
+
+ import r1, s as z, t TF{e};
+
+
+ import binds the stack s on
return as z with Fun expression e of
- type t.
-
-
-
-
α, ζ, ε
-
a1, z21, e5
-
TAL type variables must begin with a, stack
+ type t.
+
+
+
+
+ α, ζ, ε
+
+
+ a1, z21, e5
+
+
+ TAL type variables must begin with a, stack
variables with z, and return marker variables
- with e.
-
-
-
-
· | τ :: ...
-
:: | t :: ... ::
-
Empty stack prefixes (in protect, stack modifying
+ with e.
+
+
+
+
+ · | τ :: ...
+
+
+ :: | t :: ... ::
+
+
+ Empty stack prefixes (in protect, stack modifying
lambdas) are written as ::, and stack prefixes
- end with ::.
-
-
-
-
∀, μ, λ, ∃, •
-
forall, mu, lam, exists, *
-
Greek letters and quantifiers are replaced by english keywords.
-
-
-
-
•
-
*
-
The concrete stack symbol • is written *.
-
-
-
-
u[ω]
-
u[ω, ω...]
-
TAL instantiation is n-ary. (This was mentioned as syntactic sugar.)
-
-
-
-
{χ; σ}q
-
{χ; σ} q
-
The return marker superscript is just written in line.
-
-
-
-
λφφ(x:τ...).t
-
lam[φ][φ](x:τ...).e,
- (τ...) [φ] -> [φ] τ
-
The stack prefixes of stack-modifying functions are bracketed, in line.
-
-
-
-
unpack <α, rd> u
-
unpack <α, rd>, u
-
The TAL instruction unpack has comma-separated argument,
- for consistency with other instructions.
-
-
-
-
l -> <1, 2>, l' -> (code[δ]...)
-
[l -> ref <1, 2>, l' -> box (code[δ]...)]
-
Heap values are preceded by an explicit mutability marker box or ref.
Four and a half years ago I wrote a very popular guide titled A Hacker’s Replacement for GMail about my system for email based on notmuch, emacs, my own mail server, etc (it’s still the only thing I’ve written that’s gotten any amount of traffic). I ran that system for several years, but eventually one thing killed it: spam. Perhaps I never got the various services set up (not just learning from prior messages, but talking to services that told me about IP addresses that were deemed to be spammers, etc). I still would get at least several spam messages per day. I even tried using paid anti-spam services (so all mail filters through them, they forward on to your mail server, and your mail server only accepts messages from their servers). Unlike what I was led to believe, I never seemed to have trouble with deliverability (maybe I got lucky and the IP address my server was assigned had never spammed, but with DKIM, SPF, etc, everything worked!).
-
I went back to GMail for a little while, still behind my own domain (note to the reader: if you take nothing else from this, seriously consider registering a domain and paying Google to host your email. The domain is ~$10-15/year, the email is ~$4/month, and the transition to switch addresses is certainly painful, but once you’ve done it Google doesn’t own your identity anymore. You can, in the future, without anyone noticing, switch to another company, or self-host. It’s worth it!), but I missed writing email in emacs.
-
So, I started trying to figure out a better system. As usual, I started with requirements:
-
-
Be able to read, write, search email in Emacs.
-
Push notifications on the computer.
-
Be able to read, write, search email on iPhone (push notifications too).
-
Have a single repository where email is stored, to make backup simpler.
-
Keep mail organized into an Inbox and an everything else (Archive).
-
-
Points 4&5 led me to decide that, for my purposes, Maildir/IMAP is a perfectly fine authoritative source for my email. In my previous system, I was always a little worried about having to rely on the notmuch database, as it was an undocumented format (that changed with new versions) with a single client program. Realistically, aside from tags that can be automatically applied (sent, unread, mailing list ones), the only tag that I care about in inbox, and it seemed like I should be able to synchronize that to match where messages are in the Maildir folders.
-
https://www.fastmail.com/?STKI=17129600
-
-
-
diff --git a/_site/drafts/haskell-module-names.html b/_site/drafts/haskell-module-names.html
deleted file mode 100644
index c6a58bd..0000000
--- a/_site/drafts/haskell-module-names.html
+++ /dev/null
@@ -1,56 +0,0 @@
-
-
-
-
-
- dbp.io :: How to organize modules in a Haskell Web App
-
-
-
-
-
A note: I don’t write single-page apps. Perhaps some of this translates to people who do, but I don’t know. When I say “web app”, I mean server-rendered html pages that have forms and buttons and store their state on the server.
-
-
Different people have different preferences for how to organize code in their applications. One of the really cool things about most Haskell web frameworks is they let you organize your code however you want.
-
This upside is that a tiny project can be a single file and that understanding projects comes just from understanding the language, not framework-specific magic that makes particular paths special.
-
The downside, of course, is that people are on their own to figure out best practices. I’ve tried a lot of different things (over the past ~11 years building web stuff in Haskell), and this system is the result of that experience, primarily using the Snap web framework and then more recently the Fn framework that I co-wrote (I’ve also used Scotty and Servant and I think the advice would work equally well for them).
-
1. Pure type modules
-
For each record, which in a database backed application, will usually correspond to a table, define a separate module. I would use Types.Person if Person were the name of the type. This should contain the record, which, contrary to many examples, should not have prefixed field names (the prefix, where necessary, is already present in the module name!): just name the fields the most natural names, e.g.:
-
data Person = Person { id :: Int, firstName :: Text } deriving (Eq, Show)
-
This module should also include any type class instances for Person (e.g., serialization), and related types. For example, if there is a data type that is a field within the record (e.g., you might have a role field that has a fixed number of options; in the database it is represented textually, but it shouldn’t be in your application), define it within the Types.Person module rather than giving it it’s own module, unless it’s useful to other modules.
-
-
A note about casing: I don’t think this is controversial, but match what is most natural in whatever domain the name appears in. So field names in Haskell should be camelCase, in the database should be snake_case, and in frontend templates I hyphenate-them. Transforming between these can be automated.
-
-
Having pure modules to define types is really helpful to avoid module circularity; most of the time the issue is that you’ll end up needing to allow more core application types refer to specific data for the application (e.g., in Fn web handlers pass around a “context” that contains database connections, request information, etc. It necessarily is used many places, but you may also want it to be able to contain information about a logged in user. By having the types on their own, it’s much easier to pull those types into the definition of core data types like the “context”).
-
2. State modules for manipulating state with consistent names
-
There are tons of different libraries for dealing with databases, but from the perspective of module organization, each module Types.Person should be matched with State.Person, and just like the field names in the Person record shouldn’t have any prefix or suffix, neither should functions in the State.Person module. So, for example, I’ll usually have get, create, and delete as functions, and perhaps getByFoo or deleteByBar. The reason for this is the State.Person module is expected to always be imported qualified (it ends up looking more uniform anyway).
-
3. Qualify modules that are for a different part of the application
-
In general, organizing the application around the records (i.e., database tables) works pretty well. It won’t be 100% (and it doesn’t matter, because Haskell doesn’t care), but usually I’ll have a Handler.Person module to go along with the Types.Person, which would contain web code to handle routing, form parsing and various high level glue, and State.Person which has state manipulation (database queries, business logic, etc).
-
Within State.Person, import Types.Person unqualified. There should be no conflicts. From Handler.Person, Types.Person should be imported unqualified as well. That way you can use the type Person unqualified. State.Person should be imported qualified as State. Thus to look up a person by id we might invoke State.get.
-
If we needed to access a Document record, we import Types.Document qualified as Document and State.Document fully qualified. There is a little redundancy in the type/constructor name (if you have to write Document.Document and it bothers you, you can important Document(Document) separately), but the former means you can have Document.createdAt as the record field for when the document was created and createdAt for when a Person was created. Similarly, State.Document.get would look up a Document by id. This of course is done symmetrically when you are working within Handler.Document (assuming it existed).
-
Other Handler modules should also be imported fully qualified. It’s less common to need this, but it comes up, and the clarity of the full qualification is great. If you end up splitting modules in more fine grained ways than these three (and sometimes I have, e.g., splitting out form validation, or the code that is used in templates), the same general principles apply: within the Category1.X module, any Category2.X is imported qualified as Category2 (unless Category2 is Types in which case it’s imported unqualified), and Category3.Y is imported fully qualified.
-
Summary
-
Although it wasn’t the original intention (just making code more understandable was), I’ve realized this naming scheme really matches the mantra that name length should match name locality (i.e., the further from definition, the longer the name should be), writ large. Functions that are highly relevant to a particular module have short names (since they are unqualified or minimally qualified), whereas ones from very different parts of the application have longer names that tell more about what they are for. It also helps to serve as a reminder when things start to get tangled, as you end up using more fully qualified functions (and that’s a sign that maybe some refactoring is needed).
Remembering when to do things is, for me, a big strain on my short-term attention / memory, and it’s particularly stressful to wonder if I’ve already forgotten something. I have no idea how anything inside my brain actually works, but my mental model is that I have a limited amount of short-term memory where these deadlines are stored. In order to avoid getting shifted into long-term memory (and thus get forgotten until something triggers them), I have to periodically scan through this memory.
-
This seems like exactly the type of thing that should be solvable, or at least improvable, by modern technologies: in particular, smart phones, which are perfectly capable of capturing things at any time, notifying at precise times (& locations, to a point), and filtering/sorting in sophisticated ways.
-
I want to argue that we are maybe 75% of the way to a system that is complete enough to significantly reduce this mental strain, and that there is no fundamental limitation to getting the rest of the way there (just a matter of incremental improvements). Note that, like many things, the difference between almost there and all the way there is massive: once it is perfect, you no longer have to think about it at all, whereas even if it is 90% perfect, you still have to think about it frequently, as that 10% still matters. The system that I’m using (which I’ll talk about in more detail) is the app GoodTask on iOS which relies on the built-in Calendar and Reminders (note: the app is not free – though it has a 2 week trial). There may be better tools, but either they require hardware I don’t have or I haven’t found them yet (not for lack of trying)…
-
Calendar events vs Tasks
-
First, I want to talk about “calendar” events and “tasks”, both to unify them and draw a distinction between them. Unify them because whatever notification system, display, etc, needs to show them together. Fundamentally, the display should answer the question “what do I need to do now” (or, tomorrow, next week, etc), and any tool that doesn’t put these things together is broken (which is nearly all of them).
-
But, there is still a critical distinction between these: calendar events will be scheduled, but tasks in general will not be (as a side note, the applications that insist that every task be scheduled are absurd – many tasks, in particular the ones that are easy to forget, take very little absolute time; the trouble is actually remembering to do them, and being in the right place to do them…).
-
Calendar events
-
The critical thing about calendar events is that if you don’t do them in their scheduled time, that’s it. If there was a meeting you were supposed to go to but you didn’t go to it, too bad, it’s done. Calendar events are never marked as done, they don’t become overdue, they simply become in the past.
-
Calendar events are also much simpler (and that’s probably why they are much better supported by software). Since they have a concrete time, it’s clear where they should show up in the “what should I do now” (or tomorrow, next week) displays, and provided they have a location, notifications are pretty easy too, as they can be given based on travel time to get there. There are some subtleties there (what is the mode of transit, etc), but in general this is pretty well developed and getting better. On iOS (and maybe Android), recurring events will even learn locations if you don’t input them, which is great. You can, of course, just hard-code notification times on events (which is pretty much what you have to do now). As a rule, all calendar events should have notification times, as otherwise, why is the event in your calendar?
-
Tasks
-
Tasks, or todos, are more complicated and subtle, and haven’t gotten nearly the same treatment as calendars (those facts are probably related). Another explanation of this is that calendar events can be seen as a special case of a task that has a particular duration and that gets automatically marked done at the point when it is scheduled. In this sense, a particularly useful and common type of task is well supported, but not more general varieties.
-
While there are dozens (or hundreds?) of task apps (as well as the ones built in to phones), most of them treat tasks alternately as pretty shopping lists (i.e., add a bunch of things to the list, remove them from the list), or complicated hierarchy of notes, or ticketing systems, possibly with various notification structures (note: I’ve looked into a few dozen, and some are better than this, but this is the general story: even very popular ones seems to just be pretty variations on these themes…)
-
From the perspective of remembering things, the most important thing about a system is that you can get absolutely everything that you need to remember into the system, and the immediate consequence is that the primary thing that the application needs to do is not show you things that aren’t yet relevant. For calendar events, it’s obvious when something isn’t yet relevant: it isn’t happening today (or tomorrow, or next week, depending on the view you are looking at), and there is a built-in pressure valve: you can’t do more than one thing at a time, so your calendar can’t be too overwhelming.
-
Tasks, when treated naively (as almost all applications do), do not have similar structure, so you end up having a massive list of things of varying importance, from things that need to happen today (grab groceries, put out recycling, send an email to X) to things you want to do in the next few weeks (read Y paper, contact Z about research they are doing, buy train tickets) to things you need to do in the next month or two (etc, you get the point), and you can imagine that if you start piling up all of these together you would have an unmanageable list. There are also further complications: some tasks are repeating but have deadlines (medicines, bills, etc), others repeat but without clear deadlines (e.g., vacuuming should happen maybe weekly, but it’s not particularly urgent if it doesn’t), and some only make sense to do in certain places (i.e., even if it is the day when the recycling is put out, if I’m not at home, there isn’t much point in telling me that).
-
Ideally, when adding tasks you could put down specific or vague times when they should happen, where they make sense to happen (or where they don’t make sense to happen), repeating patterns (either specific, like the 1st of each month, or periodic, like a week after the last time you did it), and possibly some sense of how important and how hard the task is. I’m a little hesitant about including the latter because I feel like trying to estimate those things becomes really hard (and that means that capturing the tasks becomes more difficult, which is counter-productive), and also, I’ve never used a system that actually does anything useful with it, but maybe.
-
What is presented in a “now” view should be a combination of things that are specifically due soon or are overdue combined with (provided there aren’t too many of the former) things that are vaguely due in the near future. What’s really important is that this view should allow tasks to be addressed quickly: ideally, there are three extremely quick actions – Mark done, Remind me soon, Remind me later. The former is obvious, but the distinction between the two others is where these systems could get smart. “Remind me soon” might mean tonight, or maybe tomorrow, or the next day. “Remind me later” is more complicated. It essentially is a deprioritization. For tasks that have clear deadlines, there probably isn’t much that should happen, but likely it won’t get clicked. But for something that was entered several months ago as vaguely due around now, it bumps it out by a week or so. If it has already been deprioritized, maybe it pushes it further out. There are probably other ways this could get more sophisticated, and it would probably be worth it! The point is, figuring out what is relevant to show (and notify about) is perhaps subtle, but if done well potentially has a high payoff!
-
Current systems
-
The best system I’ve found for iOS (if you have suggestions for something better, let me know!) is the app GoodTask, though it’s certainly not perfect. In terms of what it does do: you can schedule specific deadlines, repeating patterns, and it does an great job of integrating the calendar (you have to go to the settings->preferences and uncheck “separate calendar events”; the default keeps them separate, which is particularly broken for the week view).
-
The single day view shows overdue tasks, tasks that are due today, and calendar events. It uses the built-in Reminders for data storage (though, unlike Reminders, it doesn’t show you everything, thankfully – but using this data store has upsides: it means, for example, you can input reminders by voice) and Calendar (which is great). The location feature is limited to what the Reminders app does: you should be able to get notifications when you enter or leave a given location (though it’s been unreliable for me, so I don’t use it). This isn’t exactly what I want (as I’d rather have the tasks be filtered by location, like they are filtered by date). It has a nice subtask feature (but, it’s minimal – no sub-subtasks), which I’ve ended up using more than I would have thought (as I might have a list of things I need to do before leaving home and I can more compactly keep them organized this way).
-
The main flaw is that it doesn’t have a notion of vague deadlines (I don’t know of any app that does, so this isn’t an attack on it specifically), which means the most annoying part of it is moving tasks between days. For example, there is no way of having 10 tasks that should happen this week and have only a few show up at a time, as they are done. I could put them all on Monday, but then Monday is an overwhelming mess, so more realistically I’ll scatter them throughout the first couple days of the week. And then on Monday if I decide not to do a task, I’ll bump it a few days forward. It works okay. And then if I want something to be hidden for a while, I need to put it as due the date when I want it to first reappear (as it will be totally invisible until that point).
-
Because of the lack of location filtering, I don’t actually find the notifications all that useful, as trying to figure out when to put notifications on tasks is difficult. The notifications are done via the built-in Reminders, which means your delay option is “delay 1 hr” or “delay 1 day”, which isn’t terrible, but isn’t great (if a reminder hits in the morning and I want to do it at night, I’ll be bouncing it every hour throughout the day). My work hours vary by day, and whether I’m working at home or commuting an hour to my office varies, and getting pointless notifications is much worse than getting minimal notifications. As a result, I primarily rely on the app badge number, which is the number of overdue tasks, and I open up the app periodically throughout the day. Having to do that is another reason why re-scheduling tasks (and making sure tasks that are not going to be done today are not there) is so important. By the end of the day, there should be nothing that hasn’t been done. Even if that means, towards the end of the day, bumping things I thought I’d get done to the next morning.
-
GoodTask has a mechanism to filter task by various lists, but I’ve never used it. It’s actually a pretty misleading aspect of their screenshots, as it makes it seem like there are features to support “@Home” and other seemingly sophisticated features, but they are just lists (that are detected by tags). Manually filtering is just a way for me to lose track of things, as I would forget I’m looking at a particular list.
-
Summary
-
I’ve been using this system for maybe six months and it works pretty well – certainly better than not using it! There are some lingering flaws in GoodTask, but overall, I think it is working well enough that I’ve been spending less time worrying about whether I’m forgetting things (and, I’m pretty sure I’m actually getting the things done more quickly). In general, I think this space has had surpisingly little attention paid to it by big tech companies, given that it seems to play so well into their “personal assistant” marketing and the technical aspects don’t actually seem terribly hard (less difficult than voice recognition, anyway!). Each of them can handle basic “remind me to do X at Y” (i.e., create the basic reminders they support), but seemingly have spent little energy figuring out when and how to present these tasks to the person that created them. Which makes them come off as cute technical demos: working when you create 5 reminders, not so much when you create 500. If they put a lot more effort into this, maybe calling them “personal assistants” may not be so silly after all (though since they are intended primarily as advertising devices, maybe I shouldn’t hold out hope).
+ There wasn’t much information out there about this build, even though it seems like a lot of people have made them. In particular, I had no idea how long I should expect it to take (and I underestimated how long it would take – the only number I saw was 10 days, which perhaps a professional, or someone with a helper, could do, but not me!). So this page is an attempt to give more information that might be useful to someone taking on this project. In the end, with every step, including a bunch of time spent stripping off poorly adhered paint and repainting (probably around 15hrs lost), to a rigged boat ready to launch, it took 150.75 hours. As a total amateur.
+
+
+ Also, the plans are pretty detailed, but they aren’t perfect – there are places where they are incomplete, or misleading (e.g., they suggest gluing in the mast step at two different places in the plans – I chose the first one arbitrarily! Even worse, the rudder box instructions say, counter to the pictures, you should glue the framing on after gluing the box together. But if you do that, it’s impossible to drill a countersunk hole on the inside of the rudderbox. So the proposed assembly instructions are essentially impossible) So read everything and try to understand how things fit together – it’s not a matter of just following the instructions in order. However, I am confident that they include enough information that a complete novice (like me) can end up with a boat, but it probably won’t match what is describe exactly, because I don’t think what is describe is actually consistent. Somewhat frustrating, especially given how much love people give to the plan author (as an engineering spec, at least the type I would expect from my entirely different background as a software engineering, I would give it a C-).
+
+
+ Another unclear part is when you should be epoxying things! In the appendix, the author says that they strongly prefer coating surfaces at the point that they are getting glued to other things, but then the instructions (and images) don’t seem to do that. Some surfaces really should be done that way, because then they will become in internal places, but others may be better to wait (I realize now, perhaps the inside of sealed compartments need not have been coated at all! Assuming no leaks, they should never see water, and it would have saved time and weight to not coat them). I ended up coating most everything (on both sides) before assembly, somewhat ignoring the advice that places where pieces will later get attached should get masked off (confusingly, one of the main sources of information, aside from the plans, is this site: http://www.bitingmidge.com/boats/ozracer/building/oneminute.html, which suggests pre-coating the entire plywood panels. This might have actually been a much better plan, but clearly contradicts the idea that you shouldn’t pre-coat areas that will get glued. It’s also probably somewhat wasteful, as even scrap parts get coated, and it may make things harder to cut, as epoxy makes the wood a lot stronger). I then sanded the panels before they got glued to other parts, and hopefully it’s all strong enough!
+
+
+ Tools Used (in order of frequency):
+
+
+
+ Drill
+
+
+ Jig Saw
+
+
+ Belt Sander
+
+
+ Orbital Sander
+
+
+ Router
+
+
+ Table saw – to rip lumber to the right dimensions (done all at once the first day, as I didn’t own one when I started this)
+
+
+ Hand plane
+
+
+ Thickness planer (to get the foil blanks to right thickness)
+
+
+ Electric plane
+
+
+
+ Note: I ended up not using the hand plane for any of the initial steps, even though the plans had you using it all the time. The one I bought was crappy and the sharpening guide I bought didn’t fit it, so sharpening was hard. So I used the jig saw / belt sander for most things (cutting down to close, using sander for the rest), and for narrowing the mast pieces which I used the router. For making the foils, however, I ended up using the hand plane for the leading edge. For the trailing edge, I used an electric hand plane, but to get the curve on the leading edge, even the not great hand plane worked much better, with a sander used to clean it up. And then for rounding the yard and shaping the oars, I used it, crappy as it was (lots of tear-out cleaned up by sanding), a LOT.
+
+
+ Cost
+
+
+
+
+
+ Amount
+
+
+
+ Description
+
+
+
+
+
+
+ $20
+
+
+ Duckworks
+
+
+ Plans
+
+
+
+
+ $368.65
+
+
+ Duckworks
+
+
+ Fiberglass tape, cloth, wood flour, sail tape, thread, rigging hardware, 3 gallons of marinepoxy, two 6” deck plates (and over $50 shipping!)
+
+
+
+
+ $216.74
+
+
+ Local lumberyard
+
+
+ Plywood (four 1/4” plywood sheets) and all lumber
+
+ Plastic drop cloth, disposable gloves, sand paper (belt & discs), rollers (pant & adhesive), brushes, tack cloth, silicone glue, etc. Some of these weren’t used up.
+
+ This was quite a bit more than I was expecting (which, based on things I had read, made me think it’d be about $500). Part of that is probably that I got marine paint, which is expensive, vs. regular outdoor house paint (which probably would have been fine, and probably would have saved ~$80), but also not thinking about all the disposable stuff (brushes, rollers, drop cloths, gloves) which, on it’s own isn’t expensive, but adds up. Finally, the rigging stuff, deck hatches, and rudder hardware is not cheap ($51 for rope & blocks, $35 for rudder stuff, $20 for two 6” hatches)! Those things account for about $300, which combined with probably paying more for lumber than strictly necessary (I got high quality 5/4ths stuff for the foils, which was not cheap, and I probably could have just gotten more 2x4s and ripped them. The plywood was just under $100, and there really wasn’t that much other lumber necessary) and a lot of miscellaneous things (like $8.65 for seam tape for the sail, $5 for high quality thread, etc) account for the “extra”.
+
+
+ Extra cost
+
+
+ The initial paint job I did both didn’t come out well (I think it was way too hot when I painted it, especially the top, which I did outside in the sun!), and I realized that the color on top really didn’t look right. So I decided to scrape/sand off the paint and repaint it. Since this was turning into a much longer term project, I decided to spend more time researching and decided to paint it with old-school marine paint from the local shop Kirby paints (which has been operating continuously since 1846). It’s a bit more expensive than the original stuff I got, but the colors are a lot nicer. Getting a couple quarts, plus thinner, and some brushes was about another $100.
+
+ 2018/5/25 (1.75hr) 8:30am-10:15am, 2pm-5pm cut leeboard holes, sand whole boat, sand next to leeboard, prep for glassing, three coats of epoxy on leeboard edge glass, epoxy on various wood, finish scraping and redo fillets that were bad, measure and cut sail
+
+
+ 2018/5/26 (6hr) 10am-12pm, 1pm-2pm, 3pm-6pm another coat on leeboard edge, fill gap between case and hull, cut patches for sail, tape on sail patches,
+
+
+ 2018/5/27 (3.75hr) 9:30am-12pm, 12:45pm-2pm trim overhanging side tank benches, sewing sewing sewing finally finishing entire sail, adding grommets to sail
+
+
+ 2018/7/18 (1hr) 7pm-8pm plane foil blank to 19mm (as my centerboard case is only 22mm! Oops!)
+
+ 2018/8/28 (2.75hr) 8am-8:45am, 5:45pm-7:00pm, 7:30-8:15pm finish shaping oats and first coat of 50% dilute varnish, glue rudder together, drill holes in spars, fit mast to boat
+
+
+ 2018/8/30 (3.25hr) 8:45am-10am, 6:45pm-8:45pm install padlocks, deckeye, glue together rudderbox w framing and tiller, tie sail to spars, cut bottom of mast, epoxy rudder&box, mast partner, mast bottom, test hull for leaks, build rowing seat
+
+
+ 2018/8/31 (2hr) 8:45am-9am, 7pm-8:45pm epoxy mast partner, mast bottom, rudder box, patch leak on outside, work on installing rudder hardware
+
+
+ 2018/9/1 (1.75hr) 2:15pm-3:00pm, 4:30pm-5:30pm finish rudder hardware, work on rigging
+
+ Getting all the lumber, post stripping it with a table saw my dad had. Without access to a table saw and someone who knows how to use it, I don’t think this project would be possible – the dimensions of lumber needed are not things that are available standard. And, even if they were, they would probably be so much more expensive. For example, I got 12” wide pine planks that were then ripped to get the 3/4” by 3/4” strips, the 3/4” by 1 3/4”, etc.
+
+
+ Marking side panels (first mistake is present in this photo – third and possibly fourth clamp are not in the right place, and the flexible wood batten was so flexible that I was able to make the shape. The bottom is supposed to be a smooth curve! All other panels marked to match this one, replicating the mistake.)
+
+
+
+ Making more side panels.
+
+
+
+ Sanding all side panels to be identical. This was supposed to be done with a plane, but I couldn’t get that to work. The sander worked though!
+
+
+
+ Cutting forward bulkhead.
+
+
+
+ Epoxying rear transom & forward bulkhead.
+
+
+
+ Epoxying side panels, attaching framing.
+
+
+
+ Sanding panels, transoms, bulkhead.
+
+
+
+ Boat is 3D! No bottom yet though.
+
+
+
+ After long and hard day, added mast step / partner, epoxied bottom and with help of dad (this would not have been possible alone, as we had to wrench it into shape and then put in screws to hold it), attached bottom (involving a bit of work to get the boat square, and then a lot more work to get the bottom to attach to the sides due to the mistake shaping the bottom; had to add more trim and screw through the bottom to get it to pull up to side panels.)
+
+
+
+ p Dry fitting deck.
+
+
+
+ Epoxying bottom of deck.
+
+
+
+ Adding fiberglass tape to seams, epoxying bottom of boat.
+
+
+
+ Scarfing mast pieces together and cutting to length. (Though, not pictured here I narrowed them with a router, and messed up the narrowest part, so I think I’m going to switch to a lug sail which requires shorter masts, and thus I can cut off the mistake!)
+
+
+
+ Priming bottom of boat.
+
+
+
+ Painted & flipped the boat (it’s gotten heavy! Problem with getting plywood from a local lumberyard is that thinnest they had was 1/4”, and I’m sure it’s not the lightest stuff!)
+
+
+
+ Moved outside to paint top.
+
+
+
+ Primed top and inside.
+
+
+
+ Painted top and inside.
+
+
+
+ Some really crappy paddles made out of scrap ply, extra 3/4” square with some epoxy added and fiberglass tape wrapped around handle (though I was lazy and only did a single coat of epoxy, so fiberglass tape is still very present. It also hadn’t fully dried when we took it out the next day, so I wrapped duct tape around the handle and halfway down the shaft, so it wouldn’t feel sticky!)
+
+
+
+ Testing the hull out. Found a minor leak in the left side tank (I hadn’t cut the leeboard slot yet, so it wasn’t that). It ended up being at the end.
+
+
+
+ Cut top of leeboard slot, and added fiberglass along edge.
+
+
+
+ Cut bottom of leeboard slot. I noticed what looked like slight gaps at some points around the leeboard case. I’m not sure if they go all the way through to the tank, but I added some thickened epoxy. I’ll have to test the watertightness by dumping water into the tank and see if any comes out!
+
+
+
+ Sewing the sail. This took a long time and a lot of sewing! But the result seemed to come out pretty well, modulo some mistakes (i.e., the tall reinforcing panel that runs most of the way through this photo should have gone higher so the curve at the top hit the top edge of the sail a bit higher than it did).
+
+
+
+ Planing foils. This would have been much easier if my hand plane was better, but I made do by using it on the leading edge (just for the actual curve, which had very narrow shavings that it did better on) and the electric hand planer on the rest, and on the trailing edge. They probably aren’t perfect, but they look pretty good!
+
There wasn’t much information out there about this build, even though it seems like a lot of people have made them. In particular, I had no idea how long I should expect it to take (and I underestimated how long it would take – the only number I saw was 10 days, which perhaps a professional, or someone with a helper, could do, but not me!). So this page is an attempt to give more information that might be useful to someone taking on this project. In the end, with every step, including a bunch of time spent stripping off poorly adhered paint and repainting (probably around 15hrs lost), to a rigged boat ready to launch, it took 150.75 hours. As a total amateur.
-
Also, the plans are pretty detailed, but they aren’t perfect – there are places where they are incomplete, or misleading (e.g., they suggest gluing in the mast step at two different places in the plans – I chose the first one arbitrarily! Even worse, the rudder box instructions say, counter to the pictures, you should glue the framing on after gluing the box together. But if you do that, it’s impossible to drill a countersunk hole on the inside of the rudderbox. So the proposed assembly instructions are essentially impossible) So read everything and try to understand how things fit together – it’s not a matter of just following the instructions in order. However, I am confident that they include enough information that a complete novice (like me) can end up with a boat, but it probably won’t match what is describe exactly, because I don’t think what is describe is actually consistent. Somewhat frustrating, especially given how much love people give to the plan author (as an engineering spec, at least the type I would expect from my entirely different background as a software engineering, I would give it a C-).
-
Another unclear part is when you should be epoxying things! In the appendix, the author says that they strongly prefer coating surfaces at the point that they are getting glued to other things, but then the instructions (and images) don’t seem to do that. Some surfaces really should be done that way, because then they will become in internal places, but others may be better to wait (I realize now, perhaps the inside of sealed compartments need not have been coated at all! Assuming no leaks, they should never see water, and it would have saved time and weight to not coat them). I ended up coating most everything (on both sides) before assembly, somewhat ignoring the advice that places where pieces will later get attached should get masked off (confusingly, one of the main sources of information, aside from the plans, is this site: http://www.bitingmidge.com/boats/ozracer/building/oneminute.html, which suggests pre-coating the entire plywood panels. This might have actually been a much better plan, but clearly contradicts the idea that you shouldn’t pre-coat areas that will get glued. It’s also probably somewhat wasteful, as even scrap parts get coated, and it may make things harder to cut, as epoxy makes the wood a lot stronger). I then sanded the panels before they got glued to other parts, and hopefully it’s all strong enough!
-
Tools Used (in order of frequency):
-
-
Drill
-
Jig Saw
-
Belt Sander
-
Orbital Sander
-
Router
-
Table saw – to rip lumber to the right dimensions (done all at once the first day, as I didn’t own one when I started this)
-
Hand plane
-
Thickness planer (to get the foil blanks to right thickness)
-
Electric plane
-
-
Note: I ended up not using the hand plane for any of the initial steps, even though the plans had you using it all the time. The one I bought was crappy and the sharpening guide I bought didn’t fit it, so sharpening was hard. So I used the jig saw / belt sander for most things (cutting down to close, using sander for the rest), and for narrowing the mast pieces which I used the router. For making the foils, however, I ended up using the hand plane for the leading edge. For the trailing edge, I used an electric hand plane, but to get the curve on the leading edge, even the not great hand plane worked much better, with a sander used to clean it up. And then for rounding the yard and shaping the oars, I used it, crappy as it was (lots of tear-out cleaned up by sanding), a LOT.
-
Cost
-
-
-
-
Amount
-
-
Description
-
-
-
-
-
$20
-
Duckworks
-
Plans
-
-
-
$368.65
-
Duckworks
-
Fiberglass tape, cloth, wood flour, sail tape, thread, rigging hardware, 3 gallons of marinepoxy, two 6" deck plates (and over $50 shipping!)
-
-
-
$216.74
-
Local lumberyard
-
Plywood (four 1/4" plywood sheets) and all lumber
-
-
-
$97.34
-
Amazon
-
Topside marine primer, paint (white & red), non-stick additive
Plastic drop cloth, disposable gloves, sand paper (belt & discs), rollers (pant & adhesive), brushes, tack cloth, silicone glue, etc. Some of these weren’t used up.
This was quite a bit more than I was expecting (which, based on things I had read, made me think it’d be about $500). Part of that is probably that I got marine paint, which is expensive, vs. regular outdoor house paint (which probably would have been fine, and probably would have saved ~$80), but also not thinking about all the disposable stuff (brushes, rollers, drop cloths, gloves) which, on it’s own isn’t expensive, but adds up. Finally, the rigging stuff, deck hatches, and rudder hardware is not cheap ($51 for rope & blocks, $35 for rudder stuff, $20 for two 6" hatches)! Those things account for about $300, which combined with probably paying more for lumber than strictly necessary (I got high quality 5/4ths stuff for the foils, which was not cheap, and I probably could have just gotten more 2x4s and ripped them. The plywood was just under $100, and there really wasn’t that much other lumber necessary) and a lot of miscellaneous things (like $8.65 for seam tape for the sail, $5 for high quality thread, etc) account for the “extra”.
-
Extra cost
-
The initial paint job I did both didn’t come out well (I think it was way too hot when I painted it, especially the top, which I did outside in the sun!), and I realized that the color on top really didn’t look right. So I decided to scrape/sand off the paint and repaint it. Since this was turning into a much longer term project, I decided to spend more time researching and decided to paint it with old-school marine paint from the local shop Kirby paints (which has been operating continuously since 1846). It’s a bit more expensive than the original stuff I got, but the colors are a lot nicer. Getting a couple quarts, plus thinner, and some brushes was about another $100.
2018/5/25 (1.75hr) 8:30am-10:15am, 2pm-5pm cut leeboard holes, sand whole boat, sand next to leeboard, prep for glassing, three coats of epoxy on leeboard edge glass, epoxy on various wood, finish scraping and redo fillets that were bad, measure and cut sail
-
2018/5/26 (6hr) 10am-12pm, 1pm-2pm, 3pm-6pm another coat on leeboard edge, fill gap between case and hull, cut patches for sail, tape on sail patches,
-
2018/5/27 (3.75hr) 9:30am-12pm, 12:45pm-2pm trim overhanging side tank benches, sewing sewing sewing finally finishing entire sail, adding grommets to sail
-
2018/7/18 (1hr) 7pm-8pm plane foil blank to 19mm (as my centerboard case is only 22mm! Oops!)
2018/8/28 (2.75hr) 8am-8:45am, 5:45pm-7:00pm, 7:30-8:15pm finish shaping oats and first coat of 50% dilute varnish, glue rudder together, drill holes in spars, fit mast to boat
-
2018/8/30 (3.25hr) 8:45am-10am, 6:45pm-8:45pm install padlocks, deckeye, glue together rudderbox w framing and tiller, tie sail to spars, cut bottom of mast, epoxy rudder&box, mast partner, mast bottom, test hull for leaks, build rowing seat
-
2018/8/31 (2hr) 8:45am-9am, 7pm-8:45pm epoxy mast partner, mast bottom, rudder box, patch leak on outside, work on installing rudder hardware
-
2018/9/1 (1.75hr) 2:15pm-3:00pm, 4:30pm-5:30pm finish rudder hardware, work on rigging
- Getting all the lumber, post stripping it with a table saw my dad had. Without access to a table saw and someone who knows how to use it, I don’t think this project would be possible – the dimensions of lumber needed are not things that are available standard. And, even if they were, they would probably be so much more expensive. For example, I got 12" wide pine planks that were then ripped to get the 3/4" by 3/4" strips, the 3/4" by 1 3/4", etc.
-
-
-
-
Marking side panels (first mistake is present in this photo – third and possibly fourth clamp are not in the right place, and the flexible wood batten was so flexible that I was able to make the shape. The bottom is supposed to be a smooth curve! All other panels marked to match this one, replicating the mistake.)
-
-
-
-
Making more side panels.
-
-
-
-
Sanding all side panels to be identical. This was supposed to be done with a plane, but I couldn’t get that to work. The sander worked though!
-
-
-
-
Cutting forward bulkhead.
-
-
-
-
Epoxying rear transom & forward bulkhead.
-
-
-
-
Epoxying side panels, attaching framing.
-
-
-
-
Sanding panels, transoms, bulkhead.
-
-
-
-
Boat is 3D! No bottom yet though.
-
-
-
-
After long and hard day, added mast step / partner, epoxied bottom and with help of dad (this would not have been possible alone, as we had to wrench it into shape and then put in screws to hold it), attached bottom (involving a bit of work to get the boat square, and then a lot more work to get the bottom to attach to the sides due to the mistake shaping the bottom; had to add more trim and screw through the bottom to get it to pull up to side panels.)
-
-
-
-
p Dry fitting deck.
-
-
-
-
Epoxying bottom of deck.
-
-
-
-
Adding fiberglass tape to seams, epoxying bottom of boat.
-
-
-
-
Scarfing mast pieces together and cutting to length. (Though, not pictured here I narrowed them with a router, and messed up the narrowest part, so I think I’m going to switch to a lug sail which requires shorter masts, and thus I can cut off the mistake!)
-
-
-
-
Priming bottom of boat.
-
-
-
-
Painted & flipped the boat (it’s gotten heavy! Problem with getting plywood from a local lumberyard is that thinnest they had was 1/4", and I’m sure it’s not the lightest stuff!)
-
-
-
-
Moved outside to paint top.
-
-
-
-
Primed top and inside.
-
-
-
-
Painted top and inside.
-
-
-
-
Some really crappy paddles made out of scrap ply, extra 3/4" square with some epoxy added and fiberglass tape wrapped around handle (though I was lazy and only did a single coat of epoxy, so fiberglass tape is still very present. It also hadn’t fully dried when we took it out the next day, so I wrapped duct tape around the handle and halfway down the shaft, so it wouldn’t feel sticky!)
-
-
-
-
Testing the hull out. Found a minor leak in the left side tank (I hadn’t cut the leeboard slot yet, so it wasn’t that). It ended up being at the end.
-
-
-
-
Cut top of leeboard slot, and added fiberglass along edge.
-
-
-
-
Cut bottom of leeboard slot. I noticed what looked like slight gaps at some points around the leeboard case. I’m not sure if they go all the way through to the tank, but I added some thickened epoxy. I’ll have to test the watertightness by dumping water into the tank and see if any comes out!
-
-
-
-
Sewing the sail. This took a long time and a lot of sewing! But the result seemed to come out pretty well, modulo some mistakes (i.e., the tall reinforcing panel that runs most of the way through this photo should have gone higher so the curve at the top hit the top edge of the sail a bit higher than it did).
-
-
-
-
Planing foils. This would have been much easier if my hand plane was better, but I made do by using it on the leading edge (just for the actual curve, which had very narrow shavings that it did better on) and the electric hand planer on the rest, and on the trailing edge. They probably aren’t perfect, but they look pretty good!
+ There wasn’t much information out there about this build, even though it seems like a lot of people have made them. In particular, I had no idea how long I should expect it to take (and I underestimated how long it would take – the only number I saw was 10 days, which perhaps a professional, or someone with a helper, could do, but not me!). So this page is an attempt to give more information that might be useful to someone taking on this project. In the end, with every step, including a bunch of time spent stripping off poorly adhered paint and repainting (probably around 15hrs lost), to a rigged boat ready to launch, it took 150.75 hours. As a total amateur.
+
+
+ Also, the plans are pretty detailed, but they aren’t perfect – there are places where they are incomplete, or misleading (e.g., they suggest gluing in the mast step at two different places in the plans – I chose the first one arbitrarily! Even worse, the rudder box instructions say, counter to the pictures, you should glue the framing on after gluing the box together. But if you do that, it’s impossible to drill a countersunk hole on the inside of the rudderbox. So the proposed assembly instructions are essentially impossible) So read everything and try to understand how things fit together – it’s not a matter of just following the instructions in order. However, I am confident that they include enough information that a complete novice (like me) can end up with a boat, but it probably won’t match what is describe exactly, because I don’t think what is describe is actually consistent. Somewhat frustrating, especially given how much love people give to the plan author (as an engineering spec, at least the type I would expect from my entirely different background as a software engineering, I would give it a C-).
+
+
+ Another unclear part is when you should be epoxying things! In the appendix, the author says that they strongly prefer coating surfaces at the point that they are getting glued to other things, but then the instructions (and images) don’t seem to do that. Some surfaces really should be done that way, because then they will become in internal places, but others may be better to wait (I realize now, perhaps the inside of sealed compartments need not have been coated at all! Assuming no leaks, they should never see water, and it would have saved time and weight to not coat them). I ended up coating most everything (on both sides) before assembly, somewhat ignoring the advice that places where pieces will later get attached should get masked off (confusingly, one of the main sources of information, aside from the plans, is this site: http://www.bitingmidge.com/boats/ozracer/building/oneminute.html, which suggests pre-coating the entire plywood panels. This might have actually been a much better plan, but clearly contradicts the idea that you shouldn’t pre-coat areas that will get glued. It’s also probably somewhat wasteful, as even scrap parts get coated, and it may make things harder to cut, as epoxy makes the wood a lot stronger). I then sanded the panels before they got glued to other parts, and hopefully it’s all strong enough!
+
+
+ Tools Used (in order of frequency):
+
+
+
+ Drill
+
+
+ Jig Saw
+
+
+ Belt Sander
+
+
+ Orbital Sander
+
+
+ Router
+
+
+ Table saw – to rip lumber to the right dimensions (done all at once the first day, as I didn’t own one when I started this)
+
+
+ Hand plane
+
+
+ Thickness planer (to get the foil blanks to right thickness)
+
+
+ Electric plane
+
+
+
+ Note: I ended up not using the hand plane for any of the initial steps, even though the plans had you using it all the time. The one I bought was crappy and the sharpening guide I bought didn’t fit it, so sharpening was hard. So I used the jig saw / belt sander for most things (cutting down to close, using sander for the rest), and for narrowing the mast pieces which I used the router. For making the foils, however, I ended up using the hand plane for the leading edge. For the trailing edge, I used an electric hand plane, but to get the curve on the leading edge, even the not great hand plane worked much better, with a sander used to clean it up. And then for rounding the yard and shaping the oars, I used it, crappy as it was (lots of tear-out cleaned up by sanding), a LOT.
+
+
+ Cost
+
+
+
+
+
+ Amount
+
+
+
+ Description
+
+
+
+
+
+
+ $20
+
+
+ Duckworks
+
+
+ Plans
+
+
+
+
+ $368.65
+
+
+ Duckworks
+
+
+ Fiberglass tape, cloth, wood flour, sail tape, thread, rigging hardware, 3 gallons of marinepoxy, two 6” deck plates (and over $50 shipping!)
+
+
+
+
+ $216.74
+
+
+ Local lumberyard
+
+
+ Plywood (four 1/4” plywood sheets) and all lumber
+
+ Plastic drop cloth, disposable gloves, sand paper (belt & discs), rollers (pant & adhesive), brushes, tack cloth, silicone glue, etc. Some of these weren’t used up.
+
+ This was quite a bit more than I was expecting (which, based on things I had read, made me think it’d be about $500). Part of that is probably that I got marine paint, which is expensive, vs. regular outdoor house paint (which probably would have been fine, and probably would have saved ~$80), but also not thinking about all the disposable stuff (brushes, rollers, drop cloths, gloves) which, on it’s own isn’t expensive, but adds up. Finally, the rigging stuff, deck hatches, and rudder hardware is not cheap ($51 for rope & blocks, $35 for rudder stuff, $20 for two 6” hatches)! Those things account for about $300, which combined with probably paying more for lumber than strictly necessary (I got high quality 5/4ths stuff for the foils, which was not cheap, and I probably could have just gotten more 2x4s and ripped them. The plywood was just under $100, and there really wasn’t that much other lumber necessary) and a lot of miscellaneous things (like $8.65 for seam tape for the sail, $5 for high quality thread, etc) account for the “extra”.
+
+
+ Extra cost
+
+
+ The initial paint job I did both didn’t come out well (I think it was way too hot when I painted it, especially the top, which I did outside in the sun!), and I realized that the color on top really didn’t look right. So I decided to scrape/sand off the paint and repaint it. Since this was turning into a much longer term project, I decided to spend more time researching and decided to paint it with old-school marine paint from the local shop Kirby paints (which has been operating continuously since 1846). It’s a bit more expensive than the original stuff I got, but the colors are a lot nicer. Getting a couple quarts, plus thinner, and some brushes was about another $100.
+
+ 2018/5/25 (1.75hr) 8:30am-10:15am, 2pm-5pm cut leeboard holes, sand whole boat, sand next to leeboard, prep for glassing, three coats of epoxy on leeboard edge glass, epoxy on various wood, finish scraping and redo fillets that were bad, measure and cut sail
+
+
+ 2018/5/26 (6hr) 10am-12pm, 1pm-2pm, 3pm-6pm another coat on leeboard edge, fill gap between case and hull, cut patches for sail, tape on sail patches,
+
+
+ 2018/5/27 (3.75hr) 9:30am-12pm, 12:45pm-2pm trim overhanging side tank benches, sewing sewing sewing finally finishing entire sail, adding grommets to sail
+
+
+ 2018/7/18 (1hr) 7pm-8pm plane foil blank to 19mm (as my centerboard case is only 22mm! Oops!)
+
+ 2018/8/28 (2.75hr) 8am-8:45am, 5:45pm-7:00pm, 7:30-8:15pm finish shaping oats and first coat of 50% dilute varnish, glue rudder together, drill holes in spars, fit mast to boat
+
+
+ 2018/8/30 (3.25hr) 8:45am-10am, 6:45pm-8:45pm install padlocks, deckeye, glue together rudderbox w framing and tiller, tie sail to spars, cut bottom of mast, epoxy rudder&box, mast partner, mast bottom, test hull for leaks, build rowing seat
+
+
+ 2018/8/31 (2hr) 8:45am-9am, 7pm-8:45pm epoxy mast partner, mast bottom, rudder box, patch leak on outside, work on installing rudder hardware
+
+
+ 2018/9/1 (1.75hr) 2:15pm-3:00pm, 4:30pm-5:30pm finish rudder hardware, work on rigging
+
+ Getting all the lumber, post stripping it with a table saw my dad had. Without access to a table saw and someone who knows how to use it, I don’t think this project would be possible – the dimensions of lumber needed are not things that are available standard. And, even if they were, they would probably be so much more expensive. For example, I got 12” wide pine planks that were then ripped to get the 3/4” by 3/4” strips, the 3/4” by 1 3/4”, etc.
+
+
+ Marking side panels (first mistake is present in this photo – third and possibly fourth clamp are not in the right place, and the flexible wood batten was so flexible that I was able to make the shape. The bottom is supposed to be a smooth curve! All other panels marked to match this one, replicating the mistake.)
+
+
+
+ Making more side panels.
+
+
+
+ Sanding all side panels to be identical. This was supposed to be done with a plane, but I couldn’t get that to work. The sander worked though!
+
+
+
+ Cutting forward bulkhead.
+
+
+
+ Epoxying rear transom & forward bulkhead.
+
+
+
+ Epoxying side panels, attaching framing.
+
+
+
+ Sanding panels, transoms, bulkhead.
+
+
+
+ Boat is 3D! No bottom yet though.
+
+
+
+ After long and hard day, added mast step / partner, epoxied bottom and with help of dad (this would not have been possible alone, as we had to wrench it into shape and then put in screws to hold it), attached bottom (involving a bit of work to get the boat square, and then a lot more work to get the bottom to attach to the sides due to the mistake shaping the bottom; had to add more trim and screw through the bottom to get it to pull up to side panels.)
+
+
+
+ p Dry fitting deck.
+
+
+
+ Epoxying bottom of deck.
+
+
+
+ Adding fiberglass tape to seams, epoxying bottom of boat.
+
+
+
+ Scarfing mast pieces together and cutting to length. (Though, not pictured here I narrowed them with a router, and messed up the narrowest part, so I think I’m going to switch to a lug sail which requires shorter masts, and thus I can cut off the mistake!)
+
+
+
+ Priming bottom of boat.
+
+
+
+ Painted & flipped the boat (it’s gotten heavy! Problem with getting plywood from a local lumberyard is that thinnest they had was 1/4”, and I’m sure it’s not the lightest stuff!)
+
+
+
+ Moved outside to paint top.
+
+
+
+ Primed top and inside.
+
+
+
+ Painted top and inside.
+
+
+
+ Some really crappy paddles made out of scrap ply, extra 3/4” square with some epoxy added and fiberglass tape wrapped around handle (though I was lazy and only did a single coat of epoxy, so fiberglass tape is still very present. It also hadn’t fully dried when we took it out the next day, so I wrapped duct tape around the handle and halfway down the shaft, so it wouldn’t feel sticky!)
+
+
+
+ Testing the hull out. Found a minor leak in the left side tank (I hadn’t cut the leeboard slot yet, so it wasn’t that). It ended up being at the end.
+
+
+
+ Cut top of leeboard slot, and added fiberglass along edge.
+
+
+
+ Cut bottom of leeboard slot. I noticed what looked like slight gaps at some points around the leeboard case. I’m not sure if they go all the way through to the tank, but I added some thickened epoxy. I’ll have to test the watertightness by dumping water into the tank and see if any comes out!
+
+
+
+ Sewing the sail. This took a long time and a lot of sewing! But the result seemed to come out pretty well, modulo some mistakes (i.e., the tall reinforcing panel that runs most of the way through this photo should have gone higher so the curve at the top hit the top edge of the sail a bit higher than it did).
+
+
+
+ Planing foils. This would have been much easier if my hand plane was better, but I made do by using it on the leading edge (just for the actual curve, which had very narrow shavings that it did better on) and the electric hand planer on the rest, and on the trailing edge. They probably aren’t perfect, but they look pretty good!
+
I’ve recently started working with Snap, the Haskell web framework, (http://snapframework.com), and one reason (among many) for my reason to switch from Ocsigen, a web framework written in OCaml (which I’ve written posts about before) was the desire to more flexibly handle ajax based websites. While it seems good in some ways, I eventually decided that Ocsigen’s emphasis on declaring services as having certain types (ie, a fragment of a page, a whole page, a redirect, etc) is in some ways at odds with the way the web works.
-
After starting to work in Haskell again, and with the Snap team authored templating system Heist, I immediately began looking for ways to work with ajax content more flexibly than I had been doing before. Inspired by the work of Facebook on Primer (provided to the world at https://gist.github.com/376039 ), which is their base-line system for dynamic content - basically, event listeners waiting for onclick events on links that have a special attribute that says it should perform an ajax request, and event listeners for onsubmit events on forms that have a special attribute that indicates the forms should be serialized and submitted asynchronously. But even more interesting than that (to me) was the other half of their system (not, I believe, public, and regardless, written in PHP), which is that the server side response decides what client side div’s it should replace.
-
At first that sounds a little dirty - it basically entails mixing (conceptually) server code and client code. But then it allows a different sort of methodology - that even with client side modifications, it is the server that ultimately has all control - including what to replace on the client. This is a fascinating idea, because clientside code is notoriously limited be being written in javascript (or with javascript libraries), and thinking about having to maintain clientside and serverside code seems to be a much dirtier solution than having the server, in short, control the client.
-
Taking this idea, and bringing it into the world of Heist, which is (in my opinion) a fantastic templating system (more info at http://snapframework.com/docs/tutorials/heist ), ended up being quite straightforward, as Heist lends itself to the idea of extending the syntax of html, much like the facebook primer system did.
-
At first I thought that there should be haskell code that would specify things like “replaceDivsWithSplices …” where div’s would be identified and corresponding splices (things that can be inserted into heist templates) would replace them, and then “replaceDivsWithTemplates”, etc, but the whole solution seemed a little off.
-
And then I realized that the entire idea could be summed up with a single tag: “div-async”. The idea would be, this would be a special div that could foreseeably be replaced by an asychronous response. A template would have many divs that were marked this way, which in a non-async response would do nothing special, but when an async response came back, all div-async’s would replace corresponding tags on the page.
-
The only things that remained were the two tags to start the async requests, which I named “a-async” and “form-async”, and a little javascript to make the moving parts work together. And so, heist-async was born. (for the impatient, the code exists at https://github.com/dbp/heist-async , and while I am using this code currently and it seems to work, it could change significantly as things are worked out)
-
The basics of how this works should be obvious, but I can illustrate a basic example. On a page you have an announcements box. You want the user to be able to click a button and have the announcements box reload without reloading the whole page (new announcements may have occurred). So you have a page template that looks like this:
+ I’ve recently started working with Snap, the Haskell web framework, (http://snapframework.com), and one reason (among many) for my reason to switch from Ocsigen, a web framework written in OCaml (which I’ve written posts about before) was the desire to more flexibly handle ajax based websites. While it seems good in some ways, I eventually decided that Ocsigen’s emphasis on declaring services as having certain types (ie, a fragment of a page, a whole page, a redirect, etc) is in some ways at odds with the way the web works.
+
+
+ After starting to work in Haskell again, and with the Snap team authored templating system Heist, I immediately began looking for ways to work with ajax content more flexibly than I had been doing before. Inspired by the work of Facebook on Primer (provided to the world at https://gist.github.com/376039 ), which is their base-line system for dynamic content - basically, event listeners waiting for onclick events on links that have a special attribute that says it should perform an ajax request, and event listeners for onsubmit events on forms that have a special attribute that indicates the forms should be serialized and submitted asynchronously. But even more interesting than that (to me) was the other half of their system (not, I believe, public, and regardless, written in PHP), which is that the server side response decides what client side div’s it should replace.
+
+
+ At first that sounds a little dirty - it basically entails mixing (conceptually) server code and client code. But then it allows a different sort of methodology - that even with client side modifications, it is the server that ultimately has all control - including what to replace on the client. This is a fascinating idea, because clientside code is notoriously limited be being written in javascript (or with javascript libraries), and thinking about having to maintain clientside and serverside code seems to be a much dirtier solution than having the server, in short, control the client.
+
+
+ Taking this idea, and bringing it into the world of Heist, which is (in my opinion) a fantastic templating system (more info at http://snapframework.com/docs/tutorials/heist ), ended up being quite straightforward, as Heist lends itself to the idea of extending the syntax of html, much like the facebook primer system did.
+
+
+ At first I thought that there should be haskell code that would specify things like “replaceDivsWithSplices …” where div’s would be identified and corresponding splices (things that can be inserted into heist templates) would replace them, and then “replaceDivsWithTemplates”, etc, but the whole solution seemed a little off.
+
+
+ And then I realized that the entire idea could be summed up with a single tag: “div-async”. The idea would be, this would be a special div that could foreseeably be replaced by an asychronous response. A template would have many divs that were marked this way, which in a non-async response would do nothing special, but when an async response came back, all div-async’s would replace corresponding tags on the page.
+
+
+ The only things that remained were the two tags to start the async requests, which I named “a-async” and “form-async”, and a little javascript to make the moving parts work together. And so, heist-async was born. (for the impatient, the code exists at https://github.com/dbp/heist-async , and while I am using this code currently and it seems to work, it could change significantly as things are worked out)
+
+
+ The basics of how this works should be obvious, but I can illustrate a basic example. On a page you have an announcements box. You want the user to be able to click a button and have the announcements box reload without reloading the whole page (new announcements may have occurred). So you have a page template that looks like this:
+
Now to glue this together, all you need to do is serve the original page (with the proper splice set so that the tag actually works), and, at the /recent_announcements url, you just serve the announcements template. Since it is the exact same template, it obviously has the same identifier for the div-async (which is just the attribute “name”), and will therefore replace the current anouncements box with the recently loaded one.
-
Now that is pretty cool - what it means is that you can have one set of templating code, and the only change you need to do is separate any parts you want to be able to load asynchronously into separate templates, and make sure there is a div-async wrapper around it. (NOTE: since I didn’t mention it before, it might be helpful to now - div-async is just a regular div, so you can set all the regular things, like id, class, etc. Also feel free to take existing div’s and just add -async and set a name).
-
At this point, I was pretty happy with this, and thought it was working pretty well, but of course the real world is much more complicated, and not everything is so simple - sometimes a single asynchronous request should mean a lot of different things on a page should change. In this case, it is possible that the simple template inheritance will not work, but with the addition of a template that is just for the response, that includes all the templates that should be updated, it seems to work pretty well. An example of one of these could be:
In this case, there is still no duplication of formatting code, all that exists now is an explicit list of all the parts of the page that should be replaced by a given request.
-
Other common things; to hide an element, sending back:
Should work. You could also put some empty placeholder div’s like that on a page, and later replace them with ones with actual content.
-
What I noticed about this is that it makes dynamic page changes very explicit in the templates, which I think is a very good thing - and certainly makes it easier to reason about page changes.
-
Getting to this point, I started using this to implement a bunch of parts of a new site I’m working on, and I was happily impressed by how it all seemed to be working. Using this, it seems like ajax can be thought of as just an aspect of the templating system - describe what should be replaced, and it will be, without ever having to worry about the clientside code (which is 12k of lightweight libraries and 60 significant lines of code of custom javascript. The 60 lines should easily be able to be translated to depend on common javascript libraries like jQuery, I just didn’t want to make that a requirement).
-
I’m interested in feedback on the library, and ways that it can be improved. It is still very early software (a week ago, it did not exist), but it is something that I’ve found very powerful, and I’m kind of interested in where it can be taken / what people think about it.
-
-
+
+ Now to glue this together, all you need to do is serve the original page (with the proper splice set so that the
+
+ tag actually works), and, at the /recent_announcements url, you just serve the announcements template. Since it is the exact same template, it obviously has the same identifier for the div-async (which is just the attribute “name”), and will therefore replace the current anouncements box with the recently loaded one.
+
+
+
+ Now that is pretty cool - what it means is that you can have one set of templating code, and the only change you need to do is separate any parts you want to be able to load asynchronously into separate templates, and make sure there is a div-async wrapper around it. (NOTE: since I didn’t mention it before, it might be helpful to now - div-async is just a regular div, so you can set all the regular things, like id, class, etc. Also feel free to take existing div’s and just add -async and set a name).
+
+
+ At this point, I was pretty happy with this, and thought it was working pretty well, but of course the real world is much more complicated, and not everything is so simple - sometimes a single asynchronous request should mean a lot of different things on a page should change. In this case, it is possible that the simple template inheritance will not work, but with the addition of a template that is just for the response, that includes all the templates that should be updated, it seems to work pretty well. An example of one of these could be:
+
+ In this case, there is still no duplication of formatting code, all that exists now is an explicit list of all the parts of the page that should be replaced by a given request.
+
+
+ Other common things; to hide an element, sending back:
+
+ Should work. You could also put some empty placeholder div’s like that on a page, and later replace them with ones with actual content.
+
+
+ What I noticed about this is that it makes dynamic page changes very explicit in the templates, which I think is a very good thing - and certainly makes it easier to reason about page changes.
+
+
+ Getting to this point, I started using this to implement a bunch of parts of a new site I’m working on, and I was happily impressed by how it all seemed to be working. Using this, it seems like ajax can be thought of as just an aspect of the templating system - describe what should be replaced, and it will be, without ever having to worry about the clientside code (which is 12k of lightweight libraries and 60 significant lines of code of custom javascript. The 60 lines should easily be able to be translated to depend on common javascript libraries like jQuery, I just didn’t want to make that a requirement).
+
+
+ I’m interested in feedback on the library, and ways that it can be improved. It is still very early software (a week ago, it did not exist), but it is something that I’ve found very powerful, and I’m kind of interested in where it can be taken / what people think about it.
+
+ I’ve recently started working with Snap, the Haskell web framework, (http://snapframework.com), and one reason (among many) for my reason to switch from Ocsigen, a web framework written in OCaml (which I’ve written posts about before) was the desire to more flexibly handle ajax based websites. While it seems good in some ways, I eventually decided that Ocsigen’s emphasis on declaring services as having certain types (ie, a fragment of a page, a whole page, a redirect, etc) is in some ways at odds with the way the web works.
+
+
+ After starting to work in Haskell again, and with the Snap team authored templating system Heist, I immediately began looking for ways to work with ajax content more flexibly than I had been doing before. Inspired by the work of Facebook on Primer (provided to the world at https://gist.github.com/376039 ), which is their base-line system for dynamic content - basically, event listeners waiting for onclick events on links that have a special attribute that says it should perform an ajax request, and event listeners for onsubmit events on forms that have a special attribute that indicates the forms should be serialized and submitted asynchronously. But even more interesting than that (to me) was the other half of their system (not, I believe, public, and regardless, written in PHP), which is that the server side response decides what client side div’s it should replace.
+
+
+ At first that sounds a little dirty - it basically entails mixing (conceptually) server code and client code. But then it allows a different sort of methodology - that even with client side modifications, it is the server that ultimately has all control - including what to replace on the client. This is a fascinating idea, because clientside code is notoriously limited be being written in javascript (or with javascript libraries), and thinking about having to maintain clientside and serverside code seems to be a much dirtier solution than having the server, in short, control the client.
+
+
+ Taking this idea, and bringing it into the world of Heist, which is (in my opinion) a fantastic templating system (more info at http://snapframework.com/docs/tutorials/heist ), ended up being quite straightforward, as Heist lends itself to the idea of extending the syntax of html, much like the facebook primer system did.
+
+
+ At first I thought that there should be haskell code that would specify things like “replaceDivsWithSplices …” where div’s would be identified and corresponding splices (things that can be inserted into heist templates) would replace them, and then “replaceDivsWithTemplates”, etc, but the whole solution seemed a little off.
+
+
+ And then I realized that the entire idea could be summed up with a single tag: “div-async”. The idea would be, this would be a special div that could foreseeably be replaced by an asychronous response. A template would have many divs that were marked this way, which in a non-async response would do nothing special, but when an async response came back, all div-async’s would replace corresponding tags on the page.
+
+
+ The only things that remained were the two tags to start the async requests, which I named “a-async” and “form-async”, and a little javascript to make the moving parts work together. And so, heist-async was born. (for the impatient, the code exists at https://github.com/dbp/heist-async , and while I am using this code currently and it seems to work, it could change significantly as things are worked out)
+
+
+ The basics of how this works should be obvious, but I can illustrate a basic example. On a page you have an announcements box. You want the user to be able to click a button and have the announcements box reload without reloading the whole page (new announcements may have occurred). So you have a page template that looks like this:
+
+ Now to glue this together, all you need to do is serve the original page (with the proper splice set so that the
+
+ tag actually works), and, at the /recent_announcements url, you just serve the announcements template. Since it is the exact same template, it obviously has the same identifier for the div-async (which is just the attribute “name”), and will therefore replace the current anouncements box with the recently loaded one.
+
+
+
+ Now that is pretty cool - what it means is that you can have one set of templating code, and the only change you need to do is separate any parts you want to be able to load asynchronously into separate templates, and make sure there is a div-async wrapper around it. (NOTE: since I didn’t mention it before, it might be helpful to now - div-async is just a regular div, so you can set all the regular things, like id, class, etc. Also feel free to take existing div’s and just add -async and set a name).
+
+
+ At this point, I was pretty happy with this, and thought it was working pretty well, but of course the real world is much more complicated, and not everything is so simple - sometimes a single asynchronous request should mean a lot of different things on a page should change. In this case, it is possible that the simple template inheritance will not work, but with the addition of a template that is just for the response, that includes all the templates that should be updated, it seems to work pretty well. An example of one of these could be:
+
+ In this case, there is still no duplication of formatting code, all that exists now is an explicit list of all the parts of the page that should be replaced by a given request.
+
+
+ Other common things; to hide an element, sending back:
+
+ Should work. You could also put some empty placeholder div’s like that on a page, and later replace them with ones with actual content.
+
+
+ What I noticed about this is that it makes dynamic page changes very explicit in the templates, which I think is a very good thing - and certainly makes it easier to reason about page changes.
+
+
+ Getting to this point, I started using this to implement a bunch of parts of a new site I’m working on, and I was happily impressed by how it all seemed to be working. Using this, it seems like ajax can be thought of as just an aspect of the templating system - describe what should be replaced, and it will be, without ever having to worry about the clientside code (which is 12k of lightweight libraries and 60 significant lines of code of custom javascript. The 60 lines should easily be able to be translated to depend on common javascript libraries like jQuery, I just didn’t want to make that a requirement).
+
+
+ I’m interested in feedback on the library, and ways that it can be improved. It is still very early software (a week ago, it did not exist), but it is something that I’ve found very powerful, and I’m kind of interested in where it can be taken / what people think about it.
+
Note: this was originally posted as two separate parts, 1 week apart, and has been compressed for posterity
-
I just started learning a functional/logic language called Mercury, which has features that make it feel (at least to my initial impressions) like a mix between Prolog and Haskell. It has all the features that make it a viable Prolog, but it also adds static typing (with full type inference) and purity (all side effects are dealt with by passing around the state of the world). Since I recently was interested in learning Prolog, but had no desire to give up static typing or purity, Mercury seemed like a neat thing to learn.
-
While it is not very well known, the language has been around for over 15 years, and has a high quality self-hosting compiler.
-
Getting to play around with logic/declarative programming is interesting (and indeed the main reason why I’m interested in learning it), but what seems even more interesting with Mercury is how they have incorporated typing to the logic programming (which, unless I’m mistaken, is a new thing). As a tiny code example:
-
:- pred head(list(T), T).
+
+
+
+ Mercury tidbits - dependent types and file io
+
+
+ Note: this was originally posted as two separate parts, 1 week apart, and has been compressed for posterity
+
+
+ I just started learning a functional/logic language called Mercury, which has features that make it feel (at least to my initial impressions) like a mix between Prolog and Haskell. It has all the features that make it a viable Prolog, but it also adds static typing (with full type inference) and purity (all side effects are dealt with by passing around the state of the world). Since I recently was interested in learning Prolog, but had no desire to give up static typing or purity, Mercury seemed like a neat thing to learn.
+
+
+ While it is not very well known, the language has been around for over 15 years, and has a high quality self-hosting compiler.
+
+
+ Getting to play around with logic/declarative programming is interesting (and indeed the main reason why I’m interested in learning it), but what seems even more interesting with Mercury is how they have incorporated typing to the logic programming (which, unless I’m mistaken, is a new thing). As a tiny code example:
+
+
:- pred head(list(T), T).
:- mode head(in, out) is semidet.
:- mode head(in(non_empty_list), out) is det.
head(Xs, X) :- Xs = [X | _].
-
The first line says that this is a predicate (logic statement) that has two parts, the first is a list of some type T (it is polymorphic), the second is an item of type T.
-
The fourth line should be familiar to a prolog programmer, but briefly, the right side says that Xs is defined as X cons’d to an unnamed element. head can be seen as defining a relationship between Xs and X, where the specifics are that Xs is a list that has as it’s first element X.
-
Now with regular prolog, only the fourth line would be necessary, and that definition allows some interesting generalization. Because head([1,2,3],Y) will bind Y to 1, while head([1,2,3],1) will be true (or some truthy value), and if head(X,Y) were used in a set of other statements, together they would only yield a result if X (wherever it was bound, or unified, to a value) had as it’s first value Y, whatever Y was.
-
Since Mercury is statically typed, it adds what it calls modes to predicates, which specify whether a certain argument (that’s probably not the right word!) is going to be given, or whether it is going to be figured out by the predicate. The other thing it has is specifications about whether the predicate is deterministic. There are a couple options, but the two that are relevant to this example are det, which means fully deterministic, for every input there is exactly one output, and semidet, which means for some inputs there is an output, for others there is not (ie, the unification fails). These allow the compiler to do really interesting things, like tell you if you are not covering all of the possible cases if you declare something as det (whereas the same code, as semidet, would not cause any errors).
-
What is fascinating about this predicate head is that it has two modes defined, one being the obvious head that we know from Haskell etc:
-
:- mode head(in, out) is semidet.
-
Which states that the first argument is the input (the list) and the second is the output (the element), and it is semidet because for an empty list it will fail. The next is more interesting:
-
:- mode head(in(non_empty_list), out) is det.
-
This says for an input that is a non_empty_list (defined in the standard libraries, and included below), the second argument is the output, and this is det, ie fully deterministic. What this basically means is that failure is incorporated into the type system, because something that is semidet can fail, but something that is det cannot (neat!). Now the standard modes are defined (something like):
-
:- mode in == (ground >> ground).
+
+ The first line says that this is a predicate (logic statement) that has two parts, the first is a list of some type T (it is polymorphic), the second is an item of type T.
+
+
+ The fourth line should be familiar to a prolog programmer, but briefly, the right side says that Xs is defined as X cons’d to an unnamed element. head can be seen as defining a relationship between Xs and X, where the specifics are that Xs is a list that has as it’s first element X.
+
+
+ Now with regular prolog, only the fourth line would be necessary, and that definition allows some interesting generalization. Because head([1,2,3],Y) will bind Y to 1, while head([1,2,3],1) will be true (or some truthy value), and if head(X,Y) were used in a set of other statements, together they would only yield a result if X (wherever it was bound, or unified, to a value) had as it’s first value Y, whatever Y was.
+
+
+ Since Mercury is statically typed, it adds what it calls modes to predicates, which specify whether a certain argument (that’s probably not the right word!) is going to be given, or whether it is going to be figured out by the predicate. The other thing it has is specifications about whether the predicate is deterministic. There are a couple options, but the two that are relevant to this example are det, which means fully deterministic, for every input there is exactly one output, and semidet, which means for some inputs there is an output, for others there is not (ie, the unification fails). These allow the compiler to do really interesting things, like tell you if you are not covering all of the possible cases if you declare something as det (whereas the same code, as semidet, would not cause any errors).
+
+
+ What is fascinating about this predicate head is that it has two modes defined, one being the obvious head that we know from Haskell etc:
+
+
:- mode head(in, out) is semidet.
+
+ Which states that the first argument is the input (the list) and the second is the output (the element), and it is semidet because for an empty list it will fail. The next is more interesting:
+
+
:- mode head(in(non_empty_list), out) is det.
+
+ This says for an input that is a non_empty_list (defined in the standard libraries, and included below), the second argument is the output, and this is det, ie fully deterministic. What this basically means is that failure is incorporated into the type system, because something that is semidet can fail, but something that is det cannot (neat!). Now the standard modes are defined (something like):
+
+
:- mode in == (ground >> ground).
:- mode out == (free >> ground).
-
Ground is a something that is bound, and the >> is showing what is happening before and after the unification (the analog to function application). So something of mode in will be bound before and after, whereas something of mode out will not be bound before (that’s what free means) and it will be bound afterwards. That’s pretty straightforward.
-
What get’s more interesting is something like non_empty_list, where inst stands for instantiation state, ie one of various states that a variable can be in (with ground and free being the most obvious ones):
What this means is that a non_empty_list is defined as one that has a ground element cons’d to another ground element. (I don’t know the syntax well enough to explain what bound means in this context, but it seems straightforward). What this should allow you to do is write programs that operate on things like non-empty-lists, and have the compiler check to make sure you are never using an empty list where you shouldn’t. Pretty cool!
-
Obviously you can write data types in Haskell that also do not allow an empty list, like:
-
data NonEmptyList a = NonEmptyList a [a]
-
And could build functions to convert between them and normal lists, but the fact that it is so easy to build this kind of type checking on top of existing types with Mercury is really fascinating.
-
This is (obviously) just scratching the surface of Mercury (and the reason all of this stuff actually works is probably more due to the theoretical underpinnings of logic programming than anything else), but seeing the definition of head gave me enough of an ‘aha!’ moment that it seemed worth sharing.
-
If any of this piqued your interest, all of it comes out of the (wonderful) tutorial provided at the Mercury Project Documentation page. If there are any inaccuracies (which there probably are!) send them to daniel@dbpatterson.com.
-
-
Note: this is the beginning of the second post
-
The language that I’ve been learning recently is a pure (ie, side-effect free) logic/functional language named Mercury. There is a wonderful tutorial (PDF) available, which explains the basics, but beyond that, the primary documentation is the language reference (which is well written, but reasonably dense) and Mercury’s standard library reference (which is autogenerated and includes types and source comments, nothing else).
-
Doing I/O in a pure language is a bit of a conundrum - Haskell solved this by forcing all I/O into a special monad that keeps track of sequencing (and has a mythical state of the world that it changes each time it does something, so as not to violate referential transparency). Mercury has a simpler (though equivalent) approach - every predicate that does IO must take an world state and must give back a new world state. Old world states can not be re-used (Mercury’s mode system keep track of that), and so the state of the world is manually threaded throughout the program. A simple example would be:
+ Ground is a something that is bound, and the >> is showing what is happening before and after the unification (the analog to function application). So something of mode in will be bound before and after, whereas something of mode out will not be bound before (that’s what free means) and it will be bound afterwards. That’s pretty straightforward.
+
+
+ What get’s more interesting is something like non_empty_list, where inst stands for instantiation state, ie one of various states that a variable can be in (with ground and free being the most obvious ones):
+
+ What this means is that a non_empty_list is defined as one that has a ground element cons’d to another ground element. (I don’t know the syntax well enough to explain what bound means in this context, but it seems straightforward). What this should allow you to do is write programs that operate on things like non-empty-lists, and have the compiler check to make sure you are never using an empty list where you shouldn’t. Pretty cool!
+
+
+ Obviously you can write data types in Haskell that also do not allow an empty list, like:
+
+
data NonEmptyList a = NonEmptyList a [a]
+
+ And could build functions to convert between them and normal lists, but the fact that it is so easy to build this kind of type checking on top of existing types with Mercury is really fascinating.
+
+
+ This is (obviously) just scratching the surface of Mercury (and the reason all of this stuff actually works is probably more due to the theoretical underpinnings of logic programming than anything else), but seeing the definition of head gave me enough of an ‘aha!’ moment that it seemed worth sharing.
+
+
+ If any of this piqued your interest, all of it comes out of the (wonderful) tutorial provided at the Mercury Project Documentation page. If there are any inaccuracies (which there probably are!) send them to daniel@dbpatterson.com.
+
+
+
+ Note: this is the beginning of the second post
+
+
+ The language that I’ve been learning recently is a pure (ie, side-effect free) logic/functional language named Mercury. There is a wonderful tutorial (PDF) available, which explains the basics, but beyond that, the primary documentation is the language reference (which is well written, but reasonably dense) and Mercury’s standard library reference (which is autogenerated and includes types and source comments, nothing else).
+
+
+ Doing I/O in a pure language is a bit of a conundrum - Haskell solved this by forcing all I/O into a special monad that keeps track of sequencing (and has a mythical state of the world that it changes each time it does something, so as not to violate referential transparency). Mercury has a simpler (though equivalent) approach - every predicate that does IO must take an world state and must give back a new world state. Old world states can not be re-used (Mercury’s mode system keep track of that), and so the state of the world is manually threaded throughout the program. A simple example would be:
+
Where the first function consumes the IO_0 state and produces IO_1 (while printing “Hello World!”) and the second function consumes IO_1 and produces IO_final (while printing a newline character).
-
Of course, manually threading those could become pretty tedious, so they have a shorthand, where the same code above could be written as:
+ Where the first function consumes the IO_0 state and produces IO_1 (while printing “Hello World!”) and the second function consumes IO_1 and produces IO_final (while printing a newline character).
+
+
+ Of course, manually threading those could become pretty tedious, so they have a shorthand, where the same code above could be written as:
+
This is just syntax sugar, and can work with any parameters that are dealt with in the same way (and naming it IO for io state is just convention). It definitely makes dealing with I/O more pleasant.
-
The task that I set was to figure out how to read in a file. This is not covered in the tutorial, and I thought it would be a simple matter of looking through the library reference for the io library. One of the first predicates looks promising:
-
:- pred io.read_file(io.maybe_partial_res(list(char))::out,
+
+ This is just syntax sugar, and can work with any parameters that are dealt with in the same way (and naming it IO for io state is just convention). It definitely makes dealing with I/O more pleasant.
+
+
+ The task that I set was to figure out how to read in a file. This is not covered in the tutorial, and I thought it would be a simple matter of looking through the library reference for the io library. One of the first predicates looks promising:
+
+
:- pred io.read_file(io.maybe_partial_res(list(char))::out,
io::di,
io::uo) is det.
-
But on second thought, something seems to be missing. The second and third parameters are the world states (the type is io, the mode di stands for destructive-input, meaning the variable cannot be used again, uo means unique output, which means that no other variable in the program can have that value), and the first one is going to be the contents of the file itself. But where is the file name?
-
The comment provides the necessary pointer:
-
% Reads all the characters from the current input stream until
+
+ But on second thought, something seems to be missing. The second and third parameters are the world states (the type is io, the mode di stands for destructive-input, meaning the variable cannot be used again, uo means unique output, which means that no other variable in the program can have that value), and the first one is going to be the contents of the file itself. But where is the file name?
+
+
+ The comment provides the necessary pointer:
+
+
% Reads all the characters from the current input stream until
% eof or error.
-
Hmm. So all of these functions operate on whatever the current input stream is. How do we set that? io.set_input_stream looks pretty good:
+ Hmm. So all of these functions operate on whatever the current input stream is. How do we set that? io.set_input_stream looks pretty good:
+
+
% io.set_input_stream(NewStream, OldStream, !IO):
% Changes the current input stream to the stream specified.
% Returns the previous stream.
%
:- pred io.set_input_stream(io.input_stream::in,
io.input_stream::out,
io::di, io::uo) is det.
-
But even better is io.see, which will try to open a file and if successful, will set it to the current stream (the alternative is to use io.open_input and then io.set_input_stream):
-
% io.see(File, Result, !IO).
+
+ But even better is io.see, which will try to open a file and if successful, will set it to the current stream (the alternative is to use io.open_input and then io.set_input_stream):
+
+
% io.see(File, Result, !IO).
% Attempts to open a file for input, and if successful,
% sets the current input stream to the newly opened stream.
% Result is either 'ok' or 'error(ErrorCode)'.
%
:- pred io.see(string::in, io.res::out, io::di, io::uo) is det.
-
With that in mind, let’s go ahead and implement a predicate to read files (much like I was expecting to find in the standard library, and what I put into a module of similar utilities I’ve started, titled, in tribute to Haskell, prelude):
-
:- pred prelude.read_file(string::in,
+
+ With that in mind, let’s go ahead and implement a predicate to read files (much like I was expecting to find in the standard library, and what I put into a module of similar utilities I’ve started, titled, in tribute to Haskell, prelude):
+
+
:- pred prelude.read_file(string::in,
maybe(string)::out,
io::di,io::uo) is det.
prelude.read_file(Path,Contents,!IO) :-
@@ -110,10 +172,18 @@
Mercury tidbits - dependent types and file io
Result = error(_),
Contents = no
).
-
To walk through what this code is doing, the type says that this is a predicate that does I/O (that’s what the last two arguments are for), that it takes in a string (the path) and give out a maybe(string), and that this whole thing is deterministic (ie, it always succeeds, which is accomplished by wrapping the failure into the return type: either yes(value) or no).
-
The first line tries to open the file at the path and bind it as the current input stream. I then pattern match on the results of that - if it failed, just bind Contents (the return value) to no. Otherwise, we try to read the contents out of the file and then close the file and set the input stream to the default one again (that is what the predicate io.seen does). Similarly we handle (well, really don’t handle, at least not well) reading the file failing. If it succeeds, we set the return type to the contents of the file.
-
What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda’s and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!
-
Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).
-
-
+
+ To walk through what this code is doing, the type says that this is a predicate that does I/O (that’s what the last two arguments are for), that it takes in a string (the path) and give out a maybe(string), and that this whole thing is deterministic (ie, it always succeeds, which is accomplished by wrapping the failure into the return type: either yes(value) or no).
+
+
+ The first line tries to open the file at the path and bind it as the current input stream. I then pattern match on the results of that - if it failed, just bind Contents (the return value) to no. Otherwise, we try to read the contents out of the file and then close the file and set the input stream to the default one again (that is what the predicate io.seen does). Similarly we handle (well, really don’t handle, at least not well) reading the file failing. If it succeeds, we set the return type to the contents of the file.
+
+
+ What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda’s and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!
+
+
+ Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).
+
+ Note: this was originally posted as two separate parts, 1 week apart, and has been compressed for posterity
+
+
+ I just started learning a functional/logic language called Mercury, which has features that make it feel (at least to my initial impressions) like a mix between Prolog and Haskell. It has all the features that make it a viable Prolog, but it also adds static typing (with full type inference) and purity (all side effects are dealt with by passing around the state of the world). Since I recently was interested in learning Prolog, but had no desire to give up static typing or purity, Mercury seemed like a neat thing to learn.
+
+
+ While it is not very well known, the language has been around for over 15 years, and has a high quality self-hosting compiler.
+
+
+ Getting to play around with logic/declarative programming is interesting (and indeed the main reason why I’m interested in learning it), but what seems even more interesting with Mercury is how they have incorporated typing to the logic programming (which, unless I’m mistaken, is a new thing). As a tiny code example:
+
+
:- pred head(list(T), T).
+:- mode head(in, out) is semidet.
+:- mode head(in(non_empty_list), out) is det.
+head(Xs, X) :- Xs = [X | _].
+
+ The first line says that this is a predicate (logic statement) that has two parts, the first is a list of some type T (it is polymorphic), the second is an item of type T.
+
+
+ The fourth line should be familiar to a prolog programmer, but briefly, the right side says that Xs is defined as X cons’d to an unnamed element. head can be seen as defining a relationship between Xs and X, where the specifics are that Xs is a list that has as it’s first element X.
+
+
+ Now with regular prolog, only the fourth line would be necessary, and that definition allows some interesting generalization. Because head([1,2,3],Y) will bind Y to 1, while head([1,2,3],1) will be true (or some truthy value), and if head(X,Y) were used in a set of other statements, together they would only yield a result if X (wherever it was bound, or unified, to a value) had as it’s first value Y, whatever Y was.
+
+
+ Since Mercury is statically typed, it adds what it calls modes to predicates, which specify whether a certain argument (that’s probably not the right word!) is going to be given, or whether it is going to be figured out by the predicate. The other thing it has is specifications about whether the predicate is deterministic. There are a couple options, but the two that are relevant to this example are det, which means fully deterministic, for every input there is exactly one output, and semidet, which means for some inputs there is an output, for others there is not (ie, the unification fails). These allow the compiler to do really interesting things, like tell you if you are not covering all of the possible cases if you declare something as det (whereas the same code, as semidet, would not cause any errors).
+
+
+ What is fascinating about this predicate head is that it has two modes defined, one being the obvious head that we know from Haskell etc:
+
+
:- mode head(in, out) is semidet.
+
+ Which states that the first argument is the input (the list) and the second is the output (the element), and it is semidet because for an empty list it will fail. The next is more interesting:
+
+
:- mode head(in(non_empty_list), out) is det.
+
+ This says for an input that is a non_empty_list (defined in the standard libraries, and included below), the second argument is the output, and this is det, ie fully deterministic. What this basically means is that failure is incorporated into the type system, because something that is semidet can fail, but something that is det cannot (neat!). Now the standard modes are defined (something like):
+
+
:- mode in == (ground >> ground).
+:- mode out == (free >> ground).
+
+ Ground is a something that is bound, and the >> is showing what is happening before and after the unification (the analog to function application). So something of mode in will be bound before and after, whereas something of mode out will not be bound before (that’s what free means) and it will be bound afterwards. That’s pretty straightforward.
+
+
+ What get’s more interesting is something like non_empty_list, where inst stands for instantiation state, ie one of various states that a variable can be in (with ground and free being the most obvious ones):
+
+ What this means is that a non_empty_list is defined as one that has a ground element cons’d to another ground element. (I don’t know the syntax well enough to explain what bound means in this context, but it seems straightforward). What this should allow you to do is write programs that operate on things like non-empty-lists, and have the compiler check to make sure you are never using an empty list where you shouldn’t. Pretty cool!
+
+
+ Obviously you can write data types in Haskell that also do not allow an empty list, like:
+
+
data NonEmptyList a = NonEmptyList a [a]
+
+ And could build functions to convert between them and normal lists, but the fact that it is so easy to build this kind of type checking on top of existing types with Mercury is really fascinating.
+
+
+ This is (obviously) just scratching the surface of Mercury (and the reason all of this stuff actually works is probably more due to the theoretical underpinnings of logic programming than anything else), but seeing the definition of head gave me enough of an ‘aha!’ moment that it seemed worth sharing.
+
+
+ If any of this piqued your interest, all of it comes out of the (wonderful) tutorial provided at the Mercury Project Documentation page. If there are any inaccuracies (which there probably are!) send them to daniel@dbpatterson.com.
+
+
+
+ Note: this is the beginning of the second post
+
+
+ The language that I’ve been learning recently is a pure (ie, side-effect free) logic/functional language named Mercury. There is a wonderful tutorial (PDF) available, which explains the basics, but beyond that, the primary documentation is the language reference (which is well written, but reasonably dense) and Mercury’s standard library reference (which is autogenerated and includes types and source comments, nothing else).
+
+
+ Doing I/O in a pure language is a bit of a conundrum - Haskell solved this by forcing all I/O into a special monad that keeps track of sequencing (and has a mythical state of the world that it changes each time it does something, so as not to violate referential transparency). Mercury has a simpler (though equivalent) approach - every predicate that does IO must take an world state and must give back a new world state. Old world states can not be re-used (Mercury’s mode system keep track of that), and so the state of the world is manually threaded throughout the program. A simple example would be:
+
+ Where the first function consumes the IO_0 state and produces IO_1 (while printing “Hello World!”) and the second function consumes IO_1 and produces IO_final (while printing a newline character).
+
+
+ Of course, manually threading those could become pretty tedious, so they have a shorthand, where the same code above could be written as:
+
+ This is just syntax sugar, and can work with any parameters that are dealt with in the same way (and naming it IO for io state is just convention). It definitely makes dealing with I/O more pleasant.
+
+
+ The task that I set was to figure out how to read in a file. This is not covered in the tutorial, and I thought it would be a simple matter of looking through the library reference for the io library. One of the first predicates looks promising:
+
+
:- pred io.read_file(io.maybe_partial_res(list(char))::out,
+ io::di,
+ io::uo) is det.
+
+ But on second thought, something seems to be missing. The second and third parameters are the world states (the type is io, the mode di stands for destructive-input, meaning the variable cannot be used again, uo means unique output, which means that no other variable in the program can have that value), and the first one is going to be the contents of the file itself. But where is the file name?
+
+
+ The comment provides the necessary pointer:
+
+
% Reads all the characters from the current input stream until
+% eof or error.
+
+ Hmm. So all of these functions operate on whatever the current input stream is. How do we set that? io.set_input_stream looks pretty good:
+
+
% io.set_input_stream(NewStream, OldStream, !IO):
+% Changes the current input stream to the stream specified.
+% Returns the previous stream.
+%
+:- pred io.set_input_stream(io.input_stream::in,
+ io.input_stream::out,
+ io::di, io::uo) is det.
+
+ But even better is io.see, which will try to open a file and if successful, will set it to the current stream (the alternative is to use io.open_input and then io.set_input_stream):
+
+
% io.see(File, Result, !IO).
+% Attempts to open a file for input, and if successful,
+% sets the current input stream to the newly opened stream.
+% Result is either 'ok' or 'error(ErrorCode)'.
+%
+:- pred io.see(string::in, io.res::out, io::di, io::uo) is det.
+
+ With that in mind, let’s go ahead and implement a predicate to read files (much like I was expecting to find in the standard library, and what I put into a module of similar utilities I’ve started, titled, in tribute to Haskell, prelude):
+
+
:- pred prelude.read_file(string::in,
+ maybe(string)::out,
+ io::di,io::uo) is det.
+prelude.read_file(Path,Contents,!IO) :-
+ io.see(Path,Result,!IO),
+ ( Result = ok,
+ io.read_file_as_string(File,!IO),
+ io.seen(!IO),
+ (
+ File = ok(String),
+ Contents = yes(String)
+ ;
+ File = error(_,_),
+ Contents = no
+ )
+ ;
+ Result = error(_),
+ Contents = no
+ ).
+
+ To walk through what this code is doing, the type says that this is a predicate that does I/O (that’s what the last two arguments are for), that it takes in a string (the path) and give out a maybe(string), and that this whole thing is deterministic (ie, it always succeeds, which is accomplished by wrapping the failure into the return type: either yes(value) or no).
+
+
+ The first line tries to open the file at the path and bind it as the current input stream. I then pattern match on the results of that - if it failed, just bind Contents (the return value) to no. Otherwise, we try to read the contents out of the file and then close the file and set the input stream to the default one again (that is what the predicate io.seen does). Similarly we handle (well, really don’t handle, at least not well) reading the file failing. If it succeeds, we set the return type to the contents of the file.
+
+
+ What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda’s and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!
+
+
+ Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).
+
When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell’s pipe operator, ‘|’. Using the commandline (because at that point, linux GUI’s were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.
-
I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:
It seemed like magic at the time, and in some ways, it still does. What the shell (and UNIX in general) offered was composability - it gave you simple (but powerful) tools, and a standard way of linking them together - text streams. By combining those together, it offered immeasurable power, much more than any single tool. The mathematics of combinations guarantees this.
-
The more I use graphical interfaces (or anything that does not operate on text streams - commandline curses programs included), the more I am struck by how profound the loss of composability is - each program has to try to implement all the standard things (searching, sorting, transforming) that you might want to do with the information it has, and in that repetition lies inconsistencies and usually plain lack of power. The better ones share common libraries, and gain common functionality, but this only amounts to their least common denominator - two separate programs can not (easily) expose their higher functionality to each other (at least not it compiled languages) in the way that commandline stream processing programs can.
-
What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.
-
I’d noticed before how frustrating it was for me to use iOS, but I wasn’t sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can’t imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it’s small enough to fit in a pocket - that is a profound change in the way the world works. But it’s not a computer, it’s just a glorified palm pilot with a few bells and whistles.
-
-
+
+
+
+ iOS is anti-UNIX and anti-programmer.
+
+
+ When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell’s pipe operator, ‘|’. Using the commandline (because at that point, linux GUI’s were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.
+
+
+ I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:
+
+ It seemed like magic at the time, and in some ways, it still does. What the shell (and UNIX in general) offered was composability - it gave you simple (but powerful) tools, and a standard way of linking them together - text streams. By combining those together, it offered immeasurable power, much more than any single tool. The mathematics of combinations guarantees this.
+
+
+ The more I use graphical interfaces (or anything that does not operate on text streams - commandline curses programs included), the more I am struck by how profound the loss of composability is - each program has to try to implement all the standard things (searching, sorting, transforming) that you might want to do with the information it has, and in that repetition lies inconsistencies and usually plain lack of power. The better ones share common libraries, and gain common functionality, but this only amounts to their least common denominator - two separate programs can not (easily) expose their higher functionality to each other (at least not it compiled languages) in the way that commandline stream processing programs can.
+
+
+ What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.
+
+
+ I’d noticed before how frustrating it was for me to use iOS, but I wasn’t sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can’t imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it’s small enough to fit in a pocket - that is a profound change in the way the world works. But it’s not a computer, it’s just a glorified palm pilot with a few bells and whistles.
+
+ When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell’s pipe operator, ‘|’. Using the commandline (because at that point, linux GUI’s were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.
+
+
+ I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:
+
+ It seemed like magic at the time, and in some ways, it still does. What the shell (and UNIX in general) offered was composability - it gave you simple (but powerful) tools, and a standard way of linking them together - text streams. By combining those together, it offered immeasurable power, much more than any single tool. The mathematics of combinations guarantees this.
+
+
+ The more I use graphical interfaces (or anything that does not operate on text streams - commandline curses programs included), the more I am struck by how profound the loss of composability is - each program has to try to implement all the standard things (searching, sorting, transforming) that you might want to do with the information it has, and in that repetition lies inconsistencies and usually plain lack of power. The better ones share common libraries, and gain common functionality, but this only amounts to their least common denominator - two separate programs can not (easily) expose their higher functionality to each other (at least not it compiled languages) in the way that commandline stream processing programs can.
+
+
+ What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.
+
+
+ I’d noticed before how frustrating it was for me to use iOS, but I wasn’t sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can’t imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it’s small enough to fit in a pocket - that is a profound change in the way the world works. But it’s not a computer, it’s just a glorified palm pilot with a few bells and whistles.
+
I had an idea today, of an interactive homework assignment for a Chemistry class. It was a prompt, and you could type in queries and it would give responses. The basics would be:
-
# (questions)
+
+
+
+ Math/Science integrated with Scheme
+
+
+ I had an idea today, of an interactive homework assignment for a Chemistry class. It was a prompt, and you could type in queries and it would give responses. The basics would be:
+
# (questions)
=> To Do: (2,3,4,5,6,7)
Complete: (1)
-
Now if you don’t know Scheme syntax, the line with the numeric calculation might be a little confusing, but once you realize that it is just pure prefix notation (the operator always comes first, every set of parenthesis wrap an operation) it should start making sense. I’m pretty sure I could explain Scheme to anyone who is taking high-school science in an hour, but the three sentence explanation is: Every expression is wrapped inside parenthesis. The first word inside the parenthesis is a function, the rest (there don’t have to be any) are arguments to the function, which can be be other expressions or basic items like numbers or strings. Arithmetic follows this pattern, which may seem a little unnatural at first, but this consistency means that you now know almost all there is to know about Scheme.
-
But what I’ve described here isn’t actually much better than the web question and answer system that I saw today, that gave me this idea. It’s basically just an interactive text-based version of the same thing. What I started thinking of is having the capability to add things like this:
-
# (assignment-equations)
+
+ Now if you don’t know Scheme syntax, the line with the numeric calculation might be a little confusing, but once you realize that it is just pure prefix notation (the operator always comes first, every set of parenthesis wrap an operation) it should start making sense. I’m pretty sure I could explain Scheme to anyone who is taking high-school science in an hour, but the three sentence explanation is: Every expression is wrapped inside parenthesis. The first word inside the parenthesis is a function, the rest (there don’t have to be any) are arguments to the function, which can be be other expressions or basic items like numbers or strings. Arithmetic follows this pattern, which may seem a little unnatural at first, but this consistency means that you now know almost all there is to know about Scheme.
+
+
+ But what I’ve described here isn’t actually much better than the web question and answer system that I saw today, that gave me this idea. It’s basically just an interactive text-based version of the same thing. What I started thinking of is having the capability to add things like this:
+
Which would provide both references and ways to do some of the more boring rote work quickly. Descriptions of the equations could also exist, making it even more of an interactive learning project. But what would be even better would be to allow students to define new functions (or redefine old ones) on the fly. Let’s say there are a bunch of different calculations that require the same involved steps. I saw today I student working through two laborious calculations, which differed only in that the value for the activation energy. What would be amazing is if a student could do something like:
-
# (define (my-arrh-eq act-energy)
+
+ Which would provide both references and ways to do some of the more boring rote work quickly. Descriptions of the equations could also exist, making it even more of an interactive learning project. But what would be even better would be to allow students to define new functions (or redefine old ones) on the fly. Let’s say there are a bunch of different calculations that require the same involved steps. I saw today I student working through two laborious calculations, which differed only in that the value for the activation energy. What would be amazing is if a student could do something like:
+
+
# (define (my-arrh-eq act-energy)
(arrhenius-k (arrhenius-a 2.75e-2 act-energy 293) act-energy 333))
=> Defined new function my-arrh-eq!
# (my-arrh-eq 14500)
=> 1.01
-
I don’t remember if that was the answer or even the value for the activation energy (it probably isn’t), but that was the general solution. Now the problem was that a rate coefficient (2.75e-2) was given for 20degrees celsius and and the problem asked what was the rate coefficient for 60degrees celsius (same reaction). The problem was posed with two different activation energies, and identical and reasonably involved calculations resulted - using the given 20degree setup to solve for the frequency factor and then plug that into the same equation this time using 60degrees.
-
But what was interesting about this problem was the technique of solving one equation and using a part of that in the other - not actually doing out the arithmetic. It would be amazing if a student could build things like the function above, which clearly demonstrate an understanding of the technique, but also reveal a capacity to organize their thoughts and string together the pieces into higher level abstractions - a critical part of the type of thinking that underlies computer programming, and something that is going to become more and more important as time goes on.
-
I think there is amazing potential to systems like this - where programming is built into the fabric of math and science work, because it will both teach students to program (which is a very helpful thing), but it will also focus their attention and mental efforts on understanding how to string together concepts and actually solve problems, not just how to do calculations. I think it could also have a motivating effect because when you start writing programs like this, you feel like you are somehow getting out of doing boring work (which you are), and that you must be cheating somehow (and that feels good!). Little do you know that you are actually learning the material better than the person who did the calculations out by hand, because you focused on what was really important and had to figure out the general solution.
-
Now some of this is already happening - probably mostly using TI-BASIC on graphing calculators, but the system is reasonably unnatural (and no one is teaching students how to use it) and removed from basic work that I don’t think it is very widespread. I think a system that students would interact with that would allow them to build functions and use existing ones in the course of doing work would be a really amazing thing, both for their understanding of the subject itself and also to learn computer programming (or, more generally, “algorithmic thinking”).
-
-
+
+ I don’t remember if that was the answer or even the value for the activation energy (it probably isn’t), but that was the general solution. Now the problem was that a rate coefficient (2.75e-2) was given for 20degrees celsius and and the problem asked what was the rate coefficient for 60degrees celsius (same reaction). The problem was posed with two different activation energies, and identical and reasonably involved calculations resulted - using the given 20degree setup to solve for the frequency factor and then plug that into the same equation this time using 60degrees.
+
+
+ But what was interesting about this problem was the technique of solving one equation and using a part of that in the other - not actually doing out the arithmetic. It would be amazing if a student could build things like the function above, which clearly demonstrate an understanding of the technique, but also reveal a capacity to organize their thoughts and string together the pieces into higher level abstractions - a critical part of the type of thinking that underlies computer programming, and something that is going to become more and more important as time goes on.
+
+
+ I think there is amazing potential to systems like this - where programming is built into the fabric of math and science work, because it will both teach students to program (which is a very helpful thing), but it will also focus their attention and mental efforts on understanding how to string together concepts and actually solve problems, not just how to do calculations. I think it could also have a motivating effect because when you start writing programs like this, you feel like you are somehow getting out of doing boring work (which you are), and that you must be cheating somehow (and that feels good!). Little do you know that you are actually learning the material better than the person who did the calculations out by hand, because you focused on what was really important and had to figure out the general solution.
+
+
+ Now some of this is already happening - probably mostly using TI-BASIC on graphing calculators, but the system is reasonably unnatural (and no one is teaching students how to use it) and removed from basic work that I don’t think it is very widespread. I think a system that students would interact with that would allow them to build functions and use existing ones in the course of doing work would be a really amazing thing, both for their understanding of the subject itself and also to learn computer programming (or, more generally, “algorithmic thinking”).
+
+ I had an idea today, of an interactive homework assignment for a Chemistry class. It was a prompt, and you could type in queries and it would give responses. The basics would be:
+
+
# (questions)
+=> To Do: (1,2,3,4,5,6,7)
+ Complete: ()
+# (question-1)
+=> 1. How many grams of Na are needed to make 28 grams of NaCl?
+# (periodic-table 'Na)
+=> Sodium - Atomic Number 11 - Weight 22.98976928
+# (periodic-table 'Cl)
+=> Chlorine - Atomic Number 17 - Weight 35.453
+# (* 22.98976928 (/ 28 (+ 22.98976928 35.453)))
+=> 11.01
+# (answer-1 11.01)
+=> Correct! Great job. 1/7 Questions completed.
+# (questions)
+=> To Do: (2,3,4,5,6,7)
+ Complete: (1)
+
+ Now if you don’t know Scheme syntax, the line with the numeric calculation might be a little confusing, but once you realize that it is just pure prefix notation (the operator always comes first, every set of parenthesis wrap an operation) it should start making sense. I’m pretty sure I could explain Scheme to anyone who is taking high-school science in an hour, but the three sentence explanation is: Every expression is wrapped inside parenthesis. The first word inside the parenthesis is a function, the rest (there don’t have to be any) are arguments to the function, which can be be other expressions or basic items like numbers or strings. Arithmetic follows this pattern, which may seem a little unnatural at first, but this consistency means that you now know almost all there is to know about Scheme.
+
+
+ But what I’ve described here isn’t actually much better than the web question and answer system that I saw today, that gave me this idea. It’s basically just an interactive text-based version of the same thing. What I started thinking of is having the capability to add things like this:
+
+ Which would provide both references and ways to do some of the more boring rote work quickly. Descriptions of the equations could also exist, making it even more of an interactive learning project. But what would be even better would be to allow students to define new functions (or redefine old ones) on the fly. Let’s say there are a bunch of different calculations that require the same involved steps. I saw today I student working through two laborious calculations, which differed only in that the value for the activation energy. What would be amazing is if a student could do something like:
+
+
# (define (my-arrh-eq act-energy)
+ (arrhenius-k (arrhenius-a 2.75e-2 act-energy 293) act-energy 333))
+=> Defined new function my-arrh-eq!
+# (my-arrh-eq 14500)
+=> 1.01
+
+ I don’t remember if that was the answer or even the value for the activation energy (it probably isn’t), but that was the general solution. Now the problem was that a rate coefficient (2.75e-2) was given for 20degrees celsius and and the problem asked what was the rate coefficient for 60degrees celsius (same reaction). The problem was posed with two different activation energies, and identical and reasonably involved calculations resulted - using the given 20degree setup to solve for the frequency factor and then plug that into the same equation this time using 60degrees.
+
+
+ But what was interesting about this problem was the technique of solving one equation and using a part of that in the other - not actually doing out the arithmetic. It would be amazing if a student could build things like the function above, which clearly demonstrate an understanding of the technique, but also reveal a capacity to organize their thoughts and string together the pieces into higher level abstractions - a critical part of the type of thinking that underlies computer programming, and something that is going to become more and more important as time goes on.
+
+
+ I think there is amazing potential to systems like this - where programming is built into the fabric of math and science work, because it will both teach students to program (which is a very helpful thing), but it will also focus their attention and mental efforts on understanding how to string together concepts and actually solve problems, not just how to do calculations. I think it could also have a motivating effect because when you start writing programs like this, you feel like you are somehow getting out of doing boring work (which you are), and that you must be cheating somehow (and that feels good!). Little do you know that you are actually learning the material better than the person who did the calculations out by hand, because you focused on what was really important and had to figure out the general solution.
+
+
+ Now some of this is already happening - probably mostly using TI-BASIC on graphing calculators, but the system is reasonably unnatural (and no one is teaching students how to use it) and removed from basic work that I don’t think it is very widespread. I think a system that students would interact with that would allow them to build functions and use existing ones in the course of doing work would be a really amazing thing, both for their understanding of the subject itself and also to learn computer programming (or, more generally, “algorithmic thinking”).
+
+
+
+
diff --git a/_site/essays/2012-04-26-haskell-snap-productive.html b/_site/essays/2012-04-26-haskell-snap-productive.html
index 428b544..fc20e70 100644
--- a/_site/essays/2012-04-26-haskell-snap-productive.html
+++ b/_site/essays/2012-04-26-haskell-snap-productive.html
@@ -1,15 +1,17 @@
-
-
-
-
- dbp.io :: Haskell / Snap ecosystem is as productive as Ruby/Rails.
-
-
-
-
-
- Daniel Patterson
+
+
+
+
+
+ dbp.io :: Haskell / Snap ecosystem is as productive as Ruby/Rails.
+
+
+
+
+
+
Haskell / Snap ecosystem is as productive as Ruby/Rails.
-
-
by Daniel Patterson on April 26, 2012
-
-
This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.
-
As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.
-
1. The number one productivity improvement is a smart strong type system. This is less of an issue for small projects, but as soon as you have at least a few thousand lines of code, adding new features or refactoring inevitably involves changes to multiple parts of the codebase. Having a compiler that will tell you all the places that you need to change things is an amazing productivity booster. This can be approximated in some ways with good test coverage, but it is really a different beast - tests often need to be changed as well, and if you aren’t very careful about this it is easy to change them in ways that don’t catch new bugs. Additionally, it is hard (or very tedious, if you do it wrong) to achieve high enough coverage to actually catch all of the bugs introduced in refactoring. This as compared to a compiler that is completely automated and will always be aware of all of the code you have and the ways that it interacts (at least to the extent that you actually use the type system - but if you are a good haskell programmer, you will).
-
This alone wouldn’t be enough to suggest using Haskell/Snap over Ruby/Rails, as a type system isn’t worth much without supporting libraries, but as I switch between the ecosystems, this is the place where I notice the most drastic improvements in productivity, so I put it first.
-
2. Form libraries. There are many different libraries for dealing with forms in Rails, and there is the built in one as well. The general idea is that you define some validations on your models, and then use the DSLs from the form libraries to define forms, and can do validations, etc. In Haskell (in my opinion), the best form library is Digestive-Functors (thanks Jasper!), and the productivity difference is staggering in more complex use-cases. In the sort of vanilla examples that rails has, the validation system works quite well, and dynamic introspection allows you to write really short forms. This begins to break down when you start getting forms that don’t correspond in a simple way to models. I have forms that are sometimes a mix of two models, or forms that are a partial view into a data structure, or any number of other variations.
-
With Digestive-Functors, I can define the forms that I need, and re-use components between multiple forms (forms are composable), and these validations are on the form, not on the underlying model. It is obviously useful to database level data integrity checks, but I find that having them being the main / only way of doing validations is really limiting - because sometimes there are special cases when you want the validation done one way and other times another.
-
More generally, it is possible that the business logic of a specific form may have requirements that do not always have to hold for the datastore, and thus should not reside in the integrity checks. Having written a lot of forms (who hasn’t?), I find that getting the first form out is much faster with Rails, but inevitably when I need to change something it starts become difficult fast. Every time I am doing it I keep picturing an exponential curve - sure it starts out really small, but it gets really big really fast! It isn’t that I run into things that are not possible with Rails, but they end up being more difficult, more error prone, and generally reduce my productivity. With Digestive-Functors, I spend a little more time building the forms in the beginning, but I’ve never had requirements for a form that weren’t easily implemented (almost without thinking).
-
3. Routing is the next big one. This may be more of an opinion that the previous ones, but I have always thought that great care should be involved in designing the url structure of a site. In this sense, I guess I disagree with the idea of universally using REST - I think it is very useful when writing APIs, but when designing applications for people, I believe the urls should be meaningful to the people, not to machines. Usually, right after modeling the data of an application, I make a site-map - this is a high level view of what the site should look like. Instead, with Rails, I spend time thinking of how I can adapt what I want to the REST paradigm, and usually end up with something that is an incomplete/counterintuitive representation.
-
More broadly, I think the idea of hierarchical routing is brilliant - the idea that you match routes by pieces. What this allows you to do is easily abstract out work that should be done for many different related requests. In Rails, this is approximated by :before_filters (ie, it a controller for a specific model, you might fetch the item from the id for many different handlers), but it is a poor substitute. For example I often have an “/admin” hierarchy, and to limit this, all I have to do is have one place (the adminRouter or something) that does the required work to ensure only administrators can access, and it can also fetch any data that is needed, and then it can pass back into the route parsing mode. Or if I want to do the rails-style pre-fetching, then I design the routes as “/item/id/action” and have a handler that matches “/item/id”, fetches the item, and then matches against the various actions. If I have nested pieces of data, this is just as easy. I could have “item/id/something/add” which adds a new “something” to the item with id “id”, This would all be in the same hierarchy, so the code to fetch the item would still only exist once.
-
Not only is this very natural to program, it keeps the flow easy to follow when you are looking back at it, and allows backtracking in a great way: if, in a handler, you reach something that indicates that this cannot be matched, like if the path was “/item/id” but the id did not correspond to an actual item, you can simply “pass” and the route parser continues looking for things that will handle the request. If it finds nothing, it gives a 404.
-
An example of how you could exploit things in a really clean way - if you are building a wiki-like site, then you first have a route that matches “/page/name” and looks up the page with name “name”. If it doesn’t find it, it passes, and the next handler can be the “new page” handler, that prompts the user to create the page. As with everything else, I’m not saying this cannot be done with Rails, simply that it is much more natural and easy to understand with Snap (and Happstack, where this routing system originated, at least in the Haskell world).
-
4. Quality of external libraries. Point 2 was a special case of this, since dealing with forms comes up so much, but I think the general quality of libraries in Haskell is superb. One example that I came up against was wanting to parse some semi-free-form CSV data into dates and times. Haskell has the very mature parsing library Parsec (which has ports into many languages, including Ruby) that makes it really easy to write parsers. I ported an ad-hoc parser to it, and found that not only was I able to write the code in a fraction of the time, but it was a lot more robust and easy to understand.
-
For testing of algorithmic code, the QuickCheck library is pretty amazing - in it, you tell it how to construct domain data, and then certain invariants that should hold over function applications, and it will fuzz-test with random/pathological data. The first time you write some of these tests (and catch bugs!) you will wonder why you haven’t been testing like that before! I don’t really want to go into it here, but the other point is that many of these libraries are very very fast - there has been, over the last couple years, a massive push to have very performant libraries, with a lot of success. The Haskell web frameworks webservers regularly trounce most other webservers, and there are very high performant json, text processing, and parsing libraries (attoparsec is a version of parsec that is very fast).
-
5. Templating. In this, I want to directly compare the experience of using Heist (a templating system made by the Snap team) and Erb/Haml (I mostly use the latter, but in some things, like with javascript, I have to use the former). The first big difference is the idea of layouts/templates/partials in rails. I never really understood why there was this distinction when I first used it, and when comparing it to Heist (which has no distinction - any template can be applied to another, to achieve a layout like functionality, and any template can be included within another, to achieve a partial like functionality) it feels very limited.
-
The other major difference is that the two templating languages in Ruby allow dynamic elements by embedding raw ruby code, whereas the former allows dynamic stuff by allowing you to define new xml tags (called splices) that you can then use in the templates. I have found this to be an extremely powerful idea, as it allows you to not only do all the regular stuff (insert values, iterate over lists of values and spit out html), but can even allow you to build custom vocabularies of elements that you want to use that are designed to go with javascript (so for example, I built an asynchronous framework on top of this, where I had a “<form-async>” tag and “<div-async>”s that would be replaced asynchronously by the responses from the form posts).
-
It also adapts to being used with (trusted) user generated input - I’ve used it in multiple CMS systems so that, for example, all links to external sites are set to open in new tabs/windows (by overriding the “<a>” tag and adding the appropriate “target”) or allowing the users to gain certain dynamic stuff for their pages. Compared to this, the situation with Haml always seems hopelessly tied up with ruby spaghetti code - not that it always is (you can always be careful), but the split with Heist both feels like a cleaner separation AND more powerful, which is not something you get often, and I think is a sign that the metaphor that Heist created (which is based on a couple really simple primitives) is really something special.
-
6. This is sort of an extension of the first point, and I’m putting it towards the end because it is the most subjective of this already quite subjective comparison - I think that web applications built with Haskell/Snap are much easier to edit / add to than corresponding applications in Ruby/Rails. One of the biggest reasons for this is that there is much more boilerplate/code spread in ruby - some of it is auto-generated, other bits is manually generated, but there ends up being code scattered around. It is pretty easy to add new code, but when you want to edit / refactor existing code, it starts to get hard to figure out where everything is. A bit of this relies on conventions to a degree (which you learn), but there is simply less code in Snap, and usually everything pertaining to a specific function is in one place. This has a lot to do with the functional paradigm - there is no hidden state, so generally all the transformations that occur are very transparent, whereas with Rails it is possible for stuff from the ApplicationController being applied, or just various filters coming into play, or stuff from the model, etc. There is no obvious “starting point” if you want to see how a request travels through your application (candidates include the routes file, the controllers, etc), in the same way where with a Snap application, the code to start the web server is in one of the files you write! You can trace exactly what it is doing from there!
-
In addition, there is also very little “convention” with Snap. It enforces nothing, which has the consequence (in addition to allowing you to make a mess!) of having the whole application conforming to exactly how you think it should be organized. I’ve found that this actually makes it much easier to add new things or modify existing functionality (fix bugs!), because the entire structure of the application, from how the requests are routed to how responses are generated, is based on code I wrote. This means that making a change anywhere in this process is usually very easy - it feels in some ways like the difference in making a change to an application you wrote from scratch and one that you picked up from someone else. There is also a potential downside to this - the first couple applications I built had drastically different organizational systems
-
(Side note for anyone reading this who is curious: I’ve converged to the following method: all types for the application lives in a Types module or hierarchy, all code that pertains to the datastore lives in a State hierarchy or module in a small application, code for splices lives in a Splices hierarchy, forms live is a Forms hierarchy, and the web handlers live in a Handlers hierarchy. I also usually have a Utils module that collects some various things that are used in all sort of different places. Everything depends on Types and Utils. Splices, Forms, and State are all independent of one another, and Handlers depends on everything. And then of course there is an Application module and Main, according to the generated code from Snap).
-
This is a major difference in how Snap even differs from some other Haskell web frameworks, that it seems more like a library with which to build a web application instead of a true framework, but in my experience this is actually a really powerful thing, and makes the whole process a lot more enjoyable, because I never feel like I’m trying to conform to how someone else thinks I should organize things.
-
7. I’m bundling the performance, security, etc all at once. Rails is a very stable framework, so lots of work has gone into this. But I think the recent vulnerabilities exposed on a lot of major sites (like GitHub) based on the common paradigm of mass-assignment sort of point out the negative side. Snap is much newer, but it was built with security in mind from the beginning, as far as I can tell, and most libraries that I have used have also mentioned ways that it comes up - the entire development community seems a lot more aware / concerned with it.
-
I think part of this probably has to do with the host languages - ruby is a very dynamic language that has a history of experimentation (so generally, flexibility is preferred of correctness), whereas Haskell is a language where lots of static guarantees are valued, and security is usually lumped in with correctness. For performance, there is no question that Haskell will win hands down on any performance comparison (and on multithreading). Granted, a lot of web code is disk/database bound so this isn’t a huge deal, but it is nice to know that you aren’t needlessly wasting cycles (and can afford to run on smaller servers).
-
8. Now, as a counterpoint, I want to articulate what Rails really has over Snap. Number one, and this is huge, is the size of the community. There are a massive number of developers who know how to use Rails (how many are good at it is another question), and this also means that if you are trying to do something it is much more likely that a prebuilt solution exists. It also means that it will be easier to hire people to work on it, and easier to sell it as a platform to clients/bosses.
-
The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.
-
Cheers, and happy web programming.
-
-
+
+
+
+ Haskell / Snap ecosystem is as productive as Ruby/Rails.
+
+
+ This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.
+
+
+ As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.
+
+
+ 1. The number one productivity improvement is a smart strong type system. This is less of an issue for small projects, but as soon as you have at least a few thousand lines of code, adding new features or refactoring inevitably involves changes to multiple parts of the codebase. Having a compiler that will tell you all the places that you need to change things is an amazing productivity booster. This can be approximated in some ways with good test coverage, but it is really a different beast - tests often need to be changed as well, and if you aren’t very careful about this it is easy to change them in ways that don’t catch new bugs. Additionally, it is hard (or very tedious, if you do it wrong) to achieve high enough coverage to actually catch all of the bugs introduced in refactoring. This as compared to a compiler that is completely automated and will always be aware of all of the code you have and the ways that it interacts (at least to the extent that you actually use the type system - but if you are a good haskell programmer, you will).
+
+
+ This alone wouldn’t be enough to suggest using Haskell/Snap over Ruby/Rails, as a type system isn’t worth much without supporting libraries, but as I switch between the ecosystems, this is the place where I notice the most drastic improvements in productivity, so I put it first.
+
+
+ 2. Form libraries. There are many different libraries for dealing with forms in Rails, and there is the built in one as well. The general idea is that you define some validations on your models, and then use the DSLs from the form libraries to define forms, and can do validations, etc. In Haskell (in my opinion), the best form library is Digestive-Functors (thanks Jasper!), and the productivity difference is staggering in more complex use-cases. In the sort of vanilla examples that rails has, the validation system works quite well, and dynamic introspection allows you to write really short forms. This begins to break down when you start getting forms that don’t correspond in a simple way to models. I have forms that are sometimes a mix of two models, or forms that are a partial view into a data structure, or any number of other variations.
+
+
+ With Digestive-Functors, I can define the forms that I need, and re-use components between multiple forms (forms are composable), and these validations are on the form, not on the underlying model. It is obviously useful to database level data integrity checks, but I find that having them being the main / only way of doing validations is really limiting - because sometimes there are special cases when you want the validation done one way and other times another.
+
+
+ More generally, it is possible that the business logic of a specific form may have requirements that do not always have to hold for the datastore, and thus should not reside in the integrity checks. Having written a lot of forms (who hasn’t?), I find that getting the first form out is much faster with Rails, but inevitably when I need to change something it starts become difficult fast. Every time I am doing it I keep picturing an exponential curve - sure it starts out really small, but it gets really big really fast! It isn’t that I run into things that are not possible with Rails, but they end up being more difficult, more error prone, and generally reduce my productivity. With Digestive-Functors, I spend a little more time building the forms in the beginning, but I’ve never had requirements for a form that weren’t easily implemented (almost without thinking).
+
+
+ 3. Routing is the next big one. This may be more of an opinion that the previous ones, but I have always thought that great care should be involved in designing the url structure of a site. In this sense, I guess I disagree with the idea of universally using REST - I think it is very useful when writing APIs, but when designing applications for people, I believe the urls should be meaningful to the people, not to machines. Usually, right after modeling the data of an application, I make a site-map - this is a high level view of what the site should look like. Instead, with Rails, I spend time thinking of how I can adapt what I want to the REST paradigm, and usually end up with something that is an incomplete/counterintuitive representation.
+
+
+ More broadly, I think the idea of hierarchical routing is brilliant - the idea that you match routes by pieces. What this allows you to do is easily abstract out work that should be done for many different related requests. In Rails, this is approximated by :before_filters (ie, it a controller for a specific model, you might fetch the item from the id for many different handlers), but it is a poor substitute. For example I often have an “/admin” hierarchy, and to limit this, all I have to do is have one place (the adminRouter or something) that does the required work to ensure only administrators can access, and it can also fetch any data that is needed, and then it can pass back into the route parsing mode. Or if I want to do the rails-style pre-fetching, then I design the routes as “/item/id/action” and have a handler that matches “/item/id”, fetches the item, and then matches against the various actions. If I have nested pieces of data, this is just as easy. I could have “item/id/something/add” which adds a new “something” to the item with id “id”, This would all be in the same hierarchy, so the code to fetch the item would still only exist once.
+
+
+ Not only is this very natural to program, it keeps the flow easy to follow when you are looking back at it, and allows backtracking in a great way: if, in a handler, you reach something that indicates that this cannot be matched, like if the path was “/item/id” but the id did not correspond to an actual item, you can simply “pass” and the route parser continues looking for things that will handle the request. If it finds nothing, it gives a 404.
+
+
+ An example of how you could exploit things in a really clean way - if you are building a wiki-like site, then you first have a route that matches “/page/name” and looks up the page with name “name”. If it doesn’t find it, it passes, and the next handler can be the “new page” handler, that prompts the user to create the page. As with everything else, I’m not saying this cannot be done with Rails, simply that it is much more natural and easy to understand with Snap (and Happstack, where this routing system originated, at least in the Haskell world).
+
+
+ 4. Quality of external libraries. Point 2 was a special case of this, since dealing with forms comes up so much, but I think the general quality of libraries in Haskell is superb. One example that I came up against was wanting to parse some semi-free-form CSV data into dates and times. Haskell has the very mature parsing library Parsec (which has ports into many languages, including Ruby) that makes it really easy to write parsers. I ported an ad-hoc parser to it, and found that not only was I able to write the code in a fraction of the time, but it was a lot more robust and easy to understand.
+
+
+ For testing of algorithmic code, the QuickCheck library is pretty amazing - in it, you tell it how to construct domain data, and then certain invariants that should hold over function applications, and it will fuzz-test with random/pathological data. The first time you write some of these tests (and catch bugs!) you will wonder why you haven’t been testing like that before! I don’t really want to go into it here, but the other point is that many of these libraries are very very fast - there has been, over the last couple years, a massive push to have very performant libraries, with a lot of success. The Haskell web frameworks webservers regularly trounce most other webservers, and there are very high performant json, text processing, and parsing libraries (attoparsec is a version of parsec that is very fast).
+
+
+ 5. Templating. In this, I want to directly compare the experience of using Heist (a templating system made by the Snap team) and Erb/Haml (I mostly use the latter, but in some things, like with javascript, I have to use the former). The first big difference is the idea of layouts/templates/partials in rails. I never really understood why there was this distinction when I first used it, and when comparing it to Heist (which has no distinction - any template can be applied to another, to achieve a layout like functionality, and any template can be included within another, to achieve a partial like functionality) it feels very limited.
+
+
+ The other major difference is that the two templating languages in Ruby allow dynamic elements by embedding raw ruby code, whereas the former allows dynamic stuff by allowing you to define new xml tags (called splices) that you can then use in the templates. I have found this to be an extremely powerful idea, as it allows you to not only do all the regular stuff (insert values, iterate over lists of values and spit out html), but can even allow you to build custom vocabularies of elements that you want to use that are designed to go with javascript (so for example, I built an asynchronous framework on top of this, where I had a “<form-async>” tag and “<div-async>”s that would be replaced asynchronously by the responses from the form posts).
+
+
+ It also adapts to being used with (trusted) user generated input - I’ve used it in multiple CMS systems so that, for example, all links to external sites are set to open in new tabs/windows (by overriding the “<a>” tag and adding the appropriate “target”) or allowing the users to gain certain dynamic stuff for their pages. Compared to this, the situation with Haml always seems hopelessly tied up with ruby spaghetti code - not that it always is (you can always be careful), but the split with Heist both feels like a cleaner separation AND more powerful, which is not something you get often, and I think is a sign that the metaphor that Heist created (which is based on a couple really simple primitives) is really something special.
+
+
+ 6. This is sort of an extension of the first point, and I’m putting it towards the end because it is the most subjective of this already quite subjective comparison - I think that web applications built with Haskell/Snap are much easier to edit / add to than corresponding applications in Ruby/Rails. One of the biggest reasons for this is that there is much more boilerplate/code spread in ruby - some of it is auto-generated, other bits is manually generated, but there ends up being code scattered around. It is pretty easy to add new code, but when you want to edit / refactor existing code, it starts to get hard to figure out where everything is. A bit of this relies on conventions to a degree (which you learn), but there is simply less code in Snap, and usually everything pertaining to a specific function is in one place. This has a lot to do with the functional paradigm - there is no hidden state, so generally all the transformations that occur are very transparent, whereas with Rails it is possible for stuff from the ApplicationController being applied, or just various filters coming into play, or stuff from the model, etc. There is no obvious “starting point” if you want to see how a request travels through your application (candidates include the routes file, the controllers, etc), in the same way where with a Snap application, the code to start the web server is in one of the files you write! You can trace exactly what it is doing from there!
+
+
+ In addition, there is also very little “convention” with Snap. It enforces nothing, which has the consequence (in addition to allowing you to make a mess!) of having the whole application conforming to exactly how you think it should be organized. I’ve found that this actually makes it much easier to add new things or modify existing functionality (fix bugs!), because the entire structure of the application, from how the requests are routed to how responses are generated, is based on code I wrote. This means that making a change anywhere in this process is usually very easy - it feels in some ways like the difference in making a change to an application you wrote from scratch and one that you picked up from someone else. There is also a potential downside to this - the first couple applications I built had drastically different organizational systems
+
+
+ (Side note for anyone reading this who is curious: I’ve converged to the following method: all types for the application lives in a Types module or hierarchy, all code that pertains to the datastore lives in a State hierarchy or module in a small application, code for splices lives in a Splices hierarchy, forms live is a Forms hierarchy, and the web handlers live in a Handlers hierarchy. I also usually have a Utils module that collects some various things that are used in all sort of different places. Everything depends on Types and Utils. Splices, Forms, and State are all independent of one another, and Handlers depends on everything. And then of course there is an Application module and Main, according to the generated code from Snap).
+
+
+ This is a major difference in how Snap even differs from some other Haskell web frameworks, that it seems more like a library with which to build a web application instead of a true framework, but in my experience this is actually a really powerful thing, and makes the whole process a lot more enjoyable, because I never feel like I’m trying to conform to how someone else thinks I should organize things.
+
+
+ 7. I’m bundling the performance, security, etc all at once. Rails is a very stable framework, so lots of work has gone into this. But I think the recent vulnerabilities exposed on a lot of major sites (like GitHub) based on the common paradigm of mass-assignment sort of point out the negative side. Snap is much newer, but it was built with security in mind from the beginning, as far as I can tell, and most libraries that I have used have also mentioned ways that it comes up - the entire development community seems a lot more aware / concerned with it.
+
+
+ I think part of this probably has to do with the host languages - ruby is a very dynamic language that has a history of experimentation (so generally, flexibility is preferred of correctness), whereas Haskell is a language where lots of static guarantees are valued, and security is usually lumped in with correctness. For performance, there is no question that Haskell will win hands down on any performance comparison (and on multithreading). Granted, a lot of web code is disk/database bound so this isn’t a huge deal, but it is nice to know that you aren’t needlessly wasting cycles (and can afford to run on smaller servers).
+
+
+ 8. Now, as a counterpoint, I want to articulate what Rails really has over Snap. Number one, and this is huge, is the size of the community. There are a massive number of developers who know how to use Rails (how many are good at it is another question), and this also means that if you are trying to do something it is much more likely that a prebuilt solution exists. It also means that it will be easier to hire people to work on it, and easier to sell it as a platform to clients/bosses.
+
+
+ The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.
+
+ This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.
+
+
+ As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.
+
+
+ 1. The number one productivity improvement is a smart strong type system. This is less of an issue for small projects, but as soon as you have at least a few thousand lines of code, adding new features or refactoring inevitably involves changes to multiple parts of the codebase. Having a compiler that will tell you all the places that you need to change things is an amazing productivity booster. This can be approximated in some ways with good test coverage, but it is really a different beast - tests often need to be changed as well, and if you aren’t very careful about this it is easy to change them in ways that don’t catch new bugs. Additionally, it is hard (or very tedious, if you do it wrong) to achieve high enough coverage to actually catch all of the bugs introduced in refactoring. This as compared to a compiler that is completely automated and will always be aware of all of the code you have and the ways that it interacts (at least to the extent that you actually use the type system - but if you are a good haskell programmer, you will).
+
+
+ This alone wouldn’t be enough to suggest using Haskell/Snap over Ruby/Rails, as a type system isn’t worth much without supporting libraries, but as I switch between the ecosystems, this is the place where I notice the most drastic improvements in productivity, so I put it first.
+
+
+ 2. Form libraries. There are many different libraries for dealing with forms in Rails, and there is the built in one as well. The general idea is that you define some validations on your models, and then use the DSLs from the form libraries to define forms, and can do validations, etc. In Haskell (in my opinion), the best form library is Digestive-Functors (thanks Jasper!), and the productivity difference is staggering in more complex use-cases. In the sort of vanilla examples that rails has, the validation system works quite well, and dynamic introspection allows you to write really short forms. This begins to break down when you start getting forms that don’t correspond in a simple way to models. I have forms that are sometimes a mix of two models, or forms that are a partial view into a data structure, or any number of other variations.
+
+
+ With Digestive-Functors, I can define the forms that I need, and re-use components between multiple forms (forms are composable), and these validations are on the form, not on the underlying model. It is obviously useful to database level data integrity checks, but I find that having them being the main / only way of doing validations is really limiting - because sometimes there are special cases when you want the validation done one way and other times another.
+
+
+ More generally, it is possible that the business logic of a specific form may have requirements that do not always have to hold for the datastore, and thus should not reside in the integrity checks. Having written a lot of forms (who hasn’t?), I find that getting the first form out is much faster with Rails, but inevitably when I need to change something it starts become difficult fast. Every time I am doing it I keep picturing an exponential curve - sure it starts out really small, but it gets really big really fast! It isn’t that I run into things that are not possible with Rails, but they end up being more difficult, more error prone, and generally reduce my productivity. With Digestive-Functors, I spend a little more time building the forms in the beginning, but I’ve never had requirements for a form that weren’t easily implemented (almost without thinking).
+
+
+ 3. Routing is the next big one. This may be more of an opinion that the previous ones, but I have always thought that great care should be involved in designing the url structure of a site. In this sense, I guess I disagree with the idea of universally using REST - I think it is very useful when writing APIs, but when designing applications for people, I believe the urls should be meaningful to the people, not to machines. Usually, right after modeling the data of an application, I make a site-map - this is a high level view of what the site should look like. Instead, with Rails, I spend time thinking of how I can adapt what I want to the REST paradigm, and usually end up with something that is an incomplete/counterintuitive representation.
+
+
+ More broadly, I think the idea of hierarchical routing is brilliant - the idea that you match routes by pieces. What this allows you to do is easily abstract out work that should be done for many different related requests. In Rails, this is approximated by :before_filters (ie, it a controller for a specific model, you might fetch the item from the id for many different handlers), but it is a poor substitute. For example I often have an “/admin” hierarchy, and to limit this, all I have to do is have one place (the adminRouter or something) that does the required work to ensure only administrators can access, and it can also fetch any data that is needed, and then it can pass back into the route parsing mode. Or if I want to do the rails-style pre-fetching, then I design the routes as “/item/id/action” and have a handler that matches “/item/id”, fetches the item, and then matches against the various actions. If I have nested pieces of data, this is just as easy. I could have “item/id/something/add” which adds a new “something” to the item with id “id”, This would all be in the same hierarchy, so the code to fetch the item would still only exist once.
+
+
+ Not only is this very natural to program, it keeps the flow easy to follow when you are looking back at it, and allows backtracking in a great way: if, in a handler, you reach something that indicates that this cannot be matched, like if the path was “/item/id” but the id did not correspond to an actual item, you can simply “pass” and the route parser continues looking for things that will handle the request. If it finds nothing, it gives a 404.
+
+
+ An example of how you could exploit things in a really clean way - if you are building a wiki-like site, then you first have a route that matches “/page/name” and looks up the page with name “name”. If it doesn’t find it, it passes, and the next handler can be the “new page” handler, that prompts the user to create the page. As with everything else, I’m not saying this cannot be done with Rails, simply that it is much more natural and easy to understand with Snap (and Happstack, where this routing system originated, at least in the Haskell world).
+
+
+ 4. Quality of external libraries. Point 2 was a special case of this, since dealing with forms comes up so much, but I think the general quality of libraries in Haskell is superb. One example that I came up against was wanting to parse some semi-free-form CSV data into dates and times. Haskell has the very mature parsing library Parsec (which has ports into many languages, including Ruby) that makes it really easy to write parsers. I ported an ad-hoc parser to it, and found that not only was I able to write the code in a fraction of the time, but it was a lot more robust and easy to understand.
+
+
+ For testing of algorithmic code, the QuickCheck library is pretty amazing - in it, you tell it how to construct domain data, and then certain invariants that should hold over function applications, and it will fuzz-test with random/pathological data. The first time you write some of these tests (and catch bugs!) you will wonder why you haven’t been testing like that before! I don’t really want to go into it here, but the other point is that many of these libraries are very very fast - there has been, over the last couple years, a massive push to have very performant libraries, with a lot of success. The Haskell web frameworks webservers regularly trounce most other webservers, and there are very high performant json, text processing, and parsing libraries (attoparsec is a version of parsec that is very fast).
+
+
+ 5. Templating. In this, I want to directly compare the experience of using Heist (a templating system made by the Snap team) and Erb/Haml (I mostly use the latter, but in some things, like with javascript, I have to use the former). The first big difference is the idea of layouts/templates/partials in rails. I never really understood why there was this distinction when I first used it, and when comparing it to Heist (which has no distinction - any template can be applied to another, to achieve a layout like functionality, and any template can be included within another, to achieve a partial like functionality) it feels very limited.
+
+
+ The other major difference is that the two templating languages in Ruby allow dynamic elements by embedding raw ruby code, whereas the former allows dynamic stuff by allowing you to define new xml tags (called splices) that you can then use in the templates. I have found this to be an extremely powerful idea, as it allows you to not only do all the regular stuff (insert values, iterate over lists of values and spit out html), but can even allow you to build custom vocabularies of elements that you want to use that are designed to go with javascript (so for example, I built an asynchronous framework on top of this, where I had a “<form-async>” tag and “<div-async>”s that would be replaced asynchronously by the responses from the form posts).
+
+
+ It also adapts to being used with (trusted) user generated input - I’ve used it in multiple CMS systems so that, for example, all links to external sites are set to open in new tabs/windows (by overriding the “<a>” tag and adding the appropriate “target”) or allowing the users to gain certain dynamic stuff for their pages. Compared to this, the situation with Haml always seems hopelessly tied up with ruby spaghetti code - not that it always is (you can always be careful), but the split with Heist both feels like a cleaner separation AND more powerful, which is not something you get often, and I think is a sign that the metaphor that Heist created (which is based on a couple really simple primitives) is really something special.
+
+
+ 6. This is sort of an extension of the first point, and I’m putting it towards the end because it is the most subjective of this already quite subjective comparison - I think that web applications built with Haskell/Snap are much easier to edit / add to than corresponding applications in Ruby/Rails. One of the biggest reasons for this is that there is much more boilerplate/code spread in ruby - some of it is auto-generated, other bits is manually generated, but there ends up being code scattered around. It is pretty easy to add new code, but when you want to edit / refactor existing code, it starts to get hard to figure out where everything is. A bit of this relies on conventions to a degree (which you learn), but there is simply less code in Snap, and usually everything pertaining to a specific function is in one place. This has a lot to do with the functional paradigm - there is no hidden state, so generally all the transformations that occur are very transparent, whereas with Rails it is possible for stuff from the ApplicationController being applied, or just various filters coming into play, or stuff from the model, etc. There is no obvious “starting point” if you want to see how a request travels through your application (candidates include the routes file, the controllers, etc), in the same way where with a Snap application, the code to start the web server is in one of the files you write! You can trace exactly what it is doing from there!
+
+
+ In addition, there is also very little “convention” with Snap. It enforces nothing, which has the consequence (in addition to allowing you to make a mess!) of having the whole application conforming to exactly how you think it should be organized. I’ve found that this actually makes it much easier to add new things or modify existing functionality (fix bugs!), because the entire structure of the application, from how the requests are routed to how responses are generated, is based on code I wrote. This means that making a change anywhere in this process is usually very easy - it feels in some ways like the difference in making a change to an application you wrote from scratch and one that you picked up from someone else. There is also a potential downside to this - the first couple applications I built had drastically different organizational systems
+
+
+ (Side note for anyone reading this who is curious: I’ve converged to the following method: all types for the application lives in a Types module or hierarchy, all code that pertains to the datastore lives in a State hierarchy or module in a small application, code for splices lives in a Splices hierarchy, forms live is a Forms hierarchy, and the web handlers live in a Handlers hierarchy. I also usually have a Utils module that collects some various things that are used in all sort of different places. Everything depends on Types and Utils. Splices, Forms, and State are all independent of one another, and Handlers depends on everything. And then of course there is an Application module and Main, according to the generated code from Snap).
+
+
+ This is a major difference in how Snap even differs from some other Haskell web frameworks, that it seems more like a library with which to build a web application instead of a true framework, but in my experience this is actually a really powerful thing, and makes the whole process a lot more enjoyable, because I never feel like I’m trying to conform to how someone else thinks I should organize things.
+
+
+ 7. I’m bundling the performance, security, etc all at once. Rails is a very stable framework, so lots of work has gone into this. But I think the recent vulnerabilities exposed on a lot of major sites (like GitHub) based on the common paradigm of mass-assignment sort of point out the negative side. Snap is much newer, but it was built with security in mind from the beginning, as far as I can tell, and most libraries that I have used have also mentioned ways that it comes up - the entire development community seems a lot more aware / concerned with it.
+
+
+ I think part of this probably has to do with the host languages - ruby is a very dynamic language that has a history of experimentation (so generally, flexibility is preferred of correctness), whereas Haskell is a language where lots of static guarantees are valued, and security is usually lumped in with correctness. For performance, there is no question that Haskell will win hands down on any performance comparison (and on multithreading). Granted, a lot of web code is disk/database bound so this isn’t a huge deal, but it is nice to know that you aren’t needlessly wasting cycles (and can afford to run on smaller servers).
+
+
+ 8. Now, as a counterpoint, I want to articulate what Rails really has over Snap. Number one, and this is huge, is the size of the community. There are a massive number of developers who know how to use Rails (how many are good at it is another question), and this also means that if you are trying to do something it is much more likely that a prebuilt solution exists. It also means that it will be easier to hire people to work on it, and easier to sell it as a platform to clients/bosses.
+
+
+ The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.
+
Sometimes I’m not sure how to explain what I study or why I study it. I tell people that I study theoretical computer science, or algorithms and programming languages, or math and computer science, and if they ask why? Let’s come back to that. First I want to talk about literacy.
-
Literacy is about being able to understand the recorded thoughts of other people, and being able to share your own in a permanent medium. There are beautiful oral traditions, but most stories, much of human knowledge, is written down. Literacy allows one to tap into that sea of knowledge. In many ways, libraries are one of humanity’s greatest achievements; that one can walk into a building that contains the thoughts and discoveries of thousands of people, stretching back hundreds or thousands of years (and as long as you aren’t at an exclusive university, you can often access that information for free). Some knowledge is certainly more accessible than other knowledge, and languages of course complicate things, but the essential element of literacy is both the perception of the world around you and the ability to describe it and share that with others. We must be able to understand the thoughts of others and formulate our own so that others can understand them.
-
The broader and perhaps more important aspect of literacy is that it allows you to contextualize your own life and perceptions in relation to others. In writing, you turn your own lived experience into something you can share. In reading, you realize that others have lived experiences that are in some ways similar and in others different from your own. In many ways, literacy is broader than reading and writing, it is rather about developing perspective on your own life and understanding of the lives of others. I can remember as a small child looking up at an airplane and realizing for the first time that there were people inside of it, in the middle of their own lives, with their own thoughts, hopes, dreams. For the first time I had an empathetic sense that I was not the center of the world (Descartes be damned).
-
Now, you may be asking, with good reason, what does this have to do with computer science? I want to argue that one of the primary mediums of our lives is now something that most of us do not have literacy in. We communicate with one another with email, websites, cell phones, etc. We learn information by pushing a button on a piece of electronics that displays pictures to us that change as we touch them or use devices attached to it. Traffic lights and airline schedules are planned with computers, cars run with them, watches, microwave ovens. Most things we plug in or have batteries have computers in them. Much of our lives are carried out using computers that we don’t have more than a surface empirical understanding of. Now there have always been things that individuals don’t understand. Tax codes, foreign languages, specifics of geography, etc.
-
But there are a couple interesting things about computers that distinguish them. The first is that they are all essentially the same. There is an underlying similarity between all computers, and indeed even among all possible devices that can compute. This means that it actually is possible to learn about all of these things.
-
The second is that they are primarily designed as a way for humans to express their thoughts. We don’t think about computers in this sense very much, but it is what distinguishes them from most other machines - they are used so that one person can express how to do something and share it with others. They are a medium for talking about solving problems. The breadth of such problems that they can express is visible by looking at all the places that they are used now - and imagine, this is with only a small minority of the population thinking up ways to use them!
-
There is a third dimension that is similarly interesting, and talked about more, which is that they are a way to expand our own mental capacities - if I am confronted with a task of sorting a few hundred (or thousand) documents, I can do it by hand, or, if I know how, I can write a program to do it and get a computer to carry out the work of sorting (and if I wanted the computer to do this sorting every day for the next year, I wouldn’t have to do any more work). What this means is that not only are they a way for me to share my ideas of how to solve a problem, they are also a way to automate that very problem solving.
-
What is interesting and sad is that while the posession of computers is expanding rapidly, the knowledge of how to truly use them is not. People are sold devices that allow them to perform a set number of functions (all of which are simply repetitions of thoughts by the people working at the company who sold them the device), but they are not given the tools to express their own thoughts, to expand their own mental capacity in any way other than that already thought of by someone else. We have expanded the medium without expanding literacy. And indeed, there is a financial explanation for this. It’s hard to sell knowledge when people can create it themselves. Many technological “innovations” these days are trivial combinations of earlier ideas which would be unnecessary if people were able to carry out those kinds of compositions themselves.
-
So why am I interested in computer science? I’m interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
-
-
+
+
+
+ Programming as Literature
+
+
+ Sometimes I’m not sure how to explain what I study or why I study it. I tell people that I study theoretical computer science, or algorithms and programming languages, or math and computer science, and if they ask why? Let’s come back to that. First I want to talk about literacy.
+
+
+ Literacy is about being able to understand the recorded thoughts of other people, and being able to share your own in a permanent medium. There are beautiful oral traditions, but most stories, much of human knowledge, is written down. Literacy allows one to tap into that sea of knowledge. In many ways, libraries are one of humanity’s greatest achievements; that one can walk into a building that contains the thoughts and discoveries of thousands of people, stretching back hundreds or thousands of years (and as long as you aren’t at an exclusive university, you can often access that information for free). Some knowledge is certainly more accessible than other knowledge, and languages of course complicate things, but the essential element of literacy is both the perception of the world around you and the ability to describe it and share that with others. We must be able to understand the thoughts of others and formulate our own so that others can understand them.
+
+
+ The broader and perhaps more important aspect of literacy is that it allows you to contextualize your own life and perceptions in relation to others. In writing, you turn your own lived experience into something you can share. In reading, you realize that others have lived experiences that are in some ways similar and in others different from your own. In many ways, literacy is broader than reading and writing, it is rather about developing perspective on your own life and understanding of the lives of others. I can remember as a small child looking up at an airplane and realizing for the first time that there were people inside of it, in the middle of their own lives, with their own thoughts, hopes, dreams. For the first time I had an empathetic sense that I was not the center of the world (Descartes be damned).
+
+
+ Now, you may be asking, with good reason, what does this have to do with computer science? I want to argue that one of the primary mediums of our lives is now something that most of us do not have literacy in. We communicate with one another with email, websites, cell phones, etc. We learn information by pushing a button on a piece of electronics that displays pictures to us that change as we touch them or use devices attached to it. Traffic lights and airline schedules are planned with computers, cars run with them, watches, microwave ovens. Most things we plug in or have batteries have computers in them. Much of our lives are carried out using computers that we don’t have more than a surface empirical understanding of. Now there have always been things that individuals don’t understand. Tax codes, foreign languages, specifics of geography, etc.
+
+
+ But there are a couple interesting things about computers that distinguish them. The first is that they are all essentially the same. There is an underlying similarity between all computers, and indeed even among all possible devices that can compute. This means that it actually is possible to learn about all of these things.
+
+
+ The second is that they are primarily designed as a way for humans to express their thoughts. We don’t think about computers in this sense very much, but it is what distinguishes them from most other machines - they are used so that one person can express how to do something and share it with others. They are a medium for talking about solving problems. The breadth of such problems that they can express is visible by looking at all the places that they are used now - and imagine, this is with only a small minority of the population thinking up ways to use them!
+
+
+ There is a third dimension that is similarly interesting, and talked about more, which is that they are a way to expand our own mental capacities - if I am confronted with a task of sorting a few hundred (or thousand) documents, I can do it by hand, or, if I know how, I can write a program to do it and get a computer to carry out the work of sorting (and if I wanted the computer to do this sorting every day for the next year, I wouldn’t have to do any more work). What this means is that not only are they a way for me to share my ideas of how to solve a problem, they are also a way to automate that very problem solving.
+
+
+ What is interesting and sad is that while the posession of computers is expanding rapidly, the knowledge of how to truly use them is not. People are sold devices that allow them to perform a set number of functions (all of which are simply repetitions of thoughts by the people working at the company who sold them the device), but they are not given the tools to express their own thoughts, to expand their own mental capacity in any way other than that already thought of by someone else. We have expanded the medium without expanding literacy. And indeed, there is a financial explanation for this. It’s hard to sell knowledge when people can create it themselves. Many technological “innovations” these days are trivial combinations of earlier ideas which would be unnecessary if people were able to carry out those kinds of compositions themselves.
+
+
+ So why am I interested in computer science? I’m interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
+
+ Sometimes I’m not sure how to explain what I study or why I study it. I tell people that I study theoretical computer science, or algorithms and programming languages, or math and computer science, and if they ask why? Let’s come back to that. First I want to talk about literacy.
+
+
+ Literacy is about being able to understand the recorded thoughts of other people, and being able to share your own in a permanent medium. There are beautiful oral traditions, but most stories, much of human knowledge, is written down. Literacy allows one to tap into that sea of knowledge. In many ways, libraries are one of humanity’s greatest achievements; that one can walk into a building that contains the thoughts and discoveries of thousands of people, stretching back hundreds or thousands of years (and as long as you aren’t at an exclusive university, you can often access that information for free). Some knowledge is certainly more accessible than other knowledge, and languages of course complicate things, but the essential element of literacy is both the perception of the world around you and the ability to describe it and share that with others. We must be able to understand the thoughts of others and formulate our own so that others can understand them.
+
+
+ The broader and perhaps more important aspect of literacy is that it allows you to contextualize your own life and perceptions in relation to others. In writing, you turn your own lived experience into something you can share. In reading, you realize that others have lived experiences that are in some ways similar and in others different from your own. In many ways, literacy is broader than reading and writing, it is rather about developing perspective on your own life and understanding of the lives of others. I can remember as a small child looking up at an airplane and realizing for the first time that there were people inside of it, in the middle of their own lives, with their own thoughts, hopes, dreams. For the first time I had an empathetic sense that I was not the center of the world (Descartes be damned).
+
+
+ Now, you may be asking, with good reason, what does this have to do with computer science? I want to argue that one of the primary mediums of our lives is now something that most of us do not have literacy in. We communicate with one another with email, websites, cell phones, etc. We learn information by pushing a button on a piece of electronics that displays pictures to us that change as we touch them or use devices attached to it. Traffic lights and airline schedules are planned with computers, cars run with them, watches, microwave ovens. Most things we plug in or have batteries have computers in them. Much of our lives are carried out using computers that we don’t have more than a surface empirical understanding of. Now there have always been things that individuals don’t understand. Tax codes, foreign languages, specifics of geography, etc.
+
+
+ But there are a couple interesting things about computers that distinguish them. The first is that they are all essentially the same. There is an underlying similarity between all computers, and indeed even among all possible devices that can compute. This means that it actually is possible to learn about all of these things.
+
+
+ The second is that they are primarily designed as a way for humans to express their thoughts. We don’t think about computers in this sense very much, but it is what distinguishes them from most other machines - they are used so that one person can express how to do something and share it with others. They are a medium for talking about solving problems. The breadth of such problems that they can express is visible by looking at all the places that they are used now - and imagine, this is with only a small minority of the population thinking up ways to use them!
+
+
+ There is a third dimension that is similarly interesting, and talked about more, which is that they are a way to expand our own mental capacities - if I am confronted with a task of sorting a few hundred (or thousand) documents, I can do it by hand, or, if I know how, I can write a program to do it and get a computer to carry out the work of sorting (and if I wanted the computer to do this sorting every day for the next year, I wouldn’t have to do any more work). What this means is that not only are they a way for me to share my ideas of how to solve a problem, they are also a way to automate that very problem solving.
+
+
+ What is interesting and sad is that while the posession of computers is expanding rapidly, the knowledge of how to truly use them is not. People are sold devices that allow them to perform a set number of functions (all of which are simply repetitions of thoughts by the people working at the company who sold them the device), but they are not given the tools to express their own thoughts, to expand their own mental capacity in any way other than that already thought of by someone else. We have expanded the medium without expanding literacy. And indeed, there is a financial explanation for this. It’s hard to sell knowledge when people can create it themselves. Many technological “innovations” these days are trivial combinations of earlier ideas which would be unnecessary if people were able to carry out those kinds of compositions themselves.
+
+
+ So why am I interested in computer science? I’m interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
+
Ur/Web is a language / framework for web programming that both makes it really hard to write code with bugs / vulnerabilities and also makes it really easy to write reactive, client-side code, all from a single, simple, codebase. But it is built on some pretty deep type theory, and while it is an incredibly practical research project, some corners of it still show - like error messages that scroll pages off the screen. I’ve experimented with it before, and have written a small application that is beyond a demo, but still small enough to be digestible.
-
For completeness and clarity, I present it here in complete literate style - all the files, interspersed with comments, are presented. They are split into sections by file, which are named in headings. All the text between the file name and the next file name that is not actual code is within comments (that is what the #, (* and *) are for), so you can copy the whole thing to the files and build the project. All the files should go into a single directory. It builds with the current version of Ur/Web. You can try out the application, as it currently exists (which might have been changed since writing this), at lab.dbpmail.net/dn. The full source, with history, is available at github.com/dbp/dnplayer.
-
The application is a video player for the daily news program Democracy Now!. The main point of it is to remember where in the show you are, so you can stop and resume it, across devices. It should work on desktop and mobile applications - I have targetted Chrome on Android, Chrome on computers, and Safari on iPhones/iPads. The main reason for not supporting Firefox is that it does not support the (proprietary) video/audio codecs that are the only format that Democracy Now! provides.
-
dn.urp
-
# .urp files are project files, which describe various meta-data about
+
+
+
+ A Literate Ur/Web Adventure
+
+
+ Ur/Web is a language / framework for web programming that both makes it really hard to write code with bugs / vulnerabilities and also makes it really easy to write reactive, client-side code, all from a single, simple, codebase. But it is built on some pretty deep type theory, and while it is an incredibly practical research project, some corners of it still show - like error messages that scroll pages off the screen. I’ve experimented with it before, and have written a small application that is beyond a demo, but still small enough to be digestible.
+
+
+ For completeness and clarity, I present it here in complete literate style - all the files, interspersed with comments, are presented. They are split into sections by file, which are named in headings. All the text between the file name and the next file name that is not actual code is within comments (that is what the #, (* and *) are for), so you can copy the whole thing to the files and build the project. All the files should go into a single directory. It builds with the current version of Ur/Web. You can try out the application, as it currently exists (which might have been changed since writing this), at lab.dbpmail.net/dn. The full source, with history, is available at github.com/dbp/dnplayer.
+
+
+ The application is a video player for the daily news program Democracy Now!. The main point of it is to remember where in the show you are, so you can stop and resume it, across devices. It should work on desktop and mobile applications - I have targetted Chrome on Android, Chrome on computers, and Safari on iPhones/iPads. The main reason for not supporting Firefox is that it does not support the (proprietary) video/audio codecs that are the only format that Democracy Now! provides.
+
+
+ dn.urp
+
+
# .urp files are project files, which describe various meta-data about
# Ur/Web applications. They declare libraries (like random, which we'll
# see later), information about the database (both what it is named and
# where to generate the sql for the tables that the application is using).
@@ -99,14 +109,22 @@
dn.urp
$/option
sourceL
dn
-
dn.urs
-
(*
-
.urs files are header files (signature files), which declare all the public functions in the module (in this case, the Dn module). We only export our main function here, but all functions that have urls that we generate within the applications are also implicitly exported.
-
The type of main, unit -> transaction page, means that it takes no input (unit is a value-less value, a placeholder for argumentless functions), and it produces a page (which is a collection of xml), within a transaction. transaction, like Haskell’s IO monad, is the way that Ur/Web handles IO in a safe way. If you aren’t familiar with IO in Haskell, you should go there and then come back.
-
*)
+
+ dn.urs
+
+
(*
+
+ .urs files are header files (signature files), which declare all the public functions in the module (in this case, the Dn module). We only export our main function here, but all functions that have urls that we generate within the applications are also implicitly exported.
+
+
+ The type of main, unit -> transaction page, means that it takes no input (unit is a value-less value, a placeholder for argumentless functions), and it produces a page (which is a collection of xml), within a transaction. transaction, like Haskell’s IO monad, is the way that Ur/Web handles IO in a safe way. If you aren’t familiar with IO in Haskell, you should go there and then come back.
+
+
*)
val main : unit -> transaction page
-
random.urp
-
# Random is a simple wrapper around librandom to provide us with random
+
+ random.urp
+
+
# Random is a simple wrapper around librandom to provide us with random
# strings, that we use for tokens. We included it above with the line
# `library random`. Libraries are declared with separate package files,
# and here we link against librandom.a, include the random header, and declare
@@ -129,30 +147,42 @@
random.urp
ffi random
include random.h
link librandom.a
-
random.urs
-
(*
-
Like with main, we see that the signatures of these functions are ‘transaction unit’ and int -> transaction string, which means the former takes no arguments, and the latter two take integers (lengths), and produce strings, within transactions. They are within transaction because they create side effects (ie, if you run them twice, you will likely not get the same result), and thus we want the compiler to treat them with care (as described earlier). Init seeds the random number generator, so it should be called before the other two are
-
*)
+
+ random.urs
+
+
(*
+
+ Like with main, we see that the signatures of these functions are ‘transaction unit’ and int -> transaction string, which means the former takes no arguments, and the latter two take integers (lengths), and produce strings, within transactions. They are within transaction because they create side effects (ie, if you run them twice, you will likely not get the same result), and thus we want the compiler to treat them with care (as described earlier). Init seeds the random number generator, so it should be called before the other two are
+
+
*)
val init: transaction unit
val str : int -> transaction string
val lower_str : int -> transaction string
-
random.h
-
/*
-
Here we have the header file for the C library, which declares the same signatures as above, but using the structs that Ur/Web uses, and the naming convention that it expects (uw_Module_name).
-
*/
-#include "types.h"
+
+ random.h
+
+
/*
+
+ Here we have the header file for the C library, which declares the same signatures as above, but using the structs that Ur/Web uses, and the naming convention that it expects (uw_Module_name).
+
And finally the C code to generate random strings.
-
*/
-#include "random.h"
+
+ random.c
+
+
/*
+
+ And finally the C code to generate random strings.
+
+
*/
+#include "random.h"
#include <stdlib.h>
#include <time.h>
-#include "urweb.h"
+#include "urweb.h"
/* Note: This is not cryptographically secure (bad PRNG) - do not
use in places where knowledge of the strings is a security issue.
@@ -189,50 +219,68 @@
random.c
return s;
}
-
dn.ur
-
(*
-
We’ll now jump into the main web application, having seen a little bit about how the various files are combined together. The first thing we have is the data that we will be using - one database table, for our users, and one cookie. The tables are declared with Ur/Web’s record syntax, where Token, Date, and Offset are the names of fields, and string, string, and float are the types.
-
All tables that are going to be used have to be declared, and Ur/Web will generate SQL to create them. This is, in my opinion, one weakness, as it means that Ur/Web doesn’t play well with others (as it needs the tables to be named uw_Module_name), and, even worse, if you rename modules, or refactor where the tables are stored, the names of the tables need to change - if you are just creating a toy, you can wipe out the database and re-initialize it, but obviously this isn’t an option for something that matters, and you just have to manually migrate the tables, based on the newly generated database schemas. Luckily the tables / columns are predictably named, but it still isn’t great.
-
*)
+
+ dn.ur
+
+
(*
+
+ We’ll now jump into the main web application, having seen a little bit about how the various files are combined together. The first thing we have is the data that we will be using - one database table, for our users, and one cookie. The tables are declared with Ur/Web’s record syntax, where Token, Date, and Offset are the names of fields, and string, string, and float are the types.
+
+
+ All tables that are going to be used have to be declared, and Ur/Web will generate SQL to create them. This is, in my opinion, one weakness, as it means that Ur/Web doesn’t play well with others (as it needs the tables to be named uw_Module_name), and, even worse, if you rename modules, or refactor where the tables are stored, the names of the tables need to change - if you are just creating a toy, you can wipe out the database and re-initialize it, but obviously this isn’t an option for something that matters, and you just have to manually migrate the tables, based on the newly generated database schemas. Luckily the tables / columns are predictably named, but it still isn’t great.
+
+
*)
(* Note: Date is the date string used in the urls, as the most
convenient serialization, Offset is seconds into the show *)
table u : {Token : string, Date : string, Offset : float} PRIMARY KEY Token
cookie c : string
(*
-
Ur/Web provides a mechanism to run certain code at times other than requests, called tasks. There are a couple categories, the simplest one being an initialization task, that is run once when the application starts up. We use this to initialize our random library.
-
*)
+
+ Ur/Web provides a mechanism to run certain code at times other than requests, called tasks. There are a couple categories, the simplest one being an initialization task, that is run once when the application starts up. We use this to initialize our random library.
+
+
*)
task initialize = fn () => Random.init
(*
-
Part of being a research project is that the standard libraries are pretty minimal, and one thing that is absent is date handling. You can format dates, add and subtract, and that’s about it. Since a bit of this application has to do with tracking what show is the current one, and whether you’ve already started watching it, I wrote a few functions to answer the couple date / time questions that I needed. These are all pure functions, and all the types are inferred.
-
*)
-val date_format = "%Y-%m%d"
+
+ Part of being a research project is that the standard libraries are pretty minimal, and one thing that is absent is date handling. You can format dates, add and subtract, and that’s about it. Since a bit of this application has to do with tracking what show is the current one, and whether you’ve already started watching it, I wrote a few functions to answer the couple date / time questions that I needed. These are all pure functions, and all the types are inferred.
+
+
*)
+val date_format = "%Y-%m%d"
fun before_nine t =
- case read (timef "%H" t) of
+ case read (timef "%H" t) of
None => error <xml>Could not read Hour</xml>
| Some h => h < 9
fun recent_show t =
let val seconds_day = 24*60*60
val nt = (if before_nine t then (addSeconds t (-seconds_day)) else t)
- val wd = timef "%u" nt in
+ val wd = timef "%u" nt in
case wd of
- "6" => addSeconds nt (-seconds_day)
- | "7" => addSeconds nt (-(2*seconds_day))
+ "6" => addSeconds nt (-seconds_day)
+ | "7" => addSeconds nt (-(2*seconds_day))
| _ => nt
end
(*
-
The server that I have this application hosted on is in a different timezone than the show is broadcasted in (EST), so we have to adjust the current time so that we can tell if it is late enough in the day to get the current days broadcast. Depending on what timezone your computer is, this may need to be changed.
-
*)
+
+ The server that I have this application hosted on is in a different timezone than the show is broadcasted in (EST), so we have to adjust the current time so that we can tell if it is late enough in the day to get the current days broadcast. Depending on what timezone your computer is, this may need to be changed.
+
+
*)
fun est_now () =
n <- now;
return (addSeconds n (-(4*60*60)))
(*
-
We track users by tokens - these are short random strings generated with our random library. The mechanism for syncing devices is to visit the url (with the token) on every device, so the tokens will need to be typed in. For that reason, I didn’t want to make the tokens very long, which means that collisions are a real possibility. To deal with this, I set the length to be 6 characters, plus the number of tokens, log_26 (since users are encoded with lower case letters, n users can be encoded with log_26 characters, so we use this as a baseline, and add several so that the collision probability is low).
-
In this, we see how SQL queries work. You can embed SQL (a subset of SQL, defined in the manual), and this is translated into a query datatype, and there are many functions in the standard library to run those queries. We see here two: oneRowE1, which expects to get back just one row, and will extract 1 value from it. E means that it computes a single output expression. Note that it will error if there is no result, but since we are selecting the count, this should be fine. hasRows is an even simpler function; it simply runs the query and returns true iff there are rows.
-
Also note that we refer to the table by name as declared above, and we refer to columns as record members of the table. To embed regular Ur/Web values within SQL queries, we use {[value]}. These queries will not type check if you try to select columns that don’t exist, and of course does escaping etc.
-
*)
+
+ We track users by tokens - these are short random strings generated with our random library. The mechanism for syncing devices is to visit the url (with the token) on every device, so the tokens will need to be typed in. For that reason, I didn’t want to make the tokens very long, which means that collisions are a real possibility. To deal with this, I set the length to be 6 characters, plus the number of tokens, log_26 (since users are encoded with lower case letters, n users can be encoded with log_26 characters, so we use this as a baseline, and add several so that the collision probability is low).
+
+
+ In this, we see how SQL queries work. You can embed SQL (a subset of SQL, defined in the manual), and this is translated into a query datatype, and there are many functions in the standard library to run those queries. We see here two: oneRowE1, which expects to get back just one row, and will extract 1 value from it. E means that it computes a single output expression. Note that it will error if there is no result, but since we are selecting the count, this should be fine. hasRows is an even simpler function; it simply runs the query and returns true iff there are rows.
+
+
+ Also note that we refer to the table by name as declared above, and we refer to columns as record members of the table. To embed regular Ur/Web values within SQL queries, we use {[value]}. These queries will not type check if you try to select columns that don’t exist, and of course does escaping etc.
+
+
*)
(* linking to cmath would be better, but since I only
need an approximation, this is fine *)
fun log26_approx n c : int =
@@ -248,8 +296,10 @@
dn.ur
if used then new_token () else return token
(*
-
We write small functions to set and clear the tokens. We do this so that after a user has visited the unique player url at least once on each device, they will only have to remember the application url, not their unique url. now is a value of type transaction time, which gives the current time, and setCookie/clearCookie should be self explanatory.
-
*)
+
+ We write small functions to set and clear the tokens. We do this so that after a user has visited the unique player url at least once on each device, they will only have to remember the application url, not their unique url. now is a value of type transaction time, which gives the current time, and setCookie/clearCookie should be self explanatory.
+
+
*)
fun set_token token =
t <- now;
setCookie c {Value = token,
@@ -260,38 +310,50 @@
dn.ur
clearCookie c
(*
-
The next thing is a bunch of html fragments. Ur/Web doesn’t have a “templating” system, but it is perfectly possible to create one by defining functions that take the values to insert in. I’ve opted for a simpler option, and just defined common pieces. HTML is written in normal XML format, within <xml> tags, and like the SQL tags, these are typechecked - having attributes that shouldn’t exist, nesting tags that don’t belong, or not closing tags all cause the code not to compile.
-
There are a couple rough edges - some tags are not defined (but you can define new ones in FFI modules), and some attributes can’t be used because they are keywords (hence typ instead of type), but overall it is a neat system, and works very well.
-
*)
+
+ The next thing is a bunch of html fragments. Ur/Web doesn’t have a “templating” system, but it is perfectly possible to create one by defining functions that take the values to insert in. I’ve opted for a simpler option, and just defined common pieces. HTML is written in normal XML format, within <xml> tags, and like the SQL tags, these are typechecked - having attributes that shouldn’t exist, nesting tags that don’t belong, or not closing tags all cause the code not to compile.
+
+
+ There are a couple rough edges - some tags are not defined (but you can define new ones in FFI modules), and some attributes can’t be used because they are keywords (hence typ instead of type), but overall it is a neat system, and works very well.
+
+
*)
fun heading () =
<xml>
- <meta name="viewport" content="width=device-width"/>
- <link rel="stylesheet" typ="text/css" href="http://dbpmail.net/css/default.css"/>
- <link rel="stylesheet" typ="text/css" href="http://lab.dbpmail.net/dn/main.css"/>
+ <meta name="viewport" content="width=device-width"/>
+ <link rel="stylesheet" typ="text/css" href="http://dbpmail.net/css/default.css"/>
+ <link rel="stylesheet" typ="text/css" href="http://lab.dbpmail.net/dn/main.css"/>
</xml>
fun about () =
<xml>
<p>
This is a player for the news program
- <a href="http://democracynow.org">Democracy Now!</a>
+ <a href="http://democracynow.org">Democracy Now!</a>
that remembers how much you have watched.
</p>
</xml>
fun footer () =
<xml>
- <p>Created by <a href="http://dbpmail.net">Daniel Patterson</a>.
+ <p>Created by <a href="http://dbpmail.net">Daniel Patterson</a>.
<br/>
- View the <a href="http://hub.darcs.net/dbp/dnplayer">Source</a>.</p>
+ View the <a href="http://hub.darcs.net/dbp/dnplayer">Source</a>.</p>
</xml>
(*
-
We now get to the web handlers. These are all url/form entry points, and do the bulk of the work. The first one, main, which we rewrote in dn.urp to be the root handler, is mostly HTML - the only catch being that if you have a cookie set, we just redirect you to the player.
-
getCookie returns an option CookieType where CookieType is the type of the cookie (in our case, it is a string). redirect takes a url, and urls can be created from handlers (ie, values of type transaction page) with the url function. So we apply player which is a handler we’ll define later, to the token value (as a token is the parameter that player expects), and grab a url for that.
-
One catch to this is that Ur/Web doesn’t know that player isn’t going to cause side effects, which would mean that it shouldn’t have a url created for it (side effecting things should only be POSTed to), which was why we had to declare player as safeGet in dn.urp
-
We also see a form that submits to create_player, which is another handler that we will define. One thing to note is that create_player is a unit -> transaction page function - and the action for the submit is just create_page, not create_page () - the action of submitting passes that parameter.
-
*)
+
+ We now get to the web handlers. These are all url/form entry points, and do the bulk of the work. The first one, main, which we rewrote in dn.urp to be the root handler, is mostly HTML - the only catch being that if you have a cookie set, we just redirect you to the player.
+
+
+ getCookie returns an option CookieType where CookieType is the type of the cookie (in our case, it is a string). redirect takes a url, and urls can be created from handlers (ie, values of type transaction page) with the url function. So we apply player which is a handler we’ll define later, to the token value (as a token is the parameter that player expects), and grab a url for that.
+
+
+ One catch to this is that Ur/Web doesn’t know that player isn’t going to cause side effects, which would mean that it shouldn’t have a url created for it (side effecting things should only be POSTed to), which was why we had to declare player as safeGet in dn.urp
+
+
+ We also see a form that submits to create_player, which is another handler that we will define. One thing to note is that create_player is a unit -> transaction page function - and the action for the submit is just create_page, not create_page () - the action of submitting passes that parameter.
+
+
*)
fun main () =
mc <- getCookie c;
case mc of
@@ -302,7 +364,7 @@
dn.ur
{heading ()}
</head>
<body>
- <h2><a href="http://democracynow.org">Democracy Now!</a> Player</h2>
+ <h2><a href="http://democracynow.org">Democracy Now!</a> Player</h2>
{about ()}
<p>
You can listen to headlines on your way to work on your phone,
@@ -315,7 +377,7 @@
dn.ur
<li>
<form>
To start, if you've not created a player on any device:
- <submit action={create_player} value="Create Player"/>
+ <submit action={create_player} value="Create Player"/>
</form>
</li>
<li>Otherwise, visit the url for the player you created (it should look like
@@ -332,8 +394,10 @@
dn.ur
</xml>
(*
-
create_player is pretty straightforward, but it shows a different part of Ur/Web’s SQL support: dml supports INSERT, UPDATE, and DELETE, in the normal ways, with the same embedding as SQL queries (that {[value]} puts a normal Ur/Web value into SQL). We create a token, create a “user”, setting that they are on the current day’s show and at the beginning of it (offset 0.0), store the token, and then redirect to the player.
-
*)
+
+ create_player is pretty straightforward, but it shows a different part of Ur/Web’s SQL support: dml supports INSERT, UPDATE, and DELETE, in the normal ways, with the same embedding as SQL queries (that {[value]} puts a normal Ur/Web value into SQL). We create a token, create a “user”, setting that they are on the current day’s show and at the beginning of it (offset 0.0), store the token, and then redirect to the player.
+
+
*)
and create_player () =
n <- est_now ();
token <- new_token ();
@@ -343,12 +407,22 @@
dn.ur
redirect (url (player token))
(*
-
The next two functions encompass most of the player, which is the core of the application. The way that it is structured is a little odd, but with justification: Chrome on Android caches extremely aggressively, and doesn’t seem to pay attention to headers that say not to, which means that if you visited the application, and then a few days later open up Chrome again, it will seem like it is loading the page, but it is loading the cached HTML, it is not getting it from the server. This is really bad for us, because it means it will have both an old offset (in case you watched some of the show from another device), but worse, on subsequent days it will be trying to play the wrong day’s show! You can manually reload the page, but this is silly, so what we do is initially just load a blank page, and then immediately make a remote call to actually load the page. So what is cached is a little bit of HTML and some javascript that loads the page for real.
-
We do all of this is functional reactive style: we declare a source, which is a place where values will be put, and it will cause parts of the page (that are signaled) to update their values. Then we set an onload handler for the body, which, first, makes an rpc call to a server side function (which is just another function, like all of these handlers), and then set the source that we defined to be the result of rendering the player. render is a client-side function that just creates the appropriate forms / html.
-
Finally, we will call a client-side function init, which will do some setup and then call into the javascipt ffi to the ffi init function, which will handle the HTML5 audio/video APIs (which Ur/Web doesn’t support, and are very browser specific anyway).
-
One incredibly special thing that is going on is the SourceL.set os that is passed to javascript. If you remember from our .urp file, we imported sourceL. It is a special reactive construct that allows you to set up handlers that cause side effects (are transactions) when the value inside the SourceL changes. So what is happening is we have created one of these on the server, in player_remote, and sent it back to the client. The client then curries the set function with that source, producing a single argument function that just takes the value to be updated. We hand this function to javascript, so that in our FFI code, we can just set values into this, and it can reactively cause stuff to happen in our server-side code.
-
The reactive component on the page is the <dyn> tag, which is a special construct that allows side-effect free operations on sources. signal s grabs the current value from the source s, and in this case we just return this, but we could do various things to it. The result of the block is what the value of the <dyn> tag is. In this case, we have just made a place where we can stick HTML, by calling set s some_html.
-
*)
+
+ The next two functions encompass most of the player, which is the core of the application. The way that it is structured is a little odd, but with justification: Chrome on Android caches extremely aggressively, and doesn’t seem to pay attention to headers that say not to, which means that if you visited the application, and then a few days later open up Chrome again, it will seem like it is loading the page, but it is loading the cached HTML, it is not getting it from the server. This is really bad for us, because it means it will have both an old offset (in case you watched some of the show from another device), but worse, on subsequent days it will be trying to play the wrong day’s show! You can manually reload the page, but this is silly, so what we do is initially just load a blank page, and then immediately make a remote call to actually load the page. So what is cached is a little bit of HTML and some javascript that loads the page for real.
+
+
+ We do all of this is functional reactive style: we declare a source, which is a place where values will be put, and it will cause parts of the page (that are signaled) to update their values. Then we set an onload handler for the body, which, first, makes an rpc call to a server side function (which is just another function, like all of these handlers), and then set the source that we defined to be the result of rendering the player. render is a client-side function that just creates the appropriate forms / html.
+
+
+ Finally, we will call a client-side function init, which will do some setup and then call into the javascipt ffi to the ffi init function, which will handle the HTML5 audio/video APIs (which Ur/Web doesn’t support, and are very browser specific anyway).
+
+
+ One incredibly special thing that is going on is the SourceL.set os that is passed to javascript. If you remember from our .urp file, we imported sourceL. It is a special reactive construct that allows you to set up handlers that cause side effects (are transactions) when the value inside the SourceL changes. So what is happening is we have created one of these on the server, in player_remote, and sent it back to the client. The client then curries the set function with that source, producing a single argument function that just takes the value to be updated. We hand this function to javascript, so that in our FFI code, we can just set values into this, and it can reactively cause stuff to happen in our server-side code.
+
+
+ The reactive component on the page is the <dyn> tag, which is a special construct that allows side-effect free operations on sources. signal s grabs the current value from the source s, and in this case we just return this, but we could do various things to it. The result of the block is what the value of the <dyn> tag is. In this case, we have just made a place where we can stick HTML, by calling set s some_html.
+
+
*)
and player token =
s <- source <xml/>;
return <xml>
@@ -362,8 +436,10 @@
dn.ur
</body>
</xml>
(*
-
The remote component is where most of the logic of the player resides. By now, you should be able to read most of what’s going in. Some points to highlight are the place where we create the SourceL that we will pass back, and set its initial value to offset. Also, fresh is a way of generating identifiers to use within html. Our render function will use this identifier for the player, which is necessary for the javascript FFI to know where it is. Finally, bless is a function that will turn strings into urls, by checking against the policy outlined in the .urp file for the application.
-
*)
+
+ The remote component is where most of the logic of the player resides. By now, you should be able to read most of what’s going in. Some points to highlight are the place where we create the SourceL that we will pass back, and set its initial value to offset. Also, fresh is a way of generating identifiers to use within html. Our render function will use this identifier for the player, which is necessary for the javascript FFI to know where it is. Finally, bless is a function that will turn strings into urls, by checking against the policy outlined in the .urp file for the application.
+
+
*)
and player_remote token =
n <- est_now ();
op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
@@ -381,10 +457,10 @@
dn.ur
else
return ());
let val offset = (if fmtted_date = pi.Date then pi.Offset else 0.0)
- val video_url = bless (strcat "http://dncdn.dvlabs.com/ipod/dn"
- (strcat fmtted_date ".mp4"))
- val audio_url = bless (strcat "http://traffic.libsyn.com/democracynow/dn"
- (strcat fmtted_date "-1.mp3")) in
+ val video_url = bless (strcat "http://dncdn.dvlabs.com/ipod/dn"
+ (strcat fmtted_date ".mp4"))
+ val audio_url = bless (strcat "http://traffic.libsyn.com/democracynow/dn"
+ (strcat fmtted_date "-1.mp3")) in
os <- SourceL.create offset;
player_id <- fresh;
@@ -395,20 +471,22 @@
dn.ur
(*
-
The next three functions are simple - the first just renders the actual player. Note that we use the player_id we generated in player_remote. Then we provide a way to forget the player (if you want to unlink two devices, forget the player on one and create a new one), and due to some imperfections with how we keep the time in sync (mostly based on weirdness of different browsers implementations of the HTML5 video/audio APIs), to seek backwards, or start the show over, we need to tell the server explicitly, so we provide a handler to do that.
-
*)
+
+ The next three functions are simple - the first just renders the actual player. Note that we use the player_id we generated in player_remote. Then we provide a way to forget the player (if you want to unlink two devices, forget the player on one and create a new one), and due to some imperfections with how we keep the time in sync (mostly based on weirdness of different browsers implementations of the HTML5 video/audio APIs), to seek backwards, or start the show over, we need to tell the server explicitly, so we provide a handler to do that.
+
Now we get to the last web handlers. The first one is a client side initializer. The main thing it sets up is a handler to rpc to the server whenever the offset SourceL changes. The call is to update (which we’ll define in a moment), and it optionally returns a new time to set the client to.
-
This may sound a little odd, but the basic situation is that you play part of the way through the show on one device, then pause, watch some on another device, and now hit play on the first device. It will POST a new time, but the server will tell it that it should actually be at a later time, and so we use the javascript FFI function set_offset to set the offset.
-
Finally we make it so that the client silently fails if the connection fails (this is bad behavior, but simple), and call the javascript FFI initialization function, which will set up the player and any HTML5 API related stuff.
-
*)
+
+ Now we get to the last web handlers. The first one is a client side initializer. The main thing it sets up is a handler to rpc to the server whenever the offset SourceL changes. The call is to update (which we’ll define in a moment), and it optionally returns a new time to set the client to.
+
+
+ This may sound a little odd, but the basic situation is that you play part of the way through the show on one device, then pause, watch some on another device, and now hit play on the first device. It will POST a new time, but the server will tell it that it should actually be at a later time, and so we use the javascript FFI function set_offset to set the offset.
+
+
+ Finally we make it so that the client silently fails if the connection fails (this is bad behavior, but simple), and call the javascript FFI initialization function, which will set up the player and any HTML5 API related stuff.
+
+
*)
and init token player_id os set_offset video_url audio_url =
SourceL.onChange os (fn offset => newt <- rpc (update token offset);
case newt of
@@ -438,8 +522,10 @@
The last function is the simple handler that we called when the offset SourceL changes. It updates the time if the time is greater than the recorded offset (this is why we need the start_over handler), and otherwise returns the recorded offset to be updated.
-
*)
+
+ The last function is the simple handler that we called when the offset SourceL changes. It updates the time if the time is greater than the recorded offset (this is why we need the start_over handler), and otherwise returns the recorded offset to be updated.
+
+
*)
and update token offset =
op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
case op of
@@ -449,10 +535,14 @@
dn.ur
WHERE Token = {[token]} AND {[offset]} > Offset);
return None
else return (Some r.Offset))
-
sourceL.urs
-
(*
-
This came from a supplemental standard library, and, as explained earlier, allows you to create source-like containers that call side-effecting handlers when their values change.
-
*)
+
+ sourceL.urs
+
+
(*
+
+ This came from a supplemental standard library, and, as explained earlier, allows you to create source-like containers that call side-effecting handlers when their values change.
+
+
*)
(* Reactive sources that accept change listeners *)
con t :: Type -> Type
@@ -464,10 +554,14 @@
sourceL.urs
val set : a ::: Type -> t a -> a -> transaction {}
val get : a ::: Type -> t a -> transaction a
val value : a ::: Type -> t a -> signal a
-
sourceL.ur
-
(*
-
The sourceLs are built on top of normal sources, and just call the OnSet function when you call set.
-
*)
+
+ sourceL.ur
+
+
(*
+
+ The sourceLs are built on top of normal sources, and just call the OnSet function when you call set.
+
+
*)
con t a = {Source : source a,
OnSet : source (a -> transaction {})}
@@ -491,10 +585,14 @@
sourceL.ur
fun get [a] (t : t a) = Basis.get t.Source
fun value [a] (t : t a) = signal t.Source
-
dnjs.urs
-
(*
-
This is the signature file for our javascript FFI. It declares what functions will be exported to be accessible within Ur/Web, and what types they have.
-
*)
+
+ dnjs.urs
+
+
(*
+
+ This is the signature file for our javascript FFI. It declares what functions will be exported to be accessible within Ur/Web, and what types they have.
+
+
*)
val init : id -> (* id for player container *)
float -> (* offset value *)
(float -> transaction unit) -> (* set function *)
@@ -503,32 +601,38 @@
dnjs.urs
transaction unit
val set_offset : float -> transaction unit
-
dn.js
-
/*
-
Since this is a adventure in Ur/Web, not Javascript, and there are plenty of places to learn about the quirks and features of HTML5 media APIs (and I don’t claim to be an expert), I’m just going to paste the code in without detailed commentary. The only points that are worth looking at are how we use setter, which you will remember is a curried function that will be updating a SourceL, causing rpcs to update the time. To call functions from the FFI, you use execF, and to force a transaction to actually occur, you have to apply the function (to anything), so we end up with double applications.
-
Other than that, all that is here is some browser detection (as different browsers have different media behavior) and preferences about media type in localstorage.
-
*/
+
+ dn.js
+
+
/*
+
+ Since this is a adventure in Ur/Web, not Javascript, and there are plenty of places to learn about the quirks and features of HTML5 media APIs (and I don’t claim to be an expert), I’m just going to paste the code in without detailed commentary. The only points that are worth looking at are how we use setter, which you will remember is a curried function that will be updating a SourceL, causing rpcs to update the time. To call functions from the FFI, you use execF, and to force a transaction to actually occur, you have to apply the function (to anything), so we end up with double applications.
+
+
+ Other than that, all that is here is some browser detection (as different browsers have different media behavior) and preferences about media type in localstorage.
+
+
*/
function init(player, offset, setter, video_url, audio_url) {
// set up toggle functionality
- $("#"+player).after("<button id='toggle'>Switch to " +
- (prefersVideo() ? "audio" : "video") + "</button>");
- $("#toggle").click(function () {
- window.localStorage["dn-prefers-video"] = !prefersVideo();
+ $("#"+player).after("<button id='toggle'>Switch to " +
+ (prefersVideo() ? "audio" : "video") + "</button>");
+ $("#toggle").click(function () {
+ window.localStorage["dn-prefers-video"] = !prefersVideo();
location.reload();
});
// put player on the page
if (canPlayVideo() && prefersVideo()) {
- $("#"+player).html("<video id='player' width='320' height='180' controls src='" +
- video_url + "'></video>");
+ $("#"+player).html("<video id='player' width='320' height='180' controls src='" +
+ video_url + "'></video>");
} else {
- $("#"+player).html("<audio id='player' width='320' controls src='" +
- audio_url + "'></audio>");
+ $("#"+player).html("<audio id='player' width='320' controls src='" +
+ audio_url + "'></audio>");
}
// seek / start the player, if applicable
if (isDesktopChrome()) {
- $("#player").one("canplay", function () {
+ $("#player").one("canplay", function () {
var player = this;
if (offset != 0) {
player.currentTime = offset;
@@ -539,21 +643,21 @@
dn.js
} else if (isiOS() || isAndroidChrome()) {
// iOS doesn't let you seek till much later... and won't let you start automatically,
// so calling play() is pointless
- $("#player").one("canplaythrough",function () {
- $("#player").one("progress", function () {
- if (offset != 0) {
- $("#player")[0].currentTime = offset;
+ $("#player").one("canplaythrough",function () {
+ $("#player").one("progress", function () {
+ if (offset != 0) {
+ $("#player")[0].currentTime = offset;
}
window.setInterval(update_time(setter), 1000);
- });
- });
+ });
+ });
} else {
- $("#player").after("<h3>As of now, the player does not support your browser.</h3>");
+ $("#player").after("<h3>As of now, the player does not support your browser.</h3>");
}
}
function set_offset(time) {
- var player = $("#player")[0];
+ var player = $("#player")[0];
if (time > player.currentTime) {
player.currentTime = time;
}
@@ -563,7 +667,7 @@
dn.js
// the function that grabs the time and updates it, if needed
function update_time(setter) {
return function () {
- var player = $("#player")[0];
+ var player = $("#player")[0];
if (!player.paused) {
// a transaction is a function from unit to value, hence the extra call
execF(execF(setter, player.currentTime), null)
@@ -579,7 +683,7 @@
var ua = navigator.userAgent.toLowerCase();
return (ua.match(/chrome/) !== null) && (ua.match(/android/) !== null);
}
-
Makefile
-
To actually build our application, we have to first build our C library. Then we’ll build the app, using the sqlite backend. To get this running, we then need to do sqlite3 dn.db < dn.sql (note you only need to do this once) and then start the server with ./dn.exe. You can then visit the application at http://localhost:8080. This has been tested on current Debian Linux and Mac OSX.
-
all: app
+
+ Makefile
+
+
+ To actually build our application, we have to first build our C library. Then we’ll build the app, using the sqlite backend. To get this running, we then need to do sqlite3 dn.db < dn.sql (note you only need to do this once) and then start the server with ./dn.exe. You can then visit the application at http://localhost:8080. This has been tested on current Debian Linux and Mac OSX.
+
+ Ur/Web is a language / framework for web programming that both makes it really hard to write code with bugs / vulnerabilities and also makes it really easy to write reactive, client-side code, all from a single, simple, codebase. But it is built on some pretty deep type theory, and while it is an incredibly practical research project, some corners of it still show - like error messages that scroll pages off the screen. I’ve experimented with it before, and have written a small application that is beyond a demo, but still small enough to be digestible.
+
+
+ For completeness and clarity, I present it here in complete literate style - all the files, interspersed with comments, are presented. They are split into sections by file, which are named in headings. All the text between the file name and the next file name that is not actual code is within comments (that is what the #, (* and *) are for), so you can copy the whole thing to the files and build the project. All the files should go into a single directory. It builds with the current version of Ur/Web. You can try out the application, as it currently exists (which might have been changed since writing this), at lab.dbpmail.net/dn. The full source, with history, is available at github.com/dbp/dnplayer.
+
+
+ The application is a video player for the daily news program Democracy Now!. The main point of it is to remember where in the show you are, so you can stop and resume it, across devices. It should work on desktop and mobile applications - I have targetted Chrome on Android, Chrome on computers, and Safari on iPhones/iPads. The main reason for not supporting Firefox is that it does not support the (proprietary) video/audio codecs that are the only format that Democracy Now! provides.
+
+
+ dn.urp
+
+
# .urp files are project files, which describe various meta-data about
+# Ur/Web applications. They declare libraries (like random, which we'll
+# see later), information about the database (both what it is named and
+# where to generate the sql for the tables that the application is using).
+# They separate meta-data declarations from the modules in the project by
+# a single blank line, which is why we have comments on all blank lines
+# prior to the end.
+library random
+database dbname=dn
+sql dn.sql
+#
+# They also allow you to rewrite urls. By default, urls are generated
+# consistently as Module/function_name, which means that the main
+# function inside Dn, our main module, is our root url. We can rewrite
+# one url to another, but if we leave off the second, that rewrites to
+# root. We can also strip prefixes from urls with a rewrite with a *.
+#
+rewrite url Dn/main
+rewrite url Dn/*
+#
+# safeGet allows us to declare that a function is safe to generate urls
+# to, ie that it won't cause side effects. Along the same safety lines,
+# we declare the external urls that we will generate and scripts we will
+# include - making injecting resources hosted elsewhere hard (as Ur/Web
+# won't allow you to create urls to anything not declared here).
+#
+#
+safeGet player
+allow url http://dncdn.dvlabs.com/ipod/*
+allow url http://traffic.libsyn.com/democracynow/*
+allow url http://dbpmail.net/css/default.css
+allow url http://dbpmail.net
+allow url http://hub.darcs.net/dbp/dnplayer
+allow url http://democracynow.org
+allow url http://lab.dbpmail.net/dn/main.css
+script http://lab.dbpmail.net/static/jquery-1.9.1.min.js
+# One odd thing - Ur/Web doesn't have a static file server of its own, so
+# you need to host any FFI javascript elsewhere. Here's where the javascript for
+# this application, presented later, is hosted. For trying it out, leaving
+# this the same is fine, though if you want to change the javascript, or
+# not depend on my copy being up, you should change this and the reference in
+# the application.
+script http://lab.dbpmail.net/dn/dn.js
+#
+# Next, we declare that we have foreign functions in a module called dnjs. This
+# refers to a header file (.urs), and we furthermore declare what functions within
+# it we are using. We declare them as effectful so that they aren't called multiple
+# times (like Haskell, Ur/Web is purely functional, so normal, non-effectful functions are not
+# guaranteed to be called exactly once - they could be optimized away if the compiler
+# did not see you use the result of the function, and could be inlined (and thus
+# duplicated) if it would be more efficient).
+#
+ffi dnjs
+jsFunc Dnjs.init=init
+effectful Dnjs.init
+jsFunc Dnjs.set_offset=set_offset
+effectful Dnjs.set_offset
+
+# The last thing we declare is the modules in our project. $/ is a prefix that means to
+# look in the standard library, as we are using the option type (Some/None in OCaml/ML,
+# Just/Nothing in Haskell, and very roughly a safe null in other languages). sourceL is
+# a helper for reactive programming (to be discussed later). And finally, our main module,
+# which should be last.
+#
+$/option
+sourceL
+dn
+
+ dn.urs
+
+
(*
+
+ .urs files are header files (signature files), which declare all the public functions in the module (in this case, the Dn module). We only export our main function here, but all functions that have urls that we generate within the applications are also implicitly exported.
+
+
+ The type of main, unit -> transaction page, means that it takes no input (unit is a value-less value, a placeholder for argumentless functions), and it produces a page (which is a collection of xml), within a transaction. transaction, like Haskell’s IO monad, is the way that Ur/Web handles IO in a safe way. If you aren’t familiar with IO in Haskell, you should go there and then come back.
+
+
*)
+val main : unit -> transaction page
+
+ random.urp
+
+
# Random is a simple wrapper around librandom to provide us with random
+# strings, that we use for tokens. We included it above with the line
+# `library random`. Libraries are declared with separate package files,
+# and here we link against librandom.a, include the random header, and declare
+# that we are using functions declared in random.urs (that is the ffi line).
+# We also declare that all three functions are effectful, because they have
+# side effects
+#
+# NOTE: It has been pointed out that instead of doing this, we could either:
+# A. use Ur/Web's builtin `rand` function, and construct the strings
+# without using the FFI, or even easier:
+# B. just use the integers than `rand` generates as tokens.
+#
+# I didn't realize that `rand` existed when I wrote this, but I'm leaving
+# it in because it is a (concise) introduction to the FFI, which, given
+# the relatively small body of Ur/Web libraries, is probably something
+# you'll end up using if you build any large applications.
+effectful Random.init
+effectful Random.str
+effectful Random.lower_str
+ffi random
+include random.h
+link librandom.a
+
+ random.urs
+
+
(*
+
+ Like with main, we see that the signatures of these functions are ‘transaction unit’ and int -> transaction string, which means the former takes no arguments, and the latter two take integers (lengths), and produce strings, within transactions. They are within transaction because they create side effects (ie, if you run them twice, you will likely not get the same result), and thus we want the compiler to treat them with care (as described earlier). Init seeds the random number generator, so it should be called before the other two are
+
+
*)
+val init: transaction unit
+val str : int -> transaction string
+val lower_str : int -> transaction string
+
+ random.h
+
+
/*
+
+ Here we have the header file for the C library, which declares the same signatures as above, but using the structs that Ur/Web uses, and the naming convention that it expects (uw_Module_name).
+
+ And finally the C code to generate random strings.
+
+
*/
+#include "random.h"
+#include <stdlib.h>
+#include <time.h>
+#include "urweb.h"
+
+/* Note: This is not cryptographically secure (bad PRNG) - do not
+ use in places where knowledge of the strings is a security issue.
+*/
+
+uw_Basis_unit uw_Random_init(uw_context ctx) {
+ srand((unsigned int)time(0));
+}
+
+uw_Basis_string uw_Random_str(uw_context ctx, uw_Basis_int len) {
+ uw_Basis_string s;
+ int i;
+
+ s = uw_malloc(ctx, len + 1);
+
+ for (i = 0; i < len; i++) {
+ s[i] = rand() % 93 + 33; /* ASCII characters 33 to 126 */
+ }
+ s[i] = 0;
+
+ return s;
+}
+
+uw_Basis_string uw_Random_lower_str(uw_context ctx, uw_Basis_int len) {
+ uw_Basis_string s;
+ int i;
+
+ s = uw_malloc(ctx, len + 1);
+
+ for (i = 0; i < len; i++) {
+ s[i] = rand() % 26 + 97; /* ASCII lowercase letters */
+ }
+ s[i] = 0;
+
+ return s;
+}
+
+ dn.ur
+
+
(*
+
+ We’ll now jump into the main web application, having seen a little bit about how the various files are combined together. The first thing we have is the data that we will be using - one database table, for our users, and one cookie. The tables are declared with Ur/Web’s record syntax, where Token, Date, and Offset are the names of fields, and string, string, and float are the types.
+
+
+ All tables that are going to be used have to be declared, and Ur/Web will generate SQL to create them. This is, in my opinion, one weakness, as it means that Ur/Web doesn’t play well with others (as it needs the tables to be named uw_Module_name), and, even worse, if you rename modules, or refactor where the tables are stored, the names of the tables need to change - if you are just creating a toy, you can wipe out the database and re-initialize it, but obviously this isn’t an option for something that matters, and you just have to manually migrate the tables, based on the newly generated database schemas. Luckily the tables / columns are predictably named, but it still isn’t great.
+
+
*)
+(* Note: Date is the date string used in the urls, as the most
+ convenient serialization, Offset is seconds into the show *)
+table u : {Token : string, Date : string, Offset : float} PRIMARY KEY Token
+cookie c : string
+(*
+
+ Ur/Web provides a mechanism to run certain code at times other than requests, called tasks. There are a couple categories, the simplest one being an initialization task, that is run once when the application starts up. We use this to initialize our random library.
+
+
*)
+task initialize = fn () => Random.init
+(*
+
+ Part of being a research project is that the standard libraries are pretty minimal, and one thing that is absent is date handling. You can format dates, add and subtract, and that’s about it. Since a bit of this application has to do with tracking what show is the current one, and whether you’ve already started watching it, I wrote a few functions to answer the couple date / time questions that I needed. These are all pure functions, and all the types are inferred.
+
+
*)
+val date_format = "%Y-%m%d"
+
+fun before_nine t =
+ case read (timef "%H" t) of
+ None => error <xml>Could not read Hour</xml>
+ | Some h => h < 9
+
+fun recent_show t =
+ let val seconds_day = 24*60*60
+ val nt = (if before_nine t then (addSeconds t (-seconds_day)) else t)
+ val wd = timef "%u" nt in
+ case wd of
+ "6" => addSeconds nt (-seconds_day)
+ | "7" => addSeconds nt (-(2*seconds_day))
+ | _ => nt
+ end
+(*
+
+ The server that I have this application hosted on is in a different timezone than the show is broadcasted in (EST), so we have to adjust the current time so that we can tell if it is late enough in the day to get the current days broadcast. Depending on what timezone your computer is, this may need to be changed.
+
+
*)
+fun est_now () =
+ n <- now;
+ return (addSeconds n (-(4*60*60)))
+
+(*
+
+ We track users by tokens - these are short random strings generated with our random library. The mechanism for syncing devices is to visit the url (with the token) on every device, so the tokens will need to be typed in. For that reason, I didn’t want to make the tokens very long, which means that collisions are a real possibility. To deal with this, I set the length to be 6 characters, plus the number of tokens, log_26 (since users are encoded with lower case letters, n users can be encoded with log_26 characters, so we use this as a baseline, and add several so that the collision probability is low).
+
+
+ In this, we see how SQL queries work. You can embed SQL (a subset of SQL, defined in the manual), and this is translated into a query datatype, and there are many functions in the standard library to run those queries. We see here two: oneRowE1, which expects to get back just one row, and will extract 1 value from it. E means that it computes a single output expression. Note that it will error if there is no result, but since we are selecting the count, this should be fine. hasRows is an even simpler function; it simply runs the query and returns true iff there are rows.
+
+
+ Also note that we refer to the table by name as declared above, and we refer to columns as record members of the table. To embed regular Ur/Web values within SQL queries, we use {[value]}. These queries will not type check if you try to select columns that don’t exist, and of course does escaping etc.
+
+
*)
+(* linking to cmath would be better, but since I only
+ need an approximation, this is fine *)
+fun log26_approx n c : int =
+ if c < 26 then n else
+ log26_approx (n+1) (c / 26)
+
+
+(* Handlers for creating and persisting token *)
+fun new_token () : transaction string =
+ count <- oneRowE1 (SELECT COUNT( * ) FROM u);
+ token <- Random.lower_str (6 + (log26_approx 0 count));
+ used <- hasRows (SELECT * FROM u WHERE u.Token = {[token]});
+ if used then new_token () else return token
+
+(*
+
+ We write small functions to set and clear the tokens. We do this so that after a user has visited the unique player url at least once on each device, they will only have to remember the application url, not their unique url. now is a value of type transaction time, which gives the current time, and setCookie/clearCookie should be self explanatory.
+
+
*)
+fun set_token token =
+ t <- now;
+ setCookie c {Value = token,
+ Expires = Some (addSeconds t (365*24*60*60)),
+ Secure = False}
+
+fun clear_token () =
+ clearCookie c
+
+(*
+
+ The next thing is a bunch of html fragments. Ur/Web doesn’t have a “templating” system, but it is perfectly possible to create one by defining functions that take the values to insert in. I’ve opted for a simpler option, and just defined common pieces. HTML is written in normal XML format, within <xml> tags, and like the SQL tags, these are typechecked - having attributes that shouldn’t exist, nesting tags that don’t belong, or not closing tags all cause the code not to compile.
+
+
+ There are a couple rough edges - some tags are not defined (but you can define new ones in FFI modules), and some attributes can’t be used because they are keywords (hence typ instead of type), but overall it is a neat system, and works very well.
+
+
*)
+fun heading () =
+ <xml>
+ <meta name="viewport" content="width=device-width"/>
+ <link rel="stylesheet" typ="text/css" href="http://dbpmail.net/css/default.css"/>
+ <link rel="stylesheet" typ="text/css" href="http://lab.dbpmail.net/dn/main.css"/>
+ </xml>
+
+fun about () =
+ <xml>
+ <p>
+ This is a player for the news program
+ <a href="http://democracynow.org">Democracy Now!</a>
+ that remembers how much you have watched.
+ </p>
+ </xml>
+
+fun footer () =
+ <xml>
+ <p>Created by <a href="http://dbpmail.net">Daniel Patterson</a>.
+ <br/>
+ View the <a href="http://hub.darcs.net/dbp/dnplayer">Source</a>.</p>
+ </xml>
+
+(*
+
+ We now get to the web handlers. These are all url/form entry points, and do the bulk of the work. The first one, main, which we rewrote in dn.urp to be the root handler, is mostly HTML - the only catch being that if you have a cookie set, we just redirect you to the player.
+
+
+ getCookie returns an option CookieType where CookieType is the type of the cookie (in our case, it is a string). redirect takes a url, and urls can be created from handlers (ie, values of type transaction page) with the url function. So we apply player which is a handler we’ll define later, to the token value (as a token is the parameter that player expects), and grab a url for that.
+
+
+ One catch to this is that Ur/Web doesn’t know that player isn’t going to cause side effects, which would mean that it shouldn’t have a url created for it (side effecting things should only be POSTed to), which was why we had to declare player as safeGet in dn.urp
+
+
+ We also see a form that submits to create_player, which is another handler that we will define. One thing to note is that create_player is a unit -> transaction page function - and the action for the submit is just create_page, not create_page () - the action of submitting passes that parameter.
+
+
*)
+fun main () =
+ mc <- getCookie c;
+ case mc of
+ Some cv => redirect (url (player cv))
+ | None =>
+ return <xml>
+ <head>
+ {heading ()}
+ </head>
+ <body>
+ <h2><a href="http://democracynow.org">Democracy Now!</a> Player</h2>
+ {about ()}
+ <p>
+ You can listen to headlines on your way to work on your phone,
+ pick up the first segment during lunch on your computer at work, and
+ finish the show in the evening, without worrying what device you are
+ on or whether you have time to watch the whole thing.
+ </p>
+ <h3>How it works</h3>
+ <ol>
+ <li>
+ <form>
+ To start, if you've not created a player on any device:
+ <submit action={create_player} value="Create Player"/>
+ </form>
+ </li>
+ <li>Otherwise, visit the url for the player you created (it should look like
+ something <code>http://.../player/hcegaoe</code>) on this device
+ to synchronize your devices. You only need to do this once per device, after that
+ just visit the home page and we'll load your player.
+ </li>
+ </ol>
+
+ <h3>Compatibility</h3>
+ <p>This currently works with Chrome (on computers and Android) and iPhones/iPads.</p>
+ {footer ()}
+ </body>
+ </xml>
+
+(*
+
+ create_player is pretty straightforward, but it shows a different part of Ur/Web’s SQL support: dml supports INSERT, UPDATE, and DELETE, in the normal ways, with the same embedding as SQL queries (that {[value]} puts a normal Ur/Web value into SQL). We create a token, create a “user”, setting that they are on the current day’s show and at the beginning of it (offset 0.0), store the token, and then redirect to the player.
+
+ The next two functions encompass most of the player, which is the core of the application. The way that it is structured is a little odd, but with justification: Chrome on Android caches extremely aggressively, and doesn’t seem to pay attention to headers that say not to, which means that if you visited the application, and then a few days later open up Chrome again, it will seem like it is loading the page, but it is loading the cached HTML, it is not getting it from the server. This is really bad for us, because it means it will have both an old offset (in case you watched some of the show from another device), but worse, on subsequent days it will be trying to play the wrong day’s show! You can manually reload the page, but this is silly, so what we do is initially just load a blank page, and then immediately make a remote call to actually load the page. So what is cached is a little bit of HTML and some javascript that loads the page for real.
+
+
+ We do all of this is functional reactive style: we declare a source, which is a place where values will be put, and it will cause parts of the page (that are signaled) to update their values. Then we set an onload handler for the body, which, first, makes an rpc call to a server side function (which is just another function, like all of these handlers), and then set the source that we defined to be the result of rendering the player. render is a client-side function that just creates the appropriate forms / html.
+
+
+ Finally, we will call a client-side function init, which will do some setup and then call into the javascipt ffi to the ffi init function, which will handle the HTML5 audio/video APIs (which Ur/Web doesn’t support, and are very browser specific anyway).
+
+
+ One incredibly special thing that is going on is the SourceL.set os that is passed to javascript. If you remember from our .urp file, we imported sourceL. It is a special reactive construct that allows you to set up handlers that cause side effects (are transactions) when the value inside the SourceL changes. So what is happening is we have created one of these on the server, in player_remote, and sent it back to the client. The client then curries the set function with that source, producing a single argument function that just takes the value to be updated. We hand this function to javascript, so that in our FFI code, we can just set values into this, and it can reactively cause stuff to happen in our server-side code.
+
+
+ The reactive component on the page is the <dyn> tag, which is a special construct that allows side-effect free operations on sources. signal s grabs the current value from the source s, and in this case we just return this, but we could do various things to it. The result of the block is what the value of the <dyn> tag is. In this case, we have just made a place where we can stick HTML, by calling set s some_html.
+
+ The remote component is where most of the logic of the player resides. By now, you should be able to read most of what’s going in. Some points to highlight are the place where we create the SourceL that we will pass back, and set its initial value to offset. Also, fresh is a way of generating identifiers to use within html. Our render function will use this identifier for the player, which is necessary for the javascript FFI to know where it is. Finally, bless is a function that will turn strings into urls, by checking against the policy outlined in the .urp file for the application.
+
+
*)
+and player_remote token =
+ n <- est_now ();
+ op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
+ case op of
+ None =>
+ clear_token ();
+ redirect (url (main ()))
+ | Some pi =>
+ set_token token;
+ let val show = recent_show n
+ val fmtted_date = (timef date_format show) in
+ (if fmtted_date <> pi.Date then
+ (* Need to switch to new day *)
+ dml (UPDATE u SET Date = {[fmtted_date]}, Offset = 0.0 WHERE Token = {[token]})
+ else
+ return ());
+ let val offset = (if fmtted_date = pi.Date then pi.Offset else 0.0)
+ val video_url = bless (strcat "http://dncdn.dvlabs.com/ipod/dn"
+ (strcat fmtted_date ".mp4"))
+ val audio_url = bless (strcat "http://traffic.libsyn.com/democracynow/dn"
+ (strcat fmtted_date "-1.mp3")) in
+ os <- SourceL.create offset;
+ player_id <- fresh;
+
+ return {Player = player_id, Show = show, Offset = offset,
+ Source = os, Video = video_url, Audio = audio_url}
+ end
+ end
+
+
+(*
+
+ The next three functions are simple - the first just renders the actual player. Note that we use the player_id we generated in player_remote. Then we provide a way to forget the player (if you want to unlink two devices, forget the player on one and create a new one), and due to some imperfections with how we keep the time in sync (mostly based on weirdness of different browsers implementations of the HTML5 video/audio APIs), to seek backwards, or start the show over, we need to tell the server explicitly, so we provide a handler to do that.
+
+
*)
+and render token player_id date =
+ <xml><h2>
+ <a href="http://democracynow.org">Democracy Now!</a> Player</h2>
+ {about ()}
+ <h3>{[timef "%A, %B %e, %Y" date]}</h3>
+ <div id={player_id}></div>
+ <br/><br/><br/>
+ <form>
+ <submit action={start_over token} value="Start Show Over"/>
+ </form>
+ <form>
+ <submit action={forget} value="Forget This Device"/>
+ </form>
+ {footer ()}
+ </xml>
+
+(* Drop the cookie, so that client will not auto-redirect to player *)
+and forget () =
+ clear_token ();
+ redirect (url (main ()))
+
+(* Because of browser quirks, this is the only way to get to an earlier time, synchronized *)
+and start_over token () =
+ dml (UPDATE u SET Offset = 0.0 WHERE Token = {[token]});
+ redirect (url (player token))
+
+(*
+
+ Now we get to the last web handlers. The first one is a client side initializer. The main thing it sets up is a handler to rpc to the server whenever the offset SourceL changes. The call is to update (which we’ll define in a moment), and it optionally returns a new time to set the client to.
+
+
+ This may sound a little odd, but the basic situation is that you play part of the way through the show on one device, then pause, watch some on another device, and now hit play on the first device. It will POST a new time, but the server will tell it that it should actually be at a later time, and so we use the javascript FFI function set_offset to set the offset.
+
+
+ Finally we make it so that the client silently fails if the connection fails (this is bad behavior, but simple), and call the javascript FFI initialization function, which will set up the player and any HTML5 API related stuff.
+
+
*)
+and init token player_id os set_offset video_url audio_url =
+ SourceL.onChange os (fn offset => newt <- rpc (update token offset);
+ case newt of
+ None => return ()
+ | Some time => Dnjs.set_offset time);
+ offset <- SourceL.get os;
+ onConnectFail (return ());
+ Dnjs.init player_id offset set_offset video_url audio_url
+
+(*
+
+ The last function is the simple handler that we called when the offset SourceL changes. It updates the time if the time is greater than the recorded offset (this is why we need the start_over handler), and otherwise returns the recorded offset to be updated.
+
+
*)
+and update token offset =
+ op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
+ case op of
+ None => return None
+ | Some r => (if offset > r.Offset then
+ dml (UPDATE u SET Offset = {[offset]}
+ WHERE Token = {[token]} AND {[offset]} > Offset);
+ return None
+ else return (Some r.Offset))
+
+ sourceL.urs
+
+
(*
+
+ This came from a supplemental standard library, and, as explained earlier, allows you to create source-like containers that call side-effecting handlers when their values change.
+
+
*)
+(* Reactive sources that accept change listeners *)
+
+con t :: Type -> Type
+
+val create : a ::: Type -> a -> transaction (t a)
+
+val onChange : a ::: Type -> t a -> (a -> transaction {}) -> transaction {}
+
+val set : a ::: Type -> t a -> a -> transaction {}
+val get : a ::: Type -> t a -> transaction a
+val value : a ::: Type -> t a -> signal a
+
+ sourceL.ur
+
+
(*
+
+ The sourceLs are built on top of normal sources, and just call the OnSet function when you call set.
+
+
*)
+
+con t a = {Source : source a,
+ OnSet : source (a -> transaction {})}
+
+fun create [a] (i : a) =
+ s <- source i;
+ f <- source (fn _ => return ());
+
+ return {Source = s,
+ OnSet = f}
+
+fun onChange [a] (t : t a) f =
+ old <- get t.OnSet;
+ set t.OnSet (fn x => (old x; f x))
+
+fun set [a] (t : t a) (v : a) =
+ Basis.set t.Source v;
+ f <- get t.OnSet;
+ f v
+
+fun get [a] (t : t a) = Basis.get t.Source
+
+fun value [a] (t : t a) = signal t.Source
+
+ dnjs.urs
+
+
(*
+
+ This is the signature file for our javascript FFI. It declares what functions will be exported to be accessible within Ur/Web, and what types they have.
+
+
*)
+val init : id -> (* id for player container *)
+ float -> (* offset value *)
+ (float -> transaction unit) -> (* set function *)
+ url -> (* video url *)
+ url -> (* audio url *)
+ transaction unit
+
+val set_offset : float -> transaction unit
+
+ dn.js
+
+
/*
+
+ Since this is a adventure in Ur/Web, not Javascript, and there are plenty of places to learn about the quirks and features of HTML5 media APIs (and I don’t claim to be an expert), I’m just going to paste the code in without detailed commentary. The only points that are worth looking at are how we use setter, which you will remember is a curried function that will be updating a SourceL, causing rpcs to update the time. To call functions from the FFI, you use execF, and to force a transaction to actually occur, you have to apply the function (to anything), so we end up with double applications.
+
+
+ Other than that, all that is here is some browser detection (as different browsers have different media behavior) and preferences about media type in localstorage.
+
+
*/
+function init(player, offset, setter, video_url, audio_url) {
+ // set up toggle functionality
+ $("#"+player).after("<button id='toggle'>Switch to " +
+ (prefersVideo() ? "audio" : "video") + "</button>");
+ $("#toggle").click(function () {
+ window.localStorage["dn-prefers-video"] = !prefersVideo();
+ location.reload();
+ });
+
+ // put player on the page
+ if (canPlayVideo() && prefersVideo()) {
+ $("#"+player).html("<video id='player' width='320' height='180' controls src='" +
+ video_url + "'></video>");
+ } else {
+ $("#"+player).html("<audio id='player' width='320' controls src='" +
+ audio_url + "'></audio>");
+ }
+
+ // seek / start the player, if applicable
+ if (isDesktopChrome()) {
+ $("#player").one("canplay", function () {
+ var player = this;
+ if (offset != 0) {
+ player.currentTime = offset;
+ }
+ player.play();
+ window.setInterval(update_time(setter), 1000);
+ });
+ } else if (isiOS() || isAndroidChrome()) {
+ // iOS doesn't let you seek till much later... and won't let you start automatically,
+ // so calling play() is pointless
+ $("#player").one("canplaythrough",function () {
+ $("#player").one("progress", function () {
+ if (offset != 0) {
+ $("#player")[0].currentTime = offset;
+ }
+ window.setInterval(update_time(setter), 1000);
+ });
+ });
+ } else {
+ $("#player").after("<h3>As of now, the player does not support your browser.</h3>");
+ }
+}
+
+function set_offset(time) {
+ var player = $("#player")[0];
+ if (time > player.currentTime) {
+ player.currentTime = time;
+ }
+
+}
+
+// the function that grabs the time and updates it, if needed
+function update_time(setter) {
+ return function () {
+ var player = $("#player")[0];
+ if (!player.paused) {
+ // a transaction is a function from unit to value, hence the extra call
+ execF(execF(setter, player.currentTime), null)
+ }
+ };
+}
+
+// browser detection / preference storage
+
+function canPlayVideo() {
+ var v = document.createElement('video');
+ return (v.canPlayType && v.canPlayType('video/mp4').replace(/no/, ''));
+}
+
+function prefersVideo() {
+ return (!window.localStorage["dn-prefers-video"] || window.localStorage["dn-prefers-video"] == "true");
+}
+
+function isiOS() {
+ var ua = navigator.userAgent.toLowerCase();
+ return (ua.match(/(ipad|iphone|ipod)/) !== null);
+}
+
+function isDesktopChrome () {
+ var ua = navigator.userAgent.toLowerCase();
+ return (ua.match(/chrome/) !== null) && (ua.match(/mobile/) == null);
+}
+
+function isAndroidChrome () {
+ var ua = navigator.userAgent.toLowerCase();
+ return (ua.match(/chrome/) !== null) && (ua.match(/android/) !== null);
+}
+
+ Makefile
+
+
+ To actually build our application, we have to first build our C library. Then we’ll build the app, using the sqlite backend. To get this running, we then need to do sqlite3 dn.db < dn.sql (note you only need to do this once) and then start the server with ./dn.exe. You can then visit the application at http://localhost:8080. This has been tested on current Debian Linux and Mac OSX.
+
Note: Since writing this I’ve replaced Exim with Postfix and Courier with Dovecot. This is outlined in the Addendum, but the main text is unchanged. Please read the whole guide before starting, as you can skip some of the steps and go straight to the final system.
-
Motivation
-
I reluctantly switched to GMail about six months ago, after using many so-called “replacements for GMail” (the last of which was Fastmail). All of them were missing one or more features that I require of email:
-
-
Access to the same email on multiple machines (but, these can all be machines I control).
-
Access to important email on my phone (Android). Sophisticated access not important - just a high-tech pager.
-
Ability to organize messages by threads.
-
Ability to categorize messages by tags (folders are not sufficient).
-
Good search functionality.
-
-
But, while GMail has all of these things, there were nagging reasons why I still wanted an alternative: handing an advertising company most of my personal and professional correspondance seems like a bad idea, having no (meaningful) way to either sign or encrypt email is unfortunate, and while it isn’t a true deal-breaker, having lightweight programmatic access to my email is a really nice thing (you can get a really rough approximation of this with the RSS feeds GMail provides). Furthermore, I’d be happy if I only get important email on my phone (ie, I want a whitelist on the phone - unexpected email is not something that I need to respond to all the time, and this allows me to elevate the notification for these messages, as they truly are important).
-
Over the past several months, I gradually put together a mail system that provides all the required features, as well as the three bonuses (encryption, easy programmatic access, and phone whitelisting). I’m describing it as a “Hacker’s Replacement for GMail” as opposed to just a “Replacement for GMail” because it involves a good deal of familiarity with Unix (or at least, to set up and debug the whole system it did. Perhaps following along is easier). But, the end result is powerful enough that for me, it is worth it. I finally switched over to using it primarily recently, confirming that all works as expected. I wanted to share the instructions in case they prove useful to someone else setting up a similar system.
-
This is somewhere between an outline and a HOWTO. I’ve organized it roughly in order of how I set things up, but some of the parts are more sketches than detailed instructions - supplement it with normal documentation. Most are based on notes from things as I did them, only a few parts were reconstructed. In general, I try to highlight the parts that were difficult / undocumented, and gloss over stuff that should be easy (and/or point to detailed docs). Without further ado:
-
Overall Design
-
-
Debian GNU/Linux as mail server operating system (both Linux and Mac as clients, though Windows should be doable)
Mail is received by the mail server and put in a Archive subdirectory which is not configured for push in K9-Mail. The mail is processed and tagged by afew, and any messages with the tag “important” are moved into the Important subdirectory. This directory is set up for push in K9-Mail, so I get all important email right away. No further tagging can be done through the mobile device, but that wasn’t a requirement. read/unread status will be synced two-way to notmuch, which is important.
-
Step By Step Instructions
-
-
The first and most important part is having a server. I’ve been really happy with VPSes I have from Digital Ocean (warning: that’s a referral link. Here’s one without.) - they provide big-enough VPSes for email and a simple website for $5/month. There are also many other providers. The important thing is to get a server, if you don’t already have one.
-
The next thing you’ll need is a domain name. You can use a subdomain of one you already have, but the simplest thing is to just get a new one. This is $10-15/year. Once you have it, you want to set a few records (these are set in the “Zone File”, and should be easy to set up through the online control panel of whatever registrar you used):
-
-
A mydomain.com. IP.ADDR.OF.SERVER (mydomain.com. might be written @)
+
+
+
+ A Hacker’s Replacement for GMail
+
+
+ Note: Since writing this I’ve replaced Exim with Postfix and Courier with Dovecot. This is outlined in the Addendum, but the main text is unchanged. Please read the whole guide before starting, as you can skip some of the steps and go straight to the final system.
+
+
+ Motivation
+
+
+ I reluctantly switched to GMail about six months ago, after using many so-called “replacements for GMail” (the last of which was Fastmail). All of them were missing one or more features that I require of email:
+
+
+
+ Access to the same email on multiple machines (but, these can all be machines I control).
+
+
+ Access to important email on my phone (Android). Sophisticated access not important - just a high-tech pager.
+
+
+ Ability to organize messages by threads.
+
+
+ Ability to categorize messages by tags (folders are not sufficient).
+
+
+ Good search functionality.
+
+
+
+ But, while GMail has all of these things, there were nagging reasons why I still wanted an alternative: handing an advertising company most of my personal and professional correspondance seems like a bad idea, having no (meaningful) way to either sign or encrypt email is unfortunate, and while it isn’t a true deal-breaker, having lightweight programmatic access to my email is a really nice thing (you can get a really rough approximation of this with the RSS feeds GMail provides). Furthermore, I’d be happy if I only get important email on my phone (ie, I want a whitelist on the phone - unexpected email is not something that I need to respond to all the time, and this allows me to elevate the notification for these messages, as they truly are important).
+
+
+ Over the past several months, I gradually put together a mail system that provides all the required features, as well as the three bonuses (encryption, easy programmatic access, and phone whitelisting). I’m describing it as a “Hacker’s Replacement for GMail” as opposed to just a “Replacement for GMail” because it involves a good deal of familiarity with Unix (or at least, to set up and debug the whole system it did. Perhaps following along is easier). But, the end result is powerful enough that for me, it is worth it. I finally switched over to using it primarily recently, confirming that all works as expected. I wanted to share the instructions in case they prove useful to someone else setting up a similar system.
+
+
+ This is somewhere between an outline and a HOWTO. I’ve organized it roughly in order of how I set things up, but some of the parts are more sketches than detailed instructions - supplement it with normal documentation. Most are based on notes from things as I did them, only a few parts were reconstructed. In general, I try to highlight the parts that were difficult / undocumented, and gloss over stuff that should be easy (and/or point to detailed docs). Without further ado:
+
+
+ Overall Design
+
+
+
+ Debian GNU/Linux as mail server operating system (both Linux and Mac as clients, though Windows should be doable)
+
+
+ Exim4 as the mail server
+
+
+ Courier-IMAP for mobile usage
+
+
+ Spamassassin (with Pyzor) for spam
+
+
+ notmuch to manage the email database+tags+search
+
+
+ afew for managing notmuch tagging/email moving
+
+ Mail is received by the mail server and put in a Archive subdirectory which is not configured for push in K9-Mail. The mail is processed and tagged by afew, and any messages with the tag “important” are moved into the Important subdirectory. This directory is set up for push in K9-Mail, so I get all important email right away. No further tagging can be done through the mobile device, but that wasn’t a requirement. read/unread status will be synced two-way to notmuch, which is important.
+
+
+ Step By Step Instructions
+
+
+
+
+ The first and most important part is having a server. I’ve been really happy with VPSes I have from Digital Ocean (warning: that’s a referral link. Here’s one without.) - they provide big-enough VPSes for email and a simple website for $5/month. There are also many other providers. The important thing is to get a server, if you don’t already have one.
+
+
+
+
+ The next thing you’ll need is a domain name. You can use a subdomain of one you already have, but the simplest thing is to just get a new one. This is $10-15/year. Once you have it, you want to set a few records (these are set in the “Zone File”, and should be easy to set up through the online control panel of whatever registrar you used):
+
+
+
+
A mydomain.com. IP.ADDR.OF.SERVER (mydomain.com. might be written @)
MX 10 mydomain.com.
-
This sets the domain to point to your server, and sets the mail record to point to that domain name. You will also need to set up a PTR record, or reverse DNS. If you got the server through Digital Ocean, you can set up the DNS records through them, and they allow you to set the PTR record for each server easily. Whereever you set it up, it should point at mydomain.com. (Note trailing period. Otherwise it will resolve to mydomain.com.mydomain.com - not what you want!).
-
-
Now set up the mail server itself. I use Debian, but it shouldn’t be terribly different with other distributions (but you should follow their instructions, not the ones I link to here, because I’m sure there are specifics that are dependent on how Debian sets things up). Since Debian uses Exim4 by default, I used that, and set up Courier as an IMAP server. I followed these instructions: blog.edseek.com/~jasonb/articles/exim4_courier/ (sections 2, 3, and 4). The only important thing I had to change was to force the hostname, by finding the line it /etc/exim4/exim4.conf.template that looks like:
-
-
.ifdef MAIN_HARDCODE_PRIMARY_HOSTNAME
-
And adding above it, MAIN_HARDCODE_PRIMARY_HOSTNAME = mydomain.com (no trailing period). This is so that the header that the mail server displays matches the domain. If this isn’t the case, some mail servers won’t deliver messages. At this point, you can test the mail server by sending yourself emails, using the swaks tool, or running it through an online testing tool like MX Toolbox
-
The last important thing is to set up spam filtering. When using a big email provider that spends a lot of effort filtering spam (and has huge data sets to do it), it’s easy to forget how much spam is actually sent. But, fortunately open source software is also capable of eliminating it. To set Spamassassin up, I generally followed the documentation on the debian wiki. I changed the last part of the configuration so that instead of changing the subject for spam messages to have “***SPAM***”, it adds the following header:
-
add_header = X-Spam-Flag: YES
-
This is the header that the default spam filter from afew will look for and tag messages as spam with. Once messages are tagged as spam, they won’t show up in searches, won’t ever end up in your inbox, etc. On the other hand, they aren’t ever deleted, so if something does end up there, you can always find it (you just have to use notmuch search with the --exclude=false parameter).
-
That sets up basic Spamassassin, which works quite well. To make it work even better, we’ll install Pyzor, which is a service for collaborative spam filtering (sort of an open source system that gets you similar behavior to what GMail can do by having access to so many people’s email). It works by constructing a digest of the message and hashing it, and then sending that hash to a server to see if anyone has marked it as spam.
-
Install pyzor with aptitude install pyzor, then run pyzor discover (as root), and at least on my system, I needed to run chmod a+r /etc/mail/spamassassin/servers (as root) in order to have it work (the following test command would report permission denied on that file if I didn’t). Now restart spamassassin (/etc/init.d/spamassassin restart) and test that it’s working, by running:
-
echo "test" | spamassassin -D pyzor 2>&1 | less
-
This should print (among other things):
-
Jun 29 16:31:53.026 [24982] dbg: pyzor: network tests on, attempting Pyzor
+
+ This sets the domain to point to your server, and sets the mail record to point to that domain name. You will also need to set up a PTR record, or reverse DNS. If you got the server through Digital Ocean, you can set up the DNS records through them, and they allow you to set the PTR record for each server easily. Whereever you set it up, it should point at mydomain.com. (Note trailing period. Otherwise it will resolve to mydomain.com.mydomain.com - not what you want!).
+
+
+
+ Now set up the mail server itself. I use Debian, but it shouldn’t be terribly different with other distributions (but you should follow their instructions, not the ones I link to here, because I’m sure there are specifics that are dependent on how Debian sets things up). Since Debian uses Exim4 by default, I used that, and set up Courier as an IMAP server. I followed these instructions: blog.edseek.com/~jasonb/articles/exim4_courier/ (sections 2, 3, and 4). The only important thing I had to change was to force the hostname, by finding the line it /etc/exim4/exim4.conf.template that looks like:
+
+
+
.ifdef MAIN_HARDCODE_PRIMARY_HOSTNAME
+
+ And adding above it, MAIN_HARDCODE_PRIMARY_HOSTNAME = mydomain.com (no trailing period). This is so that the header that the mail server displays matches the domain. If this isn’t the case, some mail servers won’t deliver messages. At this point, you can test the mail server by sending yourself emails, using the swaks tool, or running it through an online testing tool like MX Toolbox
+
+
+ The last important thing is to set up spam filtering. When using a big email provider that spends a lot of effort filtering spam (and has huge data sets to do it), it’s easy to forget how much spam is actually sent. But, fortunately open source software is also capable of eliminating it. To set Spamassassin up, I generally followed the documentation on the debian wiki. I changed the last part of the configuration so that instead of changing the subject for spam messages to have “***SPAM***”, it adds the following header:
+
+
add_header = X-Spam-Flag: YES
+
+ This is the header that the default spam filter from afew will look for and tag messages as spam with. Once messages are tagged as spam, they won’t show up in searches, won’t ever end up in your inbox, etc. On the other hand, they aren’t ever deleted, so if something does end up there, you can always find it (you just have to use notmuch search with the --exclude=false parameter).
+
+
+ That sets up basic Spamassassin, which works quite well. To make it work even better, we’ll install Pyzor, which is a service for collaborative spam filtering (sort of an open source system that gets you similar behavior to what GMail can do by having access to so many people’s email). It works by constructing a digest of the message and hashing it, and then sending that hash to a server to see if anyone has marked it as spam.
+
+
+ Install pyzor with aptitude install pyzor, then run pyzor discover (as root), and at least on my system, I needed to run chmod a+r /etc/mail/spamassassin/servers (as root) in order to have it work (the following test command would report permission denied on that file if I didn’t). Now restart spamassassin (/etc/init.d/spamassassin restart) and test that it’s working, by running:
+
+
echo "test" | spamassassin -D pyzor 2>&1 | less
+
+ This should print (among other things):
+
+
Jun 29 16:31:53.026 [24982] dbg: pyzor: network tests on, attempting Pyzor
Jun 29 16:31:54.640 [24982] dbg: pyzor: pyzor is available: /usr/bin/pyzor
Jun 29 16:31:54.641 [24982] dbg: pyzor: opening pipe: /usr/bin/pyzor --homedir ...
Jun 29 16:31:54.674 [24982] dbg: pyzor: [25043] finished: exit 1
Jun 29 16:31:54.674 [24982] dbg: pyzor: check failed: no response
-
According to the documentation, this is expected, because “test” is not a valid message.
-
-
Now we want to set up our delivery. Create a .forward file in the home directory of the account on the server that is going to recieve mail. It should contain
-
-
# Exim filter
+
+ According to the documentation, this is expected, because “test” is not a valid message.
+
+
+
+ Now we want to set up our delivery. Create a .forward file in the home directory of the account on the server that is going to recieve mail. It should contain
+
+
+
# Exim filter
save Maildir/.Archive/
-
What this does is put all mail that is recieved into the Archive subdirectory (the dots are convention of the version of the Maildir format that Courier-IMAP uses).
-
-
Next, we want to set up notmuch. You can install it and the python bindings (needed by afew) with:
-
-
aptitude install notmuch python-notmuch
-
-
Run notmuch setup and put in your name, email, and make sure that the directory to your email archive is “/home/YOURUSER/Maildir”. Run notmuch new to have it create the directories and, if you tested the mail server by sending yourself messages, import those initial messages.
-
Install afew from github.com/teythoon/afew. You can start with the default configuration, and then add filters that will add the tag ‘important’, as well as any other automatic tagging you want to have. I commented out the ClassifyingFilter because it wasn’t working - and I wasn’t convinced I wanted it, so didn’t bother to figure out how te get it to work.
-
-
Some simple filters look like:
-
[Filter.0]
+
+ What this does is put all mail that is recieved into the Archive subdirectory (the dots are convention of the version of the Maildir format that Courier-IMAP uses).
+
+
+
+ Next, we want to set up notmuch. You can install it and the python bindings (needed by afew) with:
+
+
+
aptitude install notmuch python-notmuch
+
+
+
+ Run notmuch setup and put in your name, email, and make sure that the directory to your email archive is “/home/YOURUSER/Maildir”. Run notmuch new to have it create the directories and, if you tested the mail server by sending yourself messages, import those initial messages.
+
+
+
+
+ Install afew from github.com/teythoon/afew. You can start with the default configuration, and then add filters that will add the tag ‘important’, as well as any other automatic tagging you want to have. I commented out the ClassifyingFilter because it wasn’t working - and I wasn’t convinced I wanted it, so didn’t bother to figure out how te get it to work.
+
message = messages I don't care about
query = subject:Deal
tags = -unread +deals
-
For the [MailMover] section, you want the configuration to look like:
-
[MailMover]
+
+ For the [MailMover] section, you want the configuration to look like:
+
+
[MailMover]
folders = Archive Important
max_age = 15
# rules
Archive = 'tag:important AND NOT tag:spam':.Important
Important = 'NOT tag:important':.Archive 'tag:spam':.Archive
-
This says to take anything in Archive with the important tag and put it in important (but never spam). Note that the folders we are moving to are prefixed with a dot, but the names of the folders aren’t. Now we need to set everything up to run automatically.
-
-
We are going to use inotify, and specifically the tool incron, to watch for changes in our .Archive inbox and add files to the database, tag them, and move those that should be moved to .Important. On Debian, you can obtain incron with:
-
-
aptitude install incron
-
Now edit your incrontab (similar to crontab) with incrontab -e and put an entry like:
This says that we want to watch for IN_MOVED_TO events, we don’t want to listen while the script is running (if something goes wrong with your importing script, you could cause an infinite spawning of processes, which will take down the server). If a message is delivered while the script is running, it might not get picked up until the next run, but for me that was fine (you may want to eliminate the IN_NO_LOOP option and see if it actually causes loops. In previous configurations, I crashed my server twice through process spawning loops, and didn’t want to do it again while debugging). When IN_MOVED_TO occurs, we call a script we’ve written. You can obviously put this anywhere, just make it executable:
-
#!/bin/bash
+
+ This says to take anything in Archive with the important tag and put it in important (but never spam). Note that the folders we are moving to are prefixed with a dot, but the names of the folders aren’t. Now we need to set everything up to run automatically.
+
+
+
+ We are going to use inotify, and specifically the tool incron, to watch for changes in our .Archive inbox and add files to the database, tag them, and move those that should be moved to .Important. On Debian, you can obtain incron with:
+
+
+
aptitude install incron
+
+ Now edit your incrontab (similar to crontab) with incrontab -e and put an entry like:
+
+ This says that we want to watch for IN_MOVED_TO events, we don’t want to listen while the script is running (if something goes wrong with your importing script, you could cause an infinite spawning of processes, which will take down the server). If a message is delivered while the script is running, it might not get picked up until the next run, but for me that was fine (you may want to eliminate the IN_NO_LOOP option and see if it actually causes loops. In previous configurations, I crashed my server twice through process spawning loops, and didn’t want to do it again while debugging). When IN_MOVED_TO occurs, we call a script we’ve written. You can obviously put this anywhere, just make it executable:
+
It is intentionally being very quiet because output from cron jobs will trigger emails… and thus if there were a mistake, we could be in infinite loop land again. This means you should make sure the commands are working (ie, there aren’t mistakes in your config files), because you won’t see any debug output from them when they are run through this script.
-
-
Now let’s set up the mobile client. I’m not sure of a good way to do this on iOS (aside from just manually checking the Important folder), but perhaps a motivated person could figure it out. Since I have an Android phone, it wasn’t an issue. On Android, install K9-Mail, and set up your account with the incoming / outgoing mail server to be just ‘mydomain.com’. Click on the account, and it will show just Inbox (not helpful). Hit the menu button, then click folders, and check “display all folders”. Now hit the menu again and click folders and hit “refresh folders”.
-
-
Provided at least one message has been put into Important and Archive, those should both show up now. Open the folder ‘Important’ and use the settings to enable push for it. Also add it to the Unified Inbox. Similarly, disable push on the Inbox (this latter doesn’t really matter, because we never deliver messages to the inbox). If you have trouble finding these settings (which I did for a while), note that the settings that are available are contingent upon the screen you are on. The folders settings only exist when you are looking at the list of folders (not the unified inbox / list of accounts, and not the contents of a folder).
-
-
Finally, the desktop client. I’m using the emacs client, because I spend most of my time inside emacs, but there are several other clients - one for vim, one called ‘bower’ that is curses based (that I’ve used before, but is less featureful than the emacs one), and a few others. alot, a python client, won’t work, because it assumes that the notmuch database is local (which is a really stupid assumption). The rest just assume that notmuch is in the path. This means that you can follow the instructions here: notmuchmail.org/remoteusage to have the desktop use the mail database on the server. To test, run notmuch count on your local machine, and it should return the same thing (the total number of messages) as it does on the mail server.
-
-
Once this is working, install notmuch locally, so that you get the emacs bindings (or, just download the source and put the contents of the emacs folder somewhere and include it in your .emacs). You should now be able to run M-x notmuch in emacs and get to your inbox. Setting up mail sending is a little trickier - most of the documentation I found didn’t work!
-
The first thing to do, in case your ISP is like mine and blocks port 25, is to change the default listening port for the server. Open up /etc/default/exim4 and set SMTPLISTENEROPTIONS equal to -oX 25:587 -oP /var/run/exim4/exim.pid. This will have it listen on both 25 and 587.
-
Next, set up emacs to use your mail server to send mail, and to load notmuch. This incantation in your .emacs should do the trick:
-
;; If you opted to just stick the elisp files somewhere, add that path here:
-;; (add-to-list 'load-path "~/path/folder/with/emacs-notmuch")
+
+ It is intentionally being very quiet because output from cron jobs will trigger emails… and thus if there were a mistake, we could be in infinite loop land again. This means you should make sure the commands are working (ie, there aren’t mistakes in your config files), because you won’t see any debug output from them when they are run through this script.
+
+
+
+ Now let’s set up the mobile client. I’m not sure of a good way to do this on iOS (aside from just manually checking the Important folder), but perhaps a motivated person could figure it out. Since I have an Android phone, it wasn’t an issue. On Android, install K9-Mail, and set up your account with the incoming / outgoing mail server to be just ‘mydomain.com’. Click on the account, and it will show just Inbox (not helpful). Hit the menu button, then click folders, and check “display all folders”. Now hit the menu again and click folders and hit “refresh folders”.
+
+
+
+ Provided at least one message has been put into Important and Archive, those should both show up now. Open the folder ‘Important’ and use the settings to enable push for it. Also add it to the Unified Inbox. Similarly, disable push on the Inbox (this latter doesn’t really matter, because we never deliver messages to the inbox). If you have trouble finding these settings (which I did for a while), note that the settings that are available are contingent upon the screen you are on. The folders settings only exist when you are looking at the list of folders (not the unified inbox / list of accounts, and not the contents of a folder).
+
+
+
+ Finally, the desktop client. I’m using the emacs client, because I spend most of my time inside emacs, but there are several other clients - one for vim, one called ‘bower’ that is curses based (that I’ve used before, but is less featureful than the emacs one), and a few others. alot, a python client, won’t work, because it assumes that the notmuch database is local (which is a really stupid assumption). The rest just assume that notmuch is in the path. This means that you can follow the instructions here: notmuchmail.org/remoteusage to have the desktop use the mail database on the server. To test, run notmuch count on your local machine, and it should return the same thing (the total number of messages) as it does on the mail server.
+
+
+
+ Once this is working, install notmuch locally, so that you get the emacs bindings (or, just download the source and put the contents of the emacs folder somewhere and include it in your .emacs). You should now be able to run M-x notmuch in emacs and get to your inbox. Setting up mail sending is a little trickier - most of the documentation I found didn’t work!
+
+
+ The first thing to do, in case your ISP is like mine and blocks port 25, is to change the default listening port for the server. Open up /etc/default/exim4 and set SMTPLISTENEROPTIONS equal to -oX 25:587 -oP /var/run/exim4/exim.pid. This will have it listen on both 25 and 587.
+
+
+ Next, set up emacs to use your mail server to send mail, and to load notmuch. This incantation in your .emacs should do the trick:
+
Now eval your .emacs (or restart emacs), and you are almost ready to send mail.
-
You just need to put a line like this into ~/.authinfo:
-
machine mydomain.com login MYUSERNAME password MYPASSWORD port 587
-
With appropriate permissions (chmod 600 ~/.authinfo).
-
Now you can test this by typing C-x m or M-x notmuch and then from there, hit the ‘m’ key - both of these open the composition window. Type a message and who it is to, and then type C-c C-c to send it. It should take a second and then say it was sent at the bottom of the window.
-
This should work as-is on Linux. Another machine I sometimes use is a mac, and things are a little more complicated. The main problem is that to send mail, we need starttls. You can install gnutls through Homebrew, Fink, or Macports, but the next problem is that if you are using Emacs installed from emacsformacosx.com (and thus it is a graphical application), it is not started from a shell, which means it doesn’t have the same path, and thus doesn’t know how to find gnutls. To fix this problem (which is more general), you can install a tiny Emacs package called exec-path-from-shell (this requires Emacs 24, which you should use - then M-x package-install) that interrogates a shell about what the path should be. Then, we just have to tell it to use gnutls and all should work. We can do this all in a platform specific way (so it won’t run on other platforms):
-
(when (memq window-system '(mac ns))
+
+ Now eval your .emacs (or restart emacs), and you are almost ready to send mail.
+
+
+ You just need to put a line like this into ~/.authinfo:
+
+
machine mydomain.com login MYUSERNAME password MYPASSWORD port 587
+
+ With appropriate permissions (chmod 600 ~/.authinfo).
+
+
+ Now you can test this by typing C-x m or M-x notmuch and then from there, hit the ‘m’ key - both of these open the composition window. Type a message and who it is to, and then type C-c C-c to send it. It should take a second and then say it was sent at the bottom of the window.
+
+
+ This should work as-is on Linux. Another machine I sometimes use is a mac, and things are a little more complicated. The main problem is that to send mail, we need starttls. You can install gnutls through Homebrew, Fink, or Macports, but the next problem is that if you are using Emacs installed from emacsformacosx.com (and thus it is a graphical application), it is not started from a shell, which means it doesn’t have the same path, and thus doesn’t know how to find gnutls. To fix this problem (which is more general), you can install a tiny Emacs package called exec-path-from-shell (this requires Emacs 24, which you should use - then M-x package-install) that interrogates a shell about what the path should be. Then, we just have to tell it to use gnutls and all should work. We can do this all in a platform specific way (so it won’t run on other platforms):
+
Address lookup. It’s really nice to have an address book based on messages in your mailbox. An easy way to do this is to install addrlookup: get the source from http://github.com/spaetz/vala-notmuch/raw/static-sources/src/addrlookup.c, build with
-
-
cc -o addrlookup addrlookup.c `pkg-config --cflags --libs gobject-2.0` -lnotmuch
-
and move the resulting binary into your path (all of this on your server), and then create a similar wrapper as for notmuch:
-
~/bin/addrlookup:
+
+
+ Address lookup. It’s really nice to have an address book based on messages in your mailbox. An easy way to do this is to install addrlookup: get the source from http://github.com/spaetz/vala-notmuch/raw/static-sources/src/addrlookup.c, build with
+
+
+
cc -o addrlookup addrlookup.c `pkg-config --cflags --libs gobject-2.0` -lnotmuch
+
+ and move the resulting binary into your path (all of this on your server), and then create a similar wrapper as for notmuch:
+
Now if you hit “TAB” after you start typing in an address, it will prompt you with completions (use up/down arrow to move between, hit enter to select).
-
Conclusion
-
Congratulations! You now have a mail system that is more powerful than GMail and completely controlled by you. And there is a lot more you can do. For example, to enable encryption (to start, just signing emails), install gnupg, create a key and associate it with your email address, and add the following line to your .emacs and all messages will be signed by default (it adds a line in the message that when you send it causes emacs to sign the email. Note that this line must be the first line, so add your message below it):
An unfortunate current limitation is that the keys are checked by the notmuch commandline, so you need to install public keys on the server. This is fine, except that the emacs client installs them locally when you click on an unknown key (hit $ when viewing a message to see the signatures). So, at least for now, you have to manually add keys to the server with gpg --recv-key KEYID before they will show up as verified on the client (signing/encrypting still works, because that is done locally). Hopefully this will be fixed soon.
-
Added July 9th, 2013:
-
Addendum
-
Among the large amount of feedback I received on this post, many people recommended that I use Postfix and Dovecot over Exim and Courier. Postfix chosen because of security (Exim has a less than stellar history), and dovecot because it is simpler and faster than Courier (and more importantly, combined with Postfix frequently). Security is really important to me (as I want this system to be easy to mantain), so I decided to switch it. Since I’m not doing anything particularly complicated with the mail server / IMAP, the conversion was relatively straightforward. For people reading this, I’d suggest just doing this from the start (and substitute for the parts setting up Exim / Courier), but if you’ve already followed the instructions (as I have), here is what you should do to change. Note that I have gotten much of this information from guides at syslog.tv, modified as needed.
-
-
Install postfix and dovecot with (accept the replacement policy):
Add this to end of /etc/postfix/main.cf, to tell Postfix to use Maildir, sasl,
-
-
home_mailbox = Maildir/
+
+ Now if you hit “TAB” after you start typing in an address, it will prompt you with completions (use up/down arrow to move between, hit enter to select).
+
+
+ Conclusion
+
+
+ Congratulations! You now have a mail system that is more powerful than GMail and completely controlled by you. And there is a lot more you can do. For example, to enable encryption (to start, just signing emails), install gnupg, create a key and associate it with your email address, and add the following line to your .emacs and all messages will be signed by default (it adds a line in the message that when you send it causes emacs to sign the email. Note that this line must be the first line, so add your message below it):
+
+ An unfortunate current limitation is that the keys are checked by the notmuch commandline, so you need to install public keys on the server. This is fine, except that the emacs client installs them locally when you click on an unknown key (hit $ when viewing a message to see the signatures). So, at least for now, you have to manually add keys to the server with gpg --recv-key KEYID before they will show up as verified on the client (signing/encrypting still works, because that is done locally). Hopefully this will be fixed soon.
+
+
+ Added July 9th, 2013:
+
+
+ Addendum
+
+
+ Among the large amount of feedback I received on this post, many people recommended that I use Postfix and Dovecot over Exim and Courier. Postfix chosen because of security (Exim has a less than stellar history), and dovecot because it is simpler and faster than Courier (and more importantly, combined with Postfix frequently). Security is really important to me (as I want this system to be easy to mantain), so I decided to switch it. Since I’m not doing anything particularly complicated with the mail server / IMAP, the conversion was relatively straightforward. For people reading this, I’d suggest just doing this from the start (and substitute for the parts setting up Exim / Courier), but if you’ve already followed the instructions (as I have), here is what you should do to change. Note that I have gotten much of this information from guides at syslog.tv, modified as needed.
+
+
+
+ Install postfix and dovecot with (accept the replacement policy):
+
+ Add this to the end of /etc/postfix/master.cf:
+
+
spamassassin unix - n n - - pipe
user=spamd argv=/usr/bin/spamc -f -e
/usr/sbin/sendmail -oi -f ${sender} ${recipient}
-
NOTE: It’s been pointed out to me that you may not have a spamd user on your system, this won’t work. So check that, and add the user if it’s missing.
-
And this at the beginning, right after the line smpt inet n ...
-
-o content_filter=spamassassin
-
And uncomment the line starting with ‘submission’ and put the following after it:
-
-o syslog_name=postfix/submission
+
+ NOTE: It’s been pointed out to me that you may not have a spamd user on your system, this won’t work. So check that, and add the user if it’s missing.
+
+
+ And this at the beginning, right after the line smpt inet n ...
+
+
-o content_filter=spamassassin
+
+ And uncomment the line starting with ‘submission’ and put the following after it:
+
:0 c
.Archive/
:0
| /usr/local/bin/my-notmuch-new.sh
-
This says to copy the message to the archive and then run my-notmuch-new.sh (which is a shell script that used to be called by incron). Technically it pipes the message to the script, but the script ignores standard in, so it is equivalent to just calling the script. Now fix the ownership:
-
chmod 600 .procmailrc
-
Remove incron, which we aren’t using anymore.
-
sudo aptitude remove incron
-
-
Fix up spamassassin.
-
-
Get the top of /etc/spamassassin/local.cf to look like:
-
rewrite_header Subject
+
+ This says to copy the message to the archive and then run my-notmuch-new.sh (which is a shell script that used to be called by incron). Technically it pipes the message to the script, but the script ignores standard in, so it is equivalent to just calling the script. Now fix the ownership:
+
+
chmod 600 .procmailrc
+
+ Remove incron, which we aren’t using anymore.
+
+
sudo aptitude remove incron
+
+
+ Fix up spamassassin.
+
+
+
+ Get the top of /etc/spamassassin/local.cf to look like:
+
+
rewrite_header Subject
# just add good headers
add_header spam Flag _YESNOCAPS_
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
-
This adds the proper headers so that afew recognizes and tags as spam accordingly. And that should be it!
-
-
I’m not sure of a way to tell K9Mail that the certificate on the IMAP server has changed, so I just deleted the account and recreated it.
-
-
Note: if you find any mistakes in this, or parts that needed additional steps, let me know and I’ll correct/add to this.
-
-
+
+ This adds the proper headers so that afew recognizes and tags as spam accordingly. And that should be it!
+
+
+
+ I’m not sure of a way to tell K9Mail that the certificate on the IMAP server has changed, so I just deleted the account and recreated it.
+
+
+
+ Note: if you find any mistakes in this, or parts that needed additional steps, let me know and I’ll correct/add to this.
+
+ Note: Since writing this I’ve replaced Exim with Postfix and Courier with Dovecot. This is outlined in the Addendum, but the main text is unchanged. Please read the whole guide before starting, as you can skip some of the steps and go straight to the final system.
+
+
+ Motivation
+
+
+ I reluctantly switched to GMail about six months ago, after using many so-called “replacements for GMail” (the last of which was Fastmail). All of them were missing one or more features that I require of email:
+
+
+
+ Access to the same email on multiple machines (but, these can all be machines I control).
+
+
+ Access to important email on my phone (Android). Sophisticated access not important - just a high-tech pager.
+
+
+ Ability to organize messages by threads.
+
+
+ Ability to categorize messages by tags (folders are not sufficient).
+
+
+ Good search functionality.
+
+
+
+ But, while GMail has all of these things, there were nagging reasons why I still wanted an alternative: handing an advertising company most of my personal and professional correspondance seems like a bad idea, having no (meaningful) way to either sign or encrypt email is unfortunate, and while it isn’t a true deal-breaker, having lightweight programmatic access to my email is a really nice thing (you can get a really rough approximation of this with the RSS feeds GMail provides). Furthermore, I’d be happy if I only get important email on my phone (ie, I want a whitelist on the phone - unexpected email is not something that I need to respond to all the time, and this allows me to elevate the notification for these messages, as they truly are important).
+
+
+ Over the past several months, I gradually put together a mail system that provides all the required features, as well as the three bonuses (encryption, easy programmatic access, and phone whitelisting). I’m describing it as a “Hacker’s Replacement for GMail” as opposed to just a “Replacement for GMail” because it involves a good deal of familiarity with Unix (or at least, to set up and debug the whole system it did. Perhaps following along is easier). But, the end result is powerful enough that for me, it is worth it. I finally switched over to using it primarily recently, confirming that all works as expected. I wanted to share the instructions in case they prove useful to someone else setting up a similar system.
+
+
+ This is somewhere between an outline and a HOWTO. I’ve organized it roughly in order of how I set things up, but some of the parts are more sketches than detailed instructions - supplement it with normal documentation. Most are based on notes from things as I did them, only a few parts were reconstructed. In general, I try to highlight the parts that were difficult / undocumented, and gloss over stuff that should be easy (and/or point to detailed docs). Without further ado:
+
+
+ Overall Design
+
+
+
+ Debian GNU/Linux as mail server operating system (both Linux and Mac as clients, though Windows should be doable)
+
+
+ Exim4 as the mail server
+
+
+ Courier-IMAP for mobile usage
+
+
+ Spamassassin (with Pyzor) for spam
+
+
+ notmuch to manage the email database+tags+search
+
+
+ afew for managing notmuch tagging/email moving
+
+ Mail is received by the mail server and put in a Archive subdirectory which is not configured for push in K9-Mail. The mail is processed and tagged by afew, and any messages with the tag “important” are moved into the Important subdirectory. This directory is set up for push in K9-Mail, so I get all important email right away. No further tagging can be done through the mobile device, but that wasn’t a requirement. read/unread status will be synced two-way to notmuch, which is important.
+
+
+ Step By Step Instructions
+
+
+
+
+ The first and most important part is having a server. I’ve been really happy with VPSes I have from Digital Ocean (warning: that’s a referral link. Here’s one without.) - they provide big-enough VPSes for email and a simple website for $5/month. There are also many other providers. The important thing is to get a server, if you don’t already have one.
+
+
+
+
+ The next thing you’ll need is a domain name. You can use a subdomain of one you already have, but the simplest thing is to just get a new one. This is $10-15/year. Once you have it, you want to set a few records (these are set in the “Zone File”, and should be easy to set up through the online control panel of whatever registrar you used):
+
+
+
+
A mydomain.com. IP.ADDR.OF.SERVER (mydomain.com. might be written @)
+MX 10 mydomain.com.
+
+ This sets the domain to point to your server, and sets the mail record to point to that domain name. You will also need to set up a PTR record, or reverse DNS. If you got the server through Digital Ocean, you can set up the DNS records through them, and they allow you to set the PTR record for each server easily. Whereever you set it up, it should point at mydomain.com. (Note trailing period. Otherwise it will resolve to mydomain.com.mydomain.com - not what you want!).
+
+
+
+ Now set up the mail server itself. I use Debian, but it shouldn’t be terribly different with other distributions (but you should follow their instructions, not the ones I link to here, because I’m sure there are specifics that are dependent on how Debian sets things up). Since Debian uses Exim4 by default, I used that, and set up Courier as an IMAP server. I followed these instructions: blog.edseek.com/~jasonb/articles/exim4_courier/ (sections 2, 3, and 4). The only important thing I had to change was to force the hostname, by finding the line it /etc/exim4/exim4.conf.template that looks like:
+
+
+
.ifdef MAIN_HARDCODE_PRIMARY_HOSTNAME
+
+ And adding above it, MAIN_HARDCODE_PRIMARY_HOSTNAME = mydomain.com (no trailing period). This is so that the header that the mail server displays matches the domain. If this isn’t the case, some mail servers won’t deliver messages. At this point, you can test the mail server by sending yourself emails, using the swaks tool, or running it through an online testing tool like MX Toolbox
+
+
+ The last important thing is to set up spam filtering. When using a big email provider that spends a lot of effort filtering spam (and has huge data sets to do it), it’s easy to forget how much spam is actually sent. But, fortunately open source software is also capable of eliminating it. To set Spamassassin up, I generally followed the documentation on the debian wiki. I changed the last part of the configuration so that instead of changing the subject for spam messages to have “***SPAM***”, it adds the following header:
+
+
add_header = X-Spam-Flag: YES
+
+ This is the header that the default spam filter from afew will look for and tag messages as spam with. Once messages are tagged as spam, they won’t show up in searches, won’t ever end up in your inbox, etc. On the other hand, they aren’t ever deleted, so if something does end up there, you can always find it (you just have to use notmuch search with the --exclude=false parameter).
+
+
+ That sets up basic Spamassassin, which works quite well. To make it work even better, we’ll install Pyzor, which is a service for collaborative spam filtering (sort of an open source system that gets you similar behavior to what GMail can do by having access to so many people’s email). It works by constructing a digest of the message and hashing it, and then sending that hash to a server to see if anyone has marked it as spam.
+
+
+ Install pyzor with aptitude install pyzor, then run pyzor discover (as root), and at least on my system, I needed to run chmod a+r /etc/mail/spamassassin/servers (as root) in order to have it work (the following test command would report permission denied on that file if I didn’t). Now restart spamassassin (/etc/init.d/spamassassin restart) and test that it’s working, by running:
+
+ According to the documentation, this is expected, because “test” is not a valid message.
+
+
+
+ Now we want to set up our delivery. Create a .forward file in the home directory of the account on the server that is going to recieve mail. It should contain
+
+
+
# Exim filter
+
+save Maildir/.Archive/
+
+ What this does is put all mail that is recieved into the Archive subdirectory (the dots are convention of the version of the Maildir format that Courier-IMAP uses).
+
+
+
+ Next, we want to set up notmuch. You can install it and the python bindings (needed by afew) with:
+
+
+
aptitude install notmuch python-notmuch
+
+
+
+ Run notmuch setup and put in your name, email, and make sure that the directory to your email archive is “/home/YOURUSER/Maildir”. Run notmuch new to have it create the directories and, if you tested the mail server by sending yourself messages, import those initial messages.
+
+
+
+
+ Install afew from github.com/teythoon/afew. You can start with the default configuration, and then add filters that will add the tag ‘important’, as well as any other automatic tagging you want to have. I commented out the ClassifyingFilter because it wasn’t working - and I wasn’t convinced I wanted it, so didn’t bother to figure out how te get it to work.
+
+
+
+
+ Some simple filters look like:
+
+
[Filter.0]
+message = messages from someone
+query = from:someone.important@email.com
+tags = +important
+[Filter.1]
+message = messages I don't care about
+query = subject:Deal
+tags = -unread +deals
+
+ For the [MailMover] section, you want the configuration to look like:
+
+
[MailMover]
+folders = Archive Important
+max_age = 15
+
+# rules
+Archive = 'tag:important AND NOT tag:spam':.Important
+Important = 'NOT tag:important':.Archive 'tag:spam':.Archive
+
+ This says to take anything in Archive with the important tag and put it in important (but never spam). Note that the folders we are moving to are prefixed with a dot, but the names of the folders aren’t. Now we need to set everything up to run automatically.
+
+
+
+ We are going to use inotify, and specifically the tool incron, to watch for changes in our .Archive inbox and add files to the database, tag them, and move those that should be moved to .Important. On Debian, you can obtain incron with:
+
+
+
aptitude install incron
+
+ Now edit your incrontab (similar to crontab) with incrontab -e and put an entry like:
+
+ This says that we want to watch for IN_MOVED_TO events, we don’t want to listen while the script is running (if something goes wrong with your importing script, you could cause an infinite spawning of processes, which will take down the server). If a message is delivered while the script is running, it might not get picked up until the next run, but for me that was fine (you may want to eliminate the IN_NO_LOOP option and see if it actually causes loops. In previous configurations, I crashed my server twice through process spawning loops, and didn’t want to do it again while debugging). When IN_MOVED_TO occurs, we call a script we’ve written. You can obviously put this anywhere, just make it executable:
+
+ It is intentionally being very quiet because output from cron jobs will trigger emails… and thus if there were a mistake, we could be in infinite loop land again. This means you should make sure the commands are working (ie, there aren’t mistakes in your config files), because you won’t see any debug output from them when they are run through this script.
+
+
+
+ Now let’s set up the mobile client. I’m not sure of a good way to do this on iOS (aside from just manually checking the Important folder), but perhaps a motivated person could figure it out. Since I have an Android phone, it wasn’t an issue. On Android, install K9-Mail, and set up your account with the incoming / outgoing mail server to be just ‘mydomain.com’. Click on the account, and it will show just Inbox (not helpful). Hit the menu button, then click folders, and check “display all folders”. Now hit the menu again and click folders and hit “refresh folders”.
+
+
+
+ Provided at least one message has been put into Important and Archive, those should both show up now. Open the folder ‘Important’ and use the settings to enable push for it. Also add it to the Unified Inbox. Similarly, disable push on the Inbox (this latter doesn’t really matter, because we never deliver messages to the inbox). If you have trouble finding these settings (which I did for a while), note that the settings that are available are contingent upon the screen you are on. The folders settings only exist when you are looking at the list of folders (not the unified inbox / list of accounts, and not the contents of a folder).
+
+
+
+ Finally, the desktop client. I’m using the emacs client, because I spend most of my time inside emacs, but there are several other clients - one for vim, one called ‘bower’ that is curses based (that I’ve used before, but is less featureful than the emacs one), and a few others. alot, a python client, won’t work, because it assumes that the notmuch database is local (which is a really stupid assumption). The rest just assume that notmuch is in the path. This means that you can follow the instructions here: notmuchmail.org/remoteusage to have the desktop use the mail database on the server. To test, run notmuch count on your local machine, and it should return the same thing (the total number of messages) as it does on the mail server.
+
+
+
+ Once this is working, install notmuch locally, so that you get the emacs bindings (or, just download the source and put the contents of the emacs folder somewhere and include it in your .emacs). You should now be able to run M-x notmuch in emacs and get to your inbox. Setting up mail sending is a little trickier - most of the documentation I found didn’t work!
+
+
+ The first thing to do, in case your ISP is like mine and blocks port 25, is to change the default listening port for the server. Open up /etc/default/exim4 and set SMTPLISTENEROPTIONS equal to -oX 25:587 -oP /var/run/exim4/exim.pid. This will have it listen on both 25 and 587.
+
+
+ Next, set up emacs to use your mail server to send mail, and to load notmuch. This incantation in your .emacs should do the trick:
+
+
;; If you opted to just stick the elisp files somewhere, add that path here:
+;; (add-to-list 'load-path "~/path/folder/with/emacs-notmuch")
+(require 'notmuch)
+(setq smtpmail-starttls-credentials '(("mydomain.com" 587 nil nil))
+ smtpmail-auth-credentials (expand-file-name "~/.authinfo")
+ smtpmail-default-smtp-server "mydomain.com"
+ smtpmail-smtp-server "mydomain.com"
+ smtpmail-smtp-service 587)
+(require 'smtpmail)
+(setq message-send-mail-function 'smtpmail-send-it)
+(require 'starttls)
+
+ Now eval your .emacs (or restart emacs), and you are almost ready to send mail.
+
+
+ You just need to put a line like this into ~/.authinfo:
+
+
machine mydomain.com login MYUSERNAME password MYPASSWORD port 587
+
+ With appropriate permissions (chmod 600 ~/.authinfo).
+
+
+ Now you can test this by typing C-x m or M-x notmuch and then from there, hit the ‘m’ key - both of these open the composition window. Type a message and who it is to, and then type C-c C-c to send it. It should take a second and then say it was sent at the bottom of the window.
+
+
+ This should work as-is on Linux. Another machine I sometimes use is a mac, and things are a little more complicated. The main problem is that to send mail, we need starttls. You can install gnutls through Homebrew, Fink, or Macports, but the next problem is that if you are using Emacs installed from emacsformacosx.com (and thus it is a graphical application), it is not started from a shell, which means it doesn’t have the same path, and thus doesn’t know how to find gnutls. To fix this problem (which is more general), you can install a tiny Emacs package called exec-path-from-shell (this requires Emacs 24, which you should use - then M-x package-install) that interrogates a shell about what the path should be. Then, we just have to tell it to use gnutls and all should work. We can do this all in a platform specific way (so it won’t run on other platforms):
+
+ Address lookup. It’s really nice to have an address book based on messages in your mailbox. An easy way to do this is to install addrlookup: get the source from http://github.com/spaetz/vala-notmuch/raw/static-sources/src/addrlookup.c, build with
+
+
+
cc -o addrlookup addrlookup.c `pkg-config --cflags --libs gobject-2.0` -lnotmuch
+
+ and move the resulting binary into your path (all of this on your server), and then create a similar wrapper as for notmuch:
+
+ Now if you hit “TAB” after you start typing in an address, it will prompt you with completions (use up/down arrow to move between, hit enter to select).
+
+
+ Conclusion
+
+
+ Congratulations! You now have a mail system that is more powerful than GMail and completely controlled by you. And there is a lot more you can do. For example, to enable encryption (to start, just signing emails), install gnupg, create a key and associate it with your email address, and add the following line to your .emacs and all messages will be signed by default (it adds a line in the message that when you send it causes emacs to sign the email. Note that this line must be the first line, so add your message below it):
+
+ An unfortunate current limitation is that the keys are checked by the notmuch commandline, so you need to install public keys on the server. This is fine, except that the emacs client installs them locally when you click on an unknown key (hit $ when viewing a message to see the signatures). So, at least for now, you have to manually add keys to the server with gpg --recv-key KEYID before they will show up as verified on the client (signing/encrypting still works, because that is done locally). Hopefully this will be fixed soon.
+
+
+ Added July 9th, 2013:
+
+
+ Addendum
+
+
+ Among the large amount of feedback I received on this post, many people recommended that I use Postfix and Dovecot over Exim and Courier. Postfix chosen because of security (Exim has a less than stellar history), and dovecot because it is simpler and faster than Courier (and more importantly, combined with Postfix frequently). Security is really important to me (as I want this system to be easy to mantain), so I decided to switch it. Since I’m not doing anything particularly complicated with the mail server / IMAP, the conversion was relatively straightforward. For people reading this, I’d suggest just doing this from the start (and substitute for the parts setting up Exim / Courier), but if you’ve already followed the instructions (as I have), here is what you should do to change. Note that I have gotten much of this information from guides at syslog.tv, modified as needed.
+
+
+
+ Install postfix and dovecot with (accept the replacement policy):
+
+ Add this to the end of /etc/postfix/master.cf:
+
+
spamassassin unix - n n - - pipe
+ user=spamd argv=/usr/bin/spamc -f -e
+ /usr/sbin/sendmail -oi -f ${sender} ${recipient}
+
+ NOTE: It’s been pointed out to me that you may not have a spamd user on your system, this won’t work. So check that, and add the user if it’s missing.
+
+
+ And this at the beginning, right after the line smpt inet n ...
+
+
-o content_filter=spamassassin
+
+ And uncomment the line starting with ‘submission’ and put the following after it:
+
:0 c
+.Archive/
+
+:0
+| /usr/local/bin/my-notmuch-new.sh
+
+ This says to copy the message to the archive and then run my-notmuch-new.sh (which is a shell script that used to be called by incron). Technically it pipes the message to the script, but the script ignores standard in, so it is equivalent to just calling the script. Now fix the ownership:
+
+
chmod 600 .procmailrc
+
+ Remove incron, which we aren’t using anymore.
+
+
sudo aptitude remove incron
+
+
+ Fix up spamassassin.
+
+
+
+ Get the top of /etc/spamassassin/local.cf to look like:
+
+
rewrite_header Subject
+# just add good headers
+add_header spam Flag _YESNOCAPS_
+add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
+
+ This adds the proper headers so that afew recognizes and tags as spam accordingly. And that should be it!
+
+
+
+ I’m not sure of a way to tell K9Mail that the certificate on the IMAP server has changed, so I just deleted the account and recreated it.
+
+
+
+ Note: if you find any mistakes in this, or parts that needed additional steps, let me know and I’ll correct/add to this.
+
Every so often, the question comes up, should you test in Haskell, and if so, how should you do it?
-
Most people agree that you should test pure, especially complicated, algorithmic code. Quickcheck1 is a great way to do this, and most Haskellers have internalized this (Quickcheck was invented here, so it must provide value!). What’s less clear (or at least, more debated!) is whether you should be testing monadic code, glue code, and code that just isn’t all that complicated.
-
Quickcheck?
-
A lot Haskell I’m writing these days is with the web framework Snap, and web handlers often have the type Handler App App () - where Handler is a web monad (giving access to request information, and the ability to write response data), and App indicates access to application specific state (like database connections, templates, etc).
-
So the inputs (ie, how to run this action) include any HTTP request and any application state, and the only outputs are side effects (as all it returns is unit). Using Quickcheck here is… challenging. You could restrict the generated requests to have the right URL, and even have the right query parameters, but since the query parameters are just text, if they were supposed to be more structured (like an interger), the chance of actually generating text that was just a number is pretty low… And then if the number were supposed to be the id of an element in the database….
-
But assume that we restrict it so that it’s only generating ids for elements in the database, what are the properties we are asserting? Let’s say that the handler looked up the element, and rendered it on the page. So then we want to assert something about the content of the response (which is wrapped up in the Handler monad). But maybe it should also increment a view count in the database. And assuming that we wrote all these into properties, what are the elements in the database that it is choosing among? And in some senses we’ve now restricted too much, because we may want to see what the behavior is like for slightly invalid inputs. Say, integer id’s that don’t correspond to elements in the database. This is all certainly possible, and may be worth doing, but it seems pretty difficult. Which is totally different from the experience of testing nice pure functions!
-
Let’s try to tease out a little bit of why testing this kind of code with Quickcheck is hard. One problem is that the input space, as determined by the type, is massive. And for most of the possible inputs, the result should be some version of a no-op. Another problem is the dependence on state, as the possible inputs are contingent on external state, and the outputs are primarily changes to state, each of which, again, is a massive space.
-
But having massive input and output spaces is not necessarily a reason not to be using randomized testing. Indeed, this is exactly the kind of thing that fuzz-testing of web browsers, for example, has done with great effect2. The problem in this case is that the size of the input and output space is not at all in proportion to the complexity of the code. If we were writing an HTTP server, we may indeed want to be generating random requests, throwing them at the web server, and making sure it was generating well-formed responses (404s being perfectly fine).
-
Not that complicated…
-
But we’re just writing a little bit of glue code. Which isn’t that complicated. And can be tested manually pretty easily. And may change rapidly.
-
Which means spending a lot of time setting up property based tests (which in these sorts of cases are necessarily going to be quite a bit more complicated than quintessential Quickcheck examples like showing that reverse . reverse = id).
-
But you’re still writing code that has types that massively underspecify it’s behavior. Which should make you nervous, at least a little. Now granted, you should keep that underspecified code as thin as possible - validate the query parameters, the URL, etc, and then call a function with a type that much more clearly specifies what it is supposed to do. For example (this is coming from Snap code, with some details ellided, but should be reasonably easy to understand):
-
f :: Handler App App ()
-f = route [("/foo/:id", do i <- read <$> getParam "id"
+
+
+
+ Why test in Haskell?
+
+
+ Every so often, the question comes up, should you test in Haskell, and if so, how should you do it?
+
+
+ Most people agree that you should test pure, especially complicated, algorithmic code. Quickcheck1 is a great way to do this, and most Haskellers have internalized this (Quickcheck was invented here, so it must provide value!). What’s less clear (or at least, more debated!) is whether you should be testing monadic code, glue code, and code that just isn’t all that complicated.
+
+
+ Quickcheck?
+
+
+ A lot Haskell I’m writing these days is with the web framework Snap, and web handlers often have the type Handler App App () - where Handler is a web monad (giving access to request information, and the ability to write response data), and App indicates access to application specific state (like database connections, templates, etc).
+
+
+ So the inputs (ie, how to run this action) include any HTTP request and any application state, and the only outputs are side effects (as all it returns is unit). Using Quickcheck here is… challenging. You could restrict the generated requests to have the right URL, and even have the right query parameters, but since the query parameters are just text, if they were supposed to be more structured (like an interger), the chance of actually generating text that was just a number is pretty low… And then if the number were supposed to be the id of an element in the database….
+
+
+ But assume that we restrict it so that it’s only generating ids for elements in the database, what are the properties we are asserting? Let’s say that the handler looked up the element, and rendered it on the page. So then we want to assert something about the content of the response (which is wrapped up in the Handler monad). But maybe it should also increment a view count in the database. And assuming that we wrote all these into properties, what are the elements in the database that it is choosing among? And in some senses we’ve now restricted too much, because we may want to see what the behavior is like for slightly invalid inputs. Say, integer id’s that don’t correspond to elements in the database. This is all certainly possible, and may be worth doing, but it seems pretty difficult. Which is totally different from the experience of testing nice pure functions!
+
+
+ Let’s try to tease out a little bit of why testing this kind of code with Quickcheck is hard. One problem is that the input space, as determined by the type, is massive. And for most of the possible inputs, the result should be some version of a no-op. Another problem is the dependence on state, as the possible inputs are contingent on external state, and the outputs are primarily changes to state, each of which, again, is a massive space.
+
+
+ But having massive input and output spaces is not necessarily a reason not to be using randomized testing. Indeed, this is exactly the kind of thing that fuzz-testing of web browsers, for example, has done with great effect2. The problem in this case is that the size of the input and output space is not at all in proportion to the complexity of the code. If we were writing an HTTP server, we may indeed want to be generating random requests, throwing them at the web server, and making sure it was generating well-formed responses (404s being perfectly fine).
+
+
+ Not that complicated…
+
+
+ But we’re just writing a little bit of glue code. Which isn’t that complicated. And can be tested manually pretty easily. And may change rapidly.
+
+
+ Which means spending a lot of time setting up property based tests (which in these sorts of cases are necessarily going to be quite a bit more complicated than quintessential Quickcheck examples like showing that reverse . reverse = id).
+
+
+ But you’re still writing code that has types that massively underspecify it’s behavior. Which should make you nervous, at least a little. Now granted, you should keep that underspecified code as thin as possible - validate the query parameters, the URL, etc, and then call a function with a type that much more clearly specifies what it is supposed to do. For example (this is coming from Snap code, with some details ellided, but should be reasonably easy to understand):
+
+
f :: Handler App App ()
+f = route [("/foo/:id", do i <- read <$> getParam "id"
res <- lookupAndRenderFoo (FooId i)
writeText res)]
lookupAndRenderFoo :: FooId -> Handler App App Text
lookupAndRenderFoo = undefined
-
And certainly, this is a good pattern to use. We went from a function that had as input space any HTTP request (and any application specific state), and as output any HTTP response (as well as any side effects in the Handler monad) and split it into two functions. One still has the same input and output as before, but is very short, and the other is a function with input the id of a specific element, and as output Text, but still can perform any side effects in and read any data from within the Handler monad.
-
Increasing complexity?
-
We could split that further, and write a function with type Foo -> Text, but we would start getting in our own way, as if we wanted to render with a template, the templates exist within the context of the Handler monad, so we would have to look up a template first, and we would have ended up creating many new functions, as well as a bit of extra complexity, all for the sake of splitting our code up into layers, where the last one is pure and easy to test (the rest still have all the same problems).
-
Depending on how complex that last layer is, this may totally be worth it. If your code is dealing with human lives or livelihoods, by all means, isolate that code into as small a portion as possible and test the hell out of it. But it makes coding harder, and makes you move slower. And if you want to change the logic, you may now have to change many different functions, instead of just one.
-
Which is where we come to the argument that testing slows things down, and that for rapidly changing code, it just doesn’t matter.
-
What about just not sampling?
-
But if we step back a bit, we realize that what Quickcheck is trying to do is to sample representatively (well, with a bias towards edge cases) over the type of the input. And it’s easy to see why that’s appealing, as it gives you reasonable confidence that any use of the function behaves as desired. But if we forget about that, as we already know that our types completely underspecify the behavior, we realize all that we really care about is that the code does what we think it should do on a few example cases. That’s what we were going to manually verify after writing the code anyway.
-
Which is easy to test. With Snap, I’d write some tests for the above snippet like3:
-
do f <- create ()
+
+ And certainly, this is a good pattern to use. We went from a function that had as input space any HTTP request (and any application specific state), and as output any HTTP response (as well as any side effects in the Handler monad) and split it into two functions. One still has the same input and output as before, but is very short, and the other is a function with input the id of a specific element, and as output Text, but still can perform any side effects in and read any data from within the Handler monad.
+
+
+ Increasing complexity?
+
+
+ We could split that further, and write a function with type Foo -> Text, but we would start getting in our own way, as if we wanted to render with a template, the templates exist within the context of the Handler monad, so we would have to look up a template first, and we would have ended up creating many new functions, as well as a bit of extra complexity, all for the sake of splitting our code up into layers, where the last one is pure and easy to test (the rest still have all the same problems).
+
+
+ Depending on how complex that last layer is, this may totally be worth it. If your code is dealing with human lives or livelihoods, by all means, isolate that code into as small a portion as possible and test the hell out of it. But it makes coding harder, and makes you move slower. And if you want to change the logic, you may now have to change many different functions, instead of just one.
+
+
+ Which is where we come to the argument that testing slows things down, and that for rapidly changing code, it just doesn’t matter.
+
+
+ What about just not sampling?
+
+
+ But if we step back a bit, we realize that what Quickcheck is trying to do is to sample representatively (well, with a bias towards edge cases) over the type of the input. And it’s easy to see why that’s appealing, as it gives you reasonable confidence that any use of the function behaves as desired. But if we forget about that, as we already know that our types completely underspecify the behavior, we realize all that we really care about is that the code does what we think it should do on a few example cases. That’s what we were going to manually verify after writing the code anyway.
+
+
+ Which is easy to test. With Snap, I’d write some tests for the above snippet like3:
+
+
do f <- create ()
let i = show . unFooId . fooId $ f
- get ("/foo/" ++ i) >>= should200
- get ("/foo/" ++ i) >>= shouldHaveText (fooDescription f)
- get ("/foo/" ++ show (1 + i)) >>= should404
-
And call it a day. This misses vasts swaths of inputs, and asserts very little about the outputs, but it also tells you a huge amount more about the correctness of the code than the fact that it typechecked did. And as you iterate and refactor your application, you get the assurance that this handler:
-
-
still exists.
-
still looks up the element from the database.
-
still puts the description somewhere on the page.
-
doesn’t work for ids that don’t correspond to elements in the database.
-
-
Which seems like a lot of assurance for a very small amount of work. And if your application is fast moving, this benefits you even more, as the faster you move, the more likely you are to break things (at least, that’s always been my experience!). If you do decide to rewrite this handler, fixing these tests is going to take a tiny amount of time (probably less time than you spend manually confirming that the change worked).
-
Why this should be expected to work.
-
To take it a little further, and perhaps justify from a somewhat theoretical point of view why these sorts of tests are so valuable, consider all possible implementations of any function (or monadic action). The possible implementations with the given type are a subset of all the possible implementations, but still potentially a pretty large one (our example of a web handler certainly has this property).
-
This perspective gives us some intuition on why it is much easier to test simple, pure functions. There are only four possible implementations of a Bool -> Bool function, so testing not via sampling seems pretty tractable. To go even further, we get into the territory of “Theorems for Free”4, where there is only one implementation for an (a,b) -> a function, so testing fst is pointless.
-
But returning to our case of massive spaces of well-typed implementations: A single test, like one of the above, corresponds to another subset of all the possible implementations. For example, the first test corresponds to the subset that return success when passed the given url via GET request. Since we’re in Haskell, we also get a guarantee that the set of implementations that fulfill the test is a (non)strict subset of the set of implementations that fulfill the type, as if this were not the case, our test case wouldn’t type check. The problem with the first test, of course, is that there are all sorts of bogus implementations that fulfill it. For example, the handler that always returns success would match that test.
-
But even still, it is a strict subset of the implementations that fulfill the type (for example, the handler that always returns 404 is not in this set), so we’re guaranteed to have improved the chance that our code is correct, even with such a weak test (granted, it actually may not be that weak of a test - in one project, I have a menu generated from a data structure in code, and a test that iterates through all elements of the menu, checking that hitting each url results in a 200. And this has caught many refactoring problems!).
-
Where we really start to benefit is as we add a few more tests. The second test shows that the handler must somehow get an element out of the database (provided our create () test function is creating relatively unique field names), which is another (strict) subset of the set of implementations that fulfill the type. And we now know that our implementation must be somewhere in the intersection of these two subsets.
-
It shouldn’t be hard to convince yourself that through the process of just writing a few (well chosen) tests you can vastly reduce the possibility of writing incorrect implementations. Which, when we are writing relatively straightforward code, will probably be good enough to ensure that the code is actually correct. And will continue to verify that as the code evolves. Pretty good for a couple lines of code.
-
-
-
-
For those who haven’t used Quickcheck, it allows you to specify properties that a function should satisfy, and possibly a way to generate random values of the input type (if your input is a standard type, it already knows how to do this), and it will generate some number of inputs and verify that the property holds for all of them.↩︎
This syntax is based on the hspec-snap package, which I chose because I’m familiar with it (and wrote it). The create line is from some not-yet-integrated-or-released, at least at time of publishing, work to add factory support to the library (sorry!). With that said, the advice should hold no matter what you’re doing.↩︎
-
-
-
-
+ get ("/foo/" ++ i) >>= should200
+ get ("/foo/" ++ i) >>= shouldHaveText (fooDescription f)
+ get ("/foo/" ++ show (1 + i)) >>= should404
+
+ And call it a day. This misses vasts swaths of inputs, and asserts very little about the outputs, but it also tells you a huge amount more about the correctness of the code than the fact that it typechecked did. And as you iterate and refactor your application, you get the assurance that this handler:
+
+
+
+ still exists.
+
+
+ still looks up the element from the database.
+
+
+ still puts the description somewhere on the page.
+
+
+ doesn’t work for ids that don’t correspond to elements in the database.
+
+
+
+ Which seems like a lot of assurance for a very small amount of work. And if your application is fast moving, this benefits you even more, as the faster you move, the more likely you are to break things (at least, that’s always been my experience!). If you do decide to rewrite this handler, fixing these tests is going to take a tiny amount of time (probably less time than you spend manually confirming that the change worked).
+
+
+ Why this should be expected to work.
+
+
+ To take it a little further, and perhaps justify from a somewhat theoretical point of view why these sorts of tests are so valuable, consider all possible implementations of any function (or monadic action). The possible implementations with the given type are a subset of all the possible implementations, but still potentially a pretty large one (our example of a web handler certainly has this property).
+
+
+ This perspective gives us some intuition on why it is much easier to test simple, pure functions. There are only four possible implementations of a Bool -> Bool function, so testing not via sampling seems pretty tractable. To go even further, we get into the territory of “Theorems for Free”4, where there is only one implementation for an (a,b) -> a function, so testing fst is pointless.
+
+
+ But returning to our case of massive spaces of well-typed implementations: A single test, like one of the above, corresponds to another subset of all the possible implementations. For example, the first test corresponds to the subset that return success when passed the given url via GET request. Since we’re in Haskell, we also get a guarantee that the set of implementations that fulfill the test is a (non)strict subset of the set of implementations that fulfill the type, as if this were not the case, our test case wouldn’t type check. The problem with the first test, of course, is that there are all sorts of bogus implementations that fulfill it. For example, the handler that always returns success would match that test.
+
+
+ But even still, it is a strict subset of the implementations that fulfill the type (for example, the handler that always returns 404 is not in this set), so we’re guaranteed to have improved the chance that our code is correct, even with such a weak test (granted, it actually may not be that weak of a test - in one project, I have a menu generated from a data structure in code, and a test that iterates through all elements of the menu, checking that hitting each url results in a 200. And this has caught many refactoring problems!).
+
+
+ Where we really start to benefit is as we add a few more tests. The second test shows that the handler must somehow get an element out of the database (provided our create () test function is creating relatively unique field names), which is another (strict) subset of the set of implementations that fulfill the type. And we now know that our implementation must be somewhere in the intersection of these two subsets.
+
+
+ It shouldn’t be hard to convince yourself that through the process of just writing a few (well chosen) tests you can vastly reduce the possibility of writing incorrect implementations. Which, when we are writing relatively straightforward code, will probably be good enough to ensure that the code is actually correct. And will continue to verify that as the code evolves. Pretty good for a couple lines of code.
+
+
+
+
+
+
+ For those who haven’t used Quickcheck, it allows you to specify properties that a function should satisfy, and possibly a way to generate random values of the input type (if your input is a standard type, it already knows how to do this), and it will generate some number of inputs and verify that the property holds for all of them.↩︎
+
+ This syntax is based on the hspec-snap package, which I chose because I’m familiar with it (and wrote it). The create line is from some not-yet-integrated-or-released, at least at time of publishing, work to add factory support to the library (sorry!). With that said, the advice should hold no matter what you’re doing.↩︎
+
+ Every so often, the question comes up, should you test in Haskell, and if so, how should you do it?
+
+
+ Most people agree that you should test pure, especially complicated, algorithmic code. Quickcheck1 is a great way to do this, and most Haskellers have internalized this (Quickcheck was invented here, so it must provide value!). What’s less clear (or at least, more debated!) is whether you should be testing monadic code, glue code, and code that just isn’t all that complicated.
+
+
+ Quickcheck?
+
+
+ A lot Haskell I’m writing these days is with the web framework Snap, and web handlers often have the type Handler App App () - where Handler is a web monad (giving access to request information, and the ability to write response data), and App indicates access to application specific state (like database connections, templates, etc).
+
+
+ So the inputs (ie, how to run this action) include any HTTP request and any application state, and the only outputs are side effects (as all it returns is unit). Using Quickcheck here is… challenging. You could restrict the generated requests to have the right URL, and even have the right query parameters, but since the query parameters are just text, if they were supposed to be more structured (like an interger), the chance of actually generating text that was just a number is pretty low… And then if the number were supposed to be the id of an element in the database….
+
+
+ But assume that we restrict it so that it’s only generating ids for elements in the database, what are the properties we are asserting? Let’s say that the handler looked up the element, and rendered it on the page. So then we want to assert something about the content of the response (which is wrapped up in the Handler monad). But maybe it should also increment a view count in the database. And assuming that we wrote all these into properties, what are the elements in the database that it is choosing among? And in some senses we’ve now restricted too much, because we may want to see what the behavior is like for slightly invalid inputs. Say, integer id’s that don’t correspond to elements in the database. This is all certainly possible, and may be worth doing, but it seems pretty difficult. Which is totally different from the experience of testing nice pure functions!
+
+
+ Let’s try to tease out a little bit of why testing this kind of code with Quickcheck is hard. One problem is that the input space, as determined by the type, is massive. And for most of the possible inputs, the result should be some version of a no-op. Another problem is the dependence on state, as the possible inputs are contingent on external state, and the outputs are primarily changes to state, each of which, again, is a massive space.
+
+
+ But having massive input and output spaces is not necessarily a reason not to be using randomized testing. Indeed, this is exactly the kind of thing that fuzz-testing of web browsers, for example, has done with great effect2. The problem in this case is that the size of the input and output space is not at all in proportion to the complexity of the code. If we were writing an HTTP server, we may indeed want to be generating random requests, throwing them at the web server, and making sure it was generating well-formed responses (404s being perfectly fine).
+
+
+ Not that complicated…
+
+
+ But we’re just writing a little bit of glue code. Which isn’t that complicated. And can be tested manually pretty easily. And may change rapidly.
+
+
+ Which means spending a lot of time setting up property based tests (which in these sorts of cases are necessarily going to be quite a bit more complicated than quintessential Quickcheck examples like showing that reverse . reverse = id).
+
+
+ But you’re still writing code that has types that massively underspecify it’s behavior. Which should make you nervous, at least a little. Now granted, you should keep that underspecified code as thin as possible - validate the query parameters, the URL, etc, and then call a function with a type that much more clearly specifies what it is supposed to do. For example (this is coming from Snap code, with some details ellided, but should be reasonably easy to understand):
+
+
f :: Handler App App ()
+f = route [("/foo/:id", do i <- read <$> getParam "id"
+ res <- lookupAndRenderFoo (FooId i)
+ writeText res)]
+
+lookupAndRenderFoo :: FooId -> Handler App App Text
+lookupAndRenderFoo = undefined
+
+ And certainly, this is a good pattern to use. We went from a function that had as input space any HTTP request (and any application specific state), and as output any HTTP response (as well as any side effects in the Handler monad) and split it into two functions. One still has the same input and output as before, but is very short, and the other is a function with input the id of a specific element, and as output Text, but still can perform any side effects in and read any data from within the Handler monad.
+
+
+ Increasing complexity?
+
+
+ We could split that further, and write a function with type Foo -> Text, but we would start getting in our own way, as if we wanted to render with a template, the templates exist within the context of the Handler monad, so we would have to look up a template first, and we would have ended up creating many new functions, as well as a bit of extra complexity, all for the sake of splitting our code up into layers, where the last one is pure and easy to test (the rest still have all the same problems).
+
+
+ Depending on how complex that last layer is, this may totally be worth it. If your code is dealing with human lives or livelihoods, by all means, isolate that code into as small a portion as possible and test the hell out of it. But it makes coding harder, and makes you move slower. And if you want to change the logic, you may now have to change many different functions, instead of just one.
+
+
+ Which is where we come to the argument that testing slows things down, and that for rapidly changing code, it just doesn’t matter.
+
+
+ What about just not sampling?
+
+
+ But if we step back a bit, we realize that what Quickcheck is trying to do is to sample representatively (well, with a bias towards edge cases) over the type of the input. And it’s easy to see why that’s appealing, as it gives you reasonable confidence that any use of the function behaves as desired. But if we forget about that, as we already know that our types completely underspecify the behavior, we realize all that we really care about is that the code does what we think it should do on a few example cases. That’s what we were going to manually verify after writing the code anyway.
+
+
+ Which is easy to test. With Snap, I’d write some tests for the above snippet like3:
+
+
do f <- create ()
+ let i = show . unFooId . fooId $ f
+ get ("/foo/" ++ i) >>= should200
+ get ("/foo/" ++ i) >>= shouldHaveText (fooDescription f)
+ get ("/foo/" ++ show (1 + i)) >>= should404
+
+ And call it a day. This misses vasts swaths of inputs, and asserts very little about the outputs, but it also tells you a huge amount more about the correctness of the code than the fact that it typechecked did. And as you iterate and refactor your application, you get the assurance that this handler:
+
+
+
+ still exists.
+
+
+ still looks up the element from the database.
+
+
+ still puts the description somewhere on the page.
+
+
+ doesn’t work for ids that don’t correspond to elements in the database.
+
+
+
+ Which seems like a lot of assurance for a very small amount of work. And if your application is fast moving, this benefits you even more, as the faster you move, the more likely you are to break things (at least, that’s always been my experience!). If you do decide to rewrite this handler, fixing these tests is going to take a tiny amount of time (probably less time than you spend manually confirming that the change worked).
+
+
+ Why this should be expected to work.
+
+
+ To take it a little further, and perhaps justify from a somewhat theoretical point of view why these sorts of tests are so valuable, consider all possible implementations of any function (or monadic action). The possible implementations with the given type are a subset of all the possible implementations, but still potentially a pretty large one (our example of a web handler certainly has this property).
+
+
+ This perspective gives us some intuition on why it is much easier to test simple, pure functions. There are only four possible implementations of a Bool -> Bool function, so testing not via sampling seems pretty tractable. To go even further, we get into the territory of “Theorems for Free”4, where there is only one implementation for an (a,b) -> a function, so testing fst is pointless.
+
+
+ But returning to our case of massive spaces of well-typed implementations: A single test, like one of the above, corresponds to another subset of all the possible implementations. For example, the first test corresponds to the subset that return success when passed the given url via GET request. Since we’re in Haskell, we also get a guarantee that the set of implementations that fulfill the test is a (non)strict subset of the set of implementations that fulfill the type, as if this were not the case, our test case wouldn’t type check. The problem with the first test, of course, is that there are all sorts of bogus implementations that fulfill it. For example, the handler that always returns success would match that test.
+
+
+ But even still, it is a strict subset of the implementations that fulfill the type (for example, the handler that always returns 404 is not in this set), so we’re guaranteed to have improved the chance that our code is correct, even with such a weak test (granted, it actually may not be that weak of a test - in one project, I have a menu generated from a data structure in code, and a test that iterates through all elements of the menu, checking that hitting each url results in a 200. And this has caught many refactoring problems!).
+
+
+ Where we really start to benefit is as we add a few more tests. The second test shows that the handler must somehow get an element out of the database (provided our create () test function is creating relatively unique field names), which is another (strict) subset of the set of implementations that fulfill the type. And we now know that our implementation must be somewhere in the intersection of these two subsets.
+
+
+ It shouldn’t be hard to convince yourself that through the process of just writing a few (well chosen) tests you can vastly reduce the possibility of writing incorrect implementations. Which, when we are writing relatively straightforward code, will probably be good enough to ensure that the code is actually correct. And will continue to verify that as the code evolves. Pretty good for a couple lines of code.
+
+
+
+
+
+
+ For those who haven’t used Quickcheck, it allows you to specify properties that a function should satisfy, and possibly a way to generate random values of the input type (if your input is a standard type, it already knows how to do this), and it will generate some number of inputs and verify that the property holds for all of them.↩︎
+
+ This syntax is based on the hspec-snap package, which I chose because I’m familiar with it (and wrote it). The create line is from some not-yet-integrated-or-released, at least at time of publishing, work to add factory support to the library (sorry!). With that said, the advice should hold no matter what you’re doing.↩︎
+
Backing things up is important. Some stuff, like code that lives in repositories, may naturally end up in many places, so it perhaps is less important to explicitly back up. Other files, like photos, or personal documents, generally don’t have a natural redundant home, so they need some backup story, and relying on various online services is risky (what if they go out of business, “pivot”, etc), potentially time-consuming to keep track of (services for photos may not allow videos, or at least not full resolution ones, etc), limited in various ways (max file sizes, storage allotments, etc), not to mention bringing up serious privacy concerns. Different people need different things, but what I need, and have built (hence this post describing the system), fulfills the following requirements:
-
-
(Home) scalable – i.e., any reasonable amount of data that I could generate personally I should be able to dump in one place, and be confident that it won’t go away. What makes up the bulk is photos, some music, and some audio and video files. For me, this is currently about 1TB (+-0.5TB).
-
Cheap. I’m willing to pay about $100-200/year total (including hardware).
-
Simple. There has to be one place where I can dump files, it has to be simple enough to recover from complete failure of any given piece of hardware even if I haven’t touched it in a long time (because if it is working, I won’t have had to tweak it in months / years). Adding & organizing files should be doable without commandline familiarity, so it can serve my whole home.
-
Safe. Anything that’s not in my physical control should be encrypted.
-
Reasonably reliable. Redundancy across hardware, geographic locations, etc. This is obviously balanced with other concerns (in particular, 2 and 3)!
-
-
I’ve tried various solutions, but what I’ve ended up with seems to be working pretty well (most of it has been running for about a year; some parts are more recent, and a few have been running for much longer). It’s a combination of some cheap hardware, inexpensive cloud storage, and decent backup software.
-
Why not an off-the-shelf NAS?
-
In the past, I tried one (it was a Buffalo model). I wasn’t impressed by the software (which was hard to upgrade, install other stuff on it, maintain, etc), the power consumption (this was several years ago, but idle the two-drive system used over 30watts, which is the same power that my similarly aged quad core workstation uses when idle!). Also, a critical element of this system for me is that there is an off-site component, so getting that software on it is extremely important, and I’d rather have a well-supported linux computer to deal with rather than something esoteric. Obviously this depends in the particular NAS you get, but the system below is perfect for me. In particular, setting up and experimenting with the below was much cheaper than dropping hundreds more dollars on a new NAS that may not have worked any better than the old one, and once I had it working, there was certainly no point in going back!
-
Hardware
-
-
$70 - Raspberry Pi 3. This consumes very little power (a little over 1W without the disks, probably around 10W with them spinning, more like 3W when they are idling), takes up very little space, but seems plenty fast enough to act as a file server. That price includes a case, heat-sink, SD card, power adaptor, etc. If you had any of these things, you can probably get a cheaper kit (the single board itself is around $35). Note that you really want a heat-sink on the processor. I ran without it for a while (forgot to install it) and it would overheat and hard lock. It’s a tradeoff that they put a much faster processor in these than in prior generations – I think it’s worth it (it’s an amazingly capably computer for the size/price).
-
$75 - Three external USB SATA hard drive enclosures. You might be able to find these cheaper – the ones I got were metal, which seemed good in terms of heat dissipation, and have been running for a little over a year straight without a problem (note: this is actually one more than I’m using at any given time, to make it easier to rotate in new drives; BTRFS, which I’m using, allows you to just physically remove a drive and add a new one, but the preferred method is to have both attached, and issue a replace command. I’m not sure how much this matters, but for $25, I went with the extra enclosure).
-
$170 - Two 2TB WD Red SATA drives. These are actually recent upgrades – the server was been running on older 1TB Green drives (four and five years old respectively), but one of them started reporting failures (I would speculate the older of the two, but I didn’t check) so I replaced both. The cheaper blue drives probably would have been fine (the Greens that the Blues have replaced certainly have lasted well enough, running nearly 24/7 for years), but the “intended to run 24/7” Red ones were only $20 more each so I thought I might as well spring for them.
-
-
Cloud
-
-
Backblaze B2. This seems to be the cheapest storage that scales down to storing nothing. At my usage (0.5-2TB) it costs about $3-10/month, which is a good amount, and given that it is one of three copies (the other two being on the two hard drives I have attached to the Pi) I’m not worried about the missing reliability vs for example Amazon S3 (B2 gives 8 9s of durability vs S3 at 11 9s, but to get that S3 charges you 3-4x as much).
-
-
Software
-
-
The Raspberry Pi is running Raspbian (Debian distributed for the Raspberry Pi). This seems to be the best supported Linux distribution, and I’ve used Debian on servers & desktops for maybe 10 years now, so it’s a no-brainer. The external hard drives are a RAID1 with BTRFS. If I were doing it from scratch, I would look into ZFS, but I’ve been migrating this same data over different drives and home servers (on the same file system) since ZFS was essentially totally experimental on Linux, and on Linux, for RAID1, BTRFS seems totally stable (people do not say the same thing about RAID5/6).
-
The point is, you should use an advanced file system in RAID1 (on ZFS you could go higher, but I prefer simplicity and the power consumption of having just two drives, and can afford to pay for the wasted drive space) that can detect&correct errors, lets you swap in new drives and migrate out old ones, migrate to larger drives, etc. This is essentially the feature-set that both ZFS and BTRFS have, but the former is considered to be more stable and the latter has been in linux for longer.
-
For backups, I’m using Duplicacy, which is annoyingly similarly named to a much older backup tool called Duplicity (there also seems to be another tool called Duplicati, which I haven’t tried. Couldn’t backup tools get more creative with names? How about calling a tool “albatross”?). It’s also annoyingly not free software, but for personal use, the command-line version (which is the only version that I would be using) is free-as-in-beer. I actually settled on this after trying and failing to use (actually open-source) competitors:
-
First, I tried the aforementioned Duplicity (using its friendly frontend duply). I actually was able to make some full backups (the full size of the archive was around 600GB), but then it started erroring out because it would out-of-memory when trying to unpack the file lists. The backup format of Duplicity is not super efficient, but it is very simple (which was appealing – just tar files and various indexes with lists of files). Unfortunately, some operations need memory that seems to scale with the size of the currently backed up archive, which is a non-starter for my little server with 1GB of ram (and in general shouldn’t be acceptable for backup software, but…)
-
I next tried a newer option, restic. This has a more efficient backup format, but also had the same problem of running out of memory, though it wasn’t even able to make a backup (though that was probably a good thing, as I wasted less time!). They are aware of it (see, e.g., this issue, so maybe at some point it’ll be an option, but that issue is almost two years old so ho hum…).
-
So finally I went with the bizarrely sort-of-but-not-really open-source option, Duplicacy. I found other people talking about running it on a Raspberry Pi, and it seemed like the primary place where memory consumption could become a problem was the number of threads used to upload, which thankfully is an argument. I settled on 16 and it seems to work fine (i.e., duplicacy backup -stats -threads 16) – the memory consumption seems to hover below 60%, which leaves a very healthy buffer for anything else that’s going on (or periodic little jumps), and regardless, more threads don’t seem to get it to work faster.
-
The documentation on how to use the command-line version is a little sparse (there is a GUI version that costs money), but once I figured out that to configure it to connect automatically to my B2 account I needed a file .duplicacy/preferences that looked like (see keys section; the rest will probably be written out for you if you run duplicacy first; alternatively, just put this file in place and everything will be set up):
-
[
+
+
+
+ (Cheap) home backups
+
+
+ Backing things up is important. Some stuff, like code that lives in repositories, may naturally end up in many places, so it perhaps is less important to explicitly back up. Other files, like photos, or personal documents, generally don’t have a natural redundant home, so they need some backup story, and relying on various online services is risky (what if they go out of business, “pivot”, etc), potentially time-consuming to keep track of (services for photos may not allow videos, or at least not full resolution ones, etc), limited in various ways (max file sizes, storage allotments, etc), not to mention bringing up serious privacy concerns. Different people need different things, but what I need, and have built (hence this post describing the system), fulfills the following requirements:
+
+
+
+ (Home) scalable – i.e., any reasonable amount of data that I could generate personally I should be able to dump in one place, and be confident that it won’t go away. What makes up the bulk is photos, some music, and some audio and video files. For me, this is currently about 1TB (+-0.5TB).
+
+
+ Cheap. I’m willing to pay about $100-200/year total (including hardware).
+
+
+ Simple. There has to be one place where I can dump files, it has to be simple enough to recover from complete failure of any given piece of hardware even if I haven’t touched it in a long time (because if it is working, I won’t have had to tweak it in months / years). Adding & organizing files should be doable without commandline familiarity, so it can serve my whole home.
+
+
+ Safe. Anything that’s not in my physical control should be encrypted.
+
+
+ Reasonably reliable. Redundancy across hardware, geographic locations, etc. This is obviously balanced with other concerns (in particular, 2 and 3)!
+
+
+
+ I’ve tried various solutions, but what I’ve ended up with seems to be working pretty well (most of it has been running for about a year; some parts are more recent, and a few have been running for much longer). It’s a combination of some cheap hardware, inexpensive cloud storage, and decent backup software.
+
+
+ Why not an off-the-shelf NAS?
+
+
+ In the past, I tried one (it was a Buffalo model). I wasn’t impressed by the software (which was hard to upgrade, install other stuff on it, maintain, etc), the power consumption (this was several years ago, but idle the two-drive system used over 30watts, which is the same power that my similarly aged quad core workstation uses when idle!). Also, a critical element of this system for me is that there is an off-site component, so getting that software on it is extremely important, and I’d rather have a well-supported linux computer to deal with rather than something esoteric. Obviously this depends in the particular NAS you get, but the system below is perfect for me. In particular, setting up and experimenting with the below was much cheaper than dropping hundreds more dollars on a new NAS that may not have worked any better than the old one, and once I had it working, there was certainly no point in going back!
+
+
+ Hardware
+
+
+
+
+ $70 - Raspberry Pi 3. This consumes very little power (a little over 1W without the disks, probably around 10W with them spinning, more like 3W when they are idling), takes up very little space, but seems plenty fast enough to act as a file server. That price includes a case, heat-sink, SD card, power adaptor, etc. If you had any of these things, you can probably get a cheaper kit (the single board itself is around $35). Note that you really want a heat-sink on the processor. I ran without it for a while (forgot to install it) and it would overheat and hard lock. It’s a tradeoff that they put a much faster processor in these than in prior generations – I think it’s worth it (it’s an amazingly capably computer for the size/price).
+
+
+
+
+ $75 - Three external USB SATA hard drive enclosures. You might be able to find these cheaper – the ones I got were metal, which seemed good in terms of heat dissipation, and have been running for a little over a year straight without a problem (note: this is actually one more than I’m using at any given time, to make it easier to rotate in new drives; BTRFS, which I’m using, allows you to just physically remove a drive and add a new one, but the preferred method is to have both attached, and issue a replace command. I’m not sure how much this matters, but for $25, I went with the extra enclosure).
+
+
+
+
+ $170 - Two 2TB WD Red SATA drives. These are actually recent upgrades – the server was been running on older 1TB Green drives (four and five years old respectively), but one of them started reporting failures (I would speculate the older of the two, but I didn’t check) so I replaced both. The cheaper blue drives probably would have been fine (the Greens that the Blues have replaced certainly have lasted well enough, running nearly 24/7 for years), but the “intended to run 24/7” Red ones were only $20 more each so I thought I might as well spring for them.
+
+
+
+
+ Cloud
+
+
+
+ Backblaze B2. This seems to be the cheapest storage that scales down to storing nothing. At my usage (0.5-2TB) it costs about $3-10/month, which is a good amount, and given that it is one of three copies (the other two being on the two hard drives I have attached to the Pi) I’m not worried about the missing reliability vs for example Amazon S3 (B2 gives 8 9s of durability vs S3 at 11 9s, but to get that S3 charges you 3-4x as much).
+
+
+
+ Software
+
+
+
+
+ The Raspberry Pi is running Raspbian (Debian distributed for the Raspberry Pi). This seems to be the best supported Linux distribution, and I’ve used Debian on servers & desktops for maybe 10 years now, so it’s a no-brainer. The external hard drives are a RAID1 with BTRFS. If I were doing it from scratch, I would look into ZFS, but I’ve been migrating this same data over different drives and home servers (on the same file system) since ZFS was essentially totally experimental on Linux, and on Linux, for RAID1, BTRFS seems totally stable (people do not say the same thing about RAID5/6).
+
+
+ The point is, you should use an advanced file system in RAID1 (on ZFS you could go higher, but I prefer simplicity and the power consumption of having just two drives, and can afford to pay for the wasted drive space) that can detect&correct errors, lets you swap in new drives and migrate out old ones, migrate to larger drives, etc. This is essentially the feature-set that both ZFS and BTRFS have, but the former is considered to be more stable and the latter has been in linux for longer.
+
+
+
+
+ For backups, I’m using Duplicacy, which is annoyingly similarly named to a much older backup tool called Duplicity (there also seems to be another tool called Duplicati, which I haven’t tried. Couldn’t backup tools get more creative with names? How about calling a tool “albatross”?). It’s also annoyingly not free software, but for personal use, the command-line version (which is the only version that I would be using) is free-as-in-beer. I actually settled on this after trying and failing to use (actually open-source) competitors:
+
+
+ First, I tried the aforementioned Duplicity (using its friendly frontend duply). I actually was able to make some full backups (the full size of the archive was around 600GB), but then it started erroring out because it would out-of-memory when trying to unpack the file lists. The backup format of Duplicity is not super efficient, but it is very simple (which was appealing – just tar files and various indexes with lists of files). Unfortunately, some operations need memory that seems to scale with the size of the currently backed up archive, which is a non-starter for my little server with 1GB of ram (and in general shouldn’t be acceptable for backup software, but…)
+
+
+ I next tried a newer option, restic. This has a more efficient backup format, but also had the same problem of running out of memory, though it wasn’t even able to make a backup (though that was probably a good thing, as I wasted less time!). They are aware of it (see, e.g., this issue, so maybe at some point it’ll be an option, but that issue is almost two years old so ho hum…).
+
+
+ So finally I went with the bizarrely sort-of-but-not-really open-source option, Duplicacy. I found other people talking about running it on a Raspberry Pi, and it seemed like the primary place where memory consumption could become a problem was the number of threads used to upload, which thankfully is an argument. I settled on 16 and it seems to work fine (i.e., duplicacy backup -stats -threads 16) – the memory consumption seems to hover below 60%, which leaves a very healthy buffer for anything else that’s going on (or periodic little jumps), and regardless, more threads don’t seem to get it to work faster.
+
+
+ The documentation on how to use the command-line version is a little sparse (there is a GUI version that costs money), but once I figured out that to configure it to connect automatically to my B2 account I needed a file .duplicacy/preferences that looked like (see keys section; the rest will probably be written out for you if you run duplicacy first; alternatively, just put this file in place and everything will be set up):
+
Everything else was pretty much smooth sailing (though, as per usual, the initial backup is quite slow. The Raspberry Pi 3 processor is certainly much faster than previous Raspberry Pis, and fast enough for this purpose, but it definitely still has to work hard! And my residential cable upstream is not all that impressive. After a couple days though, the initial backup will complete!).
-
Periodic backups run with the same command, and intermediate ones can be pruned away as well (I use duplicacy prune -keep 30:180 -keep 7:30 -keep 1:1, run after my daily backup, to keep monthly backups beyond 6 months, weekly beyond 1 month, and daily below that. I have a cron job that runs the backup daily, so the last is not strictly necessary, but if I do manual backups it’ll clean them up over time. Since I pretty much never delete files that are put into this archive, pruning isn’t really about saving space, as barring some error on the server the latest backup should contain every file, but it is nice to have the list of snapshots be more manageable).
-
To restore from total loss of the Pi, you just need to put the config file above into .duplicacy/preferences relative to the current directory on any machine and you can run duplicacy restore. You can also grab individual files (which I tested on a different machine; I haven’t tested restoring a full backup) by creating the above mentioned file and then running duplicacy list -files -r N (where N is the snapshot you want to get the file from; run duplicacy list to find which one you want) and then to get a file duplicacy cat -r N path/to/file > where/to/put/it.
-
I’m still working out how to detect errors in the hard drives automatically. I can see them manually by running sudo btrfs device stats /mntpoint (which I do periodically). When this shows that a drive is failing (i.e., read/write errors), add a new drive to the spare enclosure, format it, and then run sudo btrfs replace start -f N /dev/sdX /mntpoint where N is the number of the device that is failing (when you run sudo btrfs fi show /mntpoint) and /dev/sdX is the new drive. To check for and correct errors in the file system (not the underlying drive), run sudo btrfs scrub start /mntpoint. This will run in the background; if you care you can check the status with sudo btrfs scrub status /mntpoint. Based on recommendations, I have the scrub process run monthly via a cron job.
-
If you want to expand the capacity of the disks, replace the drives as if they failed (see previous bullet) and then run sudo btrfs fi resize N:max /mntpoint for each N (run sudo btrfs fi show to see what your dev ids are). When you replace them, they stay at the same capacity – this resize expands the filesystem to the full device. As I mentioned earlier, I did this to replace 1TB WD Green drives with 2TB WD Red drives (so I replaced one, then the next, then did the resize on both).
-
For tech people (i.e., who are comfortable with scp), this setup is enough – just get files onto the server, into the right directory, and it’ll be all set. For less tech-savvy people, you can install samba on the raspberry pi and then set up a share like the following (put this at the bottom of /etc/samba/smb.conf):
-
[sharename]
+
+ Everything else was pretty much smooth sailing (though, as per usual, the initial backup is quite slow. The Raspberry Pi 3 processor is certainly much faster than previous Raspberry Pis, and fast enough for this purpose, but it definitely still has to work hard! And my residential cable upstream is not all that impressive. After a couple days though, the initial backup will complete!).
+
+
+ Periodic backups run with the same command, and intermediate ones can be pruned away as well (I use duplicacy prune -keep 30:180 -keep 7:30 -keep 1:1, run after my daily backup, to keep monthly backups beyond 6 months, weekly beyond 1 month, and daily below that. I have a cron job that runs the backup daily, so the last is not strictly necessary, but if I do manual backups it’ll clean them up over time. Since I pretty much never delete files that are put into this archive, pruning isn’t really about saving space, as barring some error on the server the latest backup should contain every file, but it is nice to have the list of snapshots be more manageable).
+
+
+ To restore from total loss of the Pi, you just need to put the config file above into .duplicacy/preferences relative to the current directory on any machine and you can run duplicacy restore. You can also grab individual files (which I tested on a different machine; I haven’t tested restoring a full backup) by creating the above mentioned file and then running duplicacy list -files -r N (where N is the snapshot you want to get the file from; run duplicacy list to find which one you want) and then to get a file duplicacy cat -r N path/to/file > where/to/put/it.
+
+
+
+
+ I’m still working out how to detect errors in the hard drives automatically. I can see them manually by running sudo btrfs device stats /mntpoint (which I do periodically). When this shows that a drive is failing (i.e., read/write errors), add a new drive to the spare enclosure, format it, and then run sudo btrfs replace start -f N /dev/sdX /mntpoint where N is the number of the device that is failing (when you run sudo btrfs fi show /mntpoint) and /dev/sdX is the new drive. To check for and correct errors in the file system (not the underlying drive), run sudo btrfs scrub start /mntpoint. This will run in the background; if you care you can check the status with sudo btrfs scrub status /mntpoint. Based on recommendations, I have the scrub process run monthly via a cron job.
+
+
+
+
+ If you want to expand the capacity of the disks, replace the drives as if they failed (see previous bullet) and then run sudo btrfs fi resize N:max /mntpoint for each N (run sudo btrfs fi show to see what your dev ids are). When you replace them, they stay at the same capacity – this resize expands the filesystem to the full device. As I mentioned earlier, I did this to replace 1TB WD Green drives with 2TB WD Red drives (so I replaced one, then the next, then did the resize on both).
+
+
+
+
+ For tech people (i.e., who are comfortable with scp), this setup is enough – just get files onto the server, into the right directory, and it’ll be all set. For less tech-savvy people, you can install samba on the raspberry pi and then set up a share like the following (put this at the bottom of /etc/samba/smb.conf):
+
Then set pis password with sudo smbpasswd -i pi. Now restart the service with sudo /etc/init.d/sambda restart and then from a mac (and probably windows; not sure how as I don’t have any in my house) you can connect to the pi with the “Connect to Server” interface, connect as the pi user with the password you set, and see the share. Note that to be able to make changes, the /mntpoint (and what’s in it) needs to be writeable by the pi user. You can also use a different user, set up samba differently, etc.
-
-
Summary
-
The system described above runs 24/7 in my home. It cost $325 in hardware (which, if you want to skip the extra USB enclosure to start and use WD Blue drives rather than Red ones you can cut $65 – i.e., $260 total), $1/month in electricity (I haven’t measured this carefully, but that’s what 10W costs where I live) and currently costs about $3/month in cloud storage, though that will go up over time, so to be more fair let’s say $5/month. Assuming no hardware replacements for three years (which is the warrantee on the hard drives I have, so a decent estimate), the total cost over that time is $325 + $54 + $170 = $549, or around $180 per year, which is squarely in the range that I wanted.
-
-
+
+ Then set pis password with sudo smbpasswd -i pi. Now restart the service with sudo /etc/init.d/sambda restart and then from a mac (and probably windows; not sure how as I don’t have any in my house) you can connect to the pi with the “Connect to Server” interface, connect as the pi user with the password you set, and see the share. Note that to be able to make changes, the /mntpoint (and what’s in it) needs to be writeable by the pi user. You can also use a different user, set up samba differently, etc.
+
+
+
+
+ Summary
+
+
+ The system described above runs 24/7 in my home. It cost $325 in hardware (which, if you want to skip the extra USB enclosure to start and use WD Blue drives rather than Red ones you can cut $65 – i.e., $260 total), $1/month in electricity (I haven’t measured this carefully, but that’s what 10W costs where I live) and currently costs about $3/month in cloud storage, though that will go up over time, so to be more fair let’s say $5/month. Assuming no hardware replacements for three years (which is the warrantee on the hard drives I have, so a decent estimate), the total cost over that time is $325 + $54 + $170 = $549, or around $180 per year, which is squarely in the range that I wanted.
+
+ Backing things up is important. Some stuff, like code that lives in repositories, may naturally end up in many places, so it perhaps is less important to explicitly back up. Other files, like photos, or personal documents, generally don’t have a natural redundant home, so they need some backup story, and relying on various online services is risky (what if they go out of business, “pivot”, etc), potentially time-consuming to keep track of (services for photos may not allow videos, or at least not full resolution ones, etc), limited in various ways (max file sizes, storage allotments, etc), not to mention bringing up serious privacy concerns. Different people need different things, but what I need, and have built (hence this post describing the system), fulfills the following requirements:
+
+
+
+ (Home) scalable – i.e., any reasonable amount of data that I could generate personally I should be able to dump in one place, and be confident that it won’t go away. What makes up the bulk is photos, some music, and some audio and video files. For me, this is currently about 1TB (+-0.5TB).
+
+
+ Cheap. I’m willing to pay about $100-200/year total (including hardware).
+
+
+ Simple. There has to be one place where I can dump files, it has to be simple enough to recover from complete failure of any given piece of hardware even if I haven’t touched it in a long time (because if it is working, I won’t have had to tweak it in months / years). Adding & organizing files should be doable without commandline familiarity, so it can serve my whole home.
+
+
+ Safe. Anything that’s not in my physical control should be encrypted.
+
+
+ Reasonably reliable. Redundancy across hardware, geographic locations, etc. This is obviously balanced with other concerns (in particular, 2 and 3)!
+
+
+
+ I’ve tried various solutions, but what I’ve ended up with seems to be working pretty well (most of it has been running for about a year; some parts are more recent, and a few have been running for much longer). It’s a combination of some cheap hardware, inexpensive cloud storage, and decent backup software.
+
+
+ Why not an off-the-shelf NAS?
+
+
+ In the past, I tried one (it was a Buffalo model). I wasn’t impressed by the software (which was hard to upgrade, install other stuff on it, maintain, etc), the power consumption (this was several years ago, but idle the two-drive system used over 30watts, which is the same power that my similarly aged quad core workstation uses when idle!). Also, a critical element of this system for me is that there is an off-site component, so getting that software on it is extremely important, and I’d rather have a well-supported linux computer to deal with rather than something esoteric. Obviously this depends in the particular NAS you get, but the system below is perfect for me. In particular, setting up and experimenting with the below was much cheaper than dropping hundreds more dollars on a new NAS that may not have worked any better than the old one, and once I had it working, there was certainly no point in going back!
+
+
+ Hardware
+
+
+
+
+ $70 - Raspberry Pi 3. This consumes very little power (a little over 1W without the disks, probably around 10W with them spinning, more like 3W when they are idling), takes up very little space, but seems plenty fast enough to act as a file server. That price includes a case, heat-sink, SD card, power adaptor, etc. If you had any of these things, you can probably get a cheaper kit (the single board itself is around $35). Note that you really want a heat-sink on the processor. I ran without it for a while (forgot to install it) and it would overheat and hard lock. It’s a tradeoff that they put a much faster processor in these than in prior generations – I think it’s worth it (it’s an amazingly capably computer for the size/price).
+
+
+
+
+ $75 - Three external USB SATA hard drive enclosures. You might be able to find these cheaper – the ones I got were metal, which seemed good in terms of heat dissipation, and have been running for a little over a year straight without a problem (note: this is actually one more than I’m using at any given time, to make it easier to rotate in new drives; BTRFS, which I’m using, allows you to just physically remove a drive and add a new one, but the preferred method is to have both attached, and issue a replace command. I’m not sure how much this matters, but for $25, I went with the extra enclosure).
+
+
+
+
+ $170 - Two 2TB WD Red SATA drives. These are actually recent upgrades – the server was been running on older 1TB Green drives (four and five years old respectively), but one of them started reporting failures (I would speculate the older of the two, but I didn’t check) so I replaced both. The cheaper blue drives probably would have been fine (the Greens that the Blues have replaced certainly have lasted well enough, running nearly 24/7 for years), but the “intended to run 24/7” Red ones were only $20 more each so I thought I might as well spring for them.
+
+
+
+
+ Cloud
+
+
+
+ Backblaze B2. This seems to be the cheapest storage that scales down to storing nothing. At my usage (0.5-2TB) it costs about $3-10/month, which is a good amount, and given that it is one of three copies (the other two being on the two hard drives I have attached to the Pi) I’m not worried about the missing reliability vs for example Amazon S3 (B2 gives 8 9s of durability vs S3 at 11 9s, but to get that S3 charges you 3-4x as much).
+
+
+
+ Software
+
+
+
+
+ The Raspberry Pi is running Raspbian (Debian distributed for the Raspberry Pi). This seems to be the best supported Linux distribution, and I’ve used Debian on servers & desktops for maybe 10 years now, so it’s a no-brainer. The external hard drives are a RAID1 with BTRFS. If I were doing it from scratch, I would look into ZFS, but I’ve been migrating this same data over different drives and home servers (on the same file system) since ZFS was essentially totally experimental on Linux, and on Linux, for RAID1, BTRFS seems totally stable (people do not say the same thing about RAID5/6).
+
+
+ The point is, you should use an advanced file system in RAID1 (on ZFS you could go higher, but I prefer simplicity and the power consumption of having just two drives, and can afford to pay for the wasted drive space) that can detect&correct errors, lets you swap in new drives and migrate out old ones, migrate to larger drives, etc. This is essentially the feature-set that both ZFS and BTRFS have, but the former is considered to be more stable and the latter has been in linux for longer.
+
+
+
+
+ For backups, I’m using Duplicacy, which is annoyingly similarly named to a much older backup tool called Duplicity (there also seems to be another tool called Duplicati, which I haven’t tried. Couldn’t backup tools get more creative with names? How about calling a tool “albatross”?). It’s also annoyingly not free software, but for personal use, the command-line version (which is the only version that I would be using) is free-as-in-beer. I actually settled on this after trying and failing to use (actually open-source) competitors:
+
+
+ First, I tried the aforementioned Duplicity (using its friendly frontend duply). I actually was able to make some full backups (the full size of the archive was around 600GB), but then it started erroring out because it would out-of-memory when trying to unpack the file lists. The backup format of Duplicity is not super efficient, but it is very simple (which was appealing – just tar files and various indexes with lists of files). Unfortunately, some operations need memory that seems to scale with the size of the currently backed up archive, which is a non-starter for my little server with 1GB of ram (and in general shouldn’t be acceptable for backup software, but…)
+
+
+ I next tried a newer option, restic. This has a more efficient backup format, but also had the same problem of running out of memory, though it wasn’t even able to make a backup (though that was probably a good thing, as I wasted less time!). They are aware of it (see, e.g., this issue, so maybe at some point it’ll be an option, but that issue is almost two years old so ho hum…).
+
+
+ So finally I went with the bizarrely sort-of-but-not-really open-source option, Duplicacy. I found other people talking about running it on a Raspberry Pi, and it seemed like the primary place where memory consumption could become a problem was the number of threads used to upload, which thankfully is an argument. I settled on 16 and it seems to work fine (i.e., duplicacy backup -stats -threads 16) – the memory consumption seems to hover below 60%, which leaves a very healthy buffer for anything else that’s going on (or periodic little jumps), and regardless, more threads don’t seem to get it to work faster.
+
+
+ The documentation on how to use the command-line version is a little sparse (there is a GUI version that costs money), but once I figured out that to configure it to connect automatically to my B2 account I needed a file .duplicacy/preferences that looked like (see keys section; the rest will probably be written out for you if you run duplicacy first; alternatively, just put this file in place and everything will be set up):
+
+ Everything else was pretty much smooth sailing (though, as per usual, the initial backup is quite slow. The Raspberry Pi 3 processor is certainly much faster than previous Raspberry Pis, and fast enough for this purpose, but it definitely still has to work hard! And my residential cable upstream is not all that impressive. After a couple days though, the initial backup will complete!).
+
+
+ Periodic backups run with the same command, and intermediate ones can be pruned away as well (I use duplicacy prune -keep 30:180 -keep 7:30 -keep 1:1, run after my daily backup, to keep monthly backups beyond 6 months, weekly beyond 1 month, and daily below that. I have a cron job that runs the backup daily, so the last is not strictly necessary, but if I do manual backups it’ll clean them up over time. Since I pretty much never delete files that are put into this archive, pruning isn’t really about saving space, as barring some error on the server the latest backup should contain every file, but it is nice to have the list of snapshots be more manageable).
+
+
+ To restore from total loss of the Pi, you just need to put the config file above into .duplicacy/preferences relative to the current directory on any machine and you can run duplicacy restore. You can also grab individual files (which I tested on a different machine; I haven’t tested restoring a full backup) by creating the above mentioned file and then running duplicacy list -files -r N (where N is the snapshot you want to get the file from; run duplicacy list to find which one you want) and then to get a file duplicacy cat -r N path/to/file > where/to/put/it.
+
+
+
+
+ I’m still working out how to detect errors in the hard drives automatically. I can see them manually by running sudo btrfs device stats /mntpoint (which I do periodically). When this shows that a drive is failing (i.e., read/write errors), add a new drive to the spare enclosure, format it, and then run sudo btrfs replace start -f N /dev/sdX /mntpoint where N is the number of the device that is failing (when you run sudo btrfs fi show /mntpoint) and /dev/sdX is the new drive. To check for and correct errors in the file system (not the underlying drive), run sudo btrfs scrub start /mntpoint. This will run in the background; if you care you can check the status with sudo btrfs scrub status /mntpoint. Based on recommendations, I have the scrub process run monthly via a cron job.
+
+
+
+
+ If you want to expand the capacity of the disks, replace the drives as if they failed (see previous bullet) and then run sudo btrfs fi resize N:max /mntpoint for each N (run sudo btrfs fi show to see what your dev ids are). When you replace them, they stay at the same capacity – this resize expands the filesystem to the full device. As I mentioned earlier, I did this to replace 1TB WD Green drives with 2TB WD Red drives (so I replaced one, then the next, then did the resize on both).
+
+
+
+
+ For tech people (i.e., who are comfortable with scp), this setup is enough – just get files onto the server, into the right directory, and it’ll be all set. For less tech-savvy people, you can install samba on the raspberry pi and then set up a share like the following (put this at the bottom of /etc/samba/smb.conf):
+
+
[sharename]
+comment = Descriptive name
+path = /mntpoint
+browseable = yes
+writeable = yes
+read only = no
+only guest = no
+create mask = 0777
+directory mask = 0777
+public = yes
+guest ok = no
+
+ Then set pis password with sudo smbpasswd -i pi. Now restart the service with sudo /etc/init.d/sambda restart and then from a mac (and probably windows; not sure how as I don’t have any in my house) you can connect to the pi with the “Connect to Server” interface, connect as the pi user with the password you set, and see the share. Note that to be able to make changes, the /mntpoint (and what’s in it) needs to be writeable by the pi user. You can also use a different user, set up samba differently, etc.
+
+
+
+
+ Summary
+
+
+ The system described above runs 24/7 in my home. It cost $325 in hardware (which, if you want to skip the extra USB enclosure to start and use WD Blue drives rather than Red ones you can cut $65 – i.e., $260 total), $1/month in electricity (I haven’t measured this carefully, but that’s what 10W costs where I live) and currently costs about $3/month in cloud storage, though that will go up over time, so to be more fair let’s say $5/month. Assuming no hardware replacements for three years (which is the warrantee on the hard drives I have, so a decent estimate), the total cost over that time is $325 + $54 + $170 = $549, or around $180 per year, which is squarely in the range that I wanted.
+
+
+
+
diff --git a/_site/essays/2018-01-16-how-to-prove-a-compiler-correct.html b/_site/essays/2018-01-16-how-to-prove-a-compiler-correct.html
index 40d42ea..67c1dbc 100644
--- a/_site/essays/2018-01-16-how-to-prove-a-compiler-correct.html
+++ b/_site/essays/2018-01-16-how-to-prove-a-compiler-correct.html
@@ -1,15 +1,17 @@
-
-
-
-
- dbp.io :: How to prove a compiler correct
-
-
-
-
-
- Daniel Patterson
+
+
+
+
+
+ dbp.io :: How to prove a compiler correct
+
+
+
+
+
+
At POPL’18 (Principles of Programming Languages) last week, I ended up talking to Annie Cherkaev about her really cool DSL (domain specific language) SweetPea (which she presented at Off the Beaten Track 18, a workshop colocated with POPL), which is a “SAT-Sampler aided language for experimental design, targeted for Psychology & Neuroscience”. In particular, we were talking about software engineering, and the work that Annie was doing to test SweetPea and increase her confidence that the implementation is correct!
-
The topic of how exactly one goes about proving a compiler correct came up, and I realized that I couldn’t think of a high-level (but concrete) overview of what that might look like. Also, like many compilers, hers is implemented in Haskell, so it seemed like a good opportunity to try out the really cool work presented at the colocated conference CPP’18 (Certified Programs and Proofs) titled “Total Haskell is Reasonable Coq” by Spector-Zabusky, Breitner, Rizkallah, and Weirich. They have a tool (hs-to-coq) that extracts Coq definitions from (certain) terminating Haskell programs (of which at least small compilers hopefully qualify). There are certainly limitations to this approach (see Addendum at the bottom of the page for some discussion), but it seems very promising from an engineering perspective.
-
The intention of this post is twofold:
-
-
Show how to take a compiler (albeit a tiny one) that was built with no intention of verifying it and after the fact prove it correct. Part of the ability to do this in such a seamless way is the wonderful hs-to-coq tool mentioned above, though there is no reason in principle you couldn’t carry out this translation manually (in practice maintenance becomes an issue, hence realistic verified compilers relying on writing their implementations within theorem provers like Coq and then extracting executable versions automatically, at least in the past – possibly hs-to-coq could change this workflow).
-
Give a concrete example of proving compiler correctness. By necessity, this is a very simplified scenario without a lot of the subtleties that appear in real verification efforts (e.g., undefined behavior, multiple compiler passes, linking with code after compilation, etc). On the other hand, even this simplified scenario could cover many cases of DSLs, and understanding the subtleties that come up should be much easier once you understand the basic case!
-
-
The intended audience is: people who know what compilers are (and may have implemented them!) but aren’t sure what it means to prove one correct!
-
-
All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoproveacompiler. If you have any trouble getting it going, open an issue on that repository.
-
-
DSL & Compiler
-
To make this simple, my source language is arithmetic expressions with adding, subtraction, and multiplication. I represent this as an explicit data structure in Haskell:
And a program is an Arith. For example, the source expression “1 + (2 * 4)” is represented as Plus 1 (Times 2 4). The target of this is a sequence of instructions for a stack machine. The idea of the stack machine is that there is a stack of values that can be used by instructions. The target language expressions are:
-
dataStackOp=SNumInt
-|SPlus
-|SMinus
-|STimes
-
And a program is a [StackOp]. For example, the previous example “1 + (2 * 4)” could be represented as [SNum 1, SNum 2, SNum 4, STimes, SPlus]. The idea is that a number evaluates to pushing it onto the stack and plus/times evaluate by popping two numbers off the stack and pushing the sum/product respectively back on. But we can make this concrete by writing an eval function that takes an initial stack (which will probably be empty), a program, and either produces an integer (the top of the stack after all the instructions run) or an error (which, for debugging sake, is the state of the stack and rest of the program when it got stuck).
Now that we have our source and target language, and know how the target works, we can implement our compiler. Part of why this is a good small example is that the compiler is very simple!
The cases for plus/minus/times are the cases that are slightly non-obvious, because they can contain further recursive expressions, but if you think about what the eval function is doing, once the stack machine finishes evaluating everything that a2 compiled to, the number that the left branch evaluated to should be on the top of the stack. Then once it finishes evaluating what a1 compiles to the number that the right branch evaluated to should be on the top of the stack (the reversal is so that they are in the right order when popped off). This means that evaluating e.g. SPlus will put the sum on the top of the stack, as expected. That’s a pretty informal argument about correctness, but we’ll have a chance to get more formal later.
-
Formalizing
-
Now that we have a Haskell compiler, we want to prove it correct! So what do we do? First, we want to convert this to Coq using the hs-to-coq tool. There are full instructions at https://github.com/dbp/howtoproveacompiler, but the main command that will convert src/Compiler.hs to src/Compiler.v:
And open up src/Proofs.v using a Coq interactive mode (I use Proof General within Emacs; with Spacemacs, this is particularly easy: use the coq layer!).
-
Proving things
-
We now have a Coq version of our compiler, complete with our evaluation function. So we should be able to write down a theorem that we would like to prove. What should the theorem say? Well, there are various things you could prove, but the most basic theorem in compiler correctness says essentially that running the source program and the target program “does the same thing”. This is often stated as “semantics preservation” and is often formally proven by way of a backwards simulation: whatever the target program does, the source program also should do (for a much more thorough discussion of this, check out William Bowman’s blog post, What even is compiler correctness?). In languages with ambiguity (nondeterminism, undefined behavior, this becomes much more complicated, but in our setting, we would state it as:
-
Theorem (informal). For all source arith expressions A, if eval [] (compile A) produces integer N then evaluating A should produce the same number N.
-
The issue that’s immediately apparent is that we don’t actually have a way of directly evaluating the source expression. The only thing we can do with our source expression is compile it, but if we do that, any statement we get has the behavior of the compiler baked into it (so if the compiler is wrong, we will just be proving stuff about our wrong compiler).
-
More philosophically, what does it even mean that the compiler is wrong? For it to be wrong, there has to be some external specification (likely, just in our head at this point) about what it was supposed to do, or in this case, about the behavior of the source language that the compiler was supposed to faithfully preserve. To prove things formally, we need to write that behavior down.
-
So we should add this function to our Haskell source. In a non-trivial DSL, this may be a significant part of the formalization process, but it is also incredibly important, because this is the part where you are actually specifying exactly what the source DSL means (otherwise, the only “meaning” it has is whatever the compiler happens to do, bugs and all). In this example, we can write this function as:
And we can re-run hs-to-coq to get it added to our Coq development. We can now formally state the theorem we want to prove as:
-
Theorem compiler_correctness : forall a : Arith,
+
+
+
+ How to prove a compiler correct
+
+
+ At POPL’18 (Principles of Programming Languages) last week, I ended up talking to Annie Cherkaev about her really cool DSL (domain specific language) SweetPea (which she presented at Off the Beaten Track 18, a workshop colocated with POPL), which is a “SAT-Sampler aided language for experimental design, targeted for Psychology & Neuroscience”. In particular, we were talking about software engineering, and the work that Annie was doing to test SweetPea and increase her confidence that the implementation is correct!
+
+
+ The topic of how exactly one goes about proving a compiler correct came up, and I realized that I couldn’t think of a high-level (but concrete) overview of what that might look like. Also, like many compilers, hers is implemented in Haskell, so it seemed like a good opportunity to try out the really cool work presented at the colocated conference CPP’18 (Certified Programs and Proofs) titled “Total Haskell is Reasonable Coq” by Spector-Zabusky, Breitner, Rizkallah, and Weirich. They have a tool (hs-to-coq) that extracts Coq definitions from (certain) terminating Haskell programs (of which at least small compilers hopefully qualify). There are certainly limitations to this approach (see Addendum at the bottom of the page for some discussion), but it seems very promising from an engineering perspective.
+
+
+ The intention of this post is twofold:
+
+
+
+ Show how to take a compiler (albeit a tiny one) that was built with no intention of verifying it and after the fact prove it correct. Part of the ability to do this in such a seamless way is the wonderful hs-to-coq tool mentioned above, though there is no reason in principle you couldn’t carry out this translation manually (in practice maintenance becomes an issue, hence realistic verified compilers relying on writing their implementations within theorem provers like Coq and then extracting executable versions automatically, at least in the past – possibly hs-to-coq could change this workflow).
+
+
+ Give a concrete example of proving compiler correctness. By necessity, this is a very simplified scenario without a lot of the subtleties that appear in real verification efforts (e.g., undefined behavior, multiple compiler passes, linking with code after compilation, etc). On the other hand, even this simplified scenario could cover many cases of DSLs, and understanding the subtleties that come up should be much easier once you understand the basic case!
+
+
+
+ The intended audience is: people who know what compilers are (and may have implemented them!) but aren’t sure what it means to prove one correct!
+
+
+
+ All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoproveacompiler. If you have any trouble getting it going, open an issue on that repository.
+
+
+
+ DSL & Compiler
+
+
+ To make this simple, my source language is arithmetic expressions with adding, subtraction, and multiplication. I represent this as an explicit data structure in Haskell:
+
+ And a program is an Arith. For example, the source expression “1 + (2 * 4)” is represented as Plus 1 (Times 2 4). The target of this is a sequence of instructions for a stack machine. The idea of the stack machine is that there is a stack of values that can be used by instructions. The target language expressions are:
+
+
+
dataStackOp=SNumInt
+|SPlus
+|SMinus
+|STimes
+
+
+ And a program is a [StackOp]. For example, the previous example “1 + (2 * 4)” could be represented as [SNum 1, SNum 2, SNum 4, STimes, SPlus]. The idea is that a number evaluates to pushing it onto the stack and plus/times evaluate by popping two numbers off the stack and pushing the sum/product respectively back on. But we can make this concrete by writing an eval function that takes an initial stack (which will probably be empty), a program, and either produces an integer (the top of the stack after all the instructions run) or an error (which, for debugging sake, is the state of the stack and rest of the program when it got stuck).
+
+ Now that we have our source and target language, and know how the target works, we can implement our compiler. Part of why this is a good small example is that the compiler is very simple!
+
+ The cases for plus/minus/times are the cases that are slightly non-obvious, because they can contain further recursive expressions, but if you think about what the eval function is doing, once the stack machine finishes evaluating everything that a2 compiled to, the number that the left branch evaluated to should be on the top of the stack. Then once it finishes evaluating what a1 compiles to the number that the right branch evaluated to should be on the top of the stack (the reversal is so that they are in the right order when popped off). This means that evaluating e.g. SPlus will put the sum on the top of the stack, as expected. That’s a pretty informal argument about correctness, but we’ll have a chance to get more formal later.
+
+
+ Formalizing
+
+
+ Now that we have a Haskell compiler, we want to prove it correct! So what do we do? First, we want to convert this to Coq using the hs-to-coq tool. There are full instructions at https://github.com/dbp/howtoproveacompiler, but the main command that will convert src/Compiler.hs to src/Compiler.v:
+
+ And open up src/Proofs.v using a Coq interactive mode (I use Proof General within Emacs; with Spacemacs, this is particularly easy: use the coq layer!).
+
+
+ Proving things
+
+
+ We now have a Coq version of our compiler, complete with our evaluation function. So we should be able to write down a theorem that we would like to prove. What should the theorem say? Well, there are various things you could prove, but the most basic theorem in compiler correctness says essentially that running the source program and the target program “does the same thing”. This is often stated as “semantics preservation” and is often formally proven by way of a backwards simulation: whatever the target program does, the source program also should do (for a much more thorough discussion of this, check out William Bowman’s blog post, What even is compiler correctness?). In languages with ambiguity (nondeterminism, undefined behavior, this becomes much more complicated, but in our setting, we would state it as:
+
+
+ Theorem (informal). For all source arith expressions A, if eval [] (compile A) produces integer N then evaluating A should produce the same number N.
+
+
+ The issue that’s immediately apparent is that we don’t actually have a way of directly evaluating the source expression. The only thing we can do with our source expression is compile it, but if we do that, any statement we get has the behavior of the compiler baked into it (so if the compiler is wrong, we will just be proving stuff about our wrong compiler).
+
+
+ More philosophically, what does it even mean that the compiler is wrong? For it to be wrong, there has to be some external specification (likely, just in our head at this point) about what it was supposed to do, or in this case, about the behavior of the source language that the compiler was supposed to faithfully preserve. To prove things formally, we need to write that behavior down.
+
+
+ So we should add this function to our Haskell source. In a non-trivial DSL, this may be a significant part of the formalization process, but it is also incredibly important, because this is the part where you are actually specifying exactly what the source DSL means (otherwise, the only “meaning” it has is whatever the compiler happens to do, bugs and all). In this example, we can write this function as:
+
+ And we can re-run hs-to-coq to get it added to our Coq development. We can now formally state the theorem we want to prove as:
+
+
Theorem compiler_correctness : forall a : Arith,
eval nil (compile a) = Data.Either.Right (eval' a).
-
I’m going to sketch out how this proof went. Proving stuff can be complex, but this maybe gives a sense of some of the thinking that goes into it. To go further, you probably want to take a course if you can find one, or follow a book like:
If you were to prove this on paper, you would proceed by induction on the structure of the arithmetic expression, so let’s start that way. The base case goes away trivially and we can expand the case for plus using:
-
induction a; iauto; simpl.
-
We see (above the line is assumptions, below what you need to prove):
+ I’m going to sketch out how this proof went. Proving stuff can be complex, but this maybe gives a sense of some of the thinking that goes into it. To go further, you probably want to take a course if you can find one, or follow a book like:
+
+ If you were to prove this on paper, you would proceed by induction on the structure of the arithmetic expression, so let’s start that way. The base case goes away trivially and we can expand the case for plus using:
+
+
induction a; iauto; simpl.
+
+ We see (above the line is assumptions, below what you need to prove):
+
Which, if we look at it for a little while, we realize two things:
-
-
Our induction hypotheses really aren’t going to work, intuitively because of the Either — our program won’t produce Right results for the subtrees, so there probably won’t be a way to rely on these hypotheses.
-
On the other hand, what does look like a Lemma we should be able to prove has to do with evaluating a partial program. Rather than trying to induct on the entire statement, we instead try to prove that evaling a compiled term will result in the eval'd term on the top of the stack. This is an instance of a more general pattern – that often the toplevel statement that you want has too much specificity, and you need to instead prove something that is more general and then use it for the specific case. So here’s (a first attempt) at a Lemma we want to prove:
-
-
Lemma eval_step : forall a : Arith, forall xs : list StackOp,
+
+ Which, if we look at it for a little while, we realize two things:
+
+
+
+ Our induction hypotheses really aren’t going to work, intuitively because of the Either — our program won’t produce Right results for the subtrees, so there probably won’t be a way to rely on these hypotheses.
+
+
+ On the other hand, what does look like a Lemma we should be able to prove has to do with evaluating a partial program. Rather than trying to induct on the entire statement, we instead try to prove that evaling a compiled term will result in the eval'd term on the top of the stack. This is an instance of a more general pattern – that often the toplevel statement that you want has too much specificity, and you need to instead prove something that is more general and then use it for the specific case. So here’s (a first attempt) at a Lemma we want to prove:
+
+
+
Lemma eval_step : forall a : Arith, forall xs : list StackOp,
eval nil (compile a ++ xs) = eval (eval' a :: nil) xs.
-
This is more general, and again we start by inducting on a, expanding and eliminating the base case:
We need to reshuffle the list associativity and then we can rewrite using the first hypotheses:
-
rewrite List.app_assoc_reverse. rewrite IHa1.
-
But now there is a problem (this is common, hence going over it!). We want to use our second hypothesis. Once we do that, we can reduce based on the definition of eval and we’ll be done (with this case, but multiplication is the same). The issue is that IHa2 needs the stack to be empty, and the stack we now have (since we used IHa1) is eval' a1 :: nil, so it can’t be used:
+ We need to reshuffle the list associativity and then we can rewrite using the first hypotheses:
+
+
rewrite List.app_assoc_reverse. rewrite IHa1.
+
+ But now there is a problem (this is common, hence going over it!). We want to use our second hypothesis. Once we do that, we can reduce based on the definition of eval and we’ll be done (with this case, but multiplication is the same). The issue is that IHa2 needs the stack to be empty, and the stack we now have (since we used IHa1) is eval' a1 :: nil, so it can’t be used:
+
The solution is to go back to what our Lemma statement said and generalize it now to arbitrary stacks (so in this process we’ve now generalized twice!), so that the inductive hypotheses are correspondingly stronger:
-
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
+
+ The solution is to go back to what our Lemma statement said and generalize it now to arbitrary stacks (so in this process we’ve now generalized twice!), so that the inductive hypotheses are correspondingly stronger:
+
+
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
eval s (compile a ++ xs) = eval (eval' a :: s) xs.
-
Now if we start the proof in the same way:
-
induction a; intros; simpl; iauto.
-
We run into an odd problem. We have a silly obligation:
-
match s with
+
+ Now if we start the proof in the same way:
+
+
induction a; intros; simpl; iauto.
+
+ We run into an odd problem. We have a silly obligation:
+
+
match s with
| nil => eval (i :: s) xs
| (_ :: nil)%list => eval (i :: s) xs
| (_ :: _ :: _)%list => eval (i :: s) xs
end = eval (i :: s) xs
-
Which will go away once we break apart the list s and simplify (if you look carefully, it has the same thing in all three branches of the match). There are (at least) a couple approaches to this:
-
-
We could just do it manually: destruct s; simpl; eauto; destruct s; simpl; eauto. But it shows up multiple times in the proof, and that’s a mess and someone reading the proof script may be confused what is going on.
-
We could write a tactic for the same thing:
-
try match goal with
+
+ Which will go away once we break apart the list s and simplify (if you look carefully, it has the same thing in all three branches of the match). There are (at least) a couple approaches to this:
+
+
+
+
+ We could just do it manually: destruct s; simpl; eauto; destruct s; simpl; eauto. But it shows up multiple times in the proof, and that’s a mess and someone reading the proof script may be confused what is going on.
+
+
+
+
+ We could write a tactic for the same thing:
+
+
try match goal with
|[l : list _ |- _ ] => solve [destruct l; simpl; eauto; destruct l; simpl; eauto]
end.
-
This has the advantage that it doesn’t depend on the name, you can call it whenever (it won’t do anything if it isn’t able to discharge the goal), but where to call it is still somewhat messy (as it’ll be in the middle of the proofs). We could hint using this tactic (using Hint Extern) to have it handled automatically, but I generally dislike adding global hints for tactics (unless there is a very good reason!), as it can slow things down and make understanding why proofs worked more difficult.
-
We can also write lemmas for these. There are actually two cases that come up, and both are solved easily:
-
Lemma list_pointless_split : forall A B:Type, forall l : list A, forall x : B,
+
+ This has the advantage that it doesn’t depend on the name, you can call it whenever (it won’t do anything if it isn’t able to discharge the goal), but where to call it is still somewhat messy (as it’ll be in the middle of the proofs). We could hint using this tactic (using Hint Extern) to have it handled automatically, but I generally dislike adding global hints for tactics (unless there is a very good reason!), as it can slow things down and make understanding why proofs worked more difficult.
+
+
+
+
+ We can also write lemmas for these. There are actually two cases that come up, and both are solved easily:
+
+
Lemma list_pointless_split : forall A B:Type, forall l : list A, forall x : B,
match l with | nil => x | (_ :: _)%list => x end = x.
Proof.
destruct l; eauto.
@@ -151,10 +258,15 @@
In this style, we can then hint using these lemmas locally to where they are needed.
-
-
Now we know the proof should follow from list associativity, this pointless list splitting, and the inductive hypotheses. We can write this down formally (this relies on the literatecoq library, which is just a few tactics at this point) as:
-
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
+
+ In this style, we can then hint using these lemmas locally to where they are needed.
+
+
+
+
+ Now we know the proof should follow from list associativity, this pointless list splitting, and the inductive hypotheses. We can write this down formally (this relies on the literatecoq library, which is just a few tactics at this point) as:
+
+
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
eval s (compile a ++ xs) = eval (eval' a :: s) xs.
Proof.
hint_rewrite List.app_assoc_reverse.
@@ -163,55 +275,99 @@
Which says that we know that we will need the associativity lemma and these list splitting lemmas somewhere. Then we proceed by induction, handle the base case, and then use the inductive hypotheses to handle the rest.
-
We can then go back to our main theorem, and proceed in a similar style. We prove by induction, relying on the eval_step lemma, and in various places needing to simplify (for the observant reader, iauto and iauto' only differ in that iauto' does a deeper proof search).
-
Theorem compiler_correctness : forall a : Arith,
+
+ Which says that we know that we will need the associativity lemma and these list splitting lemmas somewhere. Then we proceed by induction, handle the base case, and then use the inductive hypotheses to handle the rest.
+
+
+ We can then go back to our main theorem, and proceed in a similar style. We prove by induction, relying on the eval_step lemma, and in various places needing to simplify (for the observant reader, iauto and iauto' only differ in that iauto' does a deeper proof search).
+
We now have a proof that the compiler that we wrote in Haskell is correct, insofar as it preserves the meaning expressed in the source-level eval' function to the meaning in the eval function in the target. This isn’t, of course, the only theorem you could prove! Another one that would be interesting would be that no compiled program ever got stuck (i.e., never produces a Left error).
We instead wanted to take [Arith]. This would still work, and would result in the list of results stored on the stack (so probably you would want to change eval to print everything that was on the stack at the end, not just the top). If you wrote this compile:
You would get an error when you try to compile the output of hs-to-coq! Coq says that the compile function is not terminating!
-
This is good introduction into a (major) difference between Haskell and Coq: in Haskell, any term can run forever. For a programming language, this is an inconvenience, as you can end up with code that is perhaps difficult to debug if you didn’t want it to (it’s also useful if you happen to be writing a server that is supposed to run forever!). For a language intended to be used to prove things, this feature would be a non-starter, as it would make the logic unsound. The issue is that in Coq, (at a high level), a type is a theorem and the term that inhabits the type is a proof of that theorem. But in Haskell, you can write:
-
anything :: a
-anything = anything
-
i.e., for any type, you can provide a term with that type — that is, the term that simply never returns. If that were possible in Coq, you could prove any theorem, and the entire logic would be useless (or unsound, which technically means you can prove logical falsehood, but since falsehood allows you to prove anything, it’s the same thing).
-
Returning to this (only slightly contrived) program, it isn’t actually that our program runs forever (and if you do want to prove things about programs that do, you’ll need to do much more work!), just that Coq can’t tell that it doesn’t. In general, it’s not possible to tell this for sufficiently powerful languages (this is what the Halting problem says for Turing machines, and thus holds for anything with similar expressivity). What Coq relies on is that some argument is inductively defined (which we have: both lists and Arith expressions) and that all recursive calls are to structurally smaller parts of the arguments. If that holds, we are guaranteed to terminate, as inductive types cannot be infinite (note: unlike Haskell, Coq is not lazy, which is another difference, but we’ll ignore that). If we look at our recursive call, we called compile with [a1]. While a1 is structurally smaller, we put that inside a list and used that instead, which thus violates what Coq was expecting.
-
There are various ways around this (like adding another argument whose purpose is to track termination, or adding more sophisticated measurements), but there is another option: adding a helper function compile' that does what our original compile did: compiles a single Arith. The intuition that leads to trying this is that in this new compile we are decreasing on both the length of the list and the structure of the Arith, but we are trying to do both at the same time. By separating things out, we can eliminate the issue:
There are limitations to the approach outlined in this post. In particular, what hs-to-coq does is syntactically translate similar constructs from Haskell to Coq, but constructs that have similar syntax don’t necessarily have similar semantics. For example, data types in Haskell are lazy and thus infinite, whereas inductive types in Coq are definitely not infinite. This means that the proofs that you have made are about the version of the program as represented in Coq, not the original program. There are ways to make proofs about the precise semantics of a language (e.g., Arthur Charguéraud’s CFML), but on the other hand, program extraction (which is a core part of verified compilers like CompCert) has the same issue that the program being run has been converted via a similar process as hs-to-coq (from Coq to OCaml the distance is less than from Coq to Haskell, but in principle there are similar issues).
-
And yet, I think that hs-to-coq has a real practical use, in particular when you have an existing Haskell codebase that you want to verify. You likely will need to refactor it to have hs-to-coq work, but that refactoring can be done within Haskell, while the program continues to work (and your existing tests continue to pass, etc). Eventually, once you finish conversion, you may decide that it makes more sense to take the converted version as ground truth (thus, you run hs-to-coq and throw out the original, relying on extraction after that point for an executable), but being able to do this gradual migration (from full Haskell to essentially a Gallina-like dialect of Haskell) seems incredibly valuable.
-
-
+
+ We now have a proof that the compiler that we wrote in Haskell is correct, insofar as it preserves the meaning expressed in the source-level eval' function to the meaning in the eval function in the target. This isn’t, of course, the only theorem you could prove! Another one that would be interesting would be that no compiled program ever got stuck (i.e., never produces a Left error).
+
+ We instead wanted to take [Arith]. This would still work, and would result in the list of results stored on the stack (so probably you would want to change eval to print everything that was on the stack at the end, not just the top). If you wrote this compile:
+
+ You would get an error when you try to compile the output of hs-to-coq! Coq says that the compile function is not terminating!
+
+
+ This is good introduction into a (major) difference between Haskell and Coq: in Haskell, any term can run forever. For a programming language, this is an inconvenience, as you can end up with code that is perhaps difficult to debug if you didn’t want it to (it’s also useful if you happen to be writing a server that is supposed to run forever!). For a language intended to be used to prove things, this feature would be a non-starter, as it would make the logic unsound. The issue is that in Coq, (at a high level), a type is a theorem and the term that inhabits the type is a proof of that theorem. But in Haskell, you can write:
+
+
+
anything :: a
+anything = anything
+
+
+ i.e., for any type, you can provide a term with that type — that is, the term that simply never returns. If that were possible in Coq, you could prove any theorem, and the entire logic would be useless (or unsound, which technically means you can prove logical falsehood, but since falsehood allows you to prove anything, it’s the same thing).
+
+
+ Returning to this (only slightly contrived) program, it isn’t actually that our program runs forever (and if you do want to prove things about programs that do, you’ll need to do much more work!), just that Coq can’t tell that it doesn’t. In general, it’s not possible to tell this for sufficiently powerful languages (this is what the Halting problem says for Turing machines, and thus holds for anything with similar expressivity). What Coq relies on is that some argument is inductively defined (which we have: both lists and Arith expressions) and that all recursive calls are to structurally smaller parts of the arguments. If that holds, we are guaranteed to terminate, as inductive types cannot be infinite (note: unlike Haskell, Coq is not lazy, which is another difference, but we’ll ignore that). If we look at our recursive call, we called compile with [a1]. While a1 is structurally smaller, we put that inside a list and used that instead, which thus violates what Coq was expecting.
+
+
+ There are various ways around this (like adding another argument whose purpose is to track termination, or adding more sophisticated measurements), but there is another option: adding a helper function compile' that does what our original compile did: compiles a single Arith. The intuition that leads to trying this is that in this new compile we are decreasing on both the length of the list and the structure of the Arith, but we are trying to do both at the same time. By separating things out, we can eliminate the issue:
+
+ There are limitations to the approach outlined in this post. In particular, what hs-to-coq does is syntactically translate similar constructs from Haskell to Coq, but constructs that have similar syntax don’t necessarily have similar semantics. For example, data types in Haskell are lazy and thus infinite, whereas inductive types in Coq are definitely not infinite. This means that the proofs that you have made are about the version of the program as represented in Coq, not the original program. There are ways to make proofs about the precise semantics of a language (e.g., Arthur Charguéraud’s CFML), but on the other hand, program extraction (which is a core part of verified compilers like CompCert) has the same issue that the program being run has been converted via a similar process as hs-to-coq (from Coq to OCaml the distance is less than from Coq to Haskell, but in principle there are similar issues).
+
+
+ And yet, I think that hs-to-coq has a real practical use, in particular when you have an existing Haskell codebase that you want to verify. You likely will need to refactor it to have hs-to-coq work, but that refactoring can be done within Haskell, while the program continues to work (and your existing tests continue to pass, etc). Eventually, once you finish conversion, you may decide that it makes more sense to take the converted version as ground truth (thus, you run hs-to-coq and throw out the original, relying on extraction after that point for an executable), but being able to do this gradual migration (from full Haskell to essentially a Gallina-like dialect of Haskell) seems incredibly valuable.
+
+ At POPL’18 (Principles of Programming Languages) last week, I ended up talking to Annie Cherkaev about her really cool DSL (domain specific language) SweetPea (which she presented at Off the Beaten Track 18, a workshop colocated with POPL), which is a “SAT-Sampler aided language for experimental design, targeted for Psychology & Neuroscience”. In particular, we were talking about software engineering, and the work that Annie was doing to test SweetPea and increase her confidence that the implementation is correct!
+
+
+ The topic of how exactly one goes about proving a compiler correct came up, and I realized that I couldn’t think of a high-level (but concrete) overview of what that might look like. Also, like many compilers, hers is implemented in Haskell, so it seemed like a good opportunity to try out the really cool work presented at the colocated conference CPP’18 (Certified Programs and Proofs) titled “Total Haskell is Reasonable Coq” by Spector-Zabusky, Breitner, Rizkallah, and Weirich. They have a tool (hs-to-coq) that extracts Coq definitions from (certain) terminating Haskell programs (of which at least small compilers hopefully qualify). There are certainly limitations to this approach (see Addendum at the bottom of the page for some discussion), but it seems very promising from an engineering perspective.
+
+
+ The intention of this post is twofold:
+
+
+
+ Show how to take a compiler (albeit a tiny one) that was built with no intention of verifying it and after the fact prove it correct. Part of the ability to do this in such a seamless way is the wonderful hs-to-coq tool mentioned above, though there is no reason in principle you couldn’t carry out this translation manually (in practice maintenance becomes an issue, hence realistic verified compilers relying on writing their implementations within theorem provers like Coq and then extracting executable versions automatically, at least in the past – possibly hs-to-coq could change this workflow).
+
+
+ Give a concrete example of proving compiler correctness. By necessity, this is a very simplified scenario without a lot of the subtleties that appear in real verification efforts (e.g., undefined behavior, multiple compiler passes, linking with code after compilation, etc). On the other hand, even this simplified scenario could cover many cases of DSLs, and understanding the subtleties that come up should be much easier once you understand the basic case!
+
+
+
+ The intended audience is: people who know what compilers are (and may have implemented them!) but aren’t sure what it means to prove one correct!
+
+
+
+ All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoproveacompiler. If you have any trouble getting it going, open an issue on that repository.
+
+
+
+ DSL & Compiler
+
+
+ To make this simple, my source language is arithmetic expressions with adding, subtraction, and multiplication. I represent this as an explicit data structure in Haskell:
+
+ And a program is an Arith. For example, the source expression “1 + (2 * 4)” is represented as Plus 1 (Times 2 4). The target of this is a sequence of instructions for a stack machine. The idea of the stack machine is that there is a stack of values that can be used by instructions. The target language expressions are:
+
+
+
dataStackOp=SNumInt
+|SPlus
+|SMinus
+|STimes
+
+
+ And a program is a [StackOp]. For example, the previous example “1 + (2 * 4)” could be represented as [SNum 1, SNum 2, SNum 4, STimes, SPlus]. The idea is that a number evaluates to pushing it onto the stack and plus/times evaluate by popping two numbers off the stack and pushing the sum/product respectively back on. But we can make this concrete by writing an eval function that takes an initial stack (which will probably be empty), a program, and either produces an integer (the top of the stack after all the instructions run) or an error (which, for debugging sake, is the state of the stack and rest of the program when it got stuck).
+
+ Now that we have our source and target language, and know how the target works, we can implement our compiler. Part of why this is a good small example is that the compiler is very simple!
+
+ The cases for plus/minus/times are the cases that are slightly non-obvious, because they can contain further recursive expressions, but if you think about what the eval function is doing, once the stack machine finishes evaluating everything that a2 compiled to, the number that the left branch evaluated to should be on the top of the stack. Then once it finishes evaluating what a1 compiles to the number that the right branch evaluated to should be on the top of the stack (the reversal is so that they are in the right order when popped off). This means that evaluating e.g. SPlus will put the sum on the top of the stack, as expected. That’s a pretty informal argument about correctness, but we’ll have a chance to get more formal later.
+
+
+ Formalizing
+
+
+ Now that we have a Haskell compiler, we want to prove it correct! So what do we do? First, we want to convert this to Coq using the hs-to-coq tool. There are full instructions at https://github.com/dbp/howtoproveacompiler, but the main command that will convert src/Compiler.hs to src/Compiler.v:
+
+ And open up src/Proofs.v using a Coq interactive mode (I use Proof General within Emacs; with Spacemacs, this is particularly easy: use the coq layer!).
+
+
+ Proving things
+
+
+ We now have a Coq version of our compiler, complete with our evaluation function. So we should be able to write down a theorem that we would like to prove. What should the theorem say? Well, there are various things you could prove, but the most basic theorem in compiler correctness says essentially that running the source program and the target program “does the same thing”. This is often stated as “semantics preservation” and is often formally proven by way of a backwards simulation: whatever the target program does, the source program also should do (for a much more thorough discussion of this, check out William Bowman’s blog post, What even is compiler correctness?). In languages with ambiguity (nondeterminism, undefined behavior, this becomes much more complicated, but in our setting, we would state it as:
+
+
+ Theorem (informal). For all source arith expressions A, if eval [] (compile A) produces integer N then evaluating A should produce the same number N.
+
+
+ The issue that’s immediately apparent is that we don’t actually have a way of directly evaluating the source expression. The only thing we can do with our source expression is compile it, but if we do that, any statement we get has the behavior of the compiler baked into it (so if the compiler is wrong, we will just be proving stuff about our wrong compiler).
+
+
+ More philosophically, what does it even mean that the compiler is wrong? For it to be wrong, there has to be some external specification (likely, just in our head at this point) about what it was supposed to do, or in this case, about the behavior of the source language that the compiler was supposed to faithfully preserve. To prove things formally, we need to write that behavior down.
+
+
+ So we should add this function to our Haskell source. In a non-trivial DSL, this may be a significant part of the formalization process, but it is also incredibly important, because this is the part where you are actually specifying exactly what the source DSL means (otherwise, the only “meaning” it has is whatever the compiler happens to do, bugs and all). In this example, we can write this function as:
+
+ And we can re-run hs-to-coq to get it added to our Coq development. We can now formally state the theorem we want to prove as:
+
+
Theorem compiler_correctness : forall a : Arith,
+ eval nil (compile a) = Data.Either.Right (eval' a).
+
+ I’m going to sketch out how this proof went. Proving stuff can be complex, but this maybe gives a sense of some of the thinking that goes into it. To go further, you probably want to take a course if you can find one, or follow a book like:
+
+ If you were to prove this on paper, you would proceed by induction on the structure of the arithmetic expression, so let’s start that way. The base case goes away trivially and we can expand the case for plus using:
+
+
induction a; iauto; simpl.
+
+ We see (above the line is assumptions, below what you need to prove):
+
+ Which, if we look at it for a little while, we realize two things:
+
+
+
+ Our induction hypotheses really aren’t going to work, intuitively because of the Either — our program won’t produce Right results for the subtrees, so there probably won’t be a way to rely on these hypotheses.
+
+
+ On the other hand, what does look like a Lemma we should be able to prove has to do with evaluating a partial program. Rather than trying to induct on the entire statement, we instead try to prove that evaling a compiled term will result in the eval'd term on the top of the stack. This is an instance of a more general pattern – that often the toplevel statement that you want has too much specificity, and you need to instead prove something that is more general and then use it for the specific case. So here’s (a first attempt) at a Lemma we want to prove:
+
+
+
Lemma eval_step : forall a : Arith, forall xs : list StackOp,
+ eval nil (compile a ++ xs) = eval (eval' a :: nil) xs.
+
+ This is more general, and again we start by inducting on a, expanding and eliminating the base case:
+
+
induction a; intros; simpl; iauto.
+
+ We now end up with better inductive hypotheses:
+
+ We need to reshuffle the list associativity and then we can rewrite using the first hypotheses:
+
+
rewrite List.app_assoc_reverse. rewrite IHa1.
+
+ But now there is a problem (this is common, hence going over it!). We want to use our second hypothesis. Once we do that, we can reduce based on the definition of eval and we’ll be done (with this case, but multiplication is the same). The issue is that IHa2 needs the stack to be empty, and the stack we now have (since we used IHa1) is eval' a1 :: nil, so it can’t be used:
+
+ The solution is to go back to what our Lemma statement said and generalize it now to arbitrary stacks (so in this process we’ve now generalized twice!), so that the inductive hypotheses are correspondingly stronger:
+
+
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
+ eval s (compile a ++ xs) = eval (eval' a :: s) xs.
+
+ Now if we start the proof in the same way:
+
+
induction a; intros; simpl; iauto.
+
+ We run into an odd problem. We have a silly obligation:
+
+
match s with
+| nil => eval (i :: s) xs
+| (_ :: nil)%list => eval (i :: s) xs
+| (_ :: _ :: _)%list => eval (i :: s) xs
+end = eval (i :: s) xs
+
+ Which will go away once we break apart the list s and simplify (if you look carefully, it has the same thing in all three branches of the match). There are (at least) a couple approaches to this:
+
+
+
+
+ We could just do it manually: destruct s; simpl; eauto; destruct s; simpl; eauto. But it shows up multiple times in the proof, and that’s a mess and someone reading the proof script may be confused what is going on.
+
+
+
+
+ We could write a tactic for the same thing:
+
+
try match goal with
+ |[l : list _ |- _ ] => solve [destruct l; simpl; eauto; destruct l; simpl; eauto]
+ end.
+
+ This has the advantage that it doesn’t depend on the name, you can call it whenever (it won’t do anything if it isn’t able to discharge the goal), but where to call it is still somewhat messy (as it’ll be in the middle of the proofs). We could hint using this tactic (using Hint Extern) to have it handled automatically, but I generally dislike adding global hints for tactics (unless there is a very good reason!), as it can slow things down and make understanding why proofs worked more difficult.
+
+
+
+
+ We can also write lemmas for these. There are actually two cases that come up, and both are solved easily:
+
+
Lemma list_pointless_split : forall A B:Type, forall l : list A, forall x : B,
+ match l with | nil => x | (_ :: _)%list => x end = x.
+Proof.
+ destruct l; eauto.
+Qed.
+Lemma list_pointless_split' : forall A B:Type, forall l : list A, forall x : B,
+ match l with | nil => x | (_ :: nil)%list => x | (_ :: _ :: _)%list => x end = x.
+Proof.
+ destruct l; intros; eauto. destruct l; eauto.
+Qed.
+
+ In this style, we can then hint using these lemmas locally to where they are needed.
+
+
+
+
+ Now we know the proof should follow from list associativity, this pointless list splitting, and the inductive hypotheses. We can write this down formally (this relies on the literatecoq library, which is just a few tactics at this point) as:
+
+
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
+ eval s (compile a ++ xs) = eval (eval' a :: s) xs.
+Proof.
+ hint_rewrite List.app_assoc_reverse.
+ hint_rewrite list_pointless_split, list_pointless_split'.
+
+ induction a; intros; simpl; iauto;
+ hint_rewrite IHa1, IHa2; iauto'.
+Qed.
+
+ Which says that we know that we will need the associativity lemma and these list splitting lemmas somewhere. Then we proceed by induction, handle the base case, and then use the inductive hypotheses to handle the rest.
+
+
+ We can then go back to our main theorem, and proceed in a similar style. We prove by induction, relying on the eval_step lemma, and in various places needing to simplify (for the observant reader, iauto and iauto' only differ in that iauto' does a deeper proof search).
+
+ We now have a proof that the compiler that we wrote in Haskell is correct, insofar as it preserves the meaning expressed in the source-level eval' function to the meaning in the eval function in the target. This isn’t, of course, the only theorem you could prove! Another one that would be interesting would be that no compiled program ever got stuck (i.e., never produces a Left error).
+
+ We instead wanted to take [Arith]. This would still work, and would result in the list of results stored on the stack (so probably you would want to change eval to print everything that was on the stack at the end, not just the top). If you wrote this compile:
+
+ You would get an error when you try to compile the output of hs-to-coq! Coq says that the compile function is not terminating!
+
+
+ This is good introduction into a (major) difference between Haskell and Coq: in Haskell, any term can run forever. For a programming language, this is an inconvenience, as you can end up with code that is perhaps difficult to debug if you didn’t want it to (it’s also useful if you happen to be writing a server that is supposed to run forever!). For a language intended to be used to prove things, this feature would be a non-starter, as it would make the logic unsound. The issue is that in Coq, (at a high level), a type is a theorem and the term that inhabits the type is a proof of that theorem. But in Haskell, you can write:
+
+
+
anything :: a
+anything = anything
+
+
+ i.e., for any type, you can provide a term with that type — that is, the term that simply never returns. If that were possible in Coq, you could prove any theorem, and the entire logic would be useless (or unsound, which technically means you can prove logical falsehood, but since falsehood allows you to prove anything, it’s the same thing).
+
+
+ Returning to this (only slightly contrived) program, it isn’t actually that our program runs forever (and if you do want to prove things about programs that do, you’ll need to do much more work!), just that Coq can’t tell that it doesn’t. In general, it’s not possible to tell this for sufficiently powerful languages (this is what the Halting problem says for Turing machines, and thus holds for anything with similar expressivity). What Coq relies on is that some argument is inductively defined (which we have: both lists and Arith expressions) and that all recursive calls are to structurally smaller parts of the arguments. If that holds, we are guaranteed to terminate, as inductive types cannot be infinite (note: unlike Haskell, Coq is not lazy, which is another difference, but we’ll ignore that). If we look at our recursive call, we called compile with [a1]. While a1 is structurally smaller, we put that inside a list and used that instead, which thus violates what Coq was expecting.
+
+
+ There are various ways around this (like adding another argument whose purpose is to track termination, or adding more sophisticated measurements), but there is another option: adding a helper function compile' that does what our original compile did: compiles a single Arith. The intuition that leads to trying this is that in this new compile we are decreasing on both the length of the list and the structure of the Arith, but we are trying to do both at the same time. By separating things out, we can eliminate the issue:
+
+ There are limitations to the approach outlined in this post. In particular, what hs-to-coq does is syntactically translate similar constructs from Haskell to Coq, but constructs that have similar syntax don’t necessarily have similar semantics. For example, data types in Haskell are lazy and thus infinite, whereas inductive types in Coq are definitely not infinite. This means that the proofs that you have made are about the version of the program as represented in Coq, not the original program. There are ways to make proofs about the precise semantics of a language (e.g., Arthur Charguéraud’s CFML), but on the other hand, program extraction (which is a core part of verified compilers like CompCert) has the same issue that the program being run has been converted via a similar process as hs-to-coq (from Coq to OCaml the distance is less than from Coq to Haskell, but in principle there are similar issues).
+
+
+ And yet, I think that hs-to-coq has a real practical use, in particular when you have an existing Haskell codebase that you want to verify. You likely will need to refactor it to have hs-to-coq work, but that refactoring can be done within Haskell, while the program continues to work (and your existing tests continue to pass, etc). Eventually, once you finish conversion, you may decide that it makes more sense to take the converted version as ground truth (thus, you run hs-to-coq and throw out the original, relying on extraction after that point for an executable), but being able to do this gradual migration (from full Haskell to essentially a Gallina-like dialect of Haskell) seems incredibly valuable.
+
+
+
+
diff --git a/_site/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html b/_site/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html
index 40aa20f..0b96591 100644
--- a/_site/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html
+++ b/_site/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html
@@ -1,15 +1,17 @@
-
-
-
-
- dbp.io :: How to prove a compiler fully abstract
-
-
-
-
-
- Daniel Patterson
+
+
+
+
+
+ dbp.io :: How to prove a compiler fully abstract
+
+
+
+
+
+
A compiler that preserves and reflects equivalences is called a fully abstract compiler. This is a powerful property for a compiler that is different (but complimentary) to the more common notion of compiler correctness. So what does it mean, and how do we prove it?
-
-
All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
-
-
Both equivalence preservation and equivalence reflection (what make a compiler fully abstract) relate to how the compiler treats program equivalences, which in this case I’m considering observational equivalence. Two programs p1 and p2 are observationally equivalent if you cannot tell any difference between the result of running them, including any side effects.
-
For example, if the only observable behavior about programs in your language that you can make is see what output they print, this means that the two programs that print the same output, even if they are implemented in completely different ways are equivalent. Observational equivalence is extremely useful, especially for compilers, which when optimizing may change how a particular program is implemented but should not change the observable behavior. But it is also useful for programmers, who commonly refactor code, which means they change how the code is implemented (to make it easier to maintain, or extend, or better support some future addition), without changing any functionality. Refactoring is an equivalence-preserving transformation. We write observational equivalence on programs formally as:
-
p1 ≈ p1
-
Contextual equivalence
-
But we often also want to compile not just whole programs, but particular modules, expressions, or in the general sense, components, and in that case, we want an analogous notion of equivalence. Two components are contextually equivalent if in all program contexts they produce the same observable behavior. In other words, if you have two modules, but any way you combine those modules with the rest of a program (so the rest is syntactically identical, but the modules differ), the results are observationally equivalent, then those two modules are contextually equivalent. We will write this, overloading the ≈ for both observational and contextual equivalence, as:
-
e1 ≈ e1
-
As an example, if we consider a simple functional language and consider our components to be individual expressions, it should be clear that these two expressions are contextually equivalent:
-
λx. x * 2 ≈ λx. x + x
-
While they are implemented differently, no matter how they are used, the result will always be the same (as the only thing we can do with these functions is call them on an argument, and when we do, each will double its argument, even though in a different way). It’s important to note that contextual equivalence always depends on what is observable within the language. For example, in Javascript, you can reflect over the syntax of functions, and so the above two functions, written as:
-
function(x){ return x *2; } ≈ function(x){ return x + x; }
-
Would not be contextually equivalent, because there exists a program context that can distinguish them. What is that context? Well, if we imagine plugging in the functions above into the “hole” written as [·] below, the result will be different for the two functions! This is because the toString() method on functions in Javascript returns the source code of the function.
-
([·]).toString()
-
From the perspective of optimizations, this is troublesome, as you can’t be sure that a transformation between the above programs was safe (assuming one was much faster than the other), as there could be code that relied upon the particular way that the source code had been written. There are more complicated things you can do (like optimizing speculatively and falling back to unoptimized versions when reflection was needed). In general though, languages with that kind of reflection are both harder to write fast compilers for and harder to write secure compilers for, and while it’s not the topic of this post, it’s always important to know what you mean by contextual equivalence, which usually means: what can program contexts determine about components.
-
Part 1. Equivalence reflection
-
With that in mind, what does equivalence reflection and equivalence preservation for a compiler mean? Let’s start with equivalence reflection, as that’s the property that all your correct compilers already have. Equivalence reflection means that if two components, when compiled, are equivalent, then the source components must have been equivalent. We can write this more formally as (where we write s ↠ t to mean a component s is compiled to t):
What are the consequences of this definition? And why do correct compilers have this property? Well, the contrapositive is actually easier to understand: it says that if the source components weren’t equivalent then the target components would have to be different, or more formally:
If this didn’t hold, then the compiler could take different source components and compile them to the same target component! Which means you could have different source programs you wrote, which have observationally different behavior, and your compiler would produce the same target program! Any correct compiler has to preserve observational behavior, and it couldn’t do that in this case, as the target program only has one behavior, so it can’t have both the behavior of s1 and s2 (for pedants, not considering non-deterministic targets).
-
So equivalence reflection should be thought of as related to compiler correctness. Note, however, that equivalence reflection is not the same as compiler correctness: as long as your compiler produced different target programs for different source programs, all would be fine – your compiler could hash the source program and produce target programs that just printed the hash to the screen, and it would be equivalence reflecting, since it would produce different target programs not only for source programs that were observationally different, but even syntactically different! That would be a pretty bad compiler, and certainly not correct, but it would be equivalence reflecting.
-
Part 2. Equivalence preservation
-
Equivalence preservation, on the other hand, is the hallmark of fully abstract compilers, and it is a property that even most correct compilers do not have, though it would certainly be great if they did. It says that if two source components are equivalent, then the compiled versions must still be equivalent. Or, more formally:
(See, I just reversed the implication. Neat trick! But now it means something totally different). One place where this has been studied extensively is by security researchers, because what it tells you is that observers in the target can’t make observations that aren’t possible to distinguish in the source. Let’s make that a lot more concrete, where we will also see why it’s not frequently true, even of proven correct compilers.
-
Say your language has some information hiding feature, like a private field, and you have two source components that are identical except they have different values stored in the private field. If the compiler does not preserve the fact that it is private (because, for example, it translates the higher level object structure into a C struct or just a pile of memory accessed by assembly), then other target code could read the private values, and these two components will no longer be equivalent.
-
This also has implications for programmer refactoring and compiler optimizations: I (or my compiler) might think that it is safe to replace one version of the program with another, because I know that in my language these are equivalent, but what I don’t know is that the compiler reveals distinguishing characteristics, and perhaps some target-level library that I’m linking with relies upon details (that were supposed to be hidden) of how the old code worked. If that’s the case, I can have a working program, and make a change that does not change the meaning of the component in my language, but the whole program can no longer work.
-
Proving a compiler fully abstract, therefore, is all about proving equivalence preservation. So how do we do it?
-
How to prove equivalence preservation
-
Looking at what we have to prove, we see that given contextually equivalent source components s1 and s2, we need to show that t1 and t2 are contextually equivalent. We can expand this to explicitly quantify over the contexts that combine with the components to make whole programs:
Noting that as mentioned above, I am overloading ≈ to now mean whole-program observational equivalence (so, running the program produces the same observations).
-
First I’ll outline how the proof will go in general, and then we’ll consider an actual example compiler and do the proof for the concrete example.
-
We can see that in order to prove this, we need to consider an arbitrary target context Ct and show that Ct[t1] and Ct[t2] are observationally equivalent. We do this by showing that Ct[t1] is observationally equivalent to Cs'[s1] – that is, we produce a source context Cs' that we claim is equivalent to Ct. We do this by way of a “back-translation”, which will be a sort of compiler in reverse. Assuming that we can produce such a Cs' and that Cs'[s1] and Ct[t1] (and correspondingly Cs'[s2] and Ct[t2]) are indeed observationally equivalent (noting that this relies upon a cross-language notion of observations), we can prove that Ct[t1] and Ct[t2] are observationally equivalent by instantiating our hypothesis ∀Cs. Cs[s1] ≈ Cs[s2] with Cs'. This tells us that Cs'[s1] ≈ Cs'[s2], and by transitivity, Ct[t1] ≈ Ct[t2].
-
It can be helpful to see it in a diagram, where the top line is given by the hypothesis (once instantiated with the source context we come up with by way of backtranslation) and coming up with the back-translation and showing that Ct and Cs' are equivalent is the hard part of the proof.
-
Cs'[s1] ≈ Cs'[s2]
+
+
+
+ How to prove a compiler fully abstract
+
+
+ A compiler that preserves and reflects equivalences is called a fully abstract compiler. This is a powerful property for a compiler that is different (but complimentary) to the more common notion of compiler correctness. So what does it mean, and how do we prove it?
+
+
+
+ All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
+
+
+
+ Both equivalence preservation and equivalence reflection (what make a compiler fully abstract) relate to how the compiler treats program equivalences, which in this case I’m considering observational equivalence. Two programs p1 and p2 are observationally equivalent if you cannot tell any difference between the result of running them, including any side effects.
+
+
+ For example, if the only observable behavior about programs in your language that you can make is see what output they print, this means that the two programs that print the same output, even if they are implemented in completely different ways are equivalent. Observational equivalence is extremely useful, especially for compilers, which when optimizing may change how a particular program is implemented but should not change the observable behavior. But it is also useful for programmers, who commonly refactor code, which means they change how the code is implemented (to make it easier to maintain, or extend, or better support some future addition), without changing any functionality. Refactoring is an equivalence-preserving transformation. We write observational equivalence on programs formally as:
+
+
p1 ≈ p1
+
+ Contextual equivalence
+
+
+ But we often also want to compile not just whole programs, but particular modules, expressions, or in the general sense, components, and in that case, we want an analogous notion of equivalence. Two components are contextually equivalent if in all program contexts they produce the same observable behavior. In other words, if you have two modules, but any way you combine those modules with the rest of a program (so the rest is syntactically identical, but the modules differ), the results are observationally equivalent, then those two modules are contextually equivalent. We will write this, overloading the ≈ for both observational and contextual equivalence, as:
+
+
e1 ≈ e1
+
+ As an example, if we consider a simple functional language and consider our components to be individual expressions, it should be clear that these two expressions are contextually equivalent:
+
+
λx. x * 2 ≈ λx. x + x
+
+ While they are implemented differently, no matter how they are used, the result will always be the same (as the only thing we can do with these functions is call them on an argument, and when we do, each will double its argument, even though in a different way). It’s important to note that contextual equivalence always depends on what is observable within the language. For example, in Javascript, you can reflect over the syntax of functions, and so the above two functions, written as:
+
+
+
function(x){ return x *2; } ≈ function(x){ return x + x; }
+
+
+ Would not be contextually equivalent, because there exists a program context that can distinguish them. What is that context? Well, if we imagine plugging in the functions above into the “hole” written as [·] below, the result will be different for the two functions! This is because the toString() method on functions in Javascript returns the source code of the function.
+
+
+
([·]).toString()
+
+
+ From the perspective of optimizations, this is troublesome, as you can’t be sure that a transformation between the above programs was safe (assuming one was much faster than the other), as there could be code that relied upon the particular way that the source code had been written. There are more complicated things you can do (like optimizing speculatively and falling back to unoptimized versions when reflection was needed). In general though, languages with that kind of reflection are both harder to write fast compilers for and harder to write secure compilers for, and while it’s not the topic of this post, it’s always important to know what you mean by contextual equivalence, which usually means: what can program contexts determine about components.
+
+
+ Part 1. Equivalence reflection
+
+
+ With that in mind, what does equivalence reflection and equivalence preservation for a compiler mean? Let’s start with equivalence reflection, as that’s the property that all your correct compilers already have. Equivalence reflection means that if two components, when compiled, are equivalent, then the source components must have been equivalent. We can write this more formally as (where we write s ↠ t to mean a component s is compiled to t):
+
+ What are the consequences of this definition? And why do correct compilers have this property? Well, the contrapositive is actually easier to understand: it says that if the source components weren’t equivalent then the target components would have to be different, or more formally:
+
+ If this didn’t hold, then the compiler could take different source components and compile them to the same target component! Which means you could have different source programs you wrote, which have observationally different behavior, and your compiler would produce the same target program! Any correct compiler has to preserve observational behavior, and it couldn’t do that in this case, as the target program only has one behavior, so it can’t have both the behavior of s1 and s2 (for pedants, not considering non-deterministic targets).
+
+
+ So equivalence reflection should be thought of as related to compiler correctness. Note, however, that equivalence reflection is not the same as compiler correctness: as long as your compiler produced different target programs for different source programs, all would be fine – your compiler could hash the source program and produce target programs that just printed the hash to the screen, and it would be equivalence reflecting, since it would produce different target programs not only for source programs that were observationally different, but even syntactically different! That would be a pretty bad compiler, and certainly not correct, but it would be equivalence reflecting.
+
+
+ Part 2. Equivalence preservation
+
+
+ Equivalence preservation, on the other hand, is the hallmark of fully abstract compilers, and it is a property that even most correct compilers do not have, though it would certainly be great if they did. It says that if two source components are equivalent, then the compiled versions must still be equivalent. Or, more formally:
+
+ (See, I just reversed the implication. Neat trick! But now it means something totally different). One place where this has been studied extensively is by security researchers, because what it tells you is that observers in the target can’t make observations that aren’t possible to distinguish in the source. Let’s make that a lot more concrete, where we will also see why it’s not frequently true, even of proven correct compilers.
+
+
+ Say your language has some information hiding feature, like a private field, and you have two source components that are identical except they have different values stored in the private field. If the compiler does not preserve the fact that it is private (because, for example, it translates the higher level object structure into a C struct or just a pile of memory accessed by assembly), then other target code could read the private values, and these two components will no longer be equivalent.
+
+
+ This also has implications for programmer refactoring and compiler optimizations: I (or my compiler) might think that it is safe to replace one version of the program with another, because I know that in my language these are equivalent, but what I don’t know is that the compiler reveals distinguishing characteristics, and perhaps some target-level library that I’m linking with relies upon details (that were supposed to be hidden) of how the old code worked. If that’s the case, I can have a working program, and make a change that does not change the meaning of the component in my language, but the whole program can no longer work.
+
+
+ Proving a compiler fully abstract, therefore, is all about proving equivalence preservation. So how do we do it?
+
+
+ How to prove equivalence preservation
+
+
+ Looking at what we have to prove, we see that given contextually equivalent source components s1 and s2, we need to show that t1 and t2 are contextually equivalent. We can expand this to explicitly quantify over the contexts that combine with the components to make whole programs:
+
+ Noting that as mentioned above, I am overloading ≈ to now mean whole-program observational equivalence (so, running the program produces the same observations).
+
+
+ First I’ll outline how the proof will go in general, and then we’ll consider an actual example compiler and do the proof for the concrete example.
+
+
+ We can see that in order to prove this, we need to consider an arbitrary target context Ct and show that Ct[t1] and Ct[t2] are observationally equivalent. We do this by showing that Ct[t1] is observationally equivalent to Cs'[s1] – that is, we produce a source context Cs' that we claim is equivalent to Ct. We do this by way of a “back-translation”, which will be a sort of compiler in reverse. Assuming that we can produce such a Cs' and that Cs'[s1] and Ct[t1] (and correspondingly Cs'[s2] and Ct[t2]) are indeed observationally equivalent (noting that this relies upon a cross-language notion of observations), we can prove that Ct[t1] and Ct[t2] are observationally equivalent by instantiating our hypothesis ∀Cs. Cs[s1] ≈ Cs[s2] with Cs'. This tells us that Cs'[s1] ≈ Cs'[s2], and by transitivity, Ct[t1] ≈ Ct[t2].
+
+
+ It can be helpful to see it in a diagram, where the top line is given by the hypothesis (once instantiated with the source context we come up with by way of backtranslation) and coming up with the back-translation and showing that Ct and Cs' are equivalent is the hard part of the proof.
+
+
Cs'[s1] ≈ Cs'[s2]
≈ ≈
Ct[t1] ? Ct[t2]
-
Concrete example of languages, compiler, & proof of full abstraction
-
Let’s make this concrete with an example. This will be presented some in english and some in the proof assistant Coq. This post isn’t an introduction to Coq; for that, see e.g., Bertot and Casteron’s Coq’Art, Chlipala’s CPDT, or Pierce et al’s Software Foundations.
-
Our source language is arithmetic expressions over integers with addition and subtraction:
-
e ::= n
+
+ Concrete example of languages, compiler, & proof of full abstraction
+
+
+ Let’s make this concrete with an example. This will be presented some in english and some in the proof assistant Coq. This post isn’t an introduction to Coq; for that, see e.g., Bertot and Casteron’s Coq’Art, Chlipala’s CPDT, or Pierce et al’s Software Foundations.
+
+
+ Our source language is arithmetic expressions over integers with addition and subtraction:
+
+
e ::= n
| e + e
| e - e
-
This is written down in Coq as:
-
Inductive Expr : Set :=
+
+ This is written down in Coq as:
+
+
Inductive Expr : Set :=
| Num : Z -> Expr
| Plus : Expr -> Expr -> Expr
| Minus : Expr -> Expr -> Expr.
-
Evaluation is standard (if you wanted to parse this, you would need to deal with left/right associativity, and probably add parenthesis to disambiguate, but we consider the point where you already have a tree structure, so it is unambiguous). We can write the evaluation function as:
-
Fixpoint eval_Expr (e : Expr) : Z :=
+
+ Evaluation is standard (if you wanted to parse this, you would need to deal with left/right associativity, and probably add parenthesis to disambiguate, but we consider the point where you already have a tree structure, so it is unambiguous). We can write the evaluation function as:
+
+
Fixpoint eval_Expr (e : Expr) : Z :=
match e with
| Num n => n
| Plus e1 e2 => eval_Expr e1 + eval_Expr e2
| Minus e1 e2 => eval_Expr e1 - eval_Expr e2
end.
-
Our target language is a stack machine which uses a stack of integers to evaluate the sequence of instructions. In addition to having instructions to add and subtract, our stack machine has an extra instruction: OpCount. This instruction returns how many operations remain on the stack machine, and it puts that integer on the top of the stack. This is the simplest abstraction I could think of that will provide an interesting case study for problems of full abstraction, and is a stand-in for both reflection (as it allows the program to inspect other parts of the program), and also somewhat of a proxy for execution time (remaining). Our stack machine requires that the stack be empty at the end of execution.
-
Inductive Op : Set :=
+
+ Our target language is a stack machine which uses a stack of integers to evaluate the sequence of instructions. In addition to having instructions to add and subtract, our stack machine has an extra instruction: OpCount. This instruction returns how many operations remain on the stack machine, and it puts that integer on the top of the stack. This is the simplest abstraction I could think of that will provide an interesting case study for problems of full abstraction, and is a stand-in for both reflection (as it allows the program to inspect other parts of the program), and also somewhat of a proxy for execution time (remaining). Our stack machine requires that the stack be empty at the end of execution.
+
+
Inductive Op : Set :=
| Push : Z -> Op
| Add : Op
| Sub : Op
| OpCount : Op.
-
Let’s see the compiler and the evaluation function (note that we reverse the order when we pop values off the stack from when we put them on in the compiler).
-
Fixpoint compile_Expr (e : Expr) : list Op :=
+
+ Let’s see the compiler and the evaluation function (note that we reverse the order when we pop values off the stack from when we put them on in the compiler).
+
+
Fixpoint compile_Expr (e : Expr) : list Op :=
match e with
| Num n => [Push n]
| Plus e1 e2 => compile_Expr e1 ++ compile_Expr e2 ++ [Add]
@@ -110,8 +184,10 @@
We can prove a basic (whole program) compiler correctness result for this (for more detail on this type of result, see this post), where first we prove a general eval_step lemma and then use that to prove correctness (note: the hint and hint_rewrite tactics are from an experimental literatecoq library that adds support for proof-local hinting, which some might think is a hack but I think makes the proofs much more readable/maintainable).
-
Lemma eval_step : forall a : Expr, forall s : list Z, forall xs : list Op,
+
+ We can prove a basic (whole program) compiler correctness result for this (for more detail on this type of result, see this post), where first we prove a general eval_step lemma and then use that to prove correctness (note: the hint and hint_rewrite tactics are from an experimental literatecoq library that adds support for proof-local hinting, which some might think is a hack but I think makes the proofs much more readable/maintainable).
+
+
Lemma eval_step : forall a : Expr, forall s : list Z, forall xs : list Op,
eval_Op s (compile_Expr a ++ xs) = eval_Op (eval_Expr a :: s) xs.
Proof.
hint_rewrite List.app_assoc_reverse.
@@ -127,11 +203,15 @@
Concre
hint_simpl.
induction a; iauto'.
Qed.
-
Now, before we can state properties about equivalences, we need to define what we mean by equivalence for our source and target languages. Both produce no side effects, so the only observation is the end result. Thus, observational equivalence is pretty straightforward; it follows from evaluation:
+ Now, before we can state properties about equivalences, we need to define what we mean by equivalence for our source and target languages. Both produce no side effects, so the only observation is the end result. Thus, observational equivalence is pretty straightforward; it follows from evaluation:
+
But, we want to talk not just about whole programs, but about partial programs that can get linked with other parts to create whole programs. In order to do that, we create a new type of “evaluation context” for our Expr, that has a hole (typically written on paper as [·]). This is a program that is missing an expression, which must be filled into the hole. Given how simple our language is, any expression can be filled in to the hole and that will produce a valid program. We only want to have one hole per partial program, so in the cases for + and -, one branch must be a normal Expr (so it contains no hole), and the other can contain one hole. Our link_Expr function takes a context and an expression and fills in the hole.
-
Inductive ExprCtxt : Set :=
+
+ But, we want to talk not just about whole programs, but about partial programs that can get linked with other parts to create whole programs. In order to do that, we create a new type of “evaluation context” for our Expr, that has a hole (typically written on paper as [·]). This is a program that is missing an expression, which must be filled into the hole. Given how simple our language is, any expression can be filled in to the hole and that will produce a valid program. We only want to have one hole per partial program, so in the cases for + and -, one branch must be a normal Expr (so it contains no hole), and the other can contain one hole. Our link_Expr function takes a context and an expression and fills in the hole.
+
Concre
| Minus1 c' e' => Minus (link_Expr c' e) e'
| Minus2 e' c' => Minus e' (link_Expr c' e)
end.
-
For our stack machine, partial programs are much easier, since a program is just a list of Op, which means that any program can be extended by adding new Ops on either end (or inserting in the middle).
-
With ExprCtxt, we can now define “contextual equivalence” for our source language:
+ For our stack machine, partial programs are much easier, since a program is just a list of Op, which means that any program can be extended by adding new Ops on either end (or inserting in the middle).
+
+
+ With ExprCtxt, we can now define “contextual equivalence” for our source language:
+
+
Definition ctxtequiv_Expr (e1 e2 : Expr) : Prop :=
forall c : ExprCtxt, eval_Expr (link_Expr c e1) = eval_Expr (link_Expr c e2).
-
We can do the same with our target, simplifying slightly and saying that we will allow adding arbitrary Ops before and after, but not in the middle, of an existing sequence of Ops.
+ We can do the same with our target, simplifying slightly and saying that we will allow adding arbitrary Ops before and after, but not in the middle, of an existing sequence of Ops.
+
To prove our compiler fully abstract, remember we need to prove that it preserves and reflects equivalences. Since we already proved that it is correct, proving that it reflects equivalences should be relatively straightforward, so lets start there. The lemma we want is:
-
Lemma equivalence_reflection :
+
+ To prove our compiler fully abstract, remember we need to prove that it preserves and reflects equivalences. Since we already proved that it is correct, proving that it reflects equivalences should be relatively straightforward, so lets start there. The lemma we want is:
+
This lemma is a little more involved, but not by much; we proceed by induction on the structure of the evaluation contexts, and in all but the case for Hole, the induction hypothesis gives us exactly what we need. In the base case, we need to appeal to the compiler_correctness lemma we proved earlier, but otherwise it follows easily.
-
So what about equivalence preservation? We can state the lemma quite easily:
-
Lemma equivalence_preservation :
+
+ This lemma is a little more involved, but not by much; we proceed by induction on the structure of the evaluation contexts, and in all but the case for Hole, the induction hypothesis gives us exactly what we need. In the base case, we need to appeal to the compiler_correctness lemma we proved earlier, but otherwise it follows easily.
+
+
+ So what about equivalence preservation? We can state the lemma quite easily:
+
But proving it is another matter. In fact, it’s not provable, because it’s not true. We can come up with a counter-example, using that OpCount instruction we (surreptitiously) added to our target language. These two expressions are contextually equivalent in our source language (should be obvious, but putting a proof):
+ But proving it is another matter. In fact, it’s not provable, because it’s not true. We can come up with a counter-example, using that OpCount instruction we (surreptitiously) added to our target language. These two expressions are contextually equivalent in our source language (should be obvious, but putting a proof):
+
But they are not contextually equivalent in the target; in particular, if we put the OpCount instruction before and then the Add instruction afterwards, the result will be the value plus the number of instructions it took to compute it:
-
Example target_not_equiv :
+
+ But they are not contextually equivalent in the target; in particular, if we put the OpCount instruction before and then the Add instruction afterwards, the result will be the value plus the number of instructions it took to compute it:
+
The former evaluating to 6, while the latter evaluates to 4. This means that there is no way we are going to be able to prove equivalence preservation (as we have a counter-example!).
-
So what do we do? Well, this scenario is not uncommon, and it’s the reason why many, even correct, compilers are not fully abstract. It’s also related to why many of these compilers may still have security problems! The solution is to somehow protect the compiled code from having the equivalences disrupted. If this were a real machine, we might want to have some flag on instructions that meant that they should not be counted, and OpCount would just not return anything if it saw any of those (or would count them as 0). Alternately, we might give our target language a type system that is able to rule out linking with code that uses the OpCount instruction, or perhaps restricts how it can be used.
-
Because this is a blog-post sized example, and I wanted to keep the proofs as short as possible, and the unstructured and untyped nature of our target (which, indeed, is much less structured than our source language; the fact that the source is so well-structured is why our whole-program correctness result was so easy!) will mean the proofs get relatively complex (or require us to add various auxiliary definitions), so the solution I’m going to take is somewhat extreme. Rather than, say, restricting how OpCount is used, or even ruling out linking with OpCount, we’re going to highly restrict what we can link with. This is very artificial, and done entirely so that the proofs can fit into a few lines. In this case, rather than a list, we are going to allow one Op before and one Op after our compiled program, neither of which can be OpCount, and further, we still want the resulting program to be well-formed (i.e., no errors, only one number on stack at end), so either there should be nothing before and after, or there is a Push n before and either Add or Sub after. (You should be able to verify that no other combination of Op before or after will fulfill our requirement).
-
We can define these possible linking contexts and a helper to combine them with programs as the following:
-
Inductive OpCtxt : Set :=
+
+ The former evaluating to 6, while the latter evaluates to 4. This means that there is no way we are going to be able to prove equivalence preservation (as we have a counter-example!).
+
+
+ So what do we do? Well, this scenario is not uncommon, and it’s the reason why many, even correct, compilers are not fully abstract. It’s also related to why many of these compilers may still have security problems! The solution is to somehow protect the compiled code from having the equivalences disrupted. If this were a real machine, we might want to have some flag on instructions that meant that they should not be counted, and OpCount would just not return anything if it saw any of those (or would count them as 0). Alternately, we might give our target language a type system that is able to rule out linking with code that uses the OpCount instruction, or perhaps restricts how it can be used.
+
+
+ Because this is a blog-post sized example, and I wanted to keep the proofs as short as possible, and the unstructured and untyped nature of our target (which, indeed, is much less structured than our source language; the fact that the source is so well-structured is why our whole-program correctness result was so easy!) will mean the proofs get relatively complex (or require us to add various auxiliary definitions), so the solution I’m going to take is somewhat extreme. Rather than, say, restricting how OpCount is used, or even ruling out linking with OpCount, we’re going to highly restrict what we can link with. This is very artificial, and done entirely so that the proofs can fit into a few lines. In this case, rather than a list, we are going to allow one Op before and one Op after our compiled program, neither of which can be OpCount, and further, we still want the resulting program to be well-formed (i.e., no errors, only one number on stack at end), so either there should be nothing before and after, or there is a Push n before and either Add or Sub after. (You should be able to verify that no other combination of Op before or after will fulfill our requirement).
+
+
+ We can define these possible linking contexts and a helper to combine them with programs as the following:
+
+
Inductive OpCtxt : Set :=
| PushAdd : Z -> OpCtxt
| PushSub : Z -> OpCtxt
| Empty : OpCtxt.
@@ -217,25 +321,35 @@
Concre
| PushSub n => Push n :: p ++ [Sub]
| Empty => p
end.
-
Using that, we can redefine contextual equivalence for our target language, only permitting these contexts:
+ Using that, we can redefine contextual equivalence for our target language, only permitting these contexts:
+
+
Definition ctxtequiv_Op (p1 p2 : list Op) : Prop :=
forall c : OpCtxt, eval_Op [] (link_Op c p1) = eval_Op [] (link_Op c p2).
-
The only change to our proof of equivalence reflection is on one line, to change our specialization of the target contexts, now to the Empty context:
-
specialize (eqtarget Empty) (* Empty rather than [] [] *)
-
With that change, we now believe that our compiler, when linked against these restricted contexts, is indeed fully abstract. So let’s prove it. If you recall from earlier in this post, proving equivalence preservation means proving that the top line implies the bottom, in the following diagram:
-
Cs'[s1] ≈ Cs'[s2]
+
+ The only change to our proof of equivalence reflection is on one line, to change our specialization of the target contexts, now to the Empty context:
+
+
specialize (eqtarget Empty) (* Empty rather than [] [] *)
+
+ With that change, we now believe that our compiler, when linked against these restricted contexts, is indeed fully abstract. So let’s prove it. If you recall from earlier in this post, proving equivalence preservation means proving that the top line implies the bottom, in the following diagram:
+
+
Cs'[s1] ≈ Cs'[s2]
≈ ≈
Ct[t1] ? Ct[t2]
-
In order to do that, we rely upon a backtranslation to get from Ct to Cs', where Ct is a target context, in this tiny example our restricted OpCtxt. We can write that backtranslation as:
+ In order to do that, we rely upon a backtranslation to get from Ct to Cs', where Ct is a target context, in this tiny example our restricted OpCtxt. We can write that backtranslation as:
+
+
Definition backtranslate (c : OpCtxt) : ExprCtxt :=
match c with
| PushAdd n => Plus2 (Num n) Hole
| PushSub n => Minus2 (Num n) Hole
| Empty => Hole
end.
-
The second part of the proof is showing that the vertical equivalences in the diagram hold — that is, that if s1 is compiled to t1 and Ct is backtranslated to Cs' then Ct[t1] is equivalent to Cs'[s1]. We can state and prove that as the following lemma, which follows from straightforward case analysis on the structure of our target context and backtranslation (using our eval_step lemmas):
-
Lemma back_translation_equiv :
+
+ The second part of the proof is showing that the vertical equivalences in the diagram hold — that is, that if s1 is compiled to t1 and Ct is backtranslated to Cs' then Ct[t1] is equivalent to Cs'[s1]. We can state and prove that as the following lemma, which follows from straightforward case analysis on the structure of our target context and backtranslation (using our eval_step lemmas):
+
+
Lemma back_translation_equiv :
forall c : OpCtxt,
forall p : list Op,
forall e : Expr,
@@ -253,8 +367,10 @@
Concre
| [ H : backtranslate _ = _ |- _] => invert H
end; simpl; iauto.
Qed.
-
Once we have that lemma, we can prove equivalence preservation directly. We do this by doing case analysis on the target context we are given, backtranslating it and then using the lemma we just proved to get the equivalence that we need.
-
Lemma equivalence_preservation :
+
+ Once we have that lemma, we can prove equivalence preservation directly. We do this by doing case analysis on the target context we are given, backtranslating it and then using the lemma we just proved to get the equivalence that we need.
+
Concre
erewrite back_translation_equiv with (e := e2) (c' := c'); iauto;
specialize (eqsource c'); simpl in *; congruence.
Qed.
-
This was obviously a very tiny language and a very restrictive linker that only allowed very restrictive contexts, but the general shape of the proof is the same as that used in more realistic languages published in research conferences today!
-
So next time you see a result about a correct (or even hoped to be correct) compiler, ask if it is fully abstract! And if it’s not, are the violations of equivalences something that could be exploited? Or something that would invalidate optimizations?
As stated at the top of the post, all the code in this post is available at https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
-
-
-
+
+ This was obviously a very tiny language and a very restrictive linker that only allowed very restrictive contexts, but the general shape of the proof is the same as that used in more realistic languages published in research conferences today!
+
+
+ So next time you see a result about a correct (or even hoped to be correct) compiler, ask if it is fully abstract! And if it’s not, are the violations of equivalences something that could be exploited? Or something that would invalidate optimizations?
+
+ As stated at the top of the post, all the code in this post is available at https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
+
+ A compiler that preserves and reflects equivalences is called a fully abstract compiler. This is a powerful property for a compiler that is different (but complimentary) to the more common notion of compiler correctness. So what does it mean, and how do we prove it?
+
+
+
+ All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
+
+
+
+ Both equivalence preservation and equivalence reflection (what make a compiler fully abstract) relate to how the compiler treats program equivalences, which in this case I’m considering observational equivalence. Two programs p1 and p2 are observationally equivalent if you cannot tell any difference between the result of running them, including any side effects.
+
+
+ For example, if the only observable behavior about programs in your language that you can make is see what output they print, this means that the two programs that print the same output, even if they are implemented in completely different ways are equivalent. Observational equivalence is extremely useful, especially for compilers, which when optimizing may change how a particular program is implemented but should not change the observable behavior. But it is also useful for programmers, who commonly refactor code, which means they change how the code is implemented (to make it easier to maintain, or extend, or better support some future addition), without changing any functionality. Refactoring is an equivalence-preserving transformation. We write observational equivalence on programs formally as:
+
+
p1 ≈ p1
+
+ Contextual equivalence
+
+
+ But we often also want to compile not just whole programs, but particular modules, expressions, or in the general sense, components, and in that case, we want an analogous notion of equivalence. Two components are contextually equivalent if in all program contexts they produce the same observable behavior. In other words, if you have two modules, but any way you combine those modules with the rest of a program (so the rest is syntactically identical, but the modules differ), the results are observationally equivalent, then those two modules are contextually equivalent. We will write this, overloading the ≈ for both observational and contextual equivalence, as:
+
+
e1 ≈ e1
+
+ As an example, if we consider a simple functional language and consider our components to be individual expressions, it should be clear that these two expressions are contextually equivalent:
+
+
λx. x * 2 ≈ λx. x + x
+
+ While they are implemented differently, no matter how they are used, the result will always be the same (as the only thing we can do with these functions is call them on an argument, and when we do, each will double its argument, even though in a different way). It’s important to note that contextual equivalence always depends on what is observable within the language. For example, in Javascript, you can reflect over the syntax of functions, and so the above two functions, written as:
+
+
+
function(x){ return x *2; } ≈ function(x){ return x + x; }
+
+
+ Would not be contextually equivalent, because there exists a program context that can distinguish them. What is that context? Well, if we imagine plugging in the functions above into the “hole” written as [·] below, the result will be different for the two functions! This is because the toString() method on functions in Javascript returns the source code of the function.
+
+
+
([·]).toString()
+
+
+ From the perspective of optimizations, this is troublesome, as you can’t be sure that a transformation between the above programs was safe (assuming one was much faster than the other), as there could be code that relied upon the particular way that the source code had been written. There are more complicated things you can do (like optimizing speculatively and falling back to unoptimized versions when reflection was needed). In general though, languages with that kind of reflection are both harder to write fast compilers for and harder to write secure compilers for, and while it’s not the topic of this post, it’s always important to know what you mean by contextual equivalence, which usually means: what can program contexts determine about components.
+
+
+ Part 1. Equivalence reflection
+
+
+ With that in mind, what does equivalence reflection and equivalence preservation for a compiler mean? Let’s start with equivalence reflection, as that’s the property that all your correct compilers already have. Equivalence reflection means that if two components, when compiled, are equivalent, then the source components must have been equivalent. We can write this more formally as (where we write s ↠ t to mean a component s is compiled to t):
+
+ What are the consequences of this definition? And why do correct compilers have this property? Well, the contrapositive is actually easier to understand: it says that if the source components weren’t equivalent then the target components would have to be different, or more formally:
+
+ If this didn’t hold, then the compiler could take different source components and compile them to the same target component! Which means you could have different source programs you wrote, which have observationally different behavior, and your compiler would produce the same target program! Any correct compiler has to preserve observational behavior, and it couldn’t do that in this case, as the target program only has one behavior, so it can’t have both the behavior of s1 and s2 (for pedants, not considering non-deterministic targets).
+
+
+ So equivalence reflection should be thought of as related to compiler correctness. Note, however, that equivalence reflection is not the same as compiler correctness: as long as your compiler produced different target programs for different source programs, all would be fine – your compiler could hash the source program and produce target programs that just printed the hash to the screen, and it would be equivalence reflecting, since it would produce different target programs not only for source programs that were observationally different, but even syntactically different! That would be a pretty bad compiler, and certainly not correct, but it would be equivalence reflecting.
+
+
+ Part 2. Equivalence preservation
+
+
+ Equivalence preservation, on the other hand, is the hallmark of fully abstract compilers, and it is a property that even most correct compilers do not have, though it would certainly be great if they did. It says that if two source components are equivalent, then the compiled versions must still be equivalent. Or, more formally:
+
+ (See, I just reversed the implication. Neat trick! But now it means something totally different). One place where this has been studied extensively is by security researchers, because what it tells you is that observers in the target can’t make observations that aren’t possible to distinguish in the source. Let’s make that a lot more concrete, where we will also see why it’s not frequently true, even of proven correct compilers.
+
+
+ Say your language has some information hiding feature, like a private field, and you have two source components that are identical except they have different values stored in the private field. If the compiler does not preserve the fact that it is private (because, for example, it translates the higher level object structure into a C struct or just a pile of memory accessed by assembly), then other target code could read the private values, and these two components will no longer be equivalent.
+
+
+ This also has implications for programmer refactoring and compiler optimizations: I (or my compiler) might think that it is safe to replace one version of the program with another, because I know that in my language these are equivalent, but what I don’t know is that the compiler reveals distinguishing characteristics, and perhaps some target-level library that I’m linking with relies upon details (that were supposed to be hidden) of how the old code worked. If that’s the case, I can have a working program, and make a change that does not change the meaning of the component in my language, but the whole program can no longer work.
+
+
+ Proving a compiler fully abstract, therefore, is all about proving equivalence preservation. So how do we do it?
+
+
+ How to prove equivalence preservation
+
+
+ Looking at what we have to prove, we see that given contextually equivalent source components s1 and s2, we need to show that t1 and t2 are contextually equivalent. We can expand this to explicitly quantify over the contexts that combine with the components to make whole programs:
+
+ Noting that as mentioned above, I am overloading ≈ to now mean whole-program observational equivalence (so, running the program produces the same observations).
+
+
+ First I’ll outline how the proof will go in general, and then we’ll consider an actual example compiler and do the proof for the concrete example.
+
+
+ We can see that in order to prove this, we need to consider an arbitrary target context Ct and show that Ct[t1] and Ct[t2] are observationally equivalent. We do this by showing that Ct[t1] is observationally equivalent to Cs'[s1] – that is, we produce a source context Cs' that we claim is equivalent to Ct. We do this by way of a “back-translation”, which will be a sort of compiler in reverse. Assuming that we can produce such a Cs' and that Cs'[s1] and Ct[t1] (and correspondingly Cs'[s2] and Ct[t2]) are indeed observationally equivalent (noting that this relies upon a cross-language notion of observations), we can prove that Ct[t1] and Ct[t2] are observationally equivalent by instantiating our hypothesis ∀Cs. Cs[s1] ≈ Cs[s2] with Cs'. This tells us that Cs'[s1] ≈ Cs'[s2], and by transitivity, Ct[t1] ≈ Ct[t2].
+
+
+ It can be helpful to see it in a diagram, where the top line is given by the hypothesis (once instantiated with the source context we come up with by way of backtranslation) and coming up with the back-translation and showing that Ct and Cs' are equivalent is the hard part of the proof.
+
+
Cs'[s1] ≈ Cs'[s2]
+ ≈ ≈
+Ct[t1] ? Ct[t2]
+
+ Concrete example of languages, compiler, & proof of full abstraction
+
+
+ Let’s make this concrete with an example. This will be presented some in english and some in the proof assistant Coq. This post isn’t an introduction to Coq; for that, see e.g., Bertot and Casteron’s Coq’Art, Chlipala’s CPDT, or Pierce et al’s Software Foundations.
+
+
+ Our source language is arithmetic expressions over integers with addition and subtraction:
+
+
e ::= n
+ | e + e
+ | e - e
+
+ This is written down in Coq as:
+
+
Inductive Expr : Set :=
+ | Num : Z -> Expr
+ | Plus : Expr -> Expr -> Expr
+ | Minus : Expr -> Expr -> Expr.
+
+ Evaluation is standard (if you wanted to parse this, you would need to deal with left/right associativity, and probably add parenthesis to disambiguate, but we consider the point where you already have a tree structure, so it is unambiguous). We can write the evaluation function as:
+
+
Fixpoint eval_Expr (e : Expr) : Z :=
+ match e with
+ | Num n => n
+ | Plus e1 e2 => eval_Expr e1 + eval_Expr e2
+ | Minus e1 e2 => eval_Expr e1 - eval_Expr e2
+ end.
+
+ Our target language is a stack machine which uses a stack of integers to evaluate the sequence of instructions. In addition to having instructions to add and subtract, our stack machine has an extra instruction: OpCount. This instruction returns how many operations remain on the stack machine, and it puts that integer on the top of the stack. This is the simplest abstraction I could think of that will provide an interesting case study for problems of full abstraction, and is a stand-in for both reflection (as it allows the program to inspect other parts of the program), and also somewhat of a proxy for execution time (remaining). Our stack machine requires that the stack be empty at the end of execution.
+
+
Inductive Op : Set :=
+| Push : Z -> Op
+| Add : Op
+| Sub : Op
+| OpCount : Op.
+
+ Let’s see the compiler and the evaluation function (note that we reverse the order when we pop values off the stack from when we put them on in the compiler).
+
+
Fixpoint compile_Expr (e : Expr) : list Op :=
+ match e with
+ | Num n => [Push n]
+ | Plus e1 e2 => compile_Expr e1 ++ compile_Expr e2 ++ [Add]
+ | Minus e1 e2 => compile_Expr e1 ++ compile_Expr e2 ++ [Sub]
+ end.
+
+Fixpoint eval_Op (s : list Z) (ops : list Op) : option Z :=
+ match (ops, s) with
+ | ([], [n]) => Some n
+ | (Push z :: rest, _) => eval_Op (z :: s) rest
+ | (Add :: rest, n2 :: n1 :: ns) => eval_Op (n1 + n2 :: ns)%Z rest
+ | (Sub :: rest, n2 :: n1 :: ns) => eval_Op (n1 - n2 :: ns)%Z rest
+ | (OpCount :: rest, _) => eval_Op (Z.of_nat (length rest) :: s) rest
+ | _ => None
+ end.
+
+ We can prove a basic (whole program) compiler correctness result for this (for more detail on this type of result, see this post), where first we prove a general eval_step lemma and then use that to prove correctness (note: the hint and hint_rewrite tactics are from an experimental literatecoq library that adds support for proof-local hinting, which some might think is a hack but I think makes the proofs much more readable/maintainable).
+
+
Lemma eval_step : forall a : Expr, forall s : list Z, forall xs : list Op,
+ eval_Op s (compile_Expr a ++ xs) = eval_Op (eval_Expr a :: s) xs.
+Proof.
+ hint_rewrite List.app_assoc_reverse.
+ induction a; intros; iauto; simpl;
+ hint_rewrite IHa2, IHa1;
+ iauto'.
+Qed.
+
+Theorem compiler_correctness : forall a : Expr,
+ eval_Op [] (compile_Expr a) = Some (eval_Expr a).
+Proof.
+ hint_rewrite eval_step.
+ hint_simpl.
+ induction a; iauto'.
+Qed.
+
+ Now, before we can state properties about equivalences, we need to define what we mean by equivalence for our source and target languages. Both produce no side effects, so the only observation is the end result. Thus, observational equivalence is pretty straightforward; it follows from evaluation:
+
+ But, we want to talk not just about whole programs, but about partial programs that can get linked with other parts to create whole programs. In order to do that, we create a new type of “evaluation context” for our Expr, that has a hole (typically written on paper as [·]). This is a program that is missing an expression, which must be filled into the hole. Given how simple our language is, any expression can be filled in to the hole and that will produce a valid program. We only want to have one hole per partial program, so in the cases for + and -, one branch must be a normal Expr (so it contains no hole), and the other can contain one hole. Our link_Expr function takes a context and an expression and fills in the hole.
+
+
Inductive ExprCtxt : Set :=
+| Hole : ExprCtxt
+| Plus1 : ExprCtxt -> Expr -> ExprCtxt
+| Plus2 : Expr -> ExprCtxt -> ExprCtxt
+| Minus1 : ExprCtxt -> Expr -> ExprCtxt
+| Minus2 : Expr -> ExprCtxt -> ExprCtxt.
+
+Fixpoint link_Expr (c : ExprCtxt) (e : Expr) : Expr :=
+ match c with
+ | Hole => e
+ | Plus1 c' e' => Plus (link_Expr c' e) e'
+ | Plus2 e' c' => Plus e' (link_Expr c' e)
+ | Minus1 c' e' => Minus (link_Expr c' e) e'
+ | Minus2 e' c' => Minus e' (link_Expr c' e)
+ end.
+
+ For our stack machine, partial programs are much easier, since a program is just a list of Op, which means that any program can be extended by adding new Ops on either end (or inserting in the middle).
+
+
+ With ExprCtxt, we can now define “contextual equivalence” for our source language:
+
+
Definition ctxtequiv_Expr (e1 e2 : Expr) : Prop :=
+ forall c : ExprCtxt, eval_Expr (link_Expr c e1) = eval_Expr (link_Expr c e2).
+
+ We can do the same with our target, simplifying slightly and saying that we will allow adding arbitrary Ops before and after, but not in the middle, of an existing sequence of Ops.
+
+ To prove our compiler fully abstract, remember we need to prove that it preserves and reflects equivalences. Since we already proved that it is correct, proving that it reflects equivalences should be relatively straightforward, so lets start there. The lemma we want is:
+
+
Lemma equivalence_reflection :
+ forall e1 e2 : Expr,
+ forall p1 p2 : list Op,
+ forall comp1 : compile_Expr e1 = p1,
+ forall comp2 : compile_Expr e2 = p2,
+ forall eqtarget : ctxtequiv_Op p1 p2,
+ ctxtequiv_Expr e1 e2.
+Proof.
+ unfold ctxtequiv_Expr, ctxtequiv_Op in *.
+ intros.
+ induction c; simpl; try solve [hint_rewrite IHc; iauto];
+ (* NOTE(dbp 2018-04-16): Only the base case, for Hole, remains *)
+ [idtac].
+ (* NOTE(dbp 2018-04-16): In the hole case, specialize the target ctxt equiv hypothesis to empty *)
+ specialize (eqtarget [] []); simpl in eqtarget; repeat rewrite app_nil_r in eqtarget.
+
+ (* NOTE(dbp 2018-04-16): At this point, we know e1 -> p1, e2 -> p2, & p1 ≈ p2,
+ and want e1 ≈ e2, which follows from compiler correctness *)
+ rewrite <- comp1 in eqtarget. rewrite <- comp2 in eqtarget.
+ repeat rewrite compiler_correctness in eqtarget.
+ inversion eqtarget.
+ reflexivity.
+Qed.
+
+ This lemma is a little more involved, but not by much; we proceed by induction on the structure of the evaluation contexts, and in all but the case for Hole, the induction hypothesis gives us exactly what we need. In the base case, we need to appeal to the compiler_correctness lemma we proved earlier, but otherwise it follows easily.
+
+
+ So what about equivalence preservation? We can state the lemma quite easily:
+
+ But proving it is another matter. In fact, it’s not provable, because it’s not true. We can come up with a counter-example, using that OpCount instruction we (surreptitiously) added to our target language. These two expressions are contextually equivalent in our source language (should be obvious, but putting a proof):
+
+ But they are not contextually equivalent in the target; in particular, if we put the OpCount instruction before and then the Add instruction afterwards, the result will be the value plus the number of instructions it took to compute it:
+
+ The former evaluating to 6, while the latter evaluates to 4. This means that there is no way we are going to be able to prove equivalence preservation (as we have a counter-example!).
+
+
+ So what do we do? Well, this scenario is not uncommon, and it’s the reason why many, even correct, compilers are not fully abstract. It’s also related to why many of these compilers may still have security problems! The solution is to somehow protect the compiled code from having the equivalences disrupted. If this were a real machine, we might want to have some flag on instructions that meant that they should not be counted, and OpCount would just not return anything if it saw any of those (or would count them as 0). Alternately, we might give our target language a type system that is able to rule out linking with code that uses the OpCount instruction, or perhaps restricts how it can be used.
+
+
+ Because this is a blog-post sized example, and I wanted to keep the proofs as short as possible, and the unstructured and untyped nature of our target (which, indeed, is much less structured than our source language; the fact that the source is so well-structured is why our whole-program correctness result was so easy!) will mean the proofs get relatively complex (or require us to add various auxiliary definitions), so the solution I’m going to take is somewhat extreme. Rather than, say, restricting how OpCount is used, or even ruling out linking with OpCount, we’re going to highly restrict what we can link with. This is very artificial, and done entirely so that the proofs can fit into a few lines. In this case, rather than a list, we are going to allow one Op before and one Op after our compiled program, neither of which can be OpCount, and further, we still want the resulting program to be well-formed (i.e., no errors, only one number on stack at end), so either there should be nothing before and after, or there is a Push n before and either Add or Sub after. (You should be able to verify that no other combination of Op before or after will fulfill our requirement).
+
+
+ We can define these possible linking contexts and a helper to combine them with programs as the following:
+
+
Inductive OpCtxt : Set :=
+| PushAdd : Z -> OpCtxt
+| PushSub : Z -> OpCtxt
+| Empty : OpCtxt.
+
+Definition link_Op (c : OpCtxt) (p : list Op) : list Op :=
+ match c with
+ | PushAdd n => Push n :: p ++ [Add]
+ | PushSub n => Push n :: p ++ [Sub]
+ | Empty => p
+ end.
+
+ Using that, we can redefine contextual equivalence for our target language, only permitting these contexts:
+
+
Definition ctxtequiv_Op (p1 p2 : list Op) : Prop :=
+ forall c : OpCtxt, eval_Op [] (link_Op c p1) = eval_Op [] (link_Op c p2).
+
+
+ The only change to our proof of equivalence reflection is on one line, to change our specialization of the target contexts, now to the Empty context:
+
+
specialize (eqtarget Empty) (* Empty rather than [] [] *)
+
+ With that change, we now believe that our compiler, when linked against these restricted contexts, is indeed fully abstract. So let’s prove it. If you recall from earlier in this post, proving equivalence preservation means proving that the top line implies the bottom, in the following diagram:
+
+
Cs'[s1] ≈ Cs'[s2]
+ ≈ ≈
+Ct[t1] ? Ct[t2]
+
+ In order to do that, we rely upon a backtranslation to get from Ct to Cs', where Ct is a target context, in this tiny example our restricted OpCtxt. We can write that backtranslation as:
+
+
Definition backtranslate (c : OpCtxt) : ExprCtxt :=
+ match c with
+ | PushAdd n => Plus2 (Num n) Hole
+ | PushSub n => Minus2 (Num n) Hole
+ | Empty => Hole
+ end.
+
+ The second part of the proof is showing that the vertical equivalences in the diagram hold — that is, that if s1 is compiled to t1 and Ct is backtranslated to Cs' then Ct[t1] is equivalent to Cs'[s1]. We can state and prove that as the following lemma, which follows from straightforward case analysis on the structure of our target context and backtranslation (using our eval_step lemmas):
+
+
Lemma back_translation_equiv :
+ forall c : OpCtxt,
+ forall p : list Op,
+ forall e : Expr,
+ forall c' : ExprCtxt,
+ compile_Expr e = p ->
+ backtranslate c = c' ->
+ eval_Op [] (link_Op c p) = Some (eval_Expr (link_Expr c' e)).
+Proof.
+ hint_rewrite eval_step, eval_step'.
+ intros.
+ match goal with
+ | [ c : OpCtxt |- _] => destruct c
+ end;
+ match goal with
+ | [ H : backtranslate _ = _ |- _] => invert H
+ end; simpl; iauto.
+Qed.
+
+ Once we have that lemma, we can prove equivalence preservation directly. We do this by doing case analysis on the target context we are given, backtranslating it and then using the lemma we just proved to get the equivalence that we need.
+
+ This was obviously a very tiny language and a very restrictive linker that only allowed very restrictive contexts, but the general shape of the proof is the same as that used in more realistic languages published in research conferences today!
+
+
+ So next time you see a result about a correct (or even hoped to be correct) compiler, ask if it is fully abstract! And if it’s not, are the violations of equivalences something that could be exploited? Or something that would invalidate optimizations?
+
+ As stated at the top of the post, all the code in this post is available at https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
+
I’m currently a graduate student at Northeastern University studying programming languages, type systems, and language interoperability with Amal Ahmed. If you want to learn more about what we’re working on, Amal gave a keynote on our research at StrangeLoop 2018. Previously, I helped start and run a worker-owned software contracting company, Position Development. I studied math and computer science undergrad at Brown University where I worked with Shriram Krishnamurthi. I’m interested in programming languages, education, and leftist politics. This is my personal website.
-
Contact
-
The easiest way to get in touch is via email: dbp@dbpmail.net. Note that if you have another email address for me, it will probably work as well. I’m currently located in Boston, MA.
+ I’m currently a graduate student at Northeastern University studying programming languages, type systems, and language interoperability with Amal Ahmed. If you want to learn more about what we’re working on, Amal gave a keynote on our research at StrangeLoop 2018. Previously, I helped start and run a worker-owned software contracting company, Position Development. I studied math and computer science undergrad at Brown University where I worked with Shriram Krishnamurthi. I’m interested in programming languages, education, and leftist politics. This is my personal website.
+
+
+ Contact
+
+
+ The easiest way to get in touch is via email: dbp@dbpmail.net. Note that if you have another email address for me, it will probably work as well. I’m currently located in Boston, MA.
+
+ Fn is a web framework written in the functional language Haskell, with the explicit goal of writing code in a more functional style - handlers to web requests are normal functions with arguments and return types (rather than monadic actions), and there are no monad transformers. In many ways it makes writing web code more like writing plain Haskell. github.com/positiondev/fn.
+
+
+
+
+ Hworker - A reliable at-least-once job processor.
+
+
+ Hworker is a Redis-backed background job processor written in Haskell. It handles running jobs that, for various reasons (usually that they take a while to complete), have to run in the background (sending email is a common example). It has a strong focus on reliability, in that once a job has been queued, it is guaranteed to run at least once. github.com/dbp/hworker.
+
+
+
+
+ Periodic - A reliable scheduled job runner.
+
+
+ Periodic is a Redis-backed scheduled job runner (ie, like cron) written in Haskell. It handles running jobs at particular intervals. github.com/positiondev/periodic.
+
+
+
+
+ Rivet - A migration tool.
+
+
+ Rivet is the beginning of a database migration tool for Haskell, that allows you to both write SQL migrations, but also ones that have to run arbitrary Haskell code. It’s pretty early, but is used in various projects. github.com/dbp/rivet.
+
+
+
+
+ Pyret Programming Language
+
+
+ A research / teaching language being developed by a small team at Brown University. It’s a scripting language that tries to learn from the best features that scripting languages have, but it also has serious functional roots and optional types. pyret.org.
+
+ Fn is a web framework written in the functional language Haskell, with the explicit goal of writing code in a more functional style - handlers to web requests are normal functions with arguments and return types (rather than monadic actions), and there are no monad transformers. In many ways it makes writing web code more like writing plain Haskell. github.com/positiondev/fn.
+
+
+
+
+ Hworker - A reliable at-least-once job processor.
+
+
+ Hworker is a Redis-backed background job processor written in Haskell. It handles running jobs that, for various reasons (usually that they take a while to complete), have to run in the background (sending email is a common example). It has a strong focus on reliability, in that once a job has been queued, it is guaranteed to run at least once. github.com/dbp/hworker.
+
+
+
+
+ Periodic - A reliable scheduled job runner.
+
+
+ Periodic is a Redis-backed scheduled job runner (ie, like cron) written in Haskell. It handles running jobs at particular intervals. github.com/positiondev/periodic.
+
+
+
+
+ Rivet - A migration tool.
+
+
+ Rivet is the beginning of a database migration tool for Haskell, that allows you to both write SQL migrations, but also ones that have to run arbitrary Haskell code. It’s pretty early, but is used in various projects. github.com/dbp/rivet.
+
+
+
+
+ Pyret Programming Language
+
+
+ A research / teaching language being developed by a small team at Brown University. It’s a scripting language that tries to learn from the best features that scripting languages have, but it also has serious functional roots and optional types. pyret.org.
+
+ Compiler correctness is an old problem, with results stretching back
beyond the last half-century. Founding the field, John McCarthy and James
Painter set out to build a "completely trustworthy compiler". And yet,
until quite recently, even despite truly impressive verification efforts,
@@ -46,8 +51,10 @@
This is a chronological list of some things I’ve been reading, with brief notes, reflections, impressions, etc. The main criteria for inclusion are: non-triviality (usually this means some length, but not always), and interest (so an uninteresting paper or book doesn’t merit inclusion).
-
Often, clusters of papers are related. This is due to the wonderful experience of reading research papers, where one paper references another, so I start reading that instead, but the second paper is actually based on a third, etc. Finding papers like this (vs. searching online) is also great because it naturally selects papers that were well written and/or influential.
-
May 2014
-
-
Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Yves Bertot. 2004. Link (no freely available version).
-
-
Another book (~500pgs), this one a pretty concrete tutorial / reference to the Coq theorem prover. Having taken a course based on the other main reference to learning Coq, “Software Foundations” (which IS freely available), I can say definitively that this book is a much better introduction. It starts from the basics, and explains a lot of the mechanics of the system while explaining the theory and method of proving theorems. It becomes pretty reference-like as you move through (indeed I only read closely the first half), but is extremely clear throughout. For getting to know Coq (which is still, at this point, the best supported automated theorem prover), this seems to be the best reference. I had heard that before, but had not been excited about the cover price (I’m so spoiled by CS people giving things away for free!). Let it be stated again - it’s worth the price.
-
-
-
April 2014
-
-
FizzBuzz in Haskell by Embedding a Domain-Specific Language. Maciej Pirog. 2014. PDF.
-
-
_A fun short paper about solving the classic interview question using languages / interpreters as a way of approaching the problem. The “trick” to the result is continuations, in order to implement control flow in the DSL.
-
-
Type Theory and Functional Programming. Simon Thompson. 1991. PDF.
-
-
This book (~400pgs) introduces type theory and then applies it to functional programming. I read the first half throughout April, (through Chapter 5). It’s a very formal (perhaps obviously) presentation of first the typed lambda calculus, and then additions that get it more into dependently typed land. It became dense enough that I put it down for the time being, as I didn’t feel that I was getting much (concrete) out of the presentation.
-
-
-
March 2014
-
-
Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming. Ralf Lammel, Simon Peyton Jones. 2003. PDF.
-
-
Presenting patterns for generic traversal and modifications of data structures where you only have to write the cases you care about. Written for an audience of normal Haskell programmers.
-
-
Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages. J Carette, O Kiselyov, CC Shan. 2007. PDF.
-
-
Writing embedded typed languages inside typed languages without tags or interpretation (the primary idea is to use functions instead of data to represent terms of the language). Really neat presentation, as it uses both OCaml and Haskell with real, working examples. The language syntaxes are module signatures in the former, and type class instances in the latter. Then the semantics are modules or class instances respectively.
-
-
Fun with Type Functions. Oleg Kiselyov, Ken Shan, and Simon Peyton Jones. 2010. PDF
-
-
A tutorial style introduction to type level functions, which are implemented in terms of type synonym families. One of the main motivations for this is parametrizing type classes over types in a more straightforwardly functional way (as contrasted with functional dependencies). Very readable.
-
-
Computing at school in the UK: from guerrilla to gorillaS. Simon Peyton Jones, Simon Humphreys, Bill Mitchell. 2013. PDF
-
-
This was an interesting mix of motivation for why teaching computer science as a core science (along with Physics, Chemistry, and Biology) starting in primary schoool is important, and also how the group “Computing at School” in the UK has essentially succeeded at doing this.
-
-
Towards a Declarative Web. (a.k.a. Haste Report). Anton Ekblad. 2012. PDF
-
-
Compilation of Haskell to Javascript. Not a super new idea, but seems to get pretty good performance out of a relatively minimal implementation. And, people seem to be using it for real work, which is definitely neat!
-
-
-
+
+
+
+ Things I’ve read
+
+
+ This is a chronological list of some things I’ve been reading, with brief notes, reflections, impressions, etc. The main criteria for inclusion are: non-triviality (usually this means some length, but not always), and interest (so an uninteresting paper or book doesn’t merit inclusion).
+
+
+ Often, clusters of papers are related. This is due to the wonderful experience of reading research papers, where one paper references another, so I start reading that instead, but the second paper is actually based on a third, etc. Finding papers like this (vs. searching online) is also great because it naturally selects papers that were well written and/or influential.
+
+
+ May 2014
+
+
+
+
+ Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Yves Bertot. 2004. Link (no freely available version).
+
+
+
+ Another book (~500pgs), this one a pretty concrete tutorial / reference to the Coq theorem prover. Having taken a course based on the other main reference to learning Coq, “Software Foundations” (which IS freely available), I can say definitively that this book is a much better introduction. It starts from the basics, and explains a lot of the mechanics of the system while explaining the theory and method of proving theorems. It becomes pretty reference-like as you move through (indeed I only read closely the first half), but is extremely clear throughout. For getting to know Coq (which is still, at this point, the best supported automated theorem prover), this seems to be the best reference. I had heard that before, but had not been excited about the cover price (I’m so spoiled by CS people giving things away for free!). Let it be stated again - it’s worth the price.
+
+
+
+
+
+ April 2014
+
+
+
+
+ FizzBuzz in Haskell by Embedding a Domain-Specific Language. Maciej Pirog. 2014. PDF.
+
+
+
+ _A fun short paper about solving the classic interview question using languages / interpreters as a way of approaching the problem. The “trick” to the result is continuations, in order to implement control flow in the DSL.
+
+
+
+
+
+ Type Theory and Functional Programming. Simon Thompson. 1991. PDF.
+
+
+
+ This book (~400pgs) introduces type theory and then applies it to functional programming. I read the first half throughout April, (through Chapter 5). It’s a very formal (perhaps obviously) presentation of first the typed lambda calculus, and then additions that get it more into dependently typed land. It became dense enough that I put it down for the time being, as I didn’t feel that I was getting much (concrete) out of the presentation.
+
+
+
+
+
+ March 2014
+
+
+
+
+ Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming. Ralf Lammel, Simon Peyton Jones. 2003. PDF.
+
+
+
+ Presenting patterns for generic traversal and modifications of data structures where you only have to write the cases you care about. Written for an audience of normal Haskell programmers.
+
+
+
+
+
+ Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages. J Carette, O Kiselyov, CC Shan. 2007. PDF.
+
+
+
+ Writing embedded typed languages inside typed languages without tags or interpretation (the primary idea is to use functions instead of data to represent terms of the language). Really neat presentation, as it uses both OCaml and Haskell with real, working examples. The language syntaxes are module signatures in the former, and type class instances in the latter. Then the semantics are modules or class instances respectively.
+
+
+
+
+
+ Fun with Type Functions. Oleg Kiselyov, Ken Shan, and Simon Peyton Jones. 2010. PDF
+
+
+
+ A tutorial style introduction to type level functions, which are implemented in terms of type synonym families. One of the main motivations for this is parametrizing type classes over types in a more straightforwardly functional way (as contrasted with functional dependencies). Very readable.
+
+
+
+
+
+ Computing at school in the UK: from guerrilla to gorillaS. Simon Peyton Jones, Simon Humphreys, Bill Mitchell. 2013. PDF
+
+
+
+ This was an interesting mix of motivation for why teaching computer science as a core science (along with Physics, Chemistry, and Biology) starting in primary schoool is important, and also how the group “Computing at School” in the UK has essentially succeeded at doing this.
+
+
+
+
+
+ Towards a Declarative Web. (a.k.a. Haste Report). Anton Ekblad. 2012. PDF
+
+
+
+ Compilation of Haskell to Javascript. Not a super new idea, but seems to get pretty good performance out of a relatively minimal implementation. And, people seem to be using it for real work, which is definitely neat!
+
+ This is a chronological list of some things I’ve been reading, with brief notes, reflections, impressions, etc. The main criteria for inclusion are: non-triviality (usually this means some length, but not always), and interest (so an uninteresting paper or book doesn’t merit inclusion).
+
+
+ Often, clusters of papers are related. This is due to the wonderful experience of reading research papers, where one paper references another, so I start reading that instead, but the second paper is actually based on a third, etc. Finding papers like this (vs. searching online) is also great because it naturally selects papers that were well written and/or influential.
+
+
+ May 2014
+
+
+
+
+ Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Yves Bertot. 2004. Link (no freely available version).
+
+
+
+ Another book (~500pgs), this one a pretty concrete tutorial / reference to the Coq theorem prover. Having taken a course based on the other main reference to learning Coq, “Software Foundations” (which IS freely available), I can say definitively that this book is a much better introduction. It starts from the basics, and explains a lot of the mechanics of the system while explaining the theory and method of proving theorems. It becomes pretty reference-like as you move through (indeed I only read closely the first half), but is extremely clear throughout. For getting to know Coq (which is still, at this point, the best supported automated theorem prover), this seems to be the best reference. I had heard that before, but had not been excited about the cover price (I’m so spoiled by CS people giving things away for free!). Let it be stated again - it’s worth the price.
+
+
+
+
+
+ April 2014
+
+
+
+
+ FizzBuzz in Haskell by Embedding a Domain-Specific Language. Maciej Pirog. 2014. PDF.
+
+
+
+ _A fun short paper about solving the classic interview question using languages / interpreters as a way of approaching the problem. The “trick” to the result is continuations, in order to implement control flow in the DSL.
+
+
+
+
+
+ Type Theory and Functional Programming. Simon Thompson. 1991. PDF.
+
+
+
+ This book (~400pgs) introduces type theory and then applies it to functional programming. I read the first half throughout April, (through Chapter 5). It’s a very formal (perhaps obviously) presentation of first the typed lambda calculus, and then additions that get it more into dependently typed land. It became dense enough that I put it down for the time being, as I didn’t feel that I was getting much (concrete) out of the presentation.
+
+
+
+
+
+ March 2014
+
+
+
+
+ Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming. Ralf Lammel, Simon Peyton Jones. 2003. PDF.
+
+
+
+ Presenting patterns for generic traversal and modifications of data structures where you only have to write the cases you care about. Written for an audience of normal Haskell programmers.
+
+
+
+
+
+ Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages. J Carette, O Kiselyov, CC Shan. 2007. PDF.
+
+
+
+ Writing embedded typed languages inside typed languages without tags or interpretation (the primary idea is to use functions instead of data to represent terms of the language). Really neat presentation, as it uses both OCaml and Haskell with real, working examples. The language syntaxes are module signatures in the former, and type class instances in the latter. Then the semantics are modules or class instances respectively.
+
+
+
+
+
+ Fun with Type Functions. Oleg Kiselyov, Ken Shan, and Simon Peyton Jones. 2010. PDF
+
+
+
+ A tutorial style introduction to type level functions, which are implemented in terms of type synonym families. One of the main motivations for this is parametrizing type classes over types in a more straightforwardly functional way (as contrasted with functional dependencies). Very readable.
+
+
+
+
+
+ Computing at school in the UK: from guerrilla to gorillaS. Simon Peyton Jones, Simon Humphreys, Bill Mitchell. 2013. PDF
+
+
+
+ This was an interesting mix of motivation for why teaching computer science as a core science (along with Physics, Chemistry, and Biology) starting in primary schoool is important, and also how the group “Computing at School” in the UK has essentially succeeded at doing this.
+
+
+
+
+
+ Towards a Declarative Web. (a.k.a. Haste Report). Anton Ekblad. 2012. PDF
+
+
+
+ Compilation of Haskell to Javascript. Not a super new idea, but seems to get pretty good performance out of a relatively minimal implementation. And, people seem to be using it for real work, which is definitely neat!
+
+
+
+
+
+
+
diff --git a/_site/rss.xml b/_site/rss.xml
deleted file mode 100644
index d18f414..0000000
--- a/_site/rss.xml
+++ /dev/null
@@ -1,1589 +0,0 @@
-
-
-
- dbp.io :: essays
- http://dbp.io
-
-
- Thu, 19 Apr 2018 00:00:00 UT
-
- How to prove a compiler fully abstract
- http://dbp.io/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html
- How to prove a compiler fully abstract
-
-
by Daniel Patterson on April 19, 2018
-
-
A compiler that preserves and reflects equivalences is called a fully abstract compiler. This is a powerful property for a compiler that is different (but complimentary) to the more common notion of compiler correctness. So what does it mean, and how do we prove it?
-
-
All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
-
-
Both equivalence preservation and equivalence reflection (what make a compiler fully abstract) relate to how the compiler treats program equivalences, which in this case I’m considering observational equivalence. Two programs p1 and p2 are observationally equivalent if you cannot tell any difference between the result of running them, including any side effects.
-
For example, if the only observable behavior about programs in your language that you can make is see what output they print, this means that the two programs that print the same output, even if they are implemented in completely different ways are equivalent. Observational equivalence is extremely useful, especially for compilers, which when optimizing may change how a particular program is implemented but should not change the observable behavior. But it is also useful for programmers, who commonly refactor code, which means they change how the code is implemented (to make it easier to maintain, or extend, or better support some future addition), without changing any functionality. Refactoring is an equivalence-preserving transformation. We write observational equivalence on programs formally as:
-
p1 ≈ p1
-
Contextual equivalence
-
But we often also want to compile not just whole programs, but particular modules, expressions, or in the general sense, components, and in that case, we want an analogous notion of equivalence. Two components are contextually equivalent if in all program contexts they produce the same observable behavior. In other words, if you have two modules, but any way you combine those modules with the rest of a program (so the rest is syntactically identical, but the modules differ), the results are observationally equivalent, then those two modules are contextually equivalent. We will write this, overloading the ≈ for both observational and contextual equivalence, as:
-
e1 ≈ e1
-
As an example, if we consider a simple functional language and consider our components to be individual expressions, it should be clear that these two expressions are contextually equivalent:
-
λx. x * 2 ≈ λx. x + x
-
While they are implemented differently, no matter how they are used, the result will always be the same (as the only thing we can do with these functions is call them on an argument, and when we do, each will double its argument, even though in a different way). It’s important to note that contextual equivalence always depends on what is observable within the language. For example, in Javascript, you can reflect over the syntax of functions, and so the above two functions, written as:
-
function(x){ return x *2; } ≈ function(x){ return x + x; }
-
Would not be contextually equivalent, because there exists a program context that can distinguish them. What is that context? Well, if we imagine plugging in the functions above into the “hole” written as [·] below, the result will be different for the two functions! This is because the toString() method on functions in Javascript returns the source code of the function.
-
([·]).toString()
-
From the perspective of optimizations, this is troublesome, as you can’t be sure that a transformation between the above programs was safe (assuming one was much faster than the other), as there could be code that relied upon the particular way that the source code had been written. There are more complicated things you can do (like optimizing speculatively and falling back to unoptimized versions when reflection was needed). In general though, languages with that kind of reflection are both harder to write fast compilers for and harder to write secure compilers for, and while it’s not the topic of this post, it’s always important to know what you mean by contextual equivalence, which usually means: what can program contexts determine about components.
-
Part 1. Equivalence reflection
-
With that in mind, what does equivalence reflection and equivalence preservation for a compiler mean? Let’s start with equivalence reflection, as that’s the property that all your correct compilers already have. Equivalence reflection means that if two components, when compiled, are equivalent, then the source components must have been equivalent. We can write this more formally as (where we write s ↠ t to mean a component s is compiled to t):
What are the consequences of this definition? And why do correct compilers have this property? Well, the contrapositive is actually easier to understand: it says that if the source components weren’t equivalent then the target components would have to be different, or more formally:
If this didn’t hold, then the compiler could take different source components and compile them to the same target component! Which means you could have different source programs you wrote, which have observationally different behavior, and your compiler would produce the same target program! Any correct compiler has to preserve observational behavior, and it couldn’t do that in this case, as the target program only has one behavior, so it can’t have both the behavior of s1 and s2 (for pedants, not considering non-deterministic targets).
-
So equivalence reflection should be thought of as related to compiler correctness. Note, however, that equivalence reflection is not the same as compiler correctness: as long as your compiler produced different target programs for different source programs, all would be fine – your compiler could hash the source program and produce target programs that just printed the hash to the screen, and it would be equivalence reflecting, since it would produce different target programs not only for source programs that were observationally different, but even syntactically different! That would be a pretty bad compiler, and certainly not correct, but it would be equivalence reflecting.
-
Part 2. Equivalence preservation
-
Equivalence preservation, on the other hand, is the hallmark of fully abstract compilers, and it is a property that even most correct compilers do not have, though it would certainly be great if they did. It says that if two source components are equivalent, then the compiled versions must still be equivalent. Or, more formally:
(See, I just reversed the implication. Neat trick! But now it means something totally different). One place where this has been studied extensively is by security researchers, because what it tells you is that observers in the target can’t make observations that aren’t possible to distinguish in the source. Let’s make that a lot more concrete, where we will also see why it’s not frequently true, even of proven correct compilers.
-
Say your language has some information hiding feature, like a private field, and you have two source components that are identical except they have different values stored in the private field. If the compiler does not preserve the fact that it is private (because, for example, it translates the higher level object structure into a C struct or just a pile of memory accessed by assembly), then other target code could read the private values, and these two components will no longer be equivalent.
-
This also has implications for programmer refactoring and compiler optimizations: I (or my compiler) might think that it is safe to replace one version of the program with another, because I know that in my language these are equivalent, but what I don’t know is that the compiler reveals distinguishing characteristics, and perhaps some target-level library that I’m linking with relies upon details (that were supposed to be hidden) of how the old code worked. If that’s the case, I can have a working program, and make a change that does not change the meaning of the component in my language, but the whole program can no longer work.
-
Proving a compiler fully abstract, therefore, is all about proving equivalence preservation. So how do we do it?
-
How to prove equivalence preservation
-
Looking at what we have to prove, we see that given contextually equivalent source components s1 and s2, we need to show that t1 and t2 are contextually equivalent. We can expand this to explicitly quantify over the contexts that combine with the components to make whole programs:
Noting that as mentioned above, I am overloading ≈ to now mean whole-program observational equivalence (so, running the program produces the same observations).
-
First I’ll outline how the proof will go in general, and then we’ll consider an actual example compiler and do the proof for the concrete example.
-
We can see that in order to prove this, we need to consider an arbitrary target context Ct and show that Ct[t1] and Ct[t2] are observationally equivalent. We do this by showing that Ct[t1] is observationally equivalent to Cs'[s1] – that is, we produce a source context Cs' that we claim is equivalent to Ct. We do this by way of a “back-translation”, which will be a sort of compiler in reverse. Assuming that we can produce such a Cs' and that Cs'[s1] and Ct[t1] (and correspondingly Cs'[s2] and Ct[t2]) are indeed observationally equivalent (noting that this relies upon a cross-language notion of observations), we can prove that Ct[t1] and Ct[t2] are observationally equivalent by instantiating our hypothesis ∀Cs. Cs[s1] ≈ Cs[s2] with Cs'. This tells us that Cs'[s1] ≈ Cs'[s2], and by transitivity, Ct[t1] ≈ Ct[t2].
-
It can be helpful to see it in a diagram, where the top line is given by the hypothesis (once instantiated with the source context we come up with by way of backtranslation) and coming up with the back-translation and showing that Ct and Cs' are equivalent is the hard part of the proof.
-
Cs'[s1] ≈ Cs'[s2]
- ≈ ≈
-Ct[t1] ? Ct[t2]
-
Concrete example of languages, compiler, & proof of full abstraction
-
Let’s make this concrete with an example. This will be presented some in english and some in the proof assistant Coq. This post isn’t an introduction to Coq; for that, see e.g., Bertot and Casteron’s Coq’Art, Chlipala’s CPDT, or Pierce et al’s Software Foundations.
-
Our source language is arithmetic expressions over integers with addition and subtraction:
-
e ::= n
- | e + e
- | e - e
-
This is written down in Coq as:
-
Inductive Expr : Set :=
- | Num : Z -> Expr
- | Plus : Expr -> Expr -> Expr
- | Minus : Expr -> Expr -> Expr.
-
Evaluation is standard (if you wanted to parse this, you would need to deal with left/right associativity, and probably add parenthesis to disambiguate, but we consider the point where you already have a tree structure, so it is unambiguous). We can write the evaluation function as:
-
Fixpoint eval_Expr (e : Expr) : Z :=
- match e with
- | Num n => n
- | Plus e1 e2 => eval_Expr e1 + eval_Expr e2
- | Minus e1 e2 => eval_Expr e1 - eval_Expr e2
- end.
-
Our target language is a stack machine which uses a stack of integers to evaluate the sequence of instructions. In addition to having instructions to add and subtract, our stack machine has an extra instruction: OpCount. This instruction returns how many operations remain on the stack machine, and it puts that integer on the top of the stack. This is the simplest abstraction I could think of that will provide an interesting case study for problems of full abstraction, and is a stand-in for both reflection (as it allows the program to inspect other parts of the program), and also somewhat of a proxy for execution time (remaining). Our stack machine requires that the stack be empty at the end of execution.
-
Inductive Op : Set :=
-| Push : Z -> Op
-| Add : Op
-| Sub : Op
-| OpCount : Op.
-
Let’s see the compiler and the evaluation function (note that we reverse the order when we pop values off the stack from when we put them on in the compiler).
-
Fixpoint compile_Expr (e : Expr) : list Op :=
- match e with
- | Num n => [Push n]
- | Plus e1 e2 => compile_Expr e1 ++ compile_Expr e2 ++ [Add]
- | Minus e1 e2 => compile_Expr e1 ++ compile_Expr e2 ++ [Sub]
- end.
-
-Fixpoint eval_Op (s : list Z) (ops : list Op) : option Z :=
- match (ops, s) with
- | ([], [n]) => Some n
- | (Push z :: rest, _) => eval_Op (z :: s) rest
- | (Add :: rest, n2 :: n1 :: ns) => eval_Op (n1 + n2 :: ns)%Z rest
- | (Sub :: rest, n2 :: n1 :: ns) => eval_Op (n1 - n2 :: ns)%Z rest
- | (OpCount :: rest, _) => eval_Op (Z.of_nat (length rest) :: s) rest
- | _ => None
- end.
-
We can prove a basic (whole program) compiler correctness result for this (for more detail on this type of result, see this post), where first we prove a general eval_step lemma and then use that to prove correctness (note: the hint and hint_rewrite tactics are from an experimental literatecoq library that adds support for proof-local hinting, which some might think is a hack but I think makes the proofs much more readable/maintainable).
-
Lemma eval_step : forall a : Expr, forall s : list Z, forall xs : list Op,
- eval_Op s (compile_Expr a ++ xs) = eval_Op (eval_Expr a :: s) xs.
-Proof.
- hint_rewrite List.app_assoc_reverse.
- induction a; intros; iauto; simpl;
- hint_rewrite IHa2, IHa1;
- iauto'.
-Qed.
-
-Theorem compiler_correctness : forall a : Expr,
- eval_Op [] (compile_Expr a) = Some (eval_Expr a).
-Proof.
- hint_rewrite eval_step.
- hint_simpl.
- induction a; iauto'.
-Qed.
-
Now, before we can state properties about equivalences, we need to define what we mean by equivalence for our source and target languages. Both produce no side effects, so the only observation is the end result. Thus, observational equivalence is pretty straightforward; it follows from evaluation:
But, we want to talk not just about whole programs, but about partial programs that can get linked with other parts to create whole programs. In order to do that, we create a new type of “evaluation context” for our Expr, that has a hole (typically written on paper as [·]). This is a program that is missing an expression, which must be filled into the hole. Given how simple our language is, any expression can be filled in to the hole and that will produce a valid program. We only want to have one hole per partial program, so in the cases for + and -, one branch must be a normal Expr (so it contains no hole), and the other can contain one hole. Our link_Expr function takes a context and an expression and fills in the hole.
-
Inductive ExprCtxt : Set :=
-| Hole : ExprCtxt
-| Plus1 : ExprCtxt -> Expr -> ExprCtxt
-| Plus2 : Expr -> ExprCtxt -> ExprCtxt
-| Minus1 : ExprCtxt -> Expr -> ExprCtxt
-| Minus2 : Expr -> ExprCtxt -> ExprCtxt.
-
-Fixpoint link_Expr (c : ExprCtxt) (e : Expr) : Expr :=
- match c with
- | Hole => e
- | Plus1 c' e' => Plus (link_Expr c' e) e'
- | Plus2 e' c' => Plus e' (link_Expr c' e)
- | Minus1 c' e' => Minus (link_Expr c' e) e'
- | Minus2 e' c' => Minus e' (link_Expr c' e)
- end.
-
For our stack machine, partial programs are much easier, since a program is just a list of Op, which means that any program can be extended by adding new Ops on either end (or inserting in the middle).
-
With ExprCtxt, we can now define “contextual equivalence” for our source language:
-
Definition ctxtequiv_Expr (e1 e2 : Expr) : Prop :=
- forall c : ExprCtxt, eval_Expr (link_Expr c e1) = eval_Expr (link_Expr c e2).
-
We can do the same with our target, simplifying slightly and saying that we will allow adding arbitrary Ops before and after, but not in the middle, of an existing sequence of Ops.
To prove our compiler fully abstract, remember we need to prove that it preserves and reflects equivalences. Since we already proved that it is correct, proving that it reflects equivalences should be relatively straightforward, so lets start there. The lemma we want is:
-
Lemma equivalence_reflection :
- forall e1 e2 : Expr,
- forall p1 p2 : list Op,
- forall comp1 : compile_Expr e1 = p1,
- forall comp2 : compile_Expr e2 = p2,
- forall eqtarget : ctxtequiv_Op p1 p2,
- ctxtequiv_Expr e1 e2.
-Proof.
- unfold ctxtequiv_Expr, ctxtequiv_Op in *.
- intros.
- induction c; simpl; try solve [hint_rewrite IHc; iauto];
- (* NOTE(dbp 2018-04-16): Only the base case, for Hole, remains *)
- [idtac].
- (* NOTE(dbp 2018-04-16): In the hole case, specialize the target ctxt equiv hypothesis to empty *)
- specialize (eqtarget [] []); simpl in eqtarget; repeat rewrite app_nil_r in eqtarget.
-
- (* NOTE(dbp 2018-04-16): At this point, we know e1 -> p1, e2 -> p2, & p1 ≈ p2,
- and want e1 ≈ e2, which follows from compiler correctness *)
- rewrite <- comp1 in eqtarget. rewrite <- comp2 in eqtarget.
- repeat rewrite compiler_correctness in eqtarget.
- inversion eqtarget.
- reflexivity.
-Qed.
-
This lemma is a little more involved, but not by much; we proceed by induction on the structure of the evaluation contexts, and in all but the case for Hole, the induction hypothesis gives us exactly what we need. In the base case, we need to appeal to the compiler_correctness lemma we proved earlier, but otherwise it follows easily.
-
So what about equivalence preservation? We can state the lemma quite easily:
But proving it is another matter. In fact, it’s not provable, because it’s not true. We can come up with a counter-example, using that OpCount instruction we (surreptitiously) added to our target language. These two expressions are contextually equivalent in our source language (should be obvious, but putting a proof):
But they are not contextually equivalent in the target; in particular, if we put the OpCount instruction before and then the Add instruction afterwards, the result will be the value plus the number of instructions it took to compute it:
The former evaluating to 6, while the latter evaluates to 4. This means that there is no way we are going to be able to prove equivalence preservation (as we have a counter-example!).
-
So what do we do? Well, this scenario is not uncommon, and it’s the reason why many, even correct, compilers are not fully abstract. It’s also related to why many of these compilers may still have security problems! The solution is to somehow protect the compiled code from having the equivalences disrupted. If this were a real machine, we might want to have some flag on instructions that meant that they should not be counted, and OpCount would just not return anything if it saw any of those (or would count them as 0). Alternately, we might give our target language a type system that is able to rule out linking with code that uses the OpCount instruction, or perhaps restricts how it can be used.
-
Because this is a blog-post sized example, and I wanted to keep the proofs as short as possible, and the unstructured and untyped nature of our target (which, indeed, is much less structured than our source language; the fact that the source is so well-structured is why our whole-program correctness result was so easy!) will mean the proofs get relatively complex (or require us to add various auxiliary definitions), so the solution I’m going to take is somewhat extreme. Rather than, say, restricting how OpCount is used, or even ruling out linking with OpCount, we’re going to highly restrict what we can link with. This is very artificial, and done entirely so that the proofs can fit into a few lines. In this case, rather than a list, we are going to allow one Op before and one Op after our compiled program, neither of which can be OpCount, and further, we still want the resulting program to be well-formed (i.e., no errors, only one number on stack at end), so either there should be nothing before and after, or there is a Push n before and either Add or Sub after. (You should be able to verify that no other combination of Op before or after will fulfill our requirement).
-
We can define these possible linking contexts and a helper to combine them with programs as the following:
-
Inductive OpCtxt : Set :=
-| PushAdd : Z -> OpCtxt
-| PushSub : Z -> OpCtxt
-| Empty : OpCtxt.
-
-Definition link_Op (c : OpCtxt) (p : list Op) : list Op :=
- match c with
- | PushAdd n => Push n :: p ++ [Add]
- | PushSub n => Push n :: p ++ [Sub]
- | Empty => p
- end.
-
Using that, we can redefine contextual equivalence for our target language, only permitting these contexts:
-
Definition ctxtequiv_Op (p1 p2 : list Op) : Prop :=
- forall c : OpCtxt, eval_Op [] (link_Op c p1) = eval_Op [] (link_Op c p2).
-
-
The only change to our proof of equivalence reflection is on one line, to change our specialization of the target contexts, now to the Empty context:
-
specialize (eqtarget Empty) (* Empty rather than [] [] *)
-
With that change, we now believe that our compiler, when linked against these restricted contexts, is indeed fully abstract. So let’s prove it. If you recall from earlier in this post, proving equivalence preservation means proving that the top line implies the bottom, in the following diagram:
-
Cs'[s1] ≈ Cs'[s2]
- ≈ ≈
-Ct[t1] ? Ct[t2]
-
In order to do that, we rely upon a backtranslation to get from Ct to Cs', where Ct is a target context, in this tiny example our restricted OpCtxt. We can write that backtranslation as:
-
Definition backtranslate (c : OpCtxt) : ExprCtxt :=
- match c with
- | PushAdd n => Plus2 (Num n) Hole
- | PushSub n => Minus2 (Num n) Hole
- | Empty => Hole
- end.
-
The second part of the proof is showing that the vertical equivalences in the diagram hold — that is, that if s1 is compiled to t1 and Ct is backtranslated to Cs' then Ct[t1] is equivalent to Cs'[s1]. We can state and prove that as the following lemma, which follows from straightforward case analysis on the structure of our target context and backtranslation (using our eval_step lemmas):
-
Lemma back_translation_equiv :
- forall c : OpCtxt,
- forall p : list Op,
- forall e : Expr,
- forall c' : ExprCtxt,
- compile_Expr e = p ->
- backtranslate c = c' ->
- eval_Op [] (link_Op c p) = Some (eval_Expr (link_Expr c' e)).
-Proof.
- hint_rewrite eval_step, eval_step'.
- intros.
- match goal with
- | [ c : OpCtxt |- _] => destruct c
- end;
- match goal with
- | [ H : backtranslate _ = _ |- _] => invert H
- end; simpl; iauto.
-Qed.
-
Once we have that lemma, we can prove equivalence preservation directly. We do this by doing case analysis on the target context we are given, backtranslating it and then using the lemma we just proved to get the equivalence that we need.
This was obviously a very tiny language and a very restrictive linker that only allowed very restrictive contexts, but the general shape of the proof is the same as that used in more realistic languages published in research conferences today!
-
So next time you see a result about a correct (or even hoped to be correct) compiler, ask if it is fully abstract! And if it’s not, are the violations of equivalences something that could be exploited? Or something that would invalidate optimizations?
As stated at the top of the post, all the code in this post is available at https://github.com/dbp/howtoprovefullabstraction. If you have any trouble getting it going, please open an issue on that repository and I’ll help figure it out with you.
-
-]]>
- Thu, 19 Apr 2018 00:00:00 UT
- http://dbp.io/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.html
- Daniel Patterson
-
-
- How to prove a compiler correct
- http://dbp.io/essays/2018-01-16-how-to-prove-a-compiler-correct.html
- How to prove a compiler correct
-
-
by Daniel Patterson on January 16, 2018
-
-
At POPL’18 (Principles of Programming Languages) last week, I ended up talking to Annie Cherkaev about her really cool DSL (domain specific language) SweetPea (which she presented at Off the Beaten Track 18, a workshop colocated with POPL), which is a “SAT-Sampler aided language for experimental design, targeted for Psychology & Neuroscience”. In particular, we were talking about software engineering, and the work that Annie was doing to test SweetPea and increase her confidence that the implementation is correct!
-
The topic of how exactly one goes about proving a compiler correct came up, and I realized that I couldn’t think of a high-level (but concrete) overview of what that might look like. Also, like many compilers, hers is implemented in Haskell, so it seemed like a good opportunity to try out the really cool work presented at the colocated conference CPP’18 (Certified Programs and Proofs) titled “Total Haskell is Reasonable Coq” by Spector-Zabusky, Breitner, Rizkallah, and Weirich. They have a tool (hs-to-coq) that extracts Coq definitions from (certain) terminating Haskell programs (of which at least small compilers hopefully qualify). There are certainly limitations to this approach (see Addendum at the bottom of the page for some discussion), but it seems very promising from an engineering perspective.
-
The intention of this post is twofold:
-
-
Show how to take a compiler (albeit a tiny one) that was built with no intention of verifying it and after the fact prove it correct. Part of the ability to do this in such a seamless way is the wonderful hs-to-coq tool mentioned above, though there is no reason in principle you couldn’t carry out this translation manually (in practice maintenance becomes an issue, hence realistic verified compilers relying on writing their implementations within theorem provers like Coq and then extracting executable versions automatically, at least in the past – possibly hs-to-coq could change this workflow).
-
Give a concrete example of proving compiler correctness. By necessity, this is a very simplified scenario without a lot of the subtleties that appear in real verification efforts (e.g., undefined behavior, multiple compiler passes, linking with code after compilation, etc). On the other hand, even this simplified scenario could cover many cases of DSLs, and understanding the subtleties that come up should be much easier once you understand the basic case!
-
-
The intended audience is: people who know what compilers are (and may have implemented them!) but aren’t sure what it means to prove one correct!
-
-
All the code for this post, along with instructions to get it running, is in the repository https://github.com/dbp/howtoproveacompiler. If you have any trouble getting it going, open an issue on that repository.
-
-
DSL & Compiler
-
To make this simple, my source language is arithmetic expressions with adding, subtraction, and multiplication. I represent this as an explicit data structure in Haskell:
And a program is an Arith. For example, the source expression “1 + (2 * 4)” is represented as Plus 1 (Times 2 4). The target of this is a sequence of instructions for a stack machine. The idea of the stack machine is that there is a stack of values that can be used by instructions. The target language expressions are:
-
dataStackOp=SNumInt
-|SPlus
-|SMinus
-|STimes
-
And a program is a [StackOp]. For example, the previous example “1 + (2 * 4)” could be represented as [SNum 1, SNum 2, SNum 4, STimes, SPlus]. The idea is that a number evaluates to pushing it onto the stack and plus/times evaluate by popping two numbers off the stack and pushing the sum/product respectively back on. But we can make this concrete by writing an eval function that takes an initial stack (which will probably be empty), a program, and either produces an integer (the top of the stack after all the instructions run) or an error (which, for debugging sake, is the state of the stack and rest of the program when it got stuck).
Now that we have our source and target language, and know how the target works, we can implement our compiler. Part of why this is a good small example is that the compiler is very simple!
The cases for plus/minus/times are the cases that are slightly non-obvious, because they can contain further recursive expressions, but if you think about what the eval function is doing, once the stack machine finishes evaluating everything that a2 compiled to, the number that the left branch evaluated to should be on the top of the stack. Then once it finishes evaluating what a1 compiles to the number that the right branch evaluated to should be on the top of the stack (the reversal is so that they are in the right order when popped off). This means that evaluating e.g. SPlus will put the sum on the top of the stack, as expected. That’s a pretty informal argument about correctness, but we’ll have a chance to get more formal later.
-
Formalizing
-
Now that we have a Haskell compiler, we want to prove it correct! So what do we do? First, we want to convert this to Coq using the hs-to-coq tool. There are full instructions at https://github.com/dbp/howtoproveacompiler, but the main command that will convert src/Compiler.hs to src/Compiler.v:
And open up src/Proofs.v using a Coq interactive mode (I use Proof General within Emacs; with Spacemacs, this is particularly easy: use the coq layer!).
-
Proving things
-
We now have a Coq version of our compiler, complete with our evaluation function. So we should be able to write down a theorem that we would like to prove. What should the theorem say? Well, there are various things you could prove, but the most basic theorem in compiler correctness says essentially that running the source program and the target program “does the same thing”. This is often stated as “semantics preservation” and is often formally proven by way of a backwards simulation: whatever the target program does, the source program also should do (for a much more thorough discussion of this, check out William Bowman’s blog post, What even is compiler correctness?). In languages with ambiguity (nondeterminism, undefined behavior, this becomes much more complicated, but in our setting, we would state it as:
-
Theorem (informal). For all source arith expressions A, if eval [] (compile A) produces integer N then evaluating A should produce the same number N.
-
The issue that’s immediately apparent is that we don’t actually have a way of directly evaluating the source expression. The only thing we can do with our source expression is compile it, but if we do that, any statement we get has the behavior of the compiler baked into it (so if the compiler is wrong, we will just be proving stuff about our wrong compiler).
-
More philosophically, what does it even mean that the compiler is wrong? For it to be wrong, there has to be some external specification (likely, just in our head at this point) about what it was supposed to do, or in this case, about the behavior of the source language that the compiler was supposed to faithfully preserve. To prove things formally, we need to write that behavior down.
-
So we should add this function to our Haskell source. In a non-trivial DSL, this may be a significant part of the formalization process, but it is also incredibly important, because this is the part where you are actually specifying exactly what the source DSL means (otherwise, the only “meaning” it has is whatever the compiler happens to do, bugs and all). In this example, we can write this function as:
And we can re-run hs-to-coq to get it added to our Coq development. We can now formally state the theorem we want to prove as:
-
Theorem compiler_correctness : forall a : Arith,
- eval nil (compile a) = Data.Either.Right (eval' a).
-
I’m going to sketch out how this proof went. Proving stuff can be complex, but this maybe gives a sense of some of the thinking that goes into it. To go further, you probably want to take a course if you can find one, or follow a book like:
If you were to prove this on paper, you would proceed by induction on the structure of the arithmetic expression, so let’s start that way. The base case goes away trivially and we can expand the case for plus using:
-
induction a; iauto; simpl.
-
We see (above the line is assumptions, below what you need to prove):
Which, if we look at it for a little while, we realize two things:
-
-
Our induction hypotheses really aren’t going to work, intuitively because of the Either — our program won’t produce Right results for the subtrees, so there probably won’t be a way to rely on these hypotheses.
-
On the other hand, what does look like a Lemma we should be able to prove has to do with evaluating a partial program. Rather than trying to induct on the entire statement, we instead try to prove that evaling a compiled term will result in the eval'd term on the top of the stack. This is an instance of a more general pattern – that often the toplevel statement that you want has too much specificity, and you need to instead prove something that is more general and then use it for the specific case. So here’s (a first attempt) at a Lemma we want to prove:
-
-
Lemma eval_step : forall a : Arith, forall xs : list StackOp,
- eval nil (compile a ++ xs) = eval (eval' a :: nil) xs.
-
This is more general, and again we start by inducting on a, expanding and eliminating the base case:
We need to reshuffle the list associativity and then we can rewrite using the first hypotheses:
-
rewrite List.app_assoc_reverse. rewrite IHa1.
-
But now there is a problem (this is common, hence going over it!). We want to use our second hypothesis. Once we do that, we can reduce based on the definition of eval and we’ll be done (with this case, but multiplication is the same). The issue is that IHa2 needs the stack to be empty, and the stack we now have (since we used IHa1) is eval' a1 :: nil, so it can’t be used:
The solution is to go back to what our Lemma statement said and generalize it now to arbitrary stacks (so in this process we’ve now generalized twice!), so that the inductive hypotheses are correspondingly stronger:
-
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
- eval s (compile a ++ xs) = eval (eval' a :: s) xs.
-
Now if we start the proof in the same way:
-
induction a; intros; simpl; iauto.
-
We run into an odd problem. We have a silly obligation:
-
match s with
-| nil => eval (i :: s) xs
-| (_ :: nil)%list => eval (i :: s) xs
-| (_ :: _ :: _)%list => eval (i :: s) xs
-end = eval (i :: s) xs
-
Which will go away once we break apart the list s and simplify (if you look carefully, it has the same thing in all three branches of the match). There are (at least) a couple approaches to this:
-
-
We could just do it manually: destruct s; simpl; eauto; destruct s; simpl; eauto. But it shows up multiple times in the proof, and that’s a mess and someone reading the proof script may be confused what is going on.
-
We could write a tactic for the same thing:
-
try match goal with
- |[l : list _ |- _ ] => solve [destruct l; simpl; eauto; destruct l; simpl; eauto]
- end.
-
This has the advantage that it doesn’t depend on the name, you can call it whenever (it won’t do anything if it isn’t able to discharge the goal), but where to call it is still somewhat messy (as it’ll be in the middle of the proofs). We could hint using this tactic (using Hint Extern) to have it handled automatically, but I generally dislike adding global hints for tactics (unless there is a very good reason!), as it can slow things down and make understanding why proofs worked more difficult.
-
We can also write lemmas for these. There are actually two cases that come up, and both are solved easily:
-
Lemma list_pointless_split : forall A B:Type, forall l : list A, forall x : B,
- match l with | nil => x | (_ :: _)%list => x end = x.
-Proof.
- destruct l; eauto.
-Qed.
-Lemma list_pointless_split' : forall A B:Type, forall l : list A, forall x : B,
- match l with | nil => x | (_ :: nil)%list => x | (_ :: _ :: _)%list => x end = x.
-Proof.
- destruct l; intros; eauto. destruct l; eauto.
-Qed.
-
In this style, we can then hint using these lemmas locally to where they are needed.
-
-
Now we know the proof should follow from list associativity, this pointless list splitting, and the inductive hypotheses. We can write this down formally (this relies on the literatecoq library, which is just a few tactics at this point) as:
-
Lemma eval_step : forall a : Arith, forall s : list Num.Int, forall xs : list StackOp,
- eval s (compile a ++ xs) = eval (eval' a :: s) xs.
-Proof.
- hint_rewrite List.app_assoc_reverse.
- hint_rewrite list_pointless_split, list_pointless_split'.
-
- induction a; intros; simpl; iauto;
- hint_rewrite IHa1, IHa2; iauto'.
-Qed.
-
Which says that we know that we will need the associativity lemma and these list splitting lemmas somewhere. Then we proceed by induction, handle the base case, and then use the inductive hypotheses to handle the rest.
-
We can then go back to our main theorem, and proceed in a similar style. We prove by induction, relying on the eval_step lemma, and in various places needing to simplify (for the observant reader, iauto and iauto' only differ in that iauto' does a deeper proof search).
We now have a proof that the compiler that we wrote in Haskell is correct, insofar as it preserves the meaning expressed in the source-level eval' function to the meaning in the eval function in the target. This isn’t, of course, the only theorem you could prove! Another one that would be interesting would be that no compiled program ever got stuck (i.e., never produces a Left error).
We instead wanted to take [Arith]. This would still work, and would result in the list of results stored on the stack (so probably you would want to change eval to print everything that was on the stack at the end, not just the top). If you wrote this compile:
You would get an error when you try to compile the output of hs-to-coq! Coq says that the compile function is not terminating!
-
This is good introduction into a (major) difference between Haskell and Coq: in Haskell, any term can run forever. For a programming language, this is an inconvenience, as you can end up with code that is perhaps difficult to debug if you didn’t want it to (it’s also useful if you happen to be writing a server that is supposed to run forever!). For a language intended to be used to prove things, this feature would be a non-starter, as it would make the logic unsound. The issue is that in Coq, (at a high level), a type is a theorem and the term that inhabits the type is a proof of that theorem. But in Haskell, you can write:
-
anything :: a
-anything = anything
-
i.e., for any type, you can provide a term with that type — that is, the term that simply never returns. If that were possible in Coq, you could prove any theorem, and the entire logic would be useless (or unsound, which technically means you can prove logical falsehood, but since falsehood allows you to prove anything, it’s the same thing).
-
Returning to this (only slightly contrived) program, it isn’t actually that our program runs forever (and if you do want to prove things about programs that do, you’ll need to do much more work!), just that Coq can’t tell that it doesn’t. In general, it’s not possible to tell this for sufficiently powerful languages (this is what the Halting problem says for Turing machines, and thus holds for anything with similar expressivity). What Coq relies on is that some argument is inductively defined (which we have: both lists and Arith expressions) and that all recursive calls are to structurally smaller parts of the arguments. If that holds, we are guaranteed to terminate, as inductive types cannot be infinite (note: unlike Haskell, Coq is not lazy, which is another difference, but we’ll ignore that). If we look at our recursive call, we called compile with [a1]. While a1 is structurally smaller, we put that inside a list and used that instead, which thus violates what Coq was expecting.
-
There are various ways around this (like adding another argument whose purpose is to track termination, or adding more sophisticated measurements), but there is another option: adding a helper function compile' that does what our original compile did: compiles a single Arith. The intuition that leads to trying this is that in this new compile we are decreasing on both the length of the list and the structure of the Arith, but we are trying to do both at the same time. By separating things out, we can eliminate the issue:
There are limitations to the approach outlined in this post. In particular, what hs-to-coq does is syntactically translate similar constructs from Haskell to Coq, but constructs that have similar syntax don’t necessarily have similar semantics. For example, data types in Haskell are lazy and thus infinite, whereas inductive types in Coq are definitely not infinite. This means that the proofs that you have made are about the version of the program as represented in Coq, not the original program. There are ways to make proofs about the precise semantics of a language (e.g., Arthur Charguéraud’s CFML), but on the other hand, program extraction (which is a core part of verified compilers like CompCert) has the same issue that the program being run has been converted via a similar process as hs-to-coq (from Coq to OCaml the distance is less than from Coq to Haskell, but in principle there are similar issues).
-
And yet, I think that hs-to-coq has a real practical use, in particular when you have an existing Haskell codebase that you want to verify. You likely will need to refactor it to have hs-to-coq work, but that refactoring can be done within Haskell, while the program continues to work (and your existing tests continue to pass, etc). Eventually, once you finish conversion, you may decide that it makes more sense to take the converted version as ground truth (thus, you run hs-to-coq and throw out the original, relying on extraction after that point for an executable), but being able to do this gradual migration (from full Haskell to essentially a Gallina-like dialect of Haskell) seems incredibly valuable.
-]]>
- Tue, 16 Jan 2018 00:00:00 UT
- http://dbp.io/essays/2018-01-16-how-to-prove-a-compiler-correct.html
- Daniel Patterson
-
-
- (Cheap) home backups
- http://dbp.io/essays/2018-01-01-home-backups.html
- (Cheap) home backups
-
-
by Daniel Patterson on January 1, 2018
-
-
Backing things up is important. Some stuff, like code that lives in repositories, may naturally end up in many places, so it perhaps is less important to explicitly back up. Other files, like photos, or personal documents, generally don’t have a natural redundant home, so they need some backup story, and relying on various online services is risky (what if they go out of business, “pivot”, etc), potentially time-consuming to keep track of (services for photos may not allow videos, or at least not full resolution ones, etc), limited in various ways (max file sizes, storage allotments, etc), not to mention bringing up serious privacy concerns. Different people need different things, but what I need, and have built (hence this post describing the system), fulfills the following requirements:
-
-
(Home) scalable – i.e., any reasonable amount of data that I could generate personally I should be able to dump in one place, and be confident that it won’t go away. What makes up the bulk is photos, some music, and some audio and video files. For me, this is currently about 1TB (+-0.5TB).
-
Cheap. I’m willing to pay about $100-200/year total (including hardware).
-
Simple. There has to be one place where I can dump files, it has to be simple enough to recover from complete failure of any given piece of hardware even if I haven’t touched it in a long time (because if it is working, I won’t have had to tweak it in months / years). Adding & organizing files should be doable without commandline familiarity, so it can serve my whole home.
-
Safe. Anything that’s not in my physical control should be encrypted.
-
Reasonably reliable. Redundancy across hardware, geographic locations, etc. This is obviously balanced with other concerns (in particular, 2 and 3)!
-
-
I’ve tried various solutions, but what I’ve ended up with seems to be working pretty well (most of it has been running for about a year; some parts are more recent, and a few have been running for much longer). It’s a combination of some cheap hardware, inexpensive cloud storage, and decent backup software.
-
Why not an off-the-shelf NAS?
-
In the past, I tried one (it was a Buffalo model). I wasn’t impressed by the software (which was hard to upgrade, install other stuff on it, maintain, etc), the power consumption (this was several years ago, but idle the two-drive system used over 30watts, which is the same power that my similarly aged quad core workstation uses when idle!). Also, a critical element of this system for me is that there is an off-site component, so getting that software on it is extremely important, and I’d rather have a well-supported linux computer to deal with rather than something esoteric. Obviously this depends in the particular NAS you get, but the system below is perfect for me. In particular, setting up and experimenting with the below was much cheaper than dropping hundreds more dollars on a new NAS that may not have worked any better than the old one, and once I had it working, there was certainly no point in going back!
-
Hardware
-
-
$70 - Raspberry Pi 3. This consumes very little power (a little over 1W without the disks, probably around 10W with them spinning, more like 3W when they are idling), takes up very little space, but seems plenty fast enough to act as a file server. That price includes a case, heat-sink, SD card, power adaptor, etc. If you had any of these things, you can probably get a cheaper kit (the single board itself is around $35). Note that you really want a heat-sink on the processor. I ran without it for a while (forgot to install it) and it would overheat and hard lock. It’s a tradeoff that they put a much faster processor in these than in prior generations – I think it’s worth it (it’s an amazingly capably computer for the size/price).
-
$75 - Three external USB SATA hard drive enclosures. You might be able to find these cheaper – the ones I got were metal, which seemed good in terms of heat dissipation, and have been running for a little over a year straight without a problem (note: this is actually one more than I’m using at any given time, to make it easier to rotate in new drives; BTRFS, which I’m using, allows you to just physically remove a drive and add a new one, but the preferred method is to have both attached, and issue a replace command. I’m not sure how much this matters, but for $25, I went with the extra enclosure).
-
$170 - Two 2TB WD Red SATA drives. These are actually recent upgrades – the server was been running on older 1TB Green drives (four and five years old respectively), but one of them started reporting failures (I would speculate the older of the two, but I didn’t check) so I replaced both. The cheaper blue drives probably would have been fine (the Greens that the Blues have replaced certainly have lasted well enough, running nearly 24/7 for years), but the “intended to run 24/7” Red ones were only $20 more each so I thought I might as well spring for them.
-
-
Cloud
-
-
Backblaze B2. This seems to be the cheapest storage that scales down to storing nothing. At my usage (0.5-2TB) it costs about $3-10/month, which is a good amount, and given that it is one of three copies (the other two being on the two hard drives I have attached to the Pi) I’m not worried about the missing reliability vs for example Amazon S3 (B2 gives 8 9s of durability vs S3 at 11 9s, but to get that S3 charges you 3-4x as much).
-
-
Software
-
-
The Raspberry Pi is running Raspbian (Debian distributed for the Raspberry Pi). This seems to be the best supported Linux distribution, and I’ve used Debian on servers & desktops for maybe 10 years now, so it’s a no-brainer. The external hard drives are a RAID1 with BTRFS. If I were doing it from scratch, I would look into ZFS, but I’ve been migrating this same data over different drives and home servers (on the same file system) since ZFS was essentially totally experimental on Linux, and on Linux, for RAID1, BTRFS seems totally stable (people do not say the same thing about RAID5/6).
-
The point is, you should use an advanced file system in RAID1 (on ZFS you could go higher, but I prefer simplicity and the power consumption of having just two drives, and can afford to pay for the wasted drive space) that can detect&correct errors, lets you swap in new drives and migrate out old ones, migrate to larger drives, etc. This is essentially the feature-set that both ZFS and BTRFS have, but the former is considered to be more stable and the latter has been in linux for longer.
-
For backups, I’m using Duplicacy, which is annoyingly similarly named to a much older backup tool called Duplicity (there also seems to be another tool called Duplicati, which I haven’t tried. Couldn’t backup tools get more creative with names? How about calling a tool “albatross”?). It’s also annoyingly not free software, but for personal use, the command-line version (which is the only version that I would be using) is free-as-in-beer. I actually settled on this after trying and failing to use (actually open-source) competitors:
-
First, I tried the aforementioned Duplicity (using its friendly frontend duply). I actually was able to make some full backups (the full size of the archive was around 600GB), but then it started erroring out because it would out-of-memory when trying to unpack the file lists. The backup format of Duplicity is not super efficient, but it is very simple (which was appealing – just tar files and various indexes with lists of files). Unfortunately, some operations need memory that seems to scale with the size of the currently backed up archive, which is a non-starter for my little server with 1GB of ram (and in general shouldn’t be acceptable for backup software, but…)
-
I next tried a newer option, restic. This has a more efficient backup format, but also had the same problem of running out of memory, though it wasn’t even able to make a backup (though that was probably a good thing, as I wasted less time!). They are aware of it (see, e.g., this issue, so maybe at some point it’ll be an option, but that issue is almost two years old so ho hum…).
-
So finally I went with the bizarrely sort-of-but-not-really open-source option, Duplicacy. I found other people talking about running it on a Raspberry Pi, and it seemed like the primary place where memory consumption could become a problem was the number of threads used to upload, which thankfully is an argument. I settled on 16 and it seems to work fine (i.e., duplicacy backup -stats -threads 16) – the memory consumption seems to hover below 60%, which leaves a very healthy buffer for anything else that’s going on (or periodic little jumps), and regardless, more threads don’t seem to get it to work faster.
-
The documentation on how to use the command-line version is a little sparse (there is a GUI version that costs money), but once I figured out that to configure it to connect automatically to my B2 account I needed a file .duplicacy/preferences that looked like (see keys section; the rest will probably be written out for you if you run duplicacy first; alternatively, just put this file in place and everything will be set up):
Everything else was pretty much smooth sailing (though, as per usual, the initial backup is quite slow. The Raspberry Pi 3 processor is certainly much faster than previous Raspberry Pis, and fast enough for this purpose, but it definitely still has to work hard! And my residential cable upstream is not all that impressive. After a couple days though, the initial backup will complete!).
-
Periodic backups run with the same command, and intermediate ones can be pruned away as well (I use duplicacy prune -keep 30:180 -keep 7:30 -keep 1:1, run after my daily backup, to keep monthly backups beyond 6 months, weekly beyond 1 month, and daily below that. I have a cron job that runs the backup daily, so the last is not strictly necessary, but if I do manual backups it’ll clean them up over time. Since I pretty much never delete files that are put into this archive, pruning isn’t really about saving space, as barring some error on the server the latest backup should contain every file, but it is nice to have the list of snapshots be more manageable).
-
To restore from total loss of the Pi, you just need to put the config file above into .duplicacy/preferences relative to the current directory on any machine and you can run duplicacy restore. You can also grab individual files (which I tested on a different machine; I haven’t tested restoring a full backup) by creating the above mentioned file and then running duplicacy list -files -r N (where N is the snapshot you want to get the file from; run duplicacy list to find which one you want) and then to get a file duplicacy cat -r N path/to/file > where/to/put/it.
-
I’m still working out how to detect errors in the hard drives automatically. I can see them manually by running sudo btrfs device stats /mntpoint (which I do periodically). When this shows that a drive is failing (i.e., read/write errors), add a new drive to the spare enclosure, format it, and then run sudo btrfs replace start -f N /dev/sdX /mntpoint where N is the number of the device that is failing (when you run sudo btrfs fi show /mntpoint) and /dev/sdX is the new drive. To check for and correct errors in the file system (not the underlying drive), run sudo btrfs scrub start /mntpoint. This will run in the background; if you care you can check the status with sudo btrfs scrub status /mntpoint. Based on recommendations, I have the scrub process run monthly via a cron job.
-
If you want to expand the capacity of the disks, replace the drives as if they failed (see previous bullet) and then run sudo btrfs fi resize N:max /mntpoint for each N (run sudo btrfs fi show to see what your dev ids are). When you replace them, they stay at the same capacity – this resize expands the filesystem to the full device. As I mentioned earlier, I did this to replace 1TB WD Green drives with 2TB WD Red drives (so I replaced one, then the next, then did the resize on both).
-
For tech people (i.e., who are comfortable with scp), this setup is enough – just get files onto the server, into the right directory, and it’ll be all set. For less tech-savvy people, you can install samba on the raspberry pi and then set up a share like the following (put this at the bottom of /etc/samba/smb.conf):
-
[sharename]
-comment = Descriptive name
-path = /mntpoint
-browseable = yes
-writeable = yes
-read only = no
-only guest = no
-create mask = 0777
-directory mask = 0777
-public = yes
-guest ok = no
-
Then set pis password with sudo smbpasswd -i pi. Now restart the service with sudo /etc/init.d/sambda restart and then from a mac (and probably windows; not sure how as I don’t have any in my house) you can connect to the pi with the “Connect to Server” interface, connect as the pi user with the password you set, and see the share. Note that to be able to make changes, the /mntpoint (and what’s in it) needs to be writeable by the pi user. You can also use a different user, set up samba differently, etc.
-
-
Summary
-
The system described above runs 24/7 in my home. It cost $325 in hardware (which, if you want to skip the extra USB enclosure to start and use WD Blue drives rather than Red ones you can cut $65 – i.e., $260 total), $1/month in electricity (I haven’t measured this carefully, but that’s what 10W costs where I live) and currently costs about $3/month in cloud storage, though that will go up over time, so to be more fair let’s say $5/month. Assuming no hardware replacements for three years (which is the warrantee on the hard drives I have, so a decent estimate), the total cost over that time is $325 + $54 + $170 = $549, or around $180 per year, which is squarely in the range that I wanted.
-]]>
- Mon, 01 Jan 2018 00:00:00 UT
- http://dbp.io/essays/2018-01-01-home-backups.html
- Daniel Patterson
-
-
- Why test in Haskell?
- http://dbp.io/essays/2014-10-05-why-test-in-haskell.html
- Why test in Haskell?
-
-
by Daniel Patterson on October 5, 2014
-
-
Every so often, the question comes up, should you test in Haskell, and if so, how should you do it?
-
Most people agree that you should test pure, especially complicated, algorithmic code. Quickcheck1 is a great way to do this, and most Haskellers have internalized this (Quickcheck was invented here, so it must provide value!). What’s less clear (or at least, more debated!) is whether you should be testing monadic code, glue code, and code that just isn’t all that complicated.
-
Quickcheck?
-
A lot Haskell I’m writing these days is with the web framework Snap, and web handlers often have the type Handler App App () - where Handler is a web monad (giving access to request information, and the ability to write response data), and App indicates access to application specific state (like database connections, templates, etc).
-
So the inputs (ie, how to run this action) include any HTTP request and any application state, and the only outputs are side effects (as all it returns is unit). Using Quickcheck here is… challenging. You could restrict the generated requests to have the right URL, and even have the right query parameters, but since the query parameters are just text, if they were supposed to be more structured (like an interger), the chance of actually generating text that was just a number is pretty low… And then if the number were supposed to be the id of an element in the database….
-
But assume that we restrict it so that it’s only generating ids for elements in the database, what are the properties we are asserting? Let’s say that the handler looked up the element, and rendered it on the page. So then we want to assert something about the content of the response (which is wrapped up in the Handler monad). But maybe it should also increment a view count in the database. And assuming that we wrote all these into properties, what are the elements in the database that it is choosing among? And in some senses we’ve now restricted too much, because we may want to see what the behavior is like for slightly invalid inputs. Say, integer id’s that don’t correspond to elements in the database. This is all certainly possible, and may be worth doing, but it seems pretty difficult. Which is totally different from the experience of testing nice pure functions!
-
Let’s try to tease out a little bit of why testing this kind of code with Quickcheck is hard. One problem is that the input space, as determined by the type, is massive. And for most of the possible inputs, the result should be some version of a no-op. Another problem is the dependence on state, as the possible inputs are contingent on external state, and the outputs are primarily changes to state, each of which, again, is a massive space.
-
But having massive input and output spaces is not necessarily a reason not to be using randomized testing. Indeed, this is exactly the kind of thing that fuzz-testing of web browsers, for example, has done with great effect2. The problem in this case is that the size of the input and output space is not at all in proportion to the complexity of the code. If we were writing an HTTP server, we may indeed want to be generating random requests, throwing them at the web server, and making sure it was generating well-formed responses (404s being perfectly fine).
-
Not that complicated…
-
But we’re just writing a little bit of glue code. Which isn’t that complicated. And can be tested manually pretty easily. And may change rapidly.
-
Which means spending a lot of time setting up property based tests (which in these sorts of cases are necessarily going to be quite a bit more complicated than quintessential Quickcheck examples like showing that reverse . reverse = id).
-
But you’re still writing code that has types that massively underspecify it’s behavior. Which should make you nervous, at least a little. Now granted, you should keep that underspecified code as thin as possible - validate the query parameters, the URL, etc, and then call a function with a type that much more clearly specifies what it is supposed to do. For example (this is coming from Snap code, with some details ellided, but should be reasonably easy to understand):
-
f :: Handler App App ()
-f = route [("/foo/:id", do i <- read <$> getParam "id"
- res <- lookupAndRenderFoo (FooId i)
- writeText res)]
-
-lookupAndRenderFoo :: FooId -> Handler App App Text
-lookupAndRenderFoo = undefined
-
And certainly, this is a good pattern to use. We went from a function that had as input space any HTTP request (and any application specific state), and as output any HTTP response (as well as any side effects in the Handler monad) and split it into two functions. One still has the same input and output as before, but is very short, and the other is a function with input the id of a specific element, and as output Text, but still can perform any side effects in and read any data from within the Handler monad.
-
Increasing complexity?
-
We could split that further, and write a function with type Foo -> Text, but we would start getting in our own way, as if we wanted to render with a template, the templates exist within the context of the Handler monad, so we would have to look up a template first, and we would have ended up creating many new functions, as well as a bit of extra complexity, all for the sake of splitting our code up into layers, where the last one is pure and easy to test (the rest still have all the same problems).
-
Depending on how complex that last layer is, this may totally be worth it. If your code is dealing with human lives or livelihoods, by all means, isolate that code into as small a portion as possible and test the hell out of it. But it makes coding harder, and makes you move slower. And if you want to change the logic, you may now have to change many different functions, instead of just one.
-
Which is where we come to the argument that testing slows things down, and that for rapidly changing code, it just doesn’t matter.
-
What about just not sampling?
-
But if we step back a bit, we realize that what Quickcheck is trying to do is to sample representatively (well, with a bias towards edge cases) over the type of the input. And it’s easy to see why that’s appealing, as it gives you reasonable confidence that any use of the function behaves as desired. But if we forget about that, as we already know that our types completely underspecify the behavior, we realize all that we really care about is that the code does what we think it should do on a few example cases. That’s what we were going to manually verify after writing the code anyway.
-
Which is easy to test. With Snap, I’d write some tests for the above snippet like3:
-
do f <- create ()
- let i = show . unFooId . fooId $ f
- get ("/foo/" ++ i) >>= should200
- get ("/foo/" ++ i) >>= shouldHaveText (fooDescription f)
- get ("/foo/" ++ show (1 + i)) >>= should404
-
And call it a day. This misses vasts swaths of inputs, and asserts very little about the outputs, but it also tells you a huge amount more about the correctness of the code than the fact that it typechecked did. And as you iterate and refactor your application, you get the assurance that this handler:
-
-
still exists.
-
still looks up the element from the database.
-
still puts the description somewhere on the page.
-
doesn’t work for ids that don’t correspond to elements in the database.
-
-
Which seems like a lot of assurance for a very small amount of work. And if your application is fast moving, this benefits you even more, as the faster you move, the more likely you are to break things (at least, that’s always been my experience!). If you do decide to rewrite this handler, fixing these tests is going to take a tiny amount of time (probably less time than you spend manually confirming that the change worked).
-
Why this should be expected to work.
-
To take it a little further, and perhaps justify from a somewhat theoretical point of view why these sorts of tests are so valuable, consider all possible implementations of any function (or monadic action). The possible implementations with the given type are a subset of all the possible implementations, but still potentially a pretty large one (our example of a web handler certainly has this property).
-
This perspective gives us some intuition on why it is much easier to test simple, pure functions. There are only four possible implementations of a Bool -> Bool function, so testing not via sampling seems pretty tractable. To go even further, we get into the territory of “Theorems for Free”4, where there is only one implementation for an (a,b) -> a function, so testing fst is pointless.
-
But returning to our case of massive spaces of well-typed implementations: A single test, like one of the above, corresponds to another subset of all the possible implementations. For example, the first test corresponds to the subset that return success when passed the given url via GET request. Since we’re in Haskell, we also get a guarantee that the set of implementations that fulfill the test is a (non)strict subset of the set of implementations that fulfill the type, as if this were not the case, our test case wouldn’t type check. The problem with the first test, of course, is that there are all sorts of bogus implementations that fulfill it. For example, the handler that always returns success would match that test.
-
But even still, it is a strict subset of the implementations that fulfill the type (for example, the handler that always returns 404 is not in this set), so we’re guaranteed to have improved the chance that our code is correct, even with such a weak test (granted, it actually may not be that weak of a test - in one project, I have a menu generated from a data structure in code, and a test that iterates through all elements of the menu, checking that hitting each url results in a 200. And this has caught many refactoring problems!).
-
Where we really start to benefit is as we add a few more tests. The second test shows that the handler must somehow get an element out of the database (provided our create () test function is creating relatively unique field names), which is another (strict) subset of the set of implementations that fulfill the type. And we now know that our implementation must be somewhere in the intersection of these two subsets.
-
It shouldn’t be hard to convince yourself that through the process of just writing a few (well chosen) tests you can vastly reduce the possibility of writing incorrect implementations. Which, when we are writing relatively straightforward code, will probably be good enough to ensure that the code is actually correct. And will continue to verify that as the code evolves. Pretty good for a couple lines of code.
-
-
-
-
For those who haven’t used Quickcheck, it allows you to specify properties that a function should satisfy, and possibly a way to generate random values of the input type (if your input is a standard type, it already knows how to do this), and it will generate some number of inputs and verify that the property holds for all of them.↩︎
This syntax is based on the hspec-snap package, which I chose because I’m familiar with it (and wrote it). The create line is from some not-yet-integrated-or-released, at least at time of publishing, work to add factory support to the library (sorry!). With that said, the advice should hold no matter what you’re doing.↩︎
-
-
-]]>
- Sun, 05 Oct 2014 00:00:00 UT
- http://dbp.io/essays/2014-10-05-why-test-in-haskell.html
- Daniel Patterson
-
-
- A Hacker's Replacement for GMail
- http://dbp.io/essays/2013-06-29-hackers-replacement-for-gmail.html
- A Hacker's Replacement for GMail
-
-
by Daniel Patterson on June 29, 2013
-
-
Note: Since writing this I’ve replaced Exim with Postfix and Courier with Dovecot. This is outlined in the Addendum, but the main text is unchanged. Please read the whole guide before starting, as you can skip some of the steps and go straight to the final system.
-
Motivation
-
I reluctantly switched to GMail about six months ago, after using many so-called “replacements for GMail” (the last of which was Fastmail). All of them were missing one or more features that I require of email:
-
-
Access to the same email on multiple machines (but, these can all be machines I control).
-
Access to important email on my phone (Android). Sophisticated access not important - just a high-tech pager.
-
Ability to organize messages by threads.
-
Ability to categorize messages by tags (folders are not sufficient).
-
Good search functionality.
-
-
But, while GMail has all of these things, there were nagging reasons why I still wanted an alternative: handing an advertising company most of my personal and professional correspondance seems like a bad idea, having no (meaningful) way to either sign or encrypt email is unfortunate, and while it isn’t a true deal-breaker, having lightweight programmatic access to my email is a really nice thing (you can get a really rough approximation of this with the RSS feeds GMail provides). Furthermore, I’d be happy if I only get important email on my phone (ie, I want a whitelist on the phone - unexpected email is not something that I need to respond to all the time, and this allows me to elevate the notification for these messages, as they truly are important).
-
Over the past several months, I gradually put together a mail system that provides all the required features, as well as the three bonuses (encryption, easy programmatic access, and phone whitelisting). I’m describing it as a “Hacker’s Replacement for GMail” as opposed to just a “Replacement for GMail” because it involves a good deal of familiarity with Unix (or at least, to set up and debug the whole system it did. Perhaps following along is easier). But, the end result is powerful enough that for me, it is worth it. I finally switched over to using it primarily recently, confirming that all works as expected. I wanted to share the instructions in case they prove useful to someone else setting up a similar system.
-
This is somewhere between an outline and a HOWTO. I’ve organized it roughly in order of how I set things up, but some of the parts are more sketches than detailed instructions - supplement it with normal documentation. Most are based on notes from things as I did them, only a few parts were reconstructed. In general, I try to highlight the parts that were difficult / undocumented, and gloss over stuff that should be easy (and/or point to detailed docs). Without further ado:
-
Overall Design
-
-
Debian GNU/Linux as mail server operating system (both Linux and Mac as clients, though Windows should be doable)
Mail is received by the mail server and put in a Archive subdirectory which is not configured for push in K9-Mail. The mail is processed and tagged by afew, and any messages with the tag “important” are moved into the Important subdirectory. This directory is set up for push in K9-Mail, so I get all important email right away. No further tagging can be done through the mobile device, but that wasn’t a requirement. read/unread status will be synced two-way to notmuch, which is important.
-
Step By Step Instructions
-
-
The first and most important part is having a server. I’ve been really happy with VPSes I have from Digital Ocean (warning: that’s a referral link. Here’s one without.) - they provide big-enough VPSes for email and a simple website for $5/month. There are also many other providers. The important thing is to get a server, if you don’t already have one.
-
The next thing you’ll need is a domain name. You can use a subdomain of one you already have, but the simplest thing is to just get a new one. This is $10-15/year. Once you have it, you want to set a few records (these are set in the “Zone File”, and should be easy to set up through the online control panel of whatever registrar you used):
-
-
A mydomain.com. IP.ADDR.OF.SERVER (mydomain.com. might be written @)
-MX 10 mydomain.com.
-
This sets the domain to point to your server, and sets the mail record to point to that domain name. You will also need to set up a PTR record, or reverse DNS. If you got the server through Digital Ocean, you can set up the DNS records through them, and they allow you to set the PTR record for each server easily. Whereever you set it up, it should point at mydomain.com. (Note trailing period. Otherwise it will resolve to mydomain.com.mydomain.com - not what you want!).
-
-
Now set up the mail server itself. I use Debian, but it shouldn’t be terribly different with other distributions (but you should follow their instructions, not the ones I link to here, because I’m sure there are specifics that are dependent on how Debian sets things up). Since Debian uses Exim4 by default, I used that, and set up Courier as an IMAP server. I followed these instructions: blog.edseek.com/~jasonb/articles/exim4_courier/ (sections 2, 3, and 4). The only important thing I had to change was to force the hostname, by finding the line it /etc/exim4/exim4.conf.template that looks like:
-
-
.ifdef MAIN_HARDCODE_PRIMARY_HOSTNAME
-
And adding above it, MAIN_HARDCODE_PRIMARY_HOSTNAME = mydomain.com (no trailing period). This is so that the header that the mail server displays matches the domain. If this isn’t the case, some mail servers won’t deliver messages. At this point, you can test the mail server by sending yourself emails, using the swaks tool, or running it through an online testing tool like MX Toolbox
-
The last important thing is to set up spam filtering. When using a big email provider that spends a lot of effort filtering spam (and has huge data sets to do it), it’s easy to forget how much spam is actually sent. But, fortunately open source software is also capable of eliminating it. To set Spamassassin up, I generally followed the documentation on the debian wiki. I changed the last part of the configuration so that instead of changing the subject for spam messages to have “***SPAM***”, it adds the following header:
-
add_header = X-Spam-Flag: YES
-
This is the header that the default spam filter from afew will look for and tag messages as spam with. Once messages are tagged as spam, they won’t show up in searches, won’t ever end up in your inbox, etc. On the other hand, they aren’t ever deleted, so if something does end up there, you can always find it (you just have to use notmuch search with the --exclude=false parameter).
-
That sets up basic Spamassassin, which works quite well. To make it work even better, we’ll install Pyzor, which is a service for collaborative spam filtering (sort of an open source system that gets you similar behavior to what GMail can do by having access to so many people’s email). It works by constructing a digest of the message and hashing it, and then sending that hash to a server to see if anyone has marked it as spam.
-
Install pyzor with aptitude install pyzor, then run pyzor discover (as root), and at least on my system, I needed to run chmod a+r /etc/mail/spamassassin/servers (as root) in order to have it work (the following test command would report permission denied on that file if I didn’t). Now restart spamassassin (/etc/init.d/spamassassin restart) and test that it’s working, by running:
According to the documentation, this is expected, because “test” is not a valid message.
-
-
Now we want to set up our delivery. Create a .forward file in the home directory of the account on the server that is going to recieve mail. It should contain
-
-
# Exim filter
-
-save Maildir/.Archive/
-
What this does is put all mail that is recieved into the Archive subdirectory (the dots are convention of the version of the Maildir format that Courier-IMAP uses).
-
-
Next, we want to set up notmuch. You can install it and the python bindings (needed by afew) with:
-
-
aptitude install notmuch python-notmuch
-
-
Run notmuch setup and put in your name, email, and make sure that the directory to your email archive is “/home/YOURUSER/Maildir”. Run notmuch new to have it create the directories and, if you tested the mail server by sending yourself messages, import those initial messages.
-
Install afew from github.com/teythoon/afew. You can start with the default configuration, and then add filters that will add the tag ‘important’, as well as any other automatic tagging you want to have. I commented out the ClassifyingFilter because it wasn’t working - and I wasn’t convinced I wanted it, so didn’t bother to figure out how te get it to work.
-
-
Some simple filters look like:
-
[Filter.0]
-message = messages from someone
-query = from:someone.important@email.com
-tags = +important
-[Filter.1]
-message = messages I don't care about
-query = subject:Deal
-tags = -unread +deals
-
For the [MailMover] section, you want the configuration to look like:
-
[MailMover]
-folders = Archive Important
-max_age = 15
-
-# rules
-Archive = 'tag:important AND NOT tag:spam':.Important
-Important = 'NOT tag:important':.Archive 'tag:spam':.Archive
-
This says to take anything in Archive with the important tag and put it in important (but never spam). Note that the folders we are moving to are prefixed with a dot, but the names of the folders aren’t. Now we need to set everything up to run automatically.
-
-
We are going to use inotify, and specifically the tool incron, to watch for changes in our .Archive inbox and add files to the database, tag them, and move those that should be moved to .Important. On Debian, you can obtain incron with:
-
-
aptitude install incron
-
Now edit your incrontab (similar to crontab) with incrontab -e and put an entry like:
This says that we want to watch for IN_MOVED_TO events, we don’t want to listen while the script is running (if something goes wrong with your importing script, you could cause an infinite spawning of processes, which will take down the server). If a message is delivered while the script is running, it might not get picked up until the next run, but for me that was fine (you may want to eliminate the IN_NO_LOOP option and see if it actually causes loops. In previous configurations, I crashed my server twice through process spawning loops, and didn’t want to do it again while debugging). When IN_MOVED_TO occurs, we call a script we’ve written. You can obviously put this anywhere, just make it executable:
It is intentionally being very quiet because output from cron jobs will trigger emails… and thus if there were a mistake, we could be in infinite loop land again. This means you should make sure the commands are working (ie, there aren’t mistakes in your config files), because you won’t see any debug output from them when they are run through this script.
-
-
Now let’s set up the mobile client. I’m not sure of a good way to do this on iOS (aside from just manually checking the Important folder), but perhaps a motivated person could figure it out. Since I have an Android phone, it wasn’t an issue. On Android, install K9-Mail, and set up your account with the incoming / outgoing mail server to be just ‘mydomain.com’. Click on the account, and it will show just Inbox (not helpful). Hit the menu button, then click folders, and check “display all folders”. Now hit the menu again and click folders and hit “refresh folders”.
-
-
Provided at least one message has been put into Important and Archive, those should both show up now. Open the folder ‘Important’ and use the settings to enable push for it. Also add it to the Unified Inbox. Similarly, disable push on the Inbox (this latter doesn’t really matter, because we never deliver messages to the inbox). If you have trouble finding these settings (which I did for a while), note that the settings that are available are contingent upon the screen you are on. The folders settings only exist when you are looking at the list of folders (not the unified inbox / list of accounts, and not the contents of a folder).
-
-
Finally, the desktop client. I’m using the emacs client, because I spend most of my time inside emacs, but there are several other clients - one for vim, one called ‘bower’ that is curses based (that I’ve used before, but is less featureful than the emacs one), and a few others. alot, a python client, won’t work, because it assumes that the notmuch database is local (which is a really stupid assumption). The rest just assume that notmuch is in the path. This means that you can follow the instructions here: notmuchmail.org/remoteusage to have the desktop use the mail database on the server. To test, run notmuch count on your local machine, and it should return the same thing (the total number of messages) as it does on the mail server.
-
-
Once this is working, install notmuch locally, so that you get the emacs bindings (or, just download the source and put the contents of the emacs folder somewhere and include it in your .emacs). You should now be able to run M-x notmuch in emacs and get to your inbox. Setting up mail sending is a little trickier - most of the documentation I found didn’t work!
-
The first thing to do, in case your ISP is like mine and blocks port 25, is to change the default listening port for the server. Open up /etc/default/exim4 and set SMTPLISTENEROPTIONS equal to -oX 25:587 -oP /var/run/exim4/exim.pid. This will have it listen on both 25 and 587.
-
Next, set up emacs to use your mail server to send mail, and to load notmuch. This incantation in your .emacs should do the trick:
-
;; If you opted to just stick the elisp files somewhere, add that path here:
-;; (add-to-list 'load-path "~/path/folder/with/emacs-notmuch")
-(require 'notmuch)
-(setq smtpmail-starttls-credentials '(("mydomain.com" 587 nil nil))
- smtpmail-auth-credentials (expand-file-name "~/.authinfo")
- smtpmail-default-smtp-server "mydomain.com"
- smtpmail-smtp-server "mydomain.com"
- smtpmail-smtp-service 587)
-(require 'smtpmail)
-(setq message-send-mail-function 'smtpmail-send-it)
-(require 'starttls)
-
Now eval your .emacs (or restart emacs), and you are almost ready to send mail.
-
You just need to put a line like this into ~/.authinfo:
-
machine mydomain.com login MYUSERNAME password MYPASSWORD port 587
-
With appropriate permissions (chmod 600 ~/.authinfo).
-
Now you can test this by typing C-x m or M-x notmuch and then from there, hit the ‘m’ key - both of these open the composition window. Type a message and who it is to, and then type C-c C-c to send it. It should take a second and then say it was sent at the bottom of the window.
-
This should work as-is on Linux. Another machine I sometimes use is a mac, and things are a little more complicated. The main problem is that to send mail, we need starttls. You can install gnutls through Homebrew, Fink, or Macports, but the next problem is that if you are using Emacs installed from emacsformacosx.com (and thus it is a graphical application), it is not started from a shell, which means it doesn’t have the same path, and thus doesn’t know how to find gnutls. To fix this problem (which is more general), you can install a tiny Emacs package called exec-path-from-shell (this requires Emacs 24, which you should use - then M-x package-install) that interrogates a shell about what the path should be. Then, we just have to tell it to use gnutls and all should work. We can do this all in a platform specific way (so it won’t run on other platforms):
Address lookup. It’s really nice to have an address book based on messages in your mailbox. An easy way to do this is to install addrlookup: get the source from http://github.com/spaetz/vala-notmuch/raw/static-sources/src/addrlookup.c, build with
-
-
cc -o addrlookup addrlookup.c `pkg-config --cflags --libs gobject-2.0` -lnotmuch
-
and move the resulting binary into your path (all of this on your server), and then create a similar wrapper as for notmuch:
Now if you hit “TAB” after you start typing in an address, it will prompt you with completions (use up/down arrow to move between, hit enter to select).
-
Conclusion
-
Congratulations! You now have a mail system that is more powerful than GMail and completely controlled by you. And there is a lot more you can do. For example, to enable encryption (to start, just signing emails), install gnupg, create a key and associate it with your email address, and add the following line to your .emacs and all messages will be signed by default (it adds a line in the message that when you send it causes emacs to sign the email. Note that this line must be the first line, so add your message below it):
An unfortunate current limitation is that the keys are checked by the notmuch commandline, so you need to install public keys on the server. This is fine, except that the emacs client installs them locally when you click on an unknown key (hit $ when viewing a message to see the signatures). So, at least for now, you have to manually add keys to the server with gpg --recv-key KEYID before they will show up as verified on the client (signing/encrypting still works, because that is done locally). Hopefully this will be fixed soon.
-
Added July 9th, 2013:
-
Addendum
-
Among the large amount of feedback I received on this post, many people recommended that I use Postfix and Dovecot over Exim and Courier. Postfix chosen because of security (Exim has a less than stellar history), and dovecot because it is simpler and faster than Courier (and more importantly, combined with Postfix frequently). Security is really important to me (as I want this system to be easy to mantain), so I decided to switch it. Since I’m not doing anything particularly complicated with the mail server / IMAP, the conversion was relatively straightforward. For people reading this, I’d suggest just doing this from the start (and substitute for the parts setting up Exim / Courier), but if you’ve already followed the instructions (as I have), here is what you should do to change. Note that I have gotten much of this information from guides at syslog.tv, modified as needed.
-
-
Install postfix and dovecot with (accept the replacement policy):
:0 c
-.Archive/
-
-:0
-| /usr/local/bin/my-notmuch-new.sh
-
This says to copy the message to the archive and then run my-notmuch-new.sh (which is a shell script that used to be called by incron). Technically it pipes the message to the script, but the script ignores standard in, so it is equivalent to just calling the script. Now fix the ownership:
-
chmod 600 .procmailrc
-
Remove incron, which we aren’t using anymore.
-
sudo aptitude remove incron
-
-
Fix up spamassassin.
-
-
Get the top of /etc/spamassassin/local.cf to look like:
-
rewrite_header Subject
-# just add good headers
-add_header spam Flag _YESNOCAPS_
-add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
-
This adds the proper headers so that afew recognizes and tags as spam accordingly. And that should be it!
-
-
I’m not sure of a way to tell K9Mail that the certificate on the IMAP server has changed, so I just deleted the account and recreated it.
-
-
Note: if you find any mistakes in this, or parts that needed additional steps, let me know and I’ll correct/add to this.
-]]>
- Sat, 29 Jun 2013 00:00:00 UT
- http://dbp.io/essays/2013-06-29-hackers-replacement-for-gmail.html
- Daniel Patterson
-
-
- A Literate Ur/Web Adventure
- http://dbp.io/essays/2013-05-21-literate-urweb-adventure.html
- A Literate Ur/Web Adventure
-
-
by Daniel Patterson on May 21, 2013
-
-
Ur/Web is a language / framework for web programming that both makes it really hard to write code with bugs / vulnerabilities and also makes it really easy to write reactive, client-side code, all from a single, simple, codebase. But it is built on some pretty deep type theory, and while it is an incredibly practical research project, some corners of it still show - like error messages that scroll pages off the screen. I’ve experimented with it before, and have written a small application that is beyond a demo, but still small enough to be digestible.
-
For completeness and clarity, I present it here in complete literate style - all the files, interspersed with comments, are presented. They are split into sections by file, which are named in headings. All the text between the file name and the next file name that is not actual code is within comments (that is what the #, (* and *) are for), so you can copy the whole thing to the files and build the project. All the files should go into a single directory. It builds with the current version of Ur/Web. You can try out the application, as it currently exists (which might have been changed since writing this), at lab.dbpmail.net/dn. The full source, with history, is available at github.com/dbp/dnplayer.
-
The application is a video player for the daily news program Democracy Now!. The main point of it is to remember where in the show you are, so you can stop and resume it, across devices. It should work on desktop and mobile applications - I have targetted Chrome on Android, Chrome on computers, and Safari on iPhones/iPads. The main reason for not supporting Firefox is that it does not support the (proprietary) video/audio codecs that are the only format that Democracy Now! provides.
-
dn.urp
-
# .urp files are project files, which describe various meta-data about
-# Ur/Web applications. They declare libraries (like random, which we'll
-# see later), information about the database (both what it is named and
-# where to generate the sql for the tables that the application is using).
-# They separate meta-data declarations from the modules in the project by
-# a single blank line, which is why we have comments on all blank lines
-# prior to the end.
-library random
-database dbname=dn
-sql dn.sql
-#
-# They also allow you to rewrite urls. By default, urls are generated
-# consistently as Module/function_name, which means that the main
-# function inside Dn, our main module, is our root url. We can rewrite
-# one url to another, but if we leave off the second, that rewrites to
-# root. We can also strip prefixes from urls with a rewrite with a *.
-#
-rewrite url Dn/main
-rewrite url Dn/*
-#
-# safeGet allows us to declare that a function is safe to generate urls
-# to, ie that it won't cause side effects. Along the same safety lines,
-# we declare the external urls that we will generate and scripts we will
-# include - making injecting resources hosted elsewhere hard (as Ur/Web
-# won't allow you to create urls to anything not declared here).
-#
-#
-safeGet player
-allow url http://dncdn.dvlabs.com/ipod/*
-allow url http://traffic.libsyn.com/democracynow/*
-allow url http://dbpmail.net/css/default.css
-allow url http://dbpmail.net
-allow url http://hub.darcs.net/dbp/dnplayer
-allow url http://democracynow.org
-allow url http://lab.dbpmail.net/dn/main.css
-script http://lab.dbpmail.net/static/jquery-1.9.1.min.js
-# One odd thing - Ur/Web doesn't have a static file server of its own, so
-# you need to host any FFI javascript elsewhere. Here's where the javascript for
-# this application, presented later, is hosted. For trying it out, leaving
-# this the same is fine, though if you want to change the javascript, or
-# not depend on my copy being up, you should change this and the reference in
-# the application.
-script http://lab.dbpmail.net/dn/dn.js
-#
-# Next, we declare that we have foreign functions in a module called dnjs. This
-# refers to a header file (.urs), and we furthermore declare what functions within
-# it we are using. We declare them as effectful so that they aren't called multiple
-# times (like Haskell, Ur/Web is purely functional, so normal, non-effectful functions are not
-# guaranteed to be called exactly once - they could be optimized away if the compiler
-# did not see you use the result of the function, and could be inlined (and thus
-# duplicated) if it would be more efficient).
-#
-ffi dnjs
-jsFunc Dnjs.init=init
-effectful Dnjs.init
-jsFunc Dnjs.set_offset=set_offset
-effectful Dnjs.set_offset
-
-# The last thing we declare is the modules in our project. $/ is a prefix that means to
-# look in the standard library, as we are using the option type (Some/None in OCaml/ML,
-# Just/Nothing in Haskell, and very roughly a safe null in other languages). sourceL is
-# a helper for reactive programming (to be discussed later). And finally, our main module,
-# which should be last.
-#
-$/option
-sourceL
-dn
-
dn.urs
-
(*
-
.urs files are header files (signature files), which declare all the public functions in the module (in this case, the Dn module). We only export our main function here, but all functions that have urls that we generate within the applications are also implicitly exported.
-
The type of main, unit -> transaction page, means that it takes no input (unit is a value-less value, a placeholder for argumentless functions), and it produces a page (which is a collection of xml), within a transaction. transaction, like Haskell’s IO monad, is the way that Ur/Web handles IO in a safe way. If you aren’t familiar with IO in Haskell, you should go there and then come back.
-
*)
-val main : unit -> transaction page
-
random.urp
-
# Random is a simple wrapper around librandom to provide us with random
-# strings, that we use for tokens. We included it above with the line
-# `library random`. Libraries are declared with separate package files,
-# and here we link against librandom.a, include the random header, and declare
-# that we are using functions declared in random.urs (that is the ffi line).
-# We also declare that all three functions are effectful, because they have
-# side effects
-#
-# NOTE: It has been pointed out that instead of doing this, we could either:
-# A. use Ur/Web's builtin `rand` function, and construct the strings
-# without using the FFI, or even easier:
-# B. just use the integers than `rand` generates as tokens.
-#
-# I didn't realize that `rand` existed when I wrote this, but I'm leaving
-# it in because it is a (concise) introduction to the FFI, which, given
-# the relatively small body of Ur/Web libraries, is probably something
-# you'll end up using if you build any large applications.
-effectful Random.init
-effectful Random.str
-effectful Random.lower_str
-ffi random
-include random.h
-link librandom.a
-
random.urs
-
(*
-
Like with main, we see that the signatures of these functions are ‘transaction unit’ and int -> transaction string, which means the former takes no arguments, and the latter two take integers (lengths), and produce strings, within transactions. They are within transaction because they create side effects (ie, if you run them twice, you will likely not get the same result), and thus we want the compiler to treat them with care (as described earlier). Init seeds the random number generator, so it should be called before the other two are
-
*)
-val init: transaction unit
-val str : int -> transaction string
-val lower_str : int -> transaction string
-
random.h
-
/*
-
Here we have the header file for the C library, which declares the same signatures as above, but using the structs that Ur/Web uses, and the naming convention that it expects (uw_Module_name).
And finally the C code to generate random strings.
-
*/
-#include "random.h"
-#include <stdlib.h>
-#include <time.h>
-#include "urweb.h"
-
-/* Note: This is not cryptographically secure (bad PRNG) - do not
- use in places where knowledge of the strings is a security issue.
-*/
-
-uw_Basis_unit uw_Random_init(uw_context ctx) {
- srand((unsigned int)time(0));
-}
-
-uw_Basis_string uw_Random_str(uw_context ctx, uw_Basis_int len) {
- uw_Basis_string s;
- int i;
-
- s = uw_malloc(ctx, len + 1);
-
- for (i = 0; i < len; i++) {
- s[i] = rand() % 93 + 33; /* ASCII characters 33 to 126 */
- }
- s[i] = 0;
-
- return s;
-}
-
-uw_Basis_string uw_Random_lower_str(uw_context ctx, uw_Basis_int len) {
- uw_Basis_string s;
- int i;
-
- s = uw_malloc(ctx, len + 1);
-
- for (i = 0; i < len; i++) {
- s[i] = rand() % 26 + 97; /* ASCII lowercase letters */
- }
- s[i] = 0;
-
- return s;
-}
-
dn.ur
-
(*
-
We’ll now jump into the main web application, having seen a little bit about how the various files are combined together. The first thing we have is the data that we will be using - one database table, for our users, and one cookie. The tables are declared with Ur/Web’s record syntax, where Token, Date, and Offset are the names of fields, and string, string, and float are the types.
-
All tables that are going to be used have to be declared, and Ur/Web will generate SQL to create them. This is, in my opinion, one weakness, as it means that Ur/Web doesn’t play well with others (as it needs the tables to be named uw_Module_name), and, even worse, if you rename modules, or refactor where the tables are stored, the names of the tables need to change - if you are just creating a toy, you can wipe out the database and re-initialize it, but obviously this isn’t an option for something that matters, and you just have to manually migrate the tables, based on the newly generated database schemas. Luckily the tables / columns are predictably named, but it still isn’t great.
-
*)
-(* Note: Date is the date string used in the urls, as the most
- convenient serialization, Offset is seconds into the show *)
-table u : {Token : string, Date : string, Offset : float} PRIMARY KEY Token
-cookie c : string
-(*
-
Ur/Web provides a mechanism to run certain code at times other than requests, called tasks. There are a couple categories, the simplest one being an initialization task, that is run once when the application starts up. We use this to initialize our random library.
-
*)
-task initialize = fn () => Random.init
-(*
-
Part of being a research project is that the standard libraries are pretty minimal, and one thing that is absent is date handling. You can format dates, add and subtract, and that’s about it. Since a bit of this application has to do with tracking what show is the current one, and whether you’ve already started watching it, I wrote a few functions to answer the couple date / time questions that I needed. These are all pure functions, and all the types are inferred.
-
*)
-val date_format = "%Y-%m%d"
-
-fun before_nine t =
- case read (timef "%H" t) of
- None => error <xml>Could not read Hour</xml>
- | Some h => h < 9
-
-fun recent_show t =
- let val seconds_day = 24*60*60
- val nt = (if before_nine t then (addSeconds t (-seconds_day)) else t)
- val wd = timef "%u" nt in
- case wd of
- "6" => addSeconds nt (-seconds_day)
- | "7" => addSeconds nt (-(2*seconds_day))
- | _ => nt
- end
-(*
-
The server that I have this application hosted on is in a different timezone than the show is broadcasted in (EST), so we have to adjust the current time so that we can tell if it is late enough in the day to get the current days broadcast. Depending on what timezone your computer is, this may need to be changed.
-
*)
-fun est_now () =
- n <- now;
- return (addSeconds n (-(4*60*60)))
-
-(*
-
We track users by tokens - these are short random strings generated with our random library. The mechanism for syncing devices is to visit the url (with the token) on every device, so the tokens will need to be typed in. For that reason, I didn’t want to make the tokens very long, which means that collisions are a real possibility. To deal with this, I set the length to be 6 characters, plus the number of tokens, log_26 (since users are encoded with lower case letters, n users can be encoded with log_26 characters, so we use this as a baseline, and add several so that the collision probability is low).
-
In this, we see how SQL queries work. You can embed SQL (a subset of SQL, defined in the manual), and this is translated into a query datatype, and there are many functions in the standard library to run those queries. We see here two: oneRowE1, which expects to get back just one row, and will extract 1 value from it. E means that it computes a single output expression. Note that it will error if there is no result, but since we are selecting the count, this should be fine. hasRows is an even simpler function; it simply runs the query and returns true iff there are rows.
-
Also note that we refer to the table by name as declared above, and we refer to columns as record members of the table. To embed regular Ur/Web values within SQL queries, we use {[value]}. These queries will not type check if you try to select columns that don’t exist, and of course does escaping etc.
-
*)
-(* linking to cmath would be better, but since I only
- need an approximation, this is fine *)
-fun log26_approx n c : int =
- if c < 26 then n else
- log26_approx (n+1) (c / 26)
-
-
-(* Handlers for creating and persisting token *)
-fun new_token () : transaction string =
- count <- oneRowE1 (SELECT COUNT( * ) FROM u);
- token <- Random.lower_str (6 + (log26_approx 0 count));
- used <- hasRows (SELECT * FROM u WHERE u.Token = {[token]});
- if used then new_token () else return token
-
-(*
-
We write small functions to set and clear the tokens. We do this so that after a user has visited the unique player url at least once on each device, they will only have to remember the application url, not their unique url. now is a value of type transaction time, which gives the current time, and setCookie/clearCookie should be self explanatory.
-
*)
-fun set_token token =
- t <- now;
- setCookie c {Value = token,
- Expires = Some (addSeconds t (365*24*60*60)),
- Secure = False}
-
-fun clear_token () =
- clearCookie c
-
-(*
-
The next thing is a bunch of html fragments. Ur/Web doesn’t have a “templating” system, but it is perfectly possible to create one by defining functions that take the values to insert in. I’ve opted for a simpler option, and just defined common pieces. HTML is written in normal XML format, within <xml> tags, and like the SQL tags, these are typechecked - having attributes that shouldn’t exist, nesting tags that don’t belong, or not closing tags all cause the code not to compile.
-
There are a couple rough edges - some tags are not defined (but you can define new ones in FFI modules), and some attributes can’t be used because they are keywords (hence typ instead of type), but overall it is a neat system, and works very well.
-
*)
-fun heading () =
- <xml>
- <meta name="viewport" content="width=device-width"/>
- <link rel="stylesheet" typ="text/css" href="http://dbpmail.net/css/default.css"/>
- <link rel="stylesheet" typ="text/css" href="http://lab.dbpmail.net/dn/main.css"/>
- </xml>
-
-fun about () =
- <xml>
- <p>
- This is a player for the news program
- <a href="http://democracynow.org">Democracy Now!</a>
- that remembers how much you have watched.
- </p>
- </xml>
-
-fun footer () =
- <xml>
- <p>Created by <a href="http://dbpmail.net">Daniel Patterson</a>.
- <br/>
- View the <a href="http://hub.darcs.net/dbp/dnplayer">Source</a>.</p>
- </xml>
-
-(*
-
We now get to the web handlers. These are all url/form entry points, and do the bulk of the work. The first one, main, which we rewrote in dn.urp to be the root handler, is mostly HTML - the only catch being that if you have a cookie set, we just redirect you to the player.
-
getCookie returns an option CookieType where CookieType is the type of the cookie (in our case, it is a string). redirect takes a url, and urls can be created from handlers (ie, values of type transaction page) with the url function. So we apply player which is a handler we’ll define later, to the token value (as a token is the parameter that player expects), and grab a url for that.
-
One catch to this is that Ur/Web doesn’t know that player isn’t going to cause side effects, which would mean that it shouldn’t have a url created for it (side effecting things should only be POSTed to), which was why we had to declare player as safeGet in dn.urp
-
We also see a form that submits to create_player, which is another handler that we will define. One thing to note is that create_player is a unit -> transaction page function - and the action for the submit is just create_page, not create_page () - the action of submitting passes that parameter.
-
*)
-fun main () =
- mc <- getCookie c;
- case mc of
- Some cv => redirect (url (player cv))
- | None =>
- return <xml>
- <head>
- {heading ()}
- </head>
- <body>
- <h2><a href="http://democracynow.org">Democracy Now!</a> Player</h2>
- {about ()}
- <p>
- You can listen to headlines on your way to work on your phone,
- pick up the first segment during lunch on your computer at work, and
- finish the show in the evening, without worrying what device you are
- on or whether you have time to watch the whole thing.
- </p>
- <h3>How it works</h3>
- <ol>
- <li>
- <form>
- To start, if you've not created a player on any device:
- <submit action={create_player} value="Create Player"/>
- </form>
- </li>
- <li>Otherwise, visit the url for the player you created (it should look like
- something <code>http://.../player/hcegaoe</code>) on this device
- to synchronize your devices. You only need to do this once per device, after that
- just visit the home page and we'll load your player.
- </li>
- </ol>
-
- <h3>Compatibility</h3>
- <p>This currently works with Chrome (on computers and Android) and iPhones/iPads.</p>
- {footer ()}
- </body>
- </xml>
-
-(*
-
create_player is pretty straightforward, but it shows a different part of Ur/Web’s SQL support: dml supports INSERT, UPDATE, and DELETE, in the normal ways, with the same embedding as SQL queries (that {[value]} puts a normal Ur/Web value into SQL). We create a token, create a “user”, setting that they are on the current day’s show and at the beginning of it (offset 0.0), store the token, and then redirect to the player.
The next two functions encompass most of the player, which is the core of the application. The way that it is structured is a little odd, but with justification: Chrome on Android caches extremely aggressively, and doesn’t seem to pay attention to headers that say not to, which means that if you visited the application, and then a few days later open up Chrome again, it will seem like it is loading the page, but it is loading the cached HTML, it is not getting it from the server. This is really bad for us, because it means it will have both an old offset (in case you watched some of the show from another device), but worse, on subsequent days it will be trying to play the wrong day’s show! You can manually reload the page, but this is silly, so what we do is initially just load a blank page, and then immediately make a remote call to actually load the page. So what is cached is a little bit of HTML and some javascript that loads the page for real.
-
We do all of this is functional reactive style: we declare a source, which is a place where values will be put, and it will cause parts of the page (that are signaled) to update their values. Then we set an onload handler for the body, which, first, makes an rpc call to a server side function (which is just another function, like all of these handlers), and then set the source that we defined to be the result of rendering the player. render is a client-side function that just creates the appropriate forms / html.
-
Finally, we will call a client-side function init, which will do some setup and then call into the javascipt ffi to the ffi init function, which will handle the HTML5 audio/video APIs (which Ur/Web doesn’t support, and are very browser specific anyway).
-
One incredibly special thing that is going on is the SourceL.set os that is passed to javascript. If you remember from our .urp file, we imported sourceL. It is a special reactive construct that allows you to set up handlers that cause side effects (are transactions) when the value inside the SourceL changes. So what is happening is we have created one of these on the server, in player_remote, and sent it back to the client. The client then curries the set function with that source, producing a single argument function that just takes the value to be updated. We hand this function to javascript, so that in our FFI code, we can just set values into this, and it can reactively cause stuff to happen in our server-side code.
-
The reactive component on the page is the <dyn> tag, which is a special construct that allows side-effect free operations on sources. signal s grabs the current value from the source s, and in this case we just return this, but we could do various things to it. The result of the block is what the value of the <dyn> tag is. In this case, we have just made a place where we can stick HTML, by calling set s some_html.
The remote component is where most of the logic of the player resides. By now, you should be able to read most of what’s going in. Some points to highlight are the place where we create the SourceL that we will pass back, and set its initial value to offset. Also, fresh is a way of generating identifiers to use within html. Our render function will use this identifier for the player, which is necessary for the javascript FFI to know where it is. Finally, bless is a function that will turn strings into urls, by checking against the policy outlined in the .urp file for the application.
-
*)
-and player_remote token =
- n <- est_now ();
- op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
- case op of
- None =>
- clear_token ();
- redirect (url (main ()))
- | Some pi =>
- set_token token;
- let val show = recent_show n
- val fmtted_date = (timef date_format show) in
- (if fmtted_date <> pi.Date then
- (* Need to switch to new day *)
- dml (UPDATE u SET Date = {[fmtted_date]}, Offset = 0.0 WHERE Token = {[token]})
- else
- return ());
- let val offset = (if fmtted_date = pi.Date then pi.Offset else 0.0)
- val video_url = bless (strcat "http://dncdn.dvlabs.com/ipod/dn"
- (strcat fmtted_date ".mp4"))
- val audio_url = bless (strcat "http://traffic.libsyn.com/democracynow/dn"
- (strcat fmtted_date "-1.mp3")) in
- os <- SourceL.create offset;
- player_id <- fresh;
-
- return {Player = player_id, Show = show, Offset = offset,
- Source = os, Video = video_url, Audio = audio_url}
- end
- end
-
-
-(*
-
The next three functions are simple - the first just renders the actual player. Note that we use the player_id we generated in player_remote. Then we provide a way to forget the player (if you want to unlink two devices, forget the player on one and create a new one), and due to some imperfections with how we keep the time in sync (mostly based on weirdness of different browsers implementations of the HTML5 video/audio APIs), to seek backwards, or start the show over, we need to tell the server explicitly, so we provide a handler to do that.
-
*)
-and render token player_id date =
- <xml><h2>
- <a href="http://democracynow.org">Democracy Now!</a> Player</h2>
- {about ()}
- <h3>{[timef "%A, %B %e, %Y" date]}</h3>
- <div id={player_id}></div>
- <br/><br/><br/>
- <form>
- <submit action={start_over token} value="Start Show Over"/>
- </form>
- <form>
- <submit action={forget} value="Forget This Device"/>
- </form>
- {footer ()}
- </xml>
-
-(* Drop the cookie, so that client will not auto-redirect to player *)
-and forget () =
- clear_token ();
- redirect (url (main ()))
-
-(* Because of browser quirks, this is the only way to get to an earlier time, synchronized *)
-and start_over token () =
- dml (UPDATE u SET Offset = 0.0 WHERE Token = {[token]});
- redirect (url (player token))
-
-(*
-
Now we get to the last web handlers. The first one is a client side initializer. The main thing it sets up is a handler to rpc to the server whenever the offset SourceL changes. The call is to update (which we’ll define in a moment), and it optionally returns a new time to set the client to.
-
This may sound a little odd, but the basic situation is that you play part of the way through the show on one device, then pause, watch some on another device, and now hit play on the first device. It will POST a new time, but the server will tell it that it should actually be at a later time, and so we use the javascript FFI function set_offset to set the offset.
-
Finally we make it so that the client silently fails if the connection fails (this is bad behavior, but simple), and call the javascript FFI initialization function, which will set up the player and any HTML5 API related stuff.
-
*)
-and init token player_id os set_offset video_url audio_url =
- SourceL.onChange os (fn offset => newt <- rpc (update token offset);
- case newt of
- None => return ()
- | Some time => Dnjs.set_offset time);
- offset <- SourceL.get os;
- onConnectFail (return ());
- Dnjs.init player_id offset set_offset video_url audio_url
-
-(*
-
The last function is the simple handler that we called when the offset SourceL changes. It updates the time if the time is greater than the recorded offset (this is why we need the start_over handler), and otherwise returns the recorded offset to be updated.
-
*)
-and update token offset =
- op <- oneOrNoRows1 (SELECT * FROM u WHERE (u.Token = {[token]}));
- case op of
- None => return None
- | Some r => (if offset > r.Offset then
- dml (UPDATE u SET Offset = {[offset]}
- WHERE Token = {[token]} AND {[offset]} > Offset);
- return None
- else return (Some r.Offset))
-
sourceL.urs
-
(*
-
This came from a supplemental standard library, and, as explained earlier, allows you to create source-like containers that call side-effecting handlers when their values change.
-
*)
-(* Reactive sources that accept change listeners *)
-
-con t :: Type -> Type
-
-val create : a ::: Type -> a -> transaction (t a)
-
-val onChange : a ::: Type -> t a -> (a -> transaction {}) -> transaction {}
-
-val set : a ::: Type -> t a -> a -> transaction {}
-val get : a ::: Type -> t a -> transaction a
-val value : a ::: Type -> t a -> signal a
-
sourceL.ur
-
(*
-
The sourceLs are built on top of normal sources, and just call the OnSet function when you call set.
-
*)
-
-con t a = {Source : source a,
- OnSet : source (a -> transaction {})}
-
-fun create [a] (i : a) =
- s <- source i;
- f <- source (fn _ => return ());
-
- return {Source = s,
- OnSet = f}
-
-fun onChange [a] (t : t a) f =
- old <- get t.OnSet;
- set t.OnSet (fn x => (old x; f x))
-
-fun set [a] (t : t a) (v : a) =
- Basis.set t.Source v;
- f <- get t.OnSet;
- f v
-
-fun get [a] (t : t a) = Basis.get t.Source
-
-fun value [a] (t : t a) = signal t.Source
-
dnjs.urs
-
(*
-
This is the signature file for our javascript FFI. It declares what functions will be exported to be accessible within Ur/Web, and what types they have.
-
*)
-val init : id -> (* id for player container *)
- float -> (* offset value *)
- (float -> transaction unit) -> (* set function *)
- url -> (* video url *)
- url -> (* audio url *)
- transaction unit
-
-val set_offset : float -> transaction unit
-
dn.js
-
/*
-
Since this is a adventure in Ur/Web, not Javascript, and there are plenty of places to learn about the quirks and features of HTML5 media APIs (and I don’t claim to be an expert), I’m just going to paste the code in without detailed commentary. The only points that are worth looking at are how we use setter, which you will remember is a curried function that will be updating a SourceL, causing rpcs to update the time. To call functions from the FFI, you use execF, and to force a transaction to actually occur, you have to apply the function (to anything), so we end up with double applications.
-
Other than that, all that is here is some browser detection (as different browsers have different media behavior) and preferences about media type in localstorage.
-
*/
-function init(player, offset, setter, video_url, audio_url) {
- // set up toggle functionality
- $("#"+player).after("<button id='toggle'>Switch to " +
- (prefersVideo() ? "audio" : "video") + "</button>");
- $("#toggle").click(function () {
- window.localStorage["dn-prefers-video"] = !prefersVideo();
- location.reload();
- });
-
- // put player on the page
- if (canPlayVideo() && prefersVideo()) {
- $("#"+player).html("<video id='player' width='320' height='180' controls src='" +
- video_url + "'></video>");
- } else {
- $("#"+player).html("<audio id='player' width='320' controls src='" +
- audio_url + "'></audio>");
- }
-
- // seek / start the player, if applicable
- if (isDesktopChrome()) {
- $("#player").one("canplay", function () {
- var player = this;
- if (offset != 0) {
- player.currentTime = offset;
- }
- player.play();
- window.setInterval(update_time(setter), 1000);
- });
- } else if (isiOS() || isAndroidChrome()) {
- // iOS doesn't let you seek till much later... and won't let you start automatically,
- // so calling play() is pointless
- $("#player").one("canplaythrough",function () {
- $("#player").one("progress", function () {
- if (offset != 0) {
- $("#player")[0].currentTime = offset;
- }
- window.setInterval(update_time(setter), 1000);
- });
- });
- } else {
- $("#player").after("<h3>As of now, the player does not support your browser.</h3>");
- }
-}
-
-function set_offset(time) {
- var player = $("#player")[0];
- if (time > player.currentTime) {
- player.currentTime = time;
- }
-
-}
-
-// the function that grabs the time and updates it, if needed
-function update_time(setter) {
- return function () {
- var player = $("#player")[0];
- if (!player.paused) {
- // a transaction is a function from unit to value, hence the extra call
- execF(execF(setter, player.currentTime), null)
- }
- };
-}
-
-// browser detection / preference storage
-
-function canPlayVideo() {
- var v = document.createElement('video');
- return (v.canPlayType && v.canPlayType('video/mp4').replace(/no/, ''));
-}
-
-function prefersVideo() {
- return (!window.localStorage["dn-prefers-video"] || window.localStorage["dn-prefers-video"] == "true");
-}
-
-function isiOS() {
- var ua = navigator.userAgent.toLowerCase();
- return (ua.match(/(ipad|iphone|ipod)/) !== null);
-}
-
-function isDesktopChrome () {
- var ua = navigator.userAgent.toLowerCase();
- return (ua.match(/chrome/) !== null) && (ua.match(/mobile/) == null);
-}
-
-function isAndroidChrome () {
- var ua = navigator.userAgent.toLowerCase();
- return (ua.match(/chrome/) !== null) && (ua.match(/android/) !== null);
-}
-
Makefile
-
To actually build our application, we have to first build our C library. Then we’ll build the app, using the sqlite backend. To get this running, we then need to do sqlite3 dn.db < dn.sql (note you only need to do this once) and then start the server with ./dn.exe. You can then visit the application at http://localhost:8080. This has been tested on current Debian Linux and Mac OSX.
-]]>
- Tue, 21 May 2013 00:00:00 UT
- http://dbp.io/essays/2013-05-21-literate-urweb-adventure.html
- Daniel Patterson
-
-
- Programming as Literature
- http://dbp.io/essays/2012-10-24-programming-literature.html
- Programming as Literature
-
-
by Daniel Patterson on October 24, 2012
-
-
Sometimes I’m not sure how to explain what I study or why I study it. I tell people that I study theoretical computer science, or algorithms and programming languages, or math and computer science, and if they ask why? Let’s come back to that. First I want to talk about literacy.
-
Literacy is about being able to understand the recorded thoughts of other people, and being able to share your own in a permanent medium. There are beautiful oral traditions, but most stories, much of human knowledge, is written down. Literacy allows one to tap into that sea of knowledge. In many ways, libraries are one of humanity’s greatest achievements; that one can walk into a building that contains the thoughts and discoveries of thousands of people, stretching back hundreds or thousands of years (and as long as you aren’t at an exclusive university, you can often access that information for free). Some knowledge is certainly more accessible than other knowledge, and languages of course complicate things, but the essential element of literacy is both the perception of the world around you and the ability to describe it and share that with others. We must be able to understand the thoughts of others and formulate our own so that others can understand them.
-
The broader and perhaps more important aspect of literacy is that it allows you to contextualize your own life and perceptions in relation to others. In writing, you turn your own lived experience into something you can share. In reading, you realize that others have lived experiences that are in some ways similar and in others different from your own. In many ways, literacy is broader than reading and writing, it is rather about developing perspective on your own life and understanding of the lives of others. I can remember as a small child looking up at an airplane and realizing for the first time that there were people inside of it, in the middle of their own lives, with their own thoughts, hopes, dreams. For the first time I had an empathetic sense that I was not the center of the world (Descartes be damned).
-
Now, you may be asking, with good reason, what does this have to do with computer science? I want to argue that one of the primary mediums of our lives is now something that most of us do not have literacy in. We communicate with one another with email, websites, cell phones, etc. We learn information by pushing a button on a piece of electronics that displays pictures to us that change as we touch them or use devices attached to it. Traffic lights and airline schedules are planned with computers, cars run with them, watches, microwave ovens. Most things we plug in or have batteries have computers in them. Much of our lives are carried out using computers that we don’t have more than a surface empirical understanding of. Now there have always been things that individuals don’t understand. Tax codes, foreign languages, specifics of geography, etc.
-
But there are a couple interesting things about computers that distinguish them. The first is that they are all essentially the same. There is an underlying similarity between all computers, and indeed even among all possible devices that can compute. This means that it actually is possible to learn about all of these things.
-
The second is that they are primarily designed as a way for humans to express their thoughts. We don’t think about computers in this sense very much, but it is what distinguishes them from most other machines - they are used so that one person can express how to do something and share it with others. They are a medium for talking about solving problems. The breadth of such problems that they can express is visible by looking at all the places that they are used now - and imagine, this is with only a small minority of the population thinking up ways to use them!
-
There is a third dimension that is similarly interesting, and talked about more, which is that they are a way to expand our own mental capacities - if I am confronted with a task of sorting a few hundred (or thousand) documents, I can do it by hand, or, if I know how, I can write a program to do it and get a computer to carry out the work of sorting (and if I wanted the computer to do this sorting every day for the next year, I wouldn’t have to do any more work). What this means is that not only are they a way for me to share my ideas of how to solve a problem, they are also a way to automate that very problem solving.
-
What is interesting and sad is that while the posession of computers is expanding rapidly, the knowledge of how to truly use them is not. People are sold devices that allow them to perform a set number of functions (all of which are simply repetitions of thoughts by the people working at the company who sold them the device), but they are not given the tools to express their own thoughts, to expand their own mental capacity in any way other than that already thought of by someone else. We have expanded the medium without expanding literacy. And indeed, there is a financial explanation for this. It’s hard to sell knowledge when people can create it themselves. Many technological “innovations” these days are trivial combinations of earlier ideas which would be unnecessary if people were able to carry out those kinds of compositions themselves.
-
So why am I interested in computer science? I’m interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
-]]>
- Wed, 24 Oct 2012 00:00:00 UT
- http://dbp.io/essays/2012-10-24-programming-literature.html
- Daniel Patterson
-
-
- Haskell / Snap ecosystem is as productive as Ruby/Rails.
- http://dbp.io/essays/2012-04-26-haskell-snap-productive.html
- Haskell / Snap ecosystem is as productive as Ruby/Rails.
-
-
by Daniel Patterson on April 26, 2012
-
-
This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.
-
As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.
-
1. The number one productivity improvement is a smart strong type system. This is less of an issue for small projects, but as soon as you have at least a few thousand lines of code, adding new features or refactoring inevitably involves changes to multiple parts of the codebase. Having a compiler that will tell you all the places that you need to change things is an amazing productivity booster. This can be approximated in some ways with good test coverage, but it is really a different beast - tests often need to be changed as well, and if you aren’t very careful about this it is easy to change them in ways that don’t catch new bugs. Additionally, it is hard (or very tedious, if you do it wrong) to achieve high enough coverage to actually catch all of the bugs introduced in refactoring. This as compared to a compiler that is completely automated and will always be aware of all of the code you have and the ways that it interacts (at least to the extent that you actually use the type system - but if you are a good haskell programmer, you will).
-
This alone wouldn’t be enough to suggest using Haskell/Snap over Ruby/Rails, as a type system isn’t worth much without supporting libraries, but as I switch between the ecosystems, this is the place where I notice the most drastic improvements in productivity, so I put it first.
-
2. Form libraries. There are many different libraries for dealing with forms in Rails, and there is the built in one as well. The general idea is that you define some validations on your models, and then use the DSLs from the form libraries to define forms, and can do validations, etc. In Haskell (in my opinion), the best form library is Digestive-Functors (thanks Jasper!), and the productivity difference is staggering in more complex use-cases. In the sort of vanilla examples that rails has, the validation system works quite well, and dynamic introspection allows you to write really short forms. This begins to break down when you start getting forms that don’t correspond in a simple way to models. I have forms that are sometimes a mix of two models, or forms that are a partial view into a data structure, or any number of other variations.
-
With Digestive-Functors, I can define the forms that I need, and re-use components between multiple forms (forms are composable), and these validations are on the form, not on the underlying model. It is obviously useful to database level data integrity checks, but I find that having them being the main / only way of doing validations is really limiting - because sometimes there are special cases when you want the validation done one way and other times another.
-
More generally, it is possible that the business logic of a specific form may have requirements that do not always have to hold for the datastore, and thus should not reside in the integrity checks. Having written a lot of forms (who hasn’t?), I find that getting the first form out is much faster with Rails, but inevitably when I need to change something it starts become difficult fast. Every time I am doing it I keep picturing an exponential curve - sure it starts out really small, but it gets really big really fast! It isn’t that I run into things that are not possible with Rails, but they end up being more difficult, more error prone, and generally reduce my productivity. With Digestive-Functors, I spend a little more time building the forms in the beginning, but I’ve never had requirements for a form that weren’t easily implemented (almost without thinking).
-
3. Routing is the next big one. This may be more of an opinion that the previous ones, but I have always thought that great care should be involved in designing the url structure of a site. In this sense, I guess I disagree with the idea of universally using REST - I think it is very useful when writing APIs, but when designing applications for people, I believe the urls should be meaningful to the people, not to machines. Usually, right after modeling the data of an application, I make a site-map - this is a high level view of what the site should look like. Instead, with Rails, I spend time thinking of how I can adapt what I want to the REST paradigm, and usually end up with something that is an incomplete/counterintuitive representation.
-
More broadly, I think the idea of hierarchical routing is brilliant - the idea that you match routes by pieces. What this allows you to do is easily abstract out work that should be done for many different related requests. In Rails, this is approximated by :before_filters (ie, it a controller for a specific model, you might fetch the item from the id for many different handlers), but it is a poor substitute. For example I often have an “/admin” hierarchy, and to limit this, all I have to do is have one place (the adminRouter or something) that does the required work to ensure only administrators can access, and it can also fetch any data that is needed, and then it can pass back into the route parsing mode. Or if I want to do the rails-style pre-fetching, then I design the routes as “/item/id/action” and have a handler that matches “/item/id”, fetches the item, and then matches against the various actions. If I have nested pieces of data, this is just as easy. I could have “item/id/something/add” which adds a new “something” to the item with id “id”, This would all be in the same hierarchy, so the code to fetch the item would still only exist once.
-
Not only is this very natural to program, it keeps the flow easy to follow when you are looking back at it, and allows backtracking in a great way: if, in a handler, you reach something that indicates that this cannot be matched, like if the path was “/item/id” but the id did not correspond to an actual item, you can simply “pass” and the route parser continues looking for things that will handle the request. If it finds nothing, it gives a 404.
-
An example of how you could exploit things in a really clean way - if you are building a wiki-like site, then you first have a route that matches “/page/name” and looks up the page with name “name”. If it doesn’t find it, it passes, and the next handler can be the “new page” handler, that prompts the user to create the page. As with everything else, I’m not saying this cannot be done with Rails, simply that it is much more natural and easy to understand with Snap (and Happstack, where this routing system originated, at least in the Haskell world).
-
4. Quality of external libraries. Point 2 was a special case of this, since dealing with forms comes up so much, but I think the general quality of libraries in Haskell is superb. One example that I came up against was wanting to parse some semi-free-form CSV data into dates and times. Haskell has the very mature parsing library Parsec (which has ports into many languages, including Ruby) that makes it really easy to write parsers. I ported an ad-hoc parser to it, and found that not only was I able to write the code in a fraction of the time, but it was a lot more robust and easy to understand.
-
For testing of algorithmic code, the QuickCheck library is pretty amazing - in it, you tell it how to construct domain data, and then certain invariants that should hold over function applications, and it will fuzz-test with random/pathological data. The first time you write some of these tests (and catch bugs!) you will wonder why you haven’t been testing like that before! I don’t really want to go into it here, but the other point is that many of these libraries are very very fast - there has been, over the last couple years, a massive push to have very performant libraries, with a lot of success. The Haskell web frameworks webservers regularly trounce most other webservers, and there are very high performant json, text processing, and parsing libraries (attoparsec is a version of parsec that is very fast).
-
5. Templating. In this, I want to directly compare the experience of using Heist (a templating system made by the Snap team) and Erb/Haml (I mostly use the latter, but in some things, like with javascript, I have to use the former). The first big difference is the idea of layouts/templates/partials in rails. I never really understood why there was this distinction when I first used it, and when comparing it to Heist (which has no distinction - any template can be applied to another, to achieve a layout like functionality, and any template can be included within another, to achieve a partial like functionality) it feels very limited.
-
The other major difference is that the two templating languages in Ruby allow dynamic elements by embedding raw ruby code, whereas the former allows dynamic stuff by allowing you to define new xml tags (called splices) that you can then use in the templates. I have found this to be an extremely powerful idea, as it allows you to not only do all the regular stuff (insert values, iterate over lists of values and spit out html), but can even allow you to build custom vocabularies of elements that you want to use that are designed to go with javascript (so for example, I built an asynchronous framework on top of this, where I had a “<form-async>” tag and “<div-async>”s that would be replaced asynchronously by the responses from the form posts).
-
It also adapts to being used with (trusted) user generated input - I’ve used it in multiple CMS systems so that, for example, all links to external sites are set to open in new tabs/windows (by overriding the “<a>” tag and adding the appropriate “target”) or allowing the users to gain certain dynamic stuff for their pages. Compared to this, the situation with Haml always seems hopelessly tied up with ruby spaghetti code - not that it always is (you can always be careful), but the split with Heist both feels like a cleaner separation AND more powerful, which is not something you get often, and I think is a sign that the metaphor that Heist created (which is based on a couple really simple primitives) is really something special.
-
6. This is sort of an extension of the first point, and I’m putting it towards the end because it is the most subjective of this already quite subjective comparison - I think that web applications built with Haskell/Snap are much easier to edit / add to than corresponding applications in Ruby/Rails. One of the biggest reasons for this is that there is much more boilerplate/code spread in ruby - some of it is auto-generated, other bits is manually generated, but there ends up being code scattered around. It is pretty easy to add new code, but when you want to edit / refactor existing code, it starts to get hard to figure out where everything is. A bit of this relies on conventions to a degree (which you learn), but there is simply less code in Snap, and usually everything pertaining to a specific function is in one place. This has a lot to do with the functional paradigm - there is no hidden state, so generally all the transformations that occur are very transparent, whereas with Rails it is possible for stuff from the ApplicationController being applied, or just various filters coming into play, or stuff from the model, etc. There is no obvious “starting point” if you want to see how a request travels through your application (candidates include the routes file, the controllers, etc), in the same way where with a Snap application, the code to start the web server is in one of the files you write! You can trace exactly what it is doing from there!
-
In addition, there is also very little “convention” with Snap. It enforces nothing, which has the consequence (in addition to allowing you to make a mess!) of having the whole application conforming to exactly how you think it should be organized. I’ve found that this actually makes it much easier to add new things or modify existing functionality (fix bugs!), because the entire structure of the application, from how the requests are routed to how responses are generated, is based on code I wrote. This means that making a change anywhere in this process is usually very easy - it feels in some ways like the difference in making a change to an application you wrote from scratch and one that you picked up from someone else. There is also a potential downside to this - the first couple applications I built had drastically different organizational systems
-
(Side note for anyone reading this who is curious: I’ve converged to the following method: all types for the application lives in a Types module or hierarchy, all code that pertains to the datastore lives in a State hierarchy or module in a small application, code for splices lives in a Splices hierarchy, forms live is a Forms hierarchy, and the web handlers live in a Handlers hierarchy. I also usually have a Utils module that collects some various things that are used in all sort of different places. Everything depends on Types and Utils. Splices, Forms, and State are all independent of one another, and Handlers depends on everything. And then of course there is an Application module and Main, according to the generated code from Snap).
-
This is a major difference in how Snap even differs from some other Haskell web frameworks, that it seems more like a library with which to build a web application instead of a true framework, but in my experience this is actually a really powerful thing, and makes the whole process a lot more enjoyable, because I never feel like I’m trying to conform to how someone else thinks I should organize things.
-
7. I’m bundling the performance, security, etc all at once. Rails is a very stable framework, so lots of work has gone into this. But I think the recent vulnerabilities exposed on a lot of major sites (like GitHub) based on the common paradigm of mass-assignment sort of point out the negative side. Snap is much newer, but it was built with security in mind from the beginning, as far as I can tell, and most libraries that I have used have also mentioned ways that it comes up - the entire development community seems a lot more aware / concerned with it.
-
I think part of this probably has to do with the host languages - ruby is a very dynamic language that has a history of experimentation (so generally, flexibility is preferred of correctness), whereas Haskell is a language where lots of static guarantees are valued, and security is usually lumped in with correctness. For performance, there is no question that Haskell will win hands down on any performance comparison (and on multithreading). Granted, a lot of web code is disk/database bound so this isn’t a huge deal, but it is nice to know that you aren’t needlessly wasting cycles (and can afford to run on smaller servers).
-
8. Now, as a counterpoint, I want to articulate what Rails really has over Snap. Number one, and this is huge, is the size of the community. There are a massive number of developers who know how to use Rails (how many are good at it is another question), and this also means that if you are trying to do something it is much more likely that a prebuilt solution exists. It also means that it will be easier to hire people to work on it, and easier to sell it as a platform to clients/bosses.
-
The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.
-
Cheers, and happy web programming.
-]]>
- Thu, 26 Apr 2012 00:00:00 UT
- http://dbp.io/essays/2012-04-26-haskell-snap-productive.html
- Daniel Patterson
-
-
- Math/Science integrated with Scheme
- http://dbp.io/essays/2011-12-06-science-scheme.html
- Math/Science integrated with Scheme
-
-
by Daniel Patterson on December 6, 2011
-
-
I had an idea today, of an interactive homework assignment for a Chemistry class. It was a prompt, and you could type in queries and it would give responses. The basics would be:
-
# (questions)
-=> To Do: (1,2,3,4,5,6,7)
- Complete: ()
-# (question-1)
-=> 1. How many grams of Na are needed to make 28 grams of NaCl?
-# (periodic-table 'Na)
-=> Sodium - Atomic Number 11 - Weight 22.98976928
-# (periodic-table 'Cl)
-=> Chlorine - Atomic Number 17 - Weight 35.453
-# (* 22.98976928 (/ 28 (+ 22.98976928 35.453)))
-=> 11.01
-# (answer-1 11.01)
-=> Correct! Great job. 1/7 Questions completed.
-# (questions)
-=> To Do: (2,3,4,5,6,7)
- Complete: (1)
-
Now if you don’t know Scheme syntax, the line with the numeric calculation might be a little confusing, but once you realize that it is just pure prefix notation (the operator always comes first, every set of parenthesis wrap an operation) it should start making sense. I’m pretty sure I could explain Scheme to anyone who is taking high-school science in an hour, but the three sentence explanation is: Every expression is wrapped inside parenthesis. The first word inside the parenthesis is a function, the rest (there don’t have to be any) are arguments to the function, which can be be other expressions or basic items like numbers or strings. Arithmetic follows this pattern, which may seem a little unnatural at first, but this consistency means that you now know almost all there is to know about Scheme.
-
But what I’ve described here isn’t actually much better than the web question and answer system that I saw today, that gave me this idea. It’s basically just an interactive text-based version of the same thing. What I started thinking of is having the capability to add things like this:
Which would provide both references and ways to do some of the more boring rote work quickly. Descriptions of the equations could also exist, making it even more of an interactive learning project. But what would be even better would be to allow students to define new functions (or redefine old ones) on the fly. Let’s say there are a bunch of different calculations that require the same involved steps. I saw today I student working through two laborious calculations, which differed only in that the value for the activation energy. What would be amazing is if a student could do something like:
-
# (define (my-arrh-eq act-energy)
- (arrhenius-k (arrhenius-a 2.75e-2 act-energy 293) act-energy 333))
-=> Defined new function my-arrh-eq!
-# (my-arrh-eq 14500)
-=> 1.01
-
I don’t remember if that was the answer or even the value for the activation energy (it probably isn’t), but that was the general solution. Now the problem was that a rate coefficient (2.75e-2) was given for 20degrees celsius and and the problem asked what was the rate coefficient for 60degrees celsius (same reaction). The problem was posed with two different activation energies, and identical and reasonably involved calculations resulted - using the given 20degree setup to solve for the frequency factor and then plug that into the same equation this time using 60degrees.
-
But what was interesting about this problem was the technique of solving one equation and using a part of that in the other - not actually doing out the arithmetic. It would be amazing if a student could build things like the function above, which clearly demonstrate an understanding of the technique, but also reveal a capacity to organize their thoughts and string together the pieces into higher level abstractions - a critical part of the type of thinking that underlies computer programming, and something that is going to become more and more important as time goes on.
-
I think there is amazing potential to systems like this - where programming is built into the fabric of math and science work, because it will both teach students to program (which is a very helpful thing), but it will also focus their attention and mental efforts on understanding how to string together concepts and actually solve problems, not just how to do calculations. I think it could also have a motivating effect because when you start writing programs like this, you feel like you are somehow getting out of doing boring work (which you are), and that you must be cheating somehow (and that feels good!). Little do you know that you are actually learning the material better than the person who did the calculations out by hand, because you focused on what was really important and had to figure out the general solution.
-
Now some of this is already happening - probably mostly using TI-BASIC on graphing calculators, but the system is reasonably unnatural (and no one is teaching students how to use it) and removed from basic work that I don’t think it is very widespread. I think a system that students would interact with that would allow them to build functions and use existing ones in the course of doing work would be a really amazing thing, both for their understanding of the subject itself and also to learn computer programming (or, more generally, “algorithmic thinking”).
-]]>
- Tue, 06 Dec 2011 00:00:00 UT
- http://dbp.io/essays/2011-12-06-science-scheme.html
- Daniel Patterson
-
-
- iOS is anti-UNIX and anti-programmer.
- http://dbp.io/essays/2011-09-15-ios-anti-unix.html
- iOS is anti-UNIX and anti-programmer.
-
-
by Daniel Patterson on September 15, 2011
-
-
When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell’s pipe operator, ‘|’. Using the commandline (because at that point, linux GUI’s were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.
-
I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:
It seemed like magic at the time, and in some ways, it still does. What the shell (and UNIX in general) offered was composability - it gave you simple (but powerful) tools, and a standard way of linking them together - text streams. By combining those together, it offered immeasurable power, much more than any single tool. The mathematics of combinations guarantees this.
-
The more I use graphical interfaces (or anything that does not operate on text streams - commandline curses programs included), the more I am struck by how profound the loss of composability is - each program has to try to implement all the standard things (searching, sorting, transforming) that you might want to do with the information it has, and in that repetition lies inconsistencies and usually plain lack of power. The better ones share common libraries, and gain common functionality, but this only amounts to their least common denominator - two separate programs can not (easily) expose their higher functionality to each other (at least not it compiled languages) in the way that commandline stream processing programs can.
-
What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.
-
I’d noticed before how frustrating it was for me to use iOS, but I wasn’t sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can’t imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it’s small enough to fit in a pocket - that is a profound change in the way the world works. But it’s not a computer, it’s just a glorified palm pilot with a few bells and whistles.
-]]>
- Thu, 15 Sep 2011 00:00:00 UT
- http://dbp.io/essays/2011-09-15-ios-anti-unix.html
- Daniel Patterson
-
-
-
-
diff --git a/drafts/email-simplified.markdown b/drafts/email-simplified.markdown
deleted file mode 100644
index 46e719f..0000000
--- a/drafts/email-simplified.markdown
+++ /dev/null
@@ -1,49 +0,0 @@
----
-title: "Email for Hackers: Simplified"
-author: Daniel Patterson
----
-
-Four and a half years ago I wrote a very popular guide titled [A Hacker's
-Replacement for GMail](/essays/2013-06-29-hackers-replacement-for-gmail.html)
-about my system for email based on `notmuch`, `emacs`, my own mail server, etc
-(it's still the only thing I've written that's gotten any amount of traffic). I
-ran that system for several years, but eventually one thing killed it: **spam**.
-Perhaps I never got the various services set up (not just learning from prior
-messages, but talking to services that told me about IP addresses that were
-deemed to be spammers, etc). I still would get at least several spam messages
-per day. I even tried using paid anti-spam services (so all mail filters through
-them, they forward on to your mail server, and your mail server only accepts
-messages from their servers). Unlike what I was led to believe, I never seemed
-to have trouble with deliverability (maybe I got lucky and the IP address my
-server was assigned had never spammed, but with DKIM, SPF, etc, everything
-worked!).
-
-I went back to GMail for a little while, still behind my own domain (note to the
-reader: if you take nothing else from this, seriously consider registering a
-domain and paying Google to host your email. The domain is ~$10-15/year, the
-email is ~$4/month, and the transition to switch addresses is certainly painful,
-but once you've done it Google doesn't own your identity anymore. You can, in
-the future, without anyone noticing, switch to another company, or self-host.
-It's worth it!), but I missed writing email in emacs.
-
-So, I started trying to figure out a better system. As usual, I started with requirements:
-
-1. Be able to read, write, search email in Emacs.
-2. Push notifications on the computer.
-3. Be able to read, write, search email on iPhone (push notifications too).
-4. Have a single repository where email is stored, to make backup simpler.
-5. Keep mail organized into an Inbox and an everything else (Archive).
-
-Points 4&5 led me to decide that, for my purposes, Maildir/IMAP is a perfectly
-fine authoritative source for my email. In my previous system, I was always a
-little worried about having to rely on the notmuch database, as it was an
-undocumented format (that changed with new versions) with a single client
-program. Realistically, aside from tags that can be automatically applied
-(`sent`, `unread`, mailing list ones), the only tag that I care about in
-`inbox`, and it seemed like I should be able to synchronize that to match where
-messages are in the Maildir folders.
-
-
-
-
-https://www.fastmail.com/?STKI=17129600
diff --git a/drafts/haskell-module-names.markdown b/drafts/haskell-module-names.markdown
deleted file mode 100644
index 24afe1c..0000000
--- a/drafts/haskell-module-names.markdown
+++ /dev/null
@@ -1,114 +0,0 @@
----
-title: "How to organize modules in a Haskell Web App"
-author: Daniel Patterson
----
-
-> A note: I don't write single-page apps. Perhaps some of this translates to
-> people who do, but I don't know. When I say "web app", I mean server-rendered
-> html pages that have forms and buttons and store their state on the server.
-
-Different people have different preferences for how to organize code in their
-applications. One of the really cool things about most Haskell web frameworks is
-they let you organize your code however you want.
-
-This upside is that a tiny project can be a single file and that understanding
-projects comes just from understanding the language, not framework-specific
-magic that makes particular paths special.
-
-The downside, of course, is that people are on their own to figure out best
-practices. I've tried a lot of different things (over the past ~11 years
-building web stuff in Haskell), and this system is the result of that
-experience, primarily using the [Snap](http://snapframework.com/) web framework
-and then more recently the [Fn framework](http://fnhaskell.com/) that I
-co-wrote (I've also used
-[Scotty](https://hackage.haskell.org/package/scotty) and
-[Servant](https://haskell-servant.github.io/) and I think the advice would work
-equally well for them).
-
-### 1. Pure type modules
-
-For each record, which in a database backed application, will usually correspond
-to a table, define a separate module. I would use `Types.Person` if `Person`
-were the name of the type. This should contain the record, which, contrary to
-many examples, should _not_ have prefixed field names (the prefix, where
-necessary, is already present in the module name!): just name the fields the
-most natural names, e.g.:
-
- data Person = Person { id :: Int, firstName :: Text } deriving (Eq, Show)
-
-This module should also include any type class instances for `Person` (e.g.,
-serialization), and related types. For example, if there is a data type that is
-a field within the record (e.g., you might have a `role` field that has a fixed
-number of options; in the database it is represented textually, but it shouldn't
-be in your application), define it within the `Types.Person` module rather
-than giving it it's own module, unless it's useful to other modules.
-
-> A note about casing: I don't think this is controversial, but match what is most
-> natural in whatever domain the name appears in. So field names in Haskell should
-> be camelCase, in the database should be snake_case, and in frontend templates I
-> hyphenate-them. Transforming between these can be automated.
-
-Having pure modules to define types is really helpful to avoid module
-circularity; most of the time the issue is that you'll end up needing to allow
-more core application types refer to specific data for the application (e.g., in
-[Fn](http://fnhaskell.com/) web handlers pass around a "context" that contains
-database connections, request information, etc. It necessarily is used many
-places, but you may also want it to be able to contain information about a
-logged in user. By having the types on their own, it's much easier to pull those
-types into the definition of core data types like the "context").
-
-### 2. State modules for manipulating state with consistent names
-
-There are tons of different libraries for dealing with databases, but from the
-perspective of module organization, each module `Types.Person` should be matched
-with `State.Person`, and just like the field names in the `Person` record
-shouldn't have any prefix or suffix, neither should functions in the
-`State.Person` module. So, for example, I'll usually have `get`, `create`, and
-`delete` as functions, and perhaps `getByFoo` or `deleteByBar`. The reason for
-this is the `State.Person` module is expected to always be imported qualified
-(it ends up looking more uniform anyway).
-
-### 3. Qualify modules that are for a different part of the application
-
-In general, organizing the application around the records (i.e., database
-tables) works pretty well. It won't be 100% (and it doesn't matter, because
-Haskell doesn't care), but usually I'll have a `Handler.Person` module to go
-along with the `Types.Person`, which would contain web code to handle routing,
-form parsing and various high level glue, and `State.Person` which has state
-manipulation (database queries, business logic, etc).
-
-Within `State.Person`, import `Types.Person` unqualified. There should be no
-conflicts. From `Handler.Person`, `Types.Person` should be imported unqualified
-as well. That way you can use the type `Person` unqualified. `State.Person`
-should be imported qualified as `State`. Thus to look up a person by id we might
-invoke `State.get`.
-
-If we needed to access a `Document` record, we import `Types.Document` qualified
-as `Document` and `State.Document` fully qualified. There is a little redundancy
-in the type/constructor name (if you have to write `Document.Document` and it
-bothers you, you can important `Document(Document)` separately), but the former
-means you can have `Document.createdAt` as the record field for when the
-document was created and `createdAt` for when a `Person` was created. Similarly,
-`State.Document.get` would look up a `Document` by id. This of course is done
-symmetrically when you are working within `Handler.Document` (assuming it existed).
-
-Other `Handler` modules should also be imported fully qualified. It's less
-common to need this, but it comes up, and the clarity of the full qualification
-is great. If you end up splitting modules in more fine grained ways than these
-three (and sometimes I have, e.g., splitting out form validation, or the code
-that is used in templates), the same general principles apply: within the
-`Category1.X` module, any `Category2.X` is imported qualified as `Category2`
-(unless `Category2` is `Types` in which case it's imported unqualified), and
-`Category3.Y` is imported fully qualified.
-
-### Summary
-
-Although it wasn't the original intention (just making code more understandable
-was), I've realized this naming scheme really matches the mantra that name
-length should match name locality (i.e., the further from definition, the longer
-the name should be), writ large. Functions that are highly relevant to a
-particular module have short names (since they are unqualified or minimally
-qualified), whereas ones from very different parts of the application have
-longer names that tell more about what they are for. It also helps to serve as a
-reminder when things start to get tangled, as you end up using more fully
-qualified functions (and that's a sign that maybe some refactoring is needed).
diff --git a/drafts/remembering-things.markdown b/drafts/remembering-things.markdown
deleted file mode 100644
index 1a9d2e1..0000000
--- a/drafts/remembering-things.markdown
+++ /dev/null
@@ -1,206 +0,0 @@
----
-title: "Remembering Things"
-author: Daniel Patterson
----
-
-Remembering when to do things is, for me, a big strain on my short-term
-attention / memory, and it's particularly stressful to wonder if I've already
-forgotten something. I have no idea how anything inside my brain actually works,
-but my mental model is that I have a limited amount of short-term memory where
-these deadlines are stored. In order to avoid getting shifted into long-term
-memory (and thus get forgotten until something triggers them), I have to
-periodically scan through this memory.
-
-This seems like exactly the type of thing that should be solvable, or at least
-improvable, by modern technologies: in particular, smart phones, which are
-perfectly capable of capturing things at any time, notifying at precise times (&
-locations, to a point), and filtering/sorting in sophisticated ways.
-
-I want to argue that we are maybe 75% of the way to a system that is complete
-enough to significantly reduce this mental strain, and that there is no
-fundamental limitation to getting the rest of the way there (just a matter of
-incremental improvements). Note that, like many things, the difference between
-almost there and all the way there is massive: once it is perfect, you no longer
-have to think about it at all, whereas even if it is 90% perfect, you still have
-to think about it frequently, as that 10% still matters. The system that I'm
-using (which I'll talk about in more detail) is the app GoodTask on iOS which
-relies on the built-in Calendar and Reminders (note: the app is not free --
-though it has a 2 week trial). There may be better tools, but either they
-require hardware I don't have or I haven't found them yet (not for lack of
-trying)...
-
-### Calendar events vs Tasks
-
-First, I want to talk about "calendar" events and "tasks", both to unify them
-and draw a distinction between them. Unify them because whatever notification
-system, display, etc, needs to show them together. Fundamentally, the display
-should answer the question "what do I need to do now" (or, tomorrow, next week,
-etc), and any tool that doesn't put these things together is broken (which is
-nearly all of them).
-
-But, there is still a critical distinction between these: calendar events will
-be scheduled, but tasks in general will not be (as a side note, the applications
-that insist that every task be scheduled are absurd -- many tasks, in particular
-the ones that are easy to forget, take very little absolute time; the trouble is
-actually remembering to do them, and being in the right place to do them...).
-
-### Calendar events
-
-The critical thing about calendar events is that if you don't do them in their
-scheduled time, that's it. If there was a meeting you were supposed to go to but
-you didn't go to it, too bad, it's done. Calendar events are never marked as
-done, they don't become overdue, they simply become in the past.
-
-Calendar events are also much simpler (and that's probably why they are much
-better supported by software). Since they have a concrete time, it's clear where
-they should show up in the "what should I do now" (or tomorrow, next week)
-displays, and provided they have a location, notifications are pretty easy too,
-as they can be given based on travel time to get there. There are some
-subtleties there (what is the mode of transit, etc), but in general this is
-pretty well developed and getting better. On iOS (and maybe Android), recurring
-events will even learn locations if you don't input them, which is great. You
-can, of course, just hard-code notification times on events (which is pretty
-much what you have to do now). As a rule, all calendar events should have
-notification times, as otherwise, why is the event in your calendar?
-
-### Tasks
-
-Tasks, or todos, are more complicated and subtle, and haven't gotten nearly the
-same treatment as calendars (those facts are probably related). Another
-explanation of this is that calendar events can be seen as a special case of a
-task that has a particular duration and that gets automatically marked done at
-the point when it is scheduled. In this sense, a particularly useful and common
-type of task is well supported, but not more general varieties.
-
-While there are dozens (or hundreds?) of task apps (as well as the ones built in
-to phones), most of them treat tasks alternately as pretty shopping lists (i.e.,
-add a bunch of things to the list, remove them from the list), or complicated
-hierarchy of notes, or ticketing systems, possibly with various notification
-structures (note: I've looked into a few dozen, and some are better than this,
-but this is the general story: even very popular ones seems to just be
-pretty variations on these themes...)
-
-From the perspective of _remembering things_, the most important thing about a
-system is that you can get absolutely everything that you need to remember into
-the system, and the immediate consequence is that the primary thing that the
-application needs to do is _not show you things that aren't yet relevant_. For
-calendar events, it's obvious when something isn't yet relevant: it isn't
-happening today (or tomorrow, or next week, depending on the view you are
-looking at), and there is a built-in pressure valve: you can't do more than one
-thing at a time, so your calendar can't be too overwhelming.
-
-Tasks, when treated naively (as almost all applications do), do not have similar
-structure, so you end up having a massive list of things of varying importance,
-from things that need to happen today (grab groceries, put out recycling, send
-an email to X) to things you want to do in the next few weeks (read Y paper,
-contact Z about research they are doing, buy train tickets) to things you need
-to do in the next month or two (etc, you get the point), and you can imagine
-that if you start piling up all of these together you would have an unmanageable
-list. There are also further complications: some tasks are repeating but have
-deadlines (medicines, bills, etc), others repeat but without clear deadlines
-(e.g., vacuuming should happen maybe weekly, but it's not particularly urgent if
-it doesn't), and some only make sense to do in certain places (i.e., even if it
-is the day when the recycling is put out, if I'm not at home, there isn't much
-point in telling me that).
-
-Ideally, when adding tasks you could put down specific _or vague_ times when
-they should happen, where they make sense to happen (or where they don't make
-sense to happen), repeating patterns (either specific, like the 1st of each
-month, or periodic, like a week after the last time you did it), and possibly
-some sense of how important and how hard the task is. I'm a little hesitant
-about including the latter because I feel like trying to estimate those things
-becomes really hard (and that means that capturing the tasks becomes more
-difficult, which is counter-productive), and also, I've never used a system that
-actually does anything useful with it, but maybe.
-
-What is presented in a "now" view should be a combination of things that are
-specifically due soon or are overdue combined with (provided there aren't too
-many of the former) things that are vaguely due in the near future. What's
-really important is that this view should allow tasks to be addressed quickly:
-ideally, there are three extremely quick actions -- Mark done, Remind me soon,
-Remind me later. The former is obvious, but the distinction between the two
-others is where these systems could get smart. "Remind me soon" might mean
-tonight, or maybe tomorrow, or the next day. "Remind me later" is more
-complicated. It essentially is a deprioritization. For tasks that have clear
-deadlines, there probably isn't much that should happen, but likely it won't get
-clicked. But for something that was entered several months ago as vaguely due
-around now, it bumps it out by a week or so. If it has already been
-deprioritized, maybe it pushes it further out. There are probably other ways
-this could get more sophisticated, and it would probably be worth it! The point
-is, figuring out what is relevant to show (and notify about) is perhaps subtle,
-but if done well potentially has a high payoff!
-
-### Current systems
-
-The best system I've found for iOS (if you have suggestions for something
-better, let me know!) is the app GoodTask, though it's certainly not perfect. In
-terms of what it does do: you can schedule specific deadlines, repeating
-patterns, and it does an great job of integrating the calendar (you have to go
-to the settings->preferences and uncheck "separate calendar events"; the default
-keeps them separate, which is particularly broken for the week view).
-
-The single day view shows overdue tasks, tasks that are due today, and calendar
-events. It uses the built-in Reminders for data storage (though, unlike
-Reminders, it doesn't show you everything, thankfully -- but using this data
-store has upsides: it means, for example, you can input reminders by voice) and
-Calendar (which is great). The location feature is limited to what the Reminders
-app does: you should be able to get notifications when you enter or leave a
-given location (though it's been unreliable for me, so I don't use it). This
-isn't exactly what I want (as I'd rather have the tasks be _filtered_ by
-location, like they are filtered by date). It has a nice subtask feature (but,
-it's minimal -- no sub-subtasks), which I've ended up using more than I would
-have thought (as I might have a list of things I need to do before leaving home
-and I can more compactly keep them organized this way).
-
-The main flaw is that it doesn't have a notion of vague deadlines (I don't know
-of any app that does, so this isn't an attack on it specifically), which means
-the most annoying part of it is moving tasks between days. For example, there is
-no way of having 10 tasks that should happen this week and have only a few show
-up at a time, as they are done. I could put them all on Monday, but then Monday
-is an overwhelming mess, so more realistically I'll scatter them throughout the
-first couple days of the week. And then on Monday if I decide not to do a task,
-I'll bump it a few days forward. It works okay. And then if I want something to
-be hidden for a while, I need to put it as due the date when I want it to first
-reappear (as it will be totally invisible until that point).
-
-Because of the lack of location filtering, I don't actually find the
-notifications all that useful, as trying to figure out when to put notifications
-on tasks is difficult. The notifications are done via the built-in Reminders,
-which means your delay option is "delay 1 hr" or "delay 1 day", which isn't
-terrible, but isn't great (if a reminder hits in the morning and I want to do it
-at night, I'll be bouncing it every hour throughout the day). My work hours vary
-by day, and whether I'm working at home or commuting an hour to my office
-varies, and getting pointless notifications is _much_ worse than getting minimal
-notifications. As a result, I primarily rely on the app badge number, which is
-the number of overdue tasks, and I open up the app periodically throughout the
-day. Having to do that is another reason why re-scheduling tasks (and making
-sure tasks that are not going to be done today are not there) is so important.
-By the end of the day, there should be nothing that hasn't been done. Even if
-that means, towards the end of the day, bumping things I thought I'd get done to
-the next morning.
-
-GoodTask has a mechanism to filter task by various lists, but I've never used
-it. It's actually a pretty misleading aspect of their screenshots, as it makes
-it seem like there are features to support "@Home" and other seemingly
-sophisticated features, but they are just lists (that are detected by tags).
-Manually filtering is just a way for me to lose track of things, as I would
-forget I'm looking at a particular list.
-
-### Summary
-
-I've been using this system for maybe six months and it works pretty well --
-certainly better than not using it! There are some lingering flaws in GoodTask,
-but overall, I think it is working well enough that I've been spending less time
-worrying about whether I'm forgetting things (and, I'm pretty sure I'm actually
-getting the things done more quickly). In general, I think this space has had
-surpisingly little attention paid to it by big tech companies, given that it
-seems to play so well into their "personal assistant" marketing and the
-technical aspects don't actually seem terribly hard (less difficult than voice
-recognition, anyway!). Each of them can handle basic "remind me to do X at Y"
-(i.e., create the basic reminders they support), but seemingly have spent little
-energy figuring out when and how to present these tasks to the person that
-created them. Which makes them come off as cute technical demos: working when
-you create 5 reminders, not so much when you create 500. If they put a lot more
-effort into this, maybe calling them "personal assistants" may not be so silly
-after all (though since they are intended primarily as advertising devices,
-maybe I shouldn't hold out hope).
diff --git a/site b/site
deleted file mode 100755
index ad96f9c..0000000
--- a/site
+++ /dev/null
@@ -1,168 +0,0 @@
-#!/usr/bin/env stack
--- stack --resolver lts-17.4 --install-ghc runghc -j1 --package hakyll
-{-# LANGUAGE OverloadedStrings #-}
--- lts-12.26
-module Main where
-
-import Control.Category (id)
-import Control.Monad (forM_)
-import qualified Data.ByteString.Lazy as LBS
-import Data.Monoid (mappend, mconcat, mempty, (<>))
-import qualified Data.Set as Set
-import qualified Data.Text as T
-import qualified Data.Text.Encoding as TE
-import Prelude hiding (id)
-import Text.Pandoc.Extensions (Extension (Ext_simple_tables),
- extensionsFromList)
-import Text.Pandoc.Options (readerExtensions)
-
-import Hakyll
-
-essayCtx :: Context String
-essayCtx = mconcat [modificationTimeField "modified" "%B %e, %Y",
- dateField "date" "%B %e, %Y",
- defaultContext]
-
-pageCtx :: String -> Context String
-pageCtx title = mconcat [constField "title" title,
- constField "modified" "unknown",
- defaultContext]
-
-gzip :: Item String -> Compiler (Item LBS.ByteString)
-gzip = withItemBody
- (unixFilterLBS "gzip" ["--best"]
- . LBS.fromStrict
- . TE.encodeUtf8
- . T.pack)
-
-pandoc = pandocCompilerWith defaultHakyllReaderOptions {readerExtensions = extensionsFromList [Ext_simple_tables]} defaultHakyllWriterOptions
-
-main :: IO ()
-main = hakyllWith config $ do
- -- Compress CSS
- match "css/*" $ do
- route idRoute
- compile compressCssCompiler
-
- -- Copy static files
- match "static/**" $ do
- route idRoute
- compile copyFileCompiler
- match "talks/**" $ do
- route idRoute
- compile copyFileCompiler
- match "pubs/**" $ do
- route idRoute
- compile copyFileCompiler
- match "posters/**" $ do
- route idRoute
- compile copyFileCompiler
- match "posters/**" $ do
- route idRoute
- compile copyFileCompiler
- match "artifacts/**/*.js" $ do
- route idRoute
- compile $ getResourceBody >>= gzip
- match "artifacts/**" $ do
- route idRoute
- compile copyFileCompiler
-
- -- Render posts
- match "essays/*" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/essay.html" essayCtx
- >>= saveSnapshot "content"
- >>= loadAndApplyTemplate "templates/default.html" essayCtx
- >>= relativizeUrls
-
- match "drafts/*" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/essay.html" (essayCtx <> constField "date" "DRAFT")
- >>= saveSnapshot "content"
- >>= loadAndApplyTemplate "templates/default.html" essayCtx
- >>= relativizeUrls
-
- match "courses/*.markdown" $ do
- route (customRoute $ \ident -> let url = reverse (drop (length (".markdown" :: String)) (reverse (toFilePath ident))) in
- url <> "/index.html")
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/course.html" defaultContext
- >>= relativizeUrls
-
- -- Render essays list
- create ["essays.html"] $ do
- route idRoute
- compile $ do
- list <- essayList
- let ctx = pageCtx "essays" `mappend` constField "essays" list
- makeItem ""
- >>= loadAndApplyTemplate "templates/essays.html" ctx
- >>= loadAndApplyTemplate "templates/default.html" ctx
- >>= relativizeUrls
-
-
- -- Render static pages
- match "index.markdown" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/default.html" (pageCtx "about")
- >>= relativizeUrls
- match "reading.markdown" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/default.html" (pageCtx "reading")
- >>= relativizeUrls
- match "duck.markdown" $ do
- route $ constRoute "duck/index.html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/default.html" (pageCtx "")
- >>= relativizeUrls
-
- match "**/*.markdown" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/default.html" (pageCtx "Apple 16")
- >>= relativizeUrls
-
- match "404.markdown" $ do
- route $ setExtension ".html"
- compile $ pandocCompiler
- >>= loadAndApplyTemplate "templates/default.html" (pageCtx "not-found")
- >>= relativizeUrls
-
- create ["rss.xml"] $ do
- route idRoute
- compile $ do
- let feedCtx = essayCtx `mappend` bodyField "description"
- posts <- fmap (take 10) . recentFirst =<<
- loadAllSnapshots "essays/*" "content"
- renderRss myFeedConfiguration feedCtx posts
-
-
- -- Read templates
- match "templates/*" $ compile templateCompiler
-
-myFeedConfiguration :: FeedConfiguration
-myFeedConfiguration = FeedConfiguration
- { feedTitle = "dbp.io :: essays"
- , feedDescription = "writing on programming etc by daniel patterson"
- , feedAuthorName = "Daniel Patterson"
- , feedAuthorEmail = "dbp@dbpmail.net"
- , feedRoot = "http://dbp.io"
- }
-
-config = defaultConfiguration
- { deployCommand = "aws s3 sync --exclude \"artifacts/*/*.js\" _site/ s3://dbp.io && aws s3 sync --exclude \"*\" --include \"artifacts/*/*.js\" --content-type \"application/javascript\" --content-encoding \"gzip\" _site/ s3://dbp.io"
- }
-
-essayList :: Compiler (String)
-essayList = do
- essays <- recentFirst =<< loadAll "essays/*"
- itemTpl <- loadBody "templates/essayitem.html"
- applyTemplateList itemTpl essayCtx essays
-
--- Local Variables:
--- mode: haskell
--- End:
diff --git a/soupault.toml b/soupault.toml
new file mode 100644
index 0000000..d13246c
--- /dev/null
+++ b/soupault.toml
@@ -0,0 +1,69 @@
+[settings]
+ # Stop on page processing errors?
+ strict = true
+
+ # Display progress?
+ verbose = true
+
+ # Display detailed debug output?
+ debug = false
+
+ # Where input files (pages and assets) are stored.
+ site_dir = "src"
+
+ # Where the output goes
+ build_dir = "_site"
+
+ # Pages will other extensions are considered static assets
+ # and copied to build/ unchanged
+ page_file_extensions = ["html", "md"]
+
+ # Files with these extensions are ignored.
+ ignore_extensions = ["draft"]
+
+ # Treat files as content to insert in the template,
+ # unless they have an element in them.
+ generator_mode = true
+ complete_page_selector = "html"
+
+ # Use templates/main.html file for the page template.
+ default_template_file = "templates/main.html"
+
+ # The content will be inserted into its element,
+ # after its last already existing child.
+ default_content_selector = "main"
+ default_content_action = "append_child"
+
+ # Set the document type to HTML5, unless the page already has
+ # a doctype declaration.
+ doctype = ""
+ keep_doctype = true
+
+ # Indent HTML tags for readability
+ pretty_print_html = true
+
+ # Translate site/about.html to build/about/index.html
+ # If set to false, then site/about.html will become build/about.html
+ clean_urls = false
+
+ # Look for plugin files in plugins/
+ plugin_discovery = true
+ plugin_dirs = ["plugins"]
+
+
+[preprocessors]
+ md = 'pandoc -f markdown -t html'
+
+
+[widgets.page-title]
+ widget = "title"
+ selector = "h1"
+ default = "dbp.io"
+ append = ""
+ prepend = "dbp.io :: "
+
+ # Insert a element if a page doesn't have one
+ force = false
+
+ # Keep the existing if it exists and isn't empty
+ keep = false
diff --git a/404.markdown b/src/404.md
similarity index 73%
rename from 404.markdown
rename to src/404.md
index b04dd73..bea66c6 100644
--- a/404.markdown
+++ b/src/404.md
@@ -1,4 +1,4 @@
-## Page Not Found
+# Page Not Found
The specified URL could not be found. I'm sorry!
diff --git a/apple/boat.markdown b/src/apple/boat.md
similarity index 100%
rename from apple/boat.markdown
rename to src/apple/boat.md
diff --git a/apple/building.markdown b/src/apple/building.md
similarity index 100%
rename from apple/building.markdown
rename to src/apple/building.md
diff --git a/apple/building/bowtank.markdown b/src/apple/building/bowtank.md
similarity index 100%
rename from apple/building/bowtank.markdown
rename to src/apple/building/bowtank.md
diff --git a/apple/building/gunwales.markdown b/src/apple/building/gunwales.md
similarity index 100%
rename from apple/building/gunwales.markdown
rename to src/apple/building/gunwales.md
diff --git a/apple/building/hull.markdown b/src/apple/building/hull.md
similarity index 100%
rename from apple/building/hull.markdown
rename to src/apple/building/hull.md
diff --git a/apple/building/stem.markdown b/src/apple/building/stem.md
similarity index 100%
rename from apple/building/stem.markdown
rename to src/apple/building/stem.md
diff --git a/apple/building/sterntank.markdown b/src/apple/building/sterntank.md
similarity index 100%
rename from apple/building/sterntank.markdown
rename to src/apple/building/sterntank.md
diff --git a/apple/comparison.markdown b/src/apple/comparison.md
similarity index 100%
rename from apple/comparison.markdown
rename to src/apple/comparison.md
diff --git a/apple/index.markdown b/src/apple/index.md
similarity index 100%
rename from apple/index.markdown
rename to src/apple/index.md
diff --git a/apple/others.markdown b/src/apple/others.md
similarity index 100%
rename from apple/others.markdown
rename to src/apple/others.md
diff --git a/artifacts/funtal/codemirror.css b/src/artifacts/funtal/codemirror.css
similarity index 100%
rename from artifacts/funtal/codemirror.css
rename to src/artifacts/funtal/codemirror.css
diff --git a/artifacts/funtal/codemirror.js b/src/artifacts/funtal/codemirror.js
similarity index 100%
rename from artifacts/funtal/codemirror.js
rename to src/artifacts/funtal/codemirror.js
diff --git a/artifacts/funtal/index.html b/src/artifacts/funtal/index.html
similarity index 100%
rename from artifacts/funtal/index.html
rename to src/artifacts/funtal/index.html
diff --git a/artifacts/funtal/matchbrackets.js b/src/artifacts/funtal/matchbrackets.js
similarity index 100%
rename from artifacts/funtal/matchbrackets.js
rename to src/artifacts/funtal/matchbrackets.js
diff --git a/artifacts/funtal/runmode.js b/src/artifacts/funtal/runmode.js
similarity index 100%
rename from artifacts/funtal/runmode.js
rename to src/artifacts/funtal/runmode.js
diff --git a/artifacts/funtal/simple.js b/src/artifacts/funtal/simple.js
similarity index 100%
rename from artifacts/funtal/simple.js
rename to src/artifacts/funtal/simple.js
diff --git a/artifacts/funtal/style.css b/src/artifacts/funtal/style.css
similarity index 100%
rename from artifacts/funtal/style.css
rename to src/artifacts/funtal/style.css
diff --git a/artifacts/funtal/web.js b/src/artifacts/funtal/web.js
similarity index 100%
rename from artifacts/funtal/web.js
rename to src/artifacts/funtal/web.js
diff --git a/css/default.css b/src/css/default.css
similarity index 100%
rename from css/default.css
rename to src/css/default.css
diff --git a/css/syntax.css b/src/css/syntax.css
similarity index 100%
rename from css/syntax.css
rename to src/css/syntax.css
diff --git a/duck.markdown b/src/duck.md
similarity index 100%
rename from duck.markdown
rename to src/duck.md
diff --git a/essays/2011-06-08-heist-async.markdown b/src/essays/2011-06-08-heist-async.md
similarity index 98%
rename from essays/2011-06-08-heist-async.markdown
rename to src/essays/2011-06-08-heist-async.md
index 458085f..1361451 100644
--- a/essays/2011-06-08-heist-async.markdown
+++ b/src/essays/2011-06-08-heist-async.md
@@ -1,8 +1,4 @@
----
-title: Declarative ajax - imagining Heist-Async
-author: Daniel Patterson
-date: June 8th, 2011
----
+# Declarative ajax - imagining Heist-Async
I've recently started working with Snap, the Haskell web framework, (http://snapframework.com), and one reason (among many) for my reason to switch from Ocsigen, a web framework written in OCaml (which I've written posts about before) was the desire to more flexibly handle ajax based websites. While it seems good in some ways, I eventually decided that Ocsigen's emphasis on declaring services as having certain types (ie, a fragment of a page, a whole page, a redirect, etc) is in some ways at odds with the way the web works.
diff --git a/essays/2011-09-03-mercury-tidbits.markdown b/src/essays/2011-09-03-mercury-tidbits.md
similarity index 98%
rename from essays/2011-09-03-mercury-tidbits.markdown
rename to src/essays/2011-09-03-mercury-tidbits.md
index 8cedd7a..72316d3 100644
--- a/essays/2011-09-03-mercury-tidbits.markdown
+++ b/src/essays/2011-09-03-mercury-tidbits.md
@@ -1,8 +1,4 @@
----
-title: Mercury tidbits - dependent types and file io
-author: Daniel Patterson
-date: September 3rd and 11th, 2011
----
+# Mercury tidbits - dependent types and file io
Note: this was originally posted as two separate parts, 1 week apart, and has been compressed for posterity
@@ -138,4 +134,4 @@ The first line tries to open the file at the path and bind it as the current inp
What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda's and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!
-Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).
\ No newline at end of file
+Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).
diff --git a/essays/2011-09-15-ios-anti-unix.markdown b/src/essays/2011-09-15-ios-anti-unix.md
similarity index 96%
rename from essays/2011-09-15-ios-anti-unix.markdown
rename to src/essays/2011-09-15-ios-anti-unix.md
index 93c3c5c..9a7bd57 100644
--- a/essays/2011-09-15-ios-anti-unix.markdown
+++ b/src/essays/2011-09-15-ios-anti-unix.md
@@ -1,8 +1,5 @@
----
-title: iOS is anti-UNIX and anti-programmer.
-author: Daniel Patterson
-date: September 15th, 2011
----
+# iOS is anti-UNIX and anti-programmer.
+
When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell's pipe operator, '|'. Using the commandline (because at that point, linux GUI's were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.
I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:
@@ -15,4 +12,4 @@ The more I use graphical interfaces (or anything that does not operate on text s
What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.
-I'd noticed before how frustrating it was for me to use iOS, but I wasn't sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can't imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it's small enough to fit in a pocket - that is a profound change in the way the world works. But it's not a computer, it's just a glorified palm pilot with a few bells and whistles.
\ No newline at end of file
+I'd noticed before how frustrating it was for me to use iOS, but I wasn't sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can't imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it's small enough to fit in a pocket - that is a profound change in the way the world works. But it's not a computer, it's just a glorified palm pilot with a few bells and whistles.
diff --git a/essays/2011-12-06-science-scheme.markdown b/src/essays/2011-12-06-science-scheme.md
similarity index 98%
rename from essays/2011-12-06-science-scheme.markdown
rename to src/essays/2011-12-06-science-scheme.md
index a04de82..b0fe335 100644
--- a/essays/2011-12-06-science-scheme.markdown
+++ b/src/essays/2011-12-06-science-scheme.md
@@ -1,8 +1,4 @@
----
-title: Math/Science integrated with Scheme
-author: Daniel Patterson
-date: December 6th, 2011
----
+# Math/Science integrated with Scheme
I had an idea today, of an interactive homework assignment for a Chemistry class. It was a prompt, and you could type in queries and it would give responses. The basics would be:
diff --git a/essays/2012-04-26-haskell-snap-productive.markdown b/src/essays/2012-04-26-haskell-snap-productive.md
similarity index 99%
rename from essays/2012-04-26-haskell-snap-productive.markdown
rename to src/essays/2012-04-26-haskell-snap-productive.md
index cf39b5d..e14a27a 100644
--- a/essays/2012-04-26-haskell-snap-productive.markdown
+++ b/src/essays/2012-04-26-haskell-snap-productive.md
@@ -1,8 +1,5 @@
----
-title: Haskell / Snap ecosystem is as productive as Ruby/Rails.
-author: Daniel Patterson
-date: April 26th, 2012
----
+# Haskell / Snap ecosystem is as productive as Ruby/Rails.
+
This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.
As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.
@@ -51,4 +48,4 @@ I think part of this probably has to do with the host languages - ruby is a very
The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.
-Cheers, and happy web programming.
\ No newline at end of file
+Cheers, and happy web programming.
diff --git a/essays/2012-10-24-programming-literature.markdown b/src/essays/2012-10-24-programming-literature.md
similarity index 97%
rename from essays/2012-10-24-programming-literature.markdown
rename to src/essays/2012-10-24-programming-literature.md
index 8fee8e0..071aab1 100644
--- a/essays/2012-10-24-programming-literature.markdown
+++ b/src/essays/2012-10-24-programming-literature.md
@@ -1,8 +1,4 @@
----
-title: Programming as Literature
-author: Daniel Patterson
-date: October 24th, 2012
----
+# Programming as Literature
Sometimes I'm not sure how to explain what I study or why I study it. I tell people that I study theoretical computer science, or algorithms and programming languages, or math and computer science, and if they ask why? Let's come back to that. First I want to talk about literacy.
@@ -20,4 +16,4 @@ There is a third dimension that is similarly interesting, and talked about more,
What is interesting and sad is that while the posession of computers is expanding rapidly, the knowledge of how to truly use them is not. People are sold devices that allow them to perform a set number of functions (all of which are simply repetitions of thoughts by the people working at the company who sold them the device), but they are not given the tools to express their own thoughts, to expand their own mental capacity in any way other than that already thought of by someone else. We have expanded the medium without expanding literacy. And indeed, there is a financial explanation for this. It's hard to sell knowledge when people can create it themselves. Many technological "innovations" these days are trivial combinations of earlier ideas which would be unnecessary if people were able to carry out those kinds of compositions themselves.
-So why am I interested in computer science? I'm interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
\ No newline at end of file
+So why am I interested in computer science? I'm interested in it because I am interested in human thought. I am interested in how people solve problems, and seeing problems that others have solved. I am interesting in teaching people how to express themselves in this medium, and learning it myself. I study programming as literature, to read, to write, to share. I study it to figure out the world we live in, and imagine how else it could be.
diff --git a/essays/2013-05-21-literate-urweb-adventure.markdown b/src/essays/2013-05-21-literate-urweb-adventure.md
similarity index 99%
rename from essays/2013-05-21-literate-urweb-adventure.markdown
rename to src/essays/2013-05-21-literate-urweb-adventure.md
index 56b056d..8945bd3 100644
--- a/essays/2013-05-21-literate-urweb-adventure.markdown
+++ b/src/essays/2013-05-21-literate-urweb-adventure.md
@@ -1,8 +1,4 @@
----
-title: A Literate Ur/Web Adventure
-author: Daniel Patterson
-date: May 21th, 2013
----
+# A Literate Ur/Web Adventure
[Ur/Web](http://www.impredicative.com/ur/) is a language / framework
for web programming that both makes it really hard to write code with
diff --git a/essays/2013-06-29-hackers-replacement-for-gmail.markdown b/src/essays/2013-06-29-hackers-replacement-for-gmail.md
similarity index 99%
rename from essays/2013-06-29-hackers-replacement-for-gmail.markdown
rename to src/essays/2013-06-29-hackers-replacement-for-gmail.md
index b8b5bb2..75f5969 100644
--- a/essays/2013-06-29-hackers-replacement-for-gmail.markdown
+++ b/src/essays/2013-06-29-hackers-replacement-for-gmail.md
@@ -1,7 +1,4 @@
----
-title: A Hacker's Replacement for GMail
-author: Daniel Patterson
----
+# A Hacker's Replacement for GMail
_Note: Since writing this I've replaced Exim with Postfix and Courier with Dovecot. This is outlined in the Addendum, but the main text is unchanged. Please read the whole guide before starting, as you can skip some of the steps and go straight to the final system._
diff --git a/essays/2014-10-05-why-test-in-haskell.markdown b/src/essays/2014-10-05-why-test-in-haskell.md
similarity index 99%
rename from essays/2014-10-05-why-test-in-haskell.markdown
rename to src/essays/2014-10-05-why-test-in-haskell.md
index d259fcd..13e6ddb 100644
--- a/essays/2014-10-05-why-test-in-haskell.markdown
+++ b/src/essays/2014-10-05-why-test-in-haskell.md
@@ -1,8 +1,4 @@
----
-title: Why test in Haskell?
-author: Daniel Patterson
-date: October 5th, 2014
----
+# Why test in Haskell?
Every so often, the question comes up, should you test in Haskell, and
if so, how should you do it?
diff --git a/essays/2018-01-01-home-backups.markdown b/src/essays/2018-01-01-home-backups.md
similarity index 99%
rename from essays/2018-01-01-home-backups.markdown
rename to src/essays/2018-01-01-home-backups.md
index e8530dc..ba729f7 100644
--- a/essays/2018-01-01-home-backups.markdown
+++ b/src/essays/2018-01-01-home-backups.md
@@ -1,8 +1,4 @@
----
-title: (Cheap) home backups
-author: Daniel Patterson
-date: January 1st, 2017
----
+# (Cheap) home backups
Backing things up is important. Some stuff, like code that lives in
repositories, may naturally end up in many places, so it perhaps is less
diff --git a/essays/2018-01-16-how-to-prove-a-compiler-correct.markdown b/src/essays/2018-01-16-how-to-prove-a-compiler-correct.md
similarity index 99%
rename from essays/2018-01-16-how-to-prove-a-compiler-correct.markdown
rename to src/essays/2018-01-16-how-to-prove-a-compiler-correct.md
index 228b98a..acb0e8d 100644
--- a/essays/2018-01-16-how-to-prove-a-compiler-correct.markdown
+++ b/src/essays/2018-01-16-how-to-prove-a-compiler-correct.md
@@ -1,7 +1,4 @@
----
-title: How to prove a compiler correct
-author: Daniel Patterson
----
+# How to prove a compiler correct
At POPL'18 (Principles of Programming Languages) last week, I ended up talking
to [Annie Cherkaev](https://anniecherkaev.com) about her really cool DSL (domain
diff --git a/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md b/src/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md
similarity index 99%
rename from essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md
rename to src/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md
index 94c4540..02c21b2 100644
--- a/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md
+++ b/src/essays/2018-04-19-how-to-prove-a-compiler-fully-abstract.md
@@ -1,7 +1,4 @@
----
-title: How to prove a compiler fully abstract
-author: Daniel Patterson
----
+# How to prove a compiler fully abstract
A compiler that preserves and reflects equivalences is called a **fully
abstract** compiler. This is a powerful property for a compiler that is
diff --git a/index.markdown b/src/index.md
similarity index 100%
rename from index.markdown
rename to src/index.md
diff --git a/posters/linking-types-popl2017-src.pdf b/src/posters/linking-types-popl2017-src.pdf
similarity index 100%
rename from posters/linking-types-popl2017-src.pdf
rename to src/posters/linking-types-popl2017-src.pdf
diff --git a/posters/phantom-contracts-popl2019-src.pdf b/src/posters/phantom-contracts-popl2019-src.pdf
similarity index 100%
rename from posters/phantom-contracts-popl2019-src.pdf
rename to src/posters/phantom-contracts-popl2019-src.pdf
diff --git a/projects.markdown b/src/projects.md
similarity index 100%
rename from projects.markdown
rename to src/projects.md
diff --git a/pubs/2013/lambda-py-appendix-oopsla.pdf b/src/pubs/2013/lambda-py-appendix-oopsla.pdf
similarity index 100%
rename from pubs/2013/lambda-py-appendix-oopsla.pdf
rename to src/pubs/2013/lambda-py-appendix-oopsla.pdf
diff --git a/pubs/2013/lambda-py-oopsla.pdf b/src/pubs/2013/lambda-py-oopsla.pdf
similarity index 100%
rename from pubs/2013/lambda-py-oopsla.pdf
rename to src/pubs/2013/lambda-py-oopsla.pdf
diff --git a/pubs/2014/captainteach-iticse.pdf b/src/pubs/2014/captainteach-iticse.pdf
similarity index 100%
rename from pubs/2014/captainteach-iticse.pdf
rename to src/pubs/2014/captainteach-iticse.pdf
diff --git a/pubs/2016/linking-types-poplsrc2017-proposal.pdf b/src/pubs/2016/linking-types-poplsrc2017-proposal.pdf
similarity index 100%
rename from pubs/2016/linking-types-poplsrc2017-proposal.pdf
rename to src/pubs/2016/linking-types-poplsrc2017-proposal.pdf
diff --git a/pubs/2017/funtal-tr.pdf b/src/pubs/2017/funtal-tr.pdf
similarity index 100%
rename from pubs/2017/funtal-tr.pdf
rename to src/pubs/2017/funtal-tr.pdf
diff --git a/pubs/2017/funtal.pdf b/src/pubs/2017/funtal.pdf
similarity index 100%
rename from pubs/2017/funtal.pdf
rename to src/pubs/2017/funtal.pdf
diff --git a/pubs/2017/linking-types-snapl-submission.pdf b/src/pubs/2017/linking-types-snapl-submission.pdf
similarity index 100%
rename from pubs/2017/linking-types-snapl-submission.pdf
rename to src/pubs/2017/linking-types-snapl-submission.pdf
diff --git a/pubs/2017/linking-types-snapl.pdf b/src/pubs/2017/linking-types-snapl.pdf
similarity index 100%
rename from pubs/2017/linking-types-snapl.pdf
rename to src/pubs/2017/linking-types-snapl.pdf
diff --git a/pubs/2017/linking-types.pdf b/src/pubs/2017/linking-types.pdf
similarity index 100%
rename from pubs/2017/linking-types.pdf
rename to src/pubs/2017/linking-types.pdf
diff --git a/pubs/2018/phantom-contracts-src.pdf b/src/pubs/2018/phantom-contracts-src.pdf
similarity index 100%
rename from pubs/2018/phantom-contracts-src.pdf
rename to src/pubs/2018/phantom-contracts-src.pdf
diff --git a/pubs/2018/rust-distilled.pdf b/src/pubs/2018/rust-distilled.pdf
similarity index 100%
rename from pubs/2018/rust-distilled.pdf
rename to src/pubs/2018/rust-distilled.pdf
diff --git a/pubs/2019/ccc.pdf b/src/pubs/2019/ccc.pdf
similarity index 100%
rename from pubs/2019/ccc.pdf
rename to src/pubs/2019/ccc.pdf
diff --git a/pubs/2019/ccc/index.html b/src/pubs/2019/ccc/index.html
similarity index 100%
rename from pubs/2019/ccc/index.html
rename to src/pubs/2019/ccc/index.html
diff --git a/pubs/2019/ccc/proofs.zip b/src/pubs/2019/ccc/proofs.zip
similarity index 100%
rename from pubs/2019/ccc/proofs.zip
rename to src/pubs/2019/ccc/proofs.zip
diff --git a/pubs/2020/soundffi-draft.pdf b/src/pubs/2020/soundffi-draft.pdf
similarity index 100%
rename from pubs/2020/soundffi-draft.pdf
rename to src/pubs/2020/soundffi-draft.pdf
diff --git a/pubs/2020/soundffi-wgt.pdf b/src/pubs/2020/soundffi-wgt.pdf
similarity index 100%
rename from pubs/2020/soundffi-wgt.pdf
rename to src/pubs/2020/soundffi-wgt.pdf
diff --git a/pubs/2021/semanticinterop-draft.pdf b/src/pubs/2021/semanticinterop-draft.pdf
similarity index 100%
rename from pubs/2021/semanticinterop-draft.pdf
rename to src/pubs/2021/semanticinterop-draft.pdf
diff --git a/reading.markdown b/src/reading.md
similarity index 100%
rename from reading.markdown
rename to src/reading.md
diff --git a/static/apple/IMG_0100.jpg b/src/static/apple/IMG_0100.jpg
similarity index 100%
rename from static/apple/IMG_0100.jpg
rename to src/static/apple/IMG_0100.jpg
diff --git a/static/apple/IMG_0103.jpg b/src/static/apple/IMG_0103.jpg
similarity index 100%
rename from static/apple/IMG_0103.jpg
rename to src/static/apple/IMG_0103.jpg
diff --git a/static/apple/IMG_0106.jpg b/src/static/apple/IMG_0106.jpg
similarity index 100%
rename from static/apple/IMG_0106.jpg
rename to src/static/apple/IMG_0106.jpg
diff --git a/static/apple/IMG_0107.jpg b/src/static/apple/IMG_0107.jpg
similarity index 100%
rename from static/apple/IMG_0107.jpg
rename to src/static/apple/IMG_0107.jpg
diff --git a/static/apple/IMG_0110.jpg b/src/static/apple/IMG_0110.jpg
similarity index 100%
rename from static/apple/IMG_0110.jpg
rename to src/static/apple/IMG_0110.jpg
diff --git a/static/apple/IMG_0117.jpg b/src/static/apple/IMG_0117.jpg
similarity index 100%
rename from static/apple/IMG_0117.jpg
rename to src/static/apple/IMG_0117.jpg
diff --git a/static/apple/IMG_0118.jpg b/src/static/apple/IMG_0118.jpg
similarity index 100%
rename from static/apple/IMG_0118.jpg
rename to src/static/apple/IMG_0118.jpg
diff --git a/static/apple/IMG_0124.jpeg b/src/static/apple/IMG_0124.jpeg
similarity index 100%
rename from static/apple/IMG_0124.jpeg
rename to src/static/apple/IMG_0124.jpeg
diff --git a/static/apple/IMG_0128.jpg b/src/static/apple/IMG_0128.jpg
similarity index 100%
rename from static/apple/IMG_0128.jpg
rename to src/static/apple/IMG_0128.jpg
diff --git a/static/apple/IMG_0130.jpg b/src/static/apple/IMG_0130.jpg
similarity index 100%
rename from static/apple/IMG_0130.jpg
rename to src/static/apple/IMG_0130.jpg
diff --git a/static/apple/IMG_0154.jpeg b/src/static/apple/IMG_0154.jpeg
similarity index 100%
rename from static/apple/IMG_0154.jpeg
rename to src/static/apple/IMG_0154.jpeg
diff --git a/static/apple/IMG_0159.jpeg b/src/static/apple/IMG_0159.jpeg
similarity index 100%
rename from static/apple/IMG_0159.jpeg
rename to src/static/apple/IMG_0159.jpeg
diff --git a/static/apple/IMG_0160.jpeg b/src/static/apple/IMG_0160.jpeg
similarity index 100%
rename from static/apple/IMG_0160.jpeg
rename to src/static/apple/IMG_0160.jpeg
diff --git a/static/apple/IMG_0164.jpeg b/src/static/apple/IMG_0164.jpeg
similarity index 100%
rename from static/apple/IMG_0164.jpeg
rename to src/static/apple/IMG_0164.jpeg
diff --git a/static/apple/IMG_0165.jpeg b/src/static/apple/IMG_0165.jpeg
similarity index 100%
rename from static/apple/IMG_0165.jpeg
rename to src/static/apple/IMG_0165.jpeg
diff --git a/static/apple/IMG_0185.jpeg b/src/static/apple/IMG_0185.jpeg
similarity index 100%
rename from static/apple/IMG_0185.jpeg
rename to src/static/apple/IMG_0185.jpeg
diff --git a/static/apple/IMG_0190.jpeg b/src/static/apple/IMG_0190.jpeg
similarity index 100%
rename from static/apple/IMG_0190.jpeg
rename to src/static/apple/IMG_0190.jpeg
diff --git a/static/apple/IMG_0195.jpeg b/src/static/apple/IMG_0195.jpeg
similarity index 100%
rename from static/apple/IMG_0195.jpeg
rename to src/static/apple/IMG_0195.jpeg
diff --git a/static/apple/IMG_0232.jpeg b/src/static/apple/IMG_0232.jpeg
similarity index 100%
rename from static/apple/IMG_0232.jpeg
rename to src/static/apple/IMG_0232.jpeg
diff --git a/static/apple/IMG_0235.jpeg b/src/static/apple/IMG_0235.jpeg
similarity index 100%
rename from static/apple/IMG_0235.jpeg
rename to src/static/apple/IMG_0235.jpeg
diff --git a/static/apple/IMG_0237.jpeg b/src/static/apple/IMG_0237.jpeg
similarity index 100%
rename from static/apple/IMG_0237.jpeg
rename to src/static/apple/IMG_0237.jpeg
diff --git a/static/apple/IMG_0239.jpeg b/src/static/apple/IMG_0239.jpeg
similarity index 100%
rename from static/apple/IMG_0239.jpeg
rename to src/static/apple/IMG_0239.jpeg
diff --git a/static/apple/IMG_0254.jpeg b/src/static/apple/IMG_0254.jpeg
similarity index 100%
rename from static/apple/IMG_0254.jpeg
rename to src/static/apple/IMG_0254.jpeg
diff --git a/static/apple/IMG_0261.jpeg b/src/static/apple/IMG_0261.jpeg
similarity index 100%
rename from static/apple/IMG_0261.jpeg
rename to src/static/apple/IMG_0261.jpeg
diff --git a/static/apple/IMG_0262.jpeg b/src/static/apple/IMG_0262.jpeg
similarity index 100%
rename from static/apple/IMG_0262.jpeg
rename to src/static/apple/IMG_0262.jpeg
diff --git a/static/apple/IMG_0271.jpeg b/src/static/apple/IMG_0271.jpeg
similarity index 100%
rename from static/apple/IMG_0271.jpeg
rename to src/static/apple/IMG_0271.jpeg
diff --git a/static/apple/IMG_0274.jpeg b/src/static/apple/IMG_0274.jpeg
similarity index 100%
rename from static/apple/IMG_0274.jpeg
rename to src/static/apple/IMG_0274.jpeg
diff --git a/static/apple/IMG_0277.jpeg b/src/static/apple/IMG_0277.jpeg
similarity index 100%
rename from static/apple/IMG_0277.jpeg
rename to src/static/apple/IMG_0277.jpeg
diff --git a/static/apple/IMG_0281.jpeg b/src/static/apple/IMG_0281.jpeg
similarity index 100%
rename from static/apple/IMG_0281.jpeg
rename to src/static/apple/IMG_0281.jpeg
diff --git a/static/apple/IMG_0282.jpeg b/src/static/apple/IMG_0282.jpeg
similarity index 100%
rename from static/apple/IMG_0282.jpeg
rename to src/static/apple/IMG_0282.jpeg
diff --git a/static/apple/IMG_0285.jpeg b/src/static/apple/IMG_0285.jpeg
similarity index 100%
rename from static/apple/IMG_0285.jpeg
rename to src/static/apple/IMG_0285.jpeg
diff --git a/static/apple/IMG_0286.jpeg b/src/static/apple/IMG_0286.jpeg
similarity index 100%
rename from static/apple/IMG_0286.jpeg
rename to src/static/apple/IMG_0286.jpeg
diff --git a/static/apple/IMG_0290.jpeg b/src/static/apple/IMG_0290.jpeg
similarity index 100%
rename from static/apple/IMG_0290.jpeg
rename to src/static/apple/IMG_0290.jpeg
diff --git a/static/apple/IMG_0293.jpeg b/src/static/apple/IMG_0293.jpeg
similarity index 100%
rename from static/apple/IMG_0293.jpeg
rename to src/static/apple/IMG_0293.jpeg
diff --git a/static/apple/IMG_0294.jpeg b/src/static/apple/IMG_0294.jpeg
similarity index 100%
rename from static/apple/IMG_0294.jpeg
rename to src/static/apple/IMG_0294.jpeg
diff --git a/static/apple/IMG_0316.jpeg b/src/static/apple/IMG_0316.jpeg
similarity index 100%
rename from static/apple/IMG_0316.jpeg
rename to src/static/apple/IMG_0316.jpeg
diff --git a/static/apple/IMG_0328.jpeg b/src/static/apple/IMG_0328.jpeg
similarity index 100%
rename from static/apple/IMG_0328.jpeg
rename to src/static/apple/IMG_0328.jpeg
diff --git a/static/apple/IMG_0333.jpeg b/src/static/apple/IMG_0333.jpeg
similarity index 100%
rename from static/apple/IMG_0333.jpeg
rename to src/static/apple/IMG_0333.jpeg
diff --git a/static/apple/IMG_0349.jpeg b/src/static/apple/IMG_0349.jpeg
similarity index 100%
rename from static/apple/IMG_0349.jpeg
rename to src/static/apple/IMG_0349.jpeg
diff --git a/static/apple/IMG_0354.jpeg b/src/static/apple/IMG_0354.jpeg
similarity index 100%
rename from static/apple/IMG_0354.jpeg
rename to src/static/apple/IMG_0354.jpeg
diff --git a/static/apple/IMG_0355.jpeg b/src/static/apple/IMG_0355.jpeg
similarity index 100%
rename from static/apple/IMG_0355.jpeg
rename to src/static/apple/IMG_0355.jpeg
diff --git a/static/apple/IMG_0365.jpeg b/src/static/apple/IMG_0365.jpeg
similarity index 100%
rename from static/apple/IMG_0365.jpeg
rename to src/static/apple/IMG_0365.jpeg
diff --git a/static/apple/IMG_0367.jpg b/src/static/apple/IMG_0367.jpg
similarity index 100%
rename from static/apple/IMG_0367.jpg
rename to src/static/apple/IMG_0367.jpg
diff --git a/static/apple/IMG_0477.jpeg b/src/static/apple/IMG_0477.jpeg
similarity index 100%
rename from static/apple/IMG_0477.jpeg
rename to src/static/apple/IMG_0477.jpeg
diff --git a/static/apple/IMG_0485.jpeg b/src/static/apple/IMG_0485.jpeg
similarity index 100%
rename from static/apple/IMG_0485.jpeg
rename to src/static/apple/IMG_0485.jpeg
diff --git a/static/apple/IMG_0491.jpeg b/src/static/apple/IMG_0491.jpeg
similarity index 100%
rename from static/apple/IMG_0491.jpeg
rename to src/static/apple/IMG_0491.jpeg
diff --git a/static/apple/IMG_0516.jpeg b/src/static/apple/IMG_0516.jpeg
similarity index 100%
rename from static/apple/IMG_0516.jpeg
rename to src/static/apple/IMG_0516.jpeg
diff --git a/static/apple/IMG_0524.jpeg b/src/static/apple/IMG_0524.jpeg
similarity index 100%
rename from static/apple/IMG_0524.jpeg
rename to src/static/apple/IMG_0524.jpeg
diff --git a/static/apple/IMG_0531.jpeg b/src/static/apple/IMG_0531.jpeg
similarity index 100%
rename from static/apple/IMG_0531.jpeg
rename to src/static/apple/IMG_0531.jpeg
diff --git a/static/apple/IMG_0534.jpeg b/src/static/apple/IMG_0534.jpeg
similarity index 100%
rename from static/apple/IMG_0534.jpeg
rename to src/static/apple/IMG_0534.jpeg
diff --git a/static/apple/IMG_0537.jpeg b/src/static/apple/IMG_0537.jpeg
similarity index 100%
rename from static/apple/IMG_0537.jpeg
rename to src/static/apple/IMG_0537.jpeg
diff --git a/static/apple/IMG_0538.jpeg b/src/static/apple/IMG_0538.jpeg
similarity index 100%
rename from static/apple/IMG_0538.jpeg
rename to src/static/apple/IMG_0538.jpeg
diff --git a/static/apple/IMG_0553.jpeg b/src/static/apple/IMG_0553.jpeg
similarity index 100%
rename from static/apple/IMG_0553.jpeg
rename to src/static/apple/IMG_0553.jpeg
diff --git a/static/apple/IMG_0560.jpeg b/src/static/apple/IMG_0560.jpeg
similarity index 100%
rename from static/apple/IMG_0560.jpeg
rename to src/static/apple/IMG_0560.jpeg
diff --git a/static/apple/IMG_0561-CROP.jpeg b/src/static/apple/IMG_0561-CROP.jpeg
similarity index 100%
rename from static/apple/IMG_0561-CROP.jpeg
rename to src/static/apple/IMG_0561-CROP.jpeg
diff --git a/static/apple/IMG_0561.jpeg b/src/static/apple/IMG_0561.jpeg
similarity index 100%
rename from static/apple/IMG_0561.jpeg
rename to src/static/apple/IMG_0561.jpeg
diff --git a/static/apple/IMG_0589.jpeg b/src/static/apple/IMG_0589.jpeg
similarity index 100%
rename from static/apple/IMG_0589.jpeg
rename to src/static/apple/IMG_0589.jpeg
diff --git a/static/apple/IMG_0594.jpeg b/src/static/apple/IMG_0594.jpeg
similarity index 100%
rename from static/apple/IMG_0594.jpeg
rename to src/static/apple/IMG_0594.jpeg
diff --git a/static/apple/IMG_0612.jpeg b/src/static/apple/IMG_0612.jpeg
similarity index 100%
rename from static/apple/IMG_0612.jpeg
rename to src/static/apple/IMG_0612.jpeg
diff --git a/static/apple/IMG_0617.jpeg b/src/static/apple/IMG_0617.jpeg
similarity index 100%
rename from static/apple/IMG_0617.jpeg
rename to src/static/apple/IMG_0617.jpeg
diff --git a/static/apple/IMG_0618.jpeg b/src/static/apple/IMG_0618.jpeg
similarity index 100%
rename from static/apple/IMG_0618.jpeg
rename to src/static/apple/IMG_0618.jpeg
diff --git a/static/apple/IMG_0624.jpeg b/src/static/apple/IMG_0624.jpeg
similarity index 100%
rename from static/apple/IMG_0624.jpeg
rename to src/static/apple/IMG_0624.jpeg
diff --git a/static/apple/IMG_0625.jpeg b/src/static/apple/IMG_0625.jpeg
similarity index 100%
rename from static/apple/IMG_0625.jpeg
rename to src/static/apple/IMG_0625.jpeg
diff --git a/static/apple/IMG_0631.jpeg b/src/static/apple/IMG_0631.jpeg
similarity index 100%
rename from static/apple/IMG_0631.jpeg
rename to src/static/apple/IMG_0631.jpeg
diff --git a/static/apple/IMG_0637.jpeg b/src/static/apple/IMG_0637.jpeg
similarity index 100%
rename from static/apple/IMG_0637.jpeg
rename to src/static/apple/IMG_0637.jpeg
diff --git a/static/apple/IMG_0649.jpeg b/src/static/apple/IMG_0649.jpeg
similarity index 100%
rename from static/apple/IMG_0649.jpeg
rename to src/static/apple/IMG_0649.jpeg
diff --git a/static/apple/IMG_0650.jpeg b/src/static/apple/IMG_0650.jpeg
similarity index 100%
rename from static/apple/IMG_0650.jpeg
rename to src/static/apple/IMG_0650.jpeg
diff --git a/static/apple/IMG_0664.jpeg b/src/static/apple/IMG_0664.jpeg
similarity index 100%
rename from static/apple/IMG_0664.jpeg
rename to src/static/apple/IMG_0664.jpeg
diff --git a/static/apple/IMG_0685.jpeg b/src/static/apple/IMG_0685.jpeg
similarity index 100%
rename from static/apple/IMG_0685.jpeg
rename to src/static/apple/IMG_0685.jpeg
diff --git a/static/apple/IMG_1194.jpeg b/src/static/apple/IMG_1194.jpeg
similarity index 100%
rename from static/apple/IMG_1194.jpeg
rename to src/static/apple/IMG_1194.jpeg
diff --git a/static/apple/IMG_1435.jpeg b/src/static/apple/IMG_1435.jpeg
similarity index 100%
rename from static/apple/IMG_1435.jpeg
rename to src/static/apple/IMG_1435.jpeg
diff --git a/static/apple/IMG_1437.jpeg b/src/static/apple/IMG_1437.jpeg
similarity index 100%
rename from static/apple/IMG_1437.jpeg
rename to src/static/apple/IMG_1437.jpeg
diff --git a/static/apple/IMG_1449.jpeg b/src/static/apple/IMG_1449.jpeg
similarity index 100%
rename from static/apple/IMG_1449.jpeg
rename to src/static/apple/IMG_1449.jpeg
diff --git a/static/apple/IMG_1474.jpeg b/src/static/apple/IMG_1474.jpeg
similarity index 100%
rename from static/apple/IMG_1474.jpeg
rename to src/static/apple/IMG_1474.jpeg
diff --git a/static/apple/IMG_1487.jpeg b/src/static/apple/IMG_1487.jpeg
similarity index 100%
rename from static/apple/IMG_1487.jpeg
rename to src/static/apple/IMG_1487.jpeg
diff --git a/static/apple/IMG_1516.jpeg b/src/static/apple/IMG_1516.jpeg
similarity index 100%
rename from static/apple/IMG_1516.jpeg
rename to src/static/apple/IMG_1516.jpeg
diff --git a/static/apple/IMG_1518.jpeg b/src/static/apple/IMG_1518.jpeg
similarity index 100%
rename from static/apple/IMG_1518.jpeg
rename to src/static/apple/IMG_1518.jpeg
diff --git a/static/apple/IMG_1598.jpeg b/src/static/apple/IMG_1598.jpeg
similarity index 100%
rename from static/apple/IMG_1598.jpeg
rename to src/static/apple/IMG_1598.jpeg
diff --git a/static/apple/IMG_1654.jpeg b/src/static/apple/IMG_1654.jpeg
similarity index 100%
rename from static/apple/IMG_1654.jpeg
rename to src/static/apple/IMG_1654.jpeg
diff --git a/static/apple/IMG_1665.jpeg b/src/static/apple/IMG_1665.jpeg
similarity index 100%
rename from static/apple/IMG_1665.jpeg
rename to src/static/apple/IMG_1665.jpeg
diff --git a/static/apple/IMG_1670.jpeg b/src/static/apple/IMG_1670.jpeg
similarity index 100%
rename from static/apple/IMG_1670.jpeg
rename to src/static/apple/IMG_1670.jpeg
diff --git a/static/apple/IMG_1682.jpeg b/src/static/apple/IMG_1682.jpeg
similarity index 100%
rename from static/apple/IMG_1682.jpeg
rename to src/static/apple/IMG_1682.jpeg
diff --git a/static/apple/IMG_1721.jpeg b/src/static/apple/IMG_1721.jpeg
similarity index 100%
rename from static/apple/IMG_1721.jpeg
rename to src/static/apple/IMG_1721.jpeg
diff --git a/static/apple/IMG_1854.jpeg b/src/static/apple/IMG_1854.jpeg
similarity index 100%
rename from static/apple/IMG_1854.jpeg
rename to src/static/apple/IMG_1854.jpeg
diff --git a/static/apple/IMG_1882.jpeg b/src/static/apple/IMG_1882.jpeg
similarity index 100%
rename from static/apple/IMG_1882.jpeg
rename to src/static/apple/IMG_1882.jpeg
diff --git a/static/apple/IMG_1923.jpeg b/src/static/apple/IMG_1923.jpeg
similarity index 100%
rename from static/apple/IMG_1923.jpeg
rename to src/static/apple/IMG_1923.jpeg
diff --git a/static/apple/IMG_1924.jpeg b/src/static/apple/IMG_1924.jpeg
similarity index 100%
rename from static/apple/IMG_1924.jpeg
rename to src/static/apple/IMG_1924.jpeg
diff --git a/static/apple/IMG_1955.jpeg b/src/static/apple/IMG_1955.jpeg
similarity index 100%
rename from static/apple/IMG_1955.jpeg
rename to src/static/apple/IMG_1955.jpeg
diff --git a/static/apple/IMG_2054.jpeg b/src/static/apple/IMG_2054.jpeg
similarity index 100%
rename from static/apple/IMG_2054.jpeg
rename to src/static/apple/IMG_2054.jpeg
diff --git a/static/apple/IMG_2100.jpeg b/src/static/apple/IMG_2100.jpeg
similarity index 100%
rename from static/apple/IMG_2100.jpeg
rename to src/static/apple/IMG_2100.jpeg
diff --git a/static/apple/IMG_2101.jpeg b/src/static/apple/IMG_2101.jpeg
similarity index 100%
rename from static/apple/IMG_2101.jpeg
rename to src/static/apple/IMG_2101.jpeg
diff --git a/static/apple/IMG_2136.jpeg b/src/static/apple/IMG_2136.jpeg
similarity index 100%
rename from static/apple/IMG_2136.jpeg
rename to src/static/apple/IMG_2136.jpeg
diff --git a/static/apple/IMG_2137.jpeg b/src/static/apple/IMG_2137.jpeg
similarity index 100%
rename from static/apple/IMG_2137.jpeg
rename to src/static/apple/IMG_2137.jpeg
diff --git a/static/apple/IMG_2156.jpeg b/src/static/apple/IMG_2156.jpeg
similarity index 100%
rename from static/apple/IMG_2156.jpeg
rename to src/static/apple/IMG_2156.jpeg
diff --git a/static/apple/IMG_2296.jpeg b/src/static/apple/IMG_2296.jpeg
similarity index 100%
rename from static/apple/IMG_2296.jpeg
rename to src/static/apple/IMG_2296.jpeg
diff --git a/static/apple/IMG_2297.jpeg b/src/static/apple/IMG_2297.jpeg
similarity index 100%
rename from static/apple/IMG_2297.jpeg
rename to src/static/apple/IMG_2297.jpeg
diff --git a/static/apple/IMG_2330.jpeg b/src/static/apple/IMG_2330.jpeg
similarity index 100%
rename from static/apple/IMG_2330.jpeg
rename to src/static/apple/IMG_2330.jpeg
diff --git a/static/apple/IMG_2339.jpeg b/src/static/apple/IMG_2339.jpeg
similarity index 100%
rename from static/apple/IMG_2339.jpeg
rename to src/static/apple/IMG_2339.jpeg
diff --git a/static/apple/IMG_2383.jpeg b/src/static/apple/IMG_2383.jpeg
similarity index 100%
rename from static/apple/IMG_2383.jpeg
rename to src/static/apple/IMG_2383.jpeg
diff --git a/static/apple/IMG_2400.jpeg b/src/static/apple/IMG_2400.jpeg
similarity index 100%
rename from static/apple/IMG_2400.jpeg
rename to src/static/apple/IMG_2400.jpeg
diff --git a/static/apple/IMG_2409.jpeg b/src/static/apple/IMG_2409.jpeg
similarity index 100%
rename from static/apple/IMG_2409.jpeg
rename to src/static/apple/IMG_2409.jpeg
diff --git a/static/apple/IMG_2419.jpeg b/src/static/apple/IMG_2419.jpeg
similarity index 100%
rename from static/apple/IMG_2419.jpeg
rename to src/static/apple/IMG_2419.jpeg
diff --git a/static/apple/IMG_2452.jpeg b/src/static/apple/IMG_2452.jpeg
similarity index 100%
rename from static/apple/IMG_2452.jpeg
rename to src/static/apple/IMG_2452.jpeg
diff --git a/static/apple/IMG_2543.jpeg b/src/static/apple/IMG_2543.jpeg
similarity index 100%
rename from static/apple/IMG_2543.jpeg
rename to src/static/apple/IMG_2543.jpeg
diff --git a/static/apple/IMG_2587.jpeg b/src/static/apple/IMG_2587.jpeg
similarity index 100%
rename from static/apple/IMG_2587.jpeg
rename to src/static/apple/IMG_2587.jpeg
diff --git a/static/apple/IMG_2698.jpeg b/src/static/apple/IMG_2698.jpeg
similarity index 100%
rename from static/apple/IMG_2698.jpeg
rename to src/static/apple/IMG_2698.jpeg
diff --git a/static/apple/IMG_2746.jpeg b/src/static/apple/IMG_2746.jpeg
similarity index 100%
rename from static/apple/IMG_2746.jpeg
rename to src/static/apple/IMG_2746.jpeg
diff --git a/static/apple/IMG_2788.jpeg b/src/static/apple/IMG_2788.jpeg
similarity index 100%
rename from static/apple/IMG_2788.jpeg
rename to src/static/apple/IMG_2788.jpeg
diff --git a/static/apple/IMG_2817.jpeg b/src/static/apple/IMG_2817.jpeg
similarity index 100%
rename from static/apple/IMG_2817.jpeg
rename to src/static/apple/IMG_2817.jpeg
diff --git a/static/apple/IMG_3254.jpeg b/src/static/apple/IMG_3254.jpeg
similarity index 100%
rename from static/apple/IMG_3254.jpeg
rename to src/static/apple/IMG_3254.jpeg
diff --git a/static/apple/IMG_4656.jpeg b/src/static/apple/IMG_4656.jpeg
similarity index 100%
rename from static/apple/IMG_4656.jpeg
rename to src/static/apple/IMG_4656.jpeg
diff --git a/static/apple/IMG_6723.jpeg b/src/static/apple/IMG_6723.jpeg
similarity index 100%
rename from static/apple/IMG_6723.jpeg
rename to src/static/apple/IMG_6723.jpeg
diff --git a/static/apple/IMG_6844.jpeg b/src/static/apple/IMG_6844.jpeg
similarity index 100%
rename from static/apple/IMG_6844.jpeg
rename to src/static/apple/IMG_6844.jpeg
diff --git a/static/apple/IMG_6883.jpeg b/src/static/apple/IMG_6883.jpeg
similarity index 100%
rename from static/apple/IMG_6883.jpeg
rename to src/static/apple/IMG_6883.jpeg
diff --git a/static/apple/IMG_6919.jpeg b/src/static/apple/IMG_6919.jpeg
similarity index 100%
rename from static/apple/IMG_6919.jpeg
rename to src/static/apple/IMG_6919.jpeg
diff --git a/static/apple/IMG_7006.jpeg b/src/static/apple/IMG_7006.jpeg
similarity index 100%
rename from static/apple/IMG_7006.jpeg
rename to src/static/apple/IMG_7006.jpeg
diff --git a/static/apple/IMG_7029.jpeg b/src/static/apple/IMG_7029.jpeg
similarity index 100%
rename from static/apple/IMG_7029.jpeg
rename to src/static/apple/IMG_7029.jpeg
diff --git a/static/apple/IMG_7047.jpeg b/src/static/apple/IMG_7047.jpeg
similarity index 100%
rename from static/apple/IMG_7047.jpeg
rename to src/static/apple/IMG_7047.jpeg
diff --git a/static/apple/IMG_7054.jpeg b/src/static/apple/IMG_7054.jpeg
similarity index 100%
rename from static/apple/IMG_7054.jpeg
rename to src/static/apple/IMG_7054.jpeg
diff --git a/static/apple/IMG_7063.jpeg b/src/static/apple/IMG_7063.jpeg
similarity index 100%
rename from static/apple/IMG_7063.jpeg
rename to src/static/apple/IMG_7063.jpeg
diff --git a/static/apple/IMG_7081.jpeg b/src/static/apple/IMG_7081.jpeg
similarity index 100%
rename from static/apple/IMG_7081.jpeg
rename to src/static/apple/IMG_7081.jpeg
diff --git a/static/apple/IMG_7112.jpeg b/src/static/apple/IMG_7112.jpeg
similarity index 100%
rename from static/apple/IMG_7112.jpeg
rename to src/static/apple/IMG_7112.jpeg
diff --git a/static/apple/IMG_7222.jpeg b/src/static/apple/IMG_7222.jpeg
similarity index 100%
rename from static/apple/IMG_7222.jpeg
rename to src/static/apple/IMG_7222.jpeg
diff --git a/static/apple/IMG_7416.jpeg b/src/static/apple/IMG_7416.jpeg
similarity index 100%
rename from static/apple/IMG_7416.jpeg
rename to src/static/apple/IMG_7416.jpeg
diff --git a/static/apple/IMG_7417.jpeg b/src/static/apple/IMG_7417.jpeg
similarity index 100%
rename from static/apple/IMG_7417.jpeg
rename to src/static/apple/IMG_7417.jpeg
diff --git a/static/apple/IMG_7418.jpeg b/src/static/apple/IMG_7418.jpeg
similarity index 100%
rename from static/apple/IMG_7418.jpeg
rename to src/static/apple/IMG_7418.jpeg
diff --git a/static/apple/IMG_7419.jpeg b/src/static/apple/IMG_7419.jpeg
similarity index 100%
rename from static/apple/IMG_7419.jpeg
rename to src/static/apple/IMG_7419.jpeg
diff --git a/static/apple/IMG_7474.jpeg b/src/static/apple/IMG_7474.jpeg
similarity index 100%
rename from static/apple/IMG_7474.jpeg
rename to src/static/apple/IMG_7474.jpeg
diff --git a/static/apple/IMG_7514.jpeg b/src/static/apple/IMG_7514.jpeg
similarity index 100%
rename from static/apple/IMG_7514.jpeg
rename to src/static/apple/IMG_7514.jpeg
diff --git a/static/apple/IMG_7515.jpeg b/src/static/apple/IMG_7515.jpeg
similarity index 100%
rename from static/apple/IMG_7515.jpeg
rename to src/static/apple/IMG_7515.jpeg
diff --git a/static/apple/IMG_7516.jpeg b/src/static/apple/IMG_7516.jpeg
similarity index 100%
rename from static/apple/IMG_7516.jpeg
rename to src/static/apple/IMG_7516.jpeg
diff --git a/static/apple/IMG_7576.jpeg b/src/static/apple/IMG_7576.jpeg
similarity index 100%
rename from static/apple/IMG_7576.jpeg
rename to src/static/apple/IMG_7576.jpeg
diff --git a/static/apple/IMG_7628.jpeg b/src/static/apple/IMG_7628.jpeg
similarity index 100%
rename from static/apple/IMG_7628.jpeg
rename to src/static/apple/IMG_7628.jpeg
diff --git a/static/apple/IMG_7629.jpeg b/src/static/apple/IMG_7629.jpeg
similarity index 100%
rename from static/apple/IMG_7629.jpeg
rename to src/static/apple/IMG_7629.jpeg
diff --git a/static/apple/IMG_7631.jpeg b/src/static/apple/IMG_7631.jpeg
similarity index 100%
rename from static/apple/IMG_7631.jpeg
rename to src/static/apple/IMG_7631.jpeg
diff --git a/static/apple/IMG_7633.jpeg b/src/static/apple/IMG_7633.jpeg
similarity index 100%
rename from static/apple/IMG_7633.jpeg
rename to src/static/apple/IMG_7633.jpeg
diff --git a/static/apple/IMG_7634.jpeg b/src/static/apple/IMG_7634.jpeg
similarity index 100%
rename from static/apple/IMG_7634.jpeg
rename to src/static/apple/IMG_7634.jpeg
diff --git a/static/apple/IMG_7636.jpeg b/src/static/apple/IMG_7636.jpeg
similarity index 100%
rename from static/apple/IMG_7636.jpeg
rename to src/static/apple/IMG_7636.jpeg
diff --git a/static/apple/IMG_7652.jpeg b/src/static/apple/IMG_7652.jpeg
similarity index 100%
rename from static/apple/IMG_7652.jpeg
rename to src/static/apple/IMG_7652.jpeg
diff --git a/static/apple/IMG_7721.jpeg b/src/static/apple/IMG_7721.jpeg
similarity index 100%
rename from static/apple/IMG_7721.jpeg
rename to src/static/apple/IMG_7721.jpeg
diff --git a/static/apple/IMG_7723.jpeg b/src/static/apple/IMG_7723.jpeg
similarity index 100%
rename from static/apple/IMG_7723.jpeg
rename to src/static/apple/IMG_7723.jpeg
diff --git a/static/apple/IMG_7762.jpeg b/src/static/apple/IMG_7762.jpeg
similarity index 100%
rename from static/apple/IMG_7762.jpeg
rename to src/static/apple/IMG_7762.jpeg
diff --git a/static/apple/IMG_7763.jpeg b/src/static/apple/IMG_7763.jpeg
similarity index 100%
rename from static/apple/IMG_7763.jpeg
rename to src/static/apple/IMG_7763.jpeg
diff --git a/static/apple/IMG_7768.jpeg b/src/static/apple/IMG_7768.jpeg
similarity index 100%
rename from static/apple/IMG_7768.jpeg
rename to src/static/apple/IMG_7768.jpeg
diff --git a/static/apple/IMG_7782.jpeg b/src/static/apple/IMG_7782.jpeg
similarity index 100%
rename from static/apple/IMG_7782.jpeg
rename to src/static/apple/IMG_7782.jpeg
diff --git a/static/apple/IMG_7815.jpeg b/src/static/apple/IMG_7815.jpeg
similarity index 100%
rename from static/apple/IMG_7815.jpeg
rename to src/static/apple/IMG_7815.jpeg
diff --git a/static/apple/IMG_7912.jpeg b/src/static/apple/IMG_7912.jpeg
similarity index 100%
rename from static/apple/IMG_7912.jpeg
rename to src/static/apple/IMG_7912.jpeg
diff --git a/static/apple/IMG_7913.jpeg b/src/static/apple/IMG_7913.jpeg
similarity index 100%
rename from static/apple/IMG_7913.jpeg
rename to src/static/apple/IMG_7913.jpeg
diff --git a/static/apple/IMG_7950.jpeg b/src/static/apple/IMG_7950.jpeg
similarity index 100%
rename from static/apple/IMG_7950.jpeg
rename to src/static/apple/IMG_7950.jpeg
diff --git a/static/apple/IMG_7997.jpeg b/src/static/apple/IMG_7997.jpeg
similarity index 100%
rename from static/apple/IMG_7997.jpeg
rename to src/static/apple/IMG_7997.jpeg
diff --git a/static/apple/IMG_8043.jpeg b/src/static/apple/IMG_8043.jpeg
similarity index 100%
rename from static/apple/IMG_8043.jpeg
rename to src/static/apple/IMG_8043.jpeg
diff --git a/static/apple/IMG_8100.jpeg b/src/static/apple/IMG_8100.jpeg
similarity index 100%
rename from static/apple/IMG_8100.jpeg
rename to src/static/apple/IMG_8100.jpeg
diff --git a/static/apple/IMG_8123.jpeg b/src/static/apple/IMG_8123.jpeg
similarity index 100%
rename from static/apple/IMG_8123.jpeg
rename to src/static/apple/IMG_8123.jpeg
diff --git a/static/apple/IMG_8128.jpeg b/src/static/apple/IMG_8128.jpeg
similarity index 100%
rename from static/apple/IMG_8128.jpeg
rename to src/static/apple/IMG_8128.jpeg
diff --git a/static/apple/IMG_8148.jpeg b/src/static/apple/IMG_8148.jpeg
similarity index 100%
rename from static/apple/IMG_8148.jpeg
rename to src/static/apple/IMG_8148.jpeg
diff --git a/static/apple/IMG_8149.jpeg b/src/static/apple/IMG_8149.jpeg
similarity index 100%
rename from static/apple/IMG_8149.jpeg
rename to src/static/apple/IMG_8149.jpeg
diff --git a/static/apple/IMG_8202.jpeg b/src/static/apple/IMG_8202.jpeg
similarity index 100%
rename from static/apple/IMG_8202.jpeg
rename to src/static/apple/IMG_8202.jpeg
diff --git a/static/apple/IMG_8213.jpeg b/src/static/apple/IMG_8213.jpeg
similarity index 100%
rename from static/apple/IMG_8213.jpeg
rename to src/static/apple/IMG_8213.jpeg
diff --git a/static/apple/IMG_8229.jpeg b/src/static/apple/IMG_8229.jpeg
similarity index 100%
rename from static/apple/IMG_8229.jpeg
rename to src/static/apple/IMG_8229.jpeg
diff --git a/static/apple/IMG_8247.jpeg b/src/static/apple/IMG_8247.jpeg
similarity index 100%
rename from static/apple/IMG_8247.jpeg
rename to src/static/apple/IMG_8247.jpeg
diff --git a/static/apple/IMG_8248.jpeg b/src/static/apple/IMG_8248.jpeg
similarity index 100%
rename from static/apple/IMG_8248.jpeg
rename to src/static/apple/IMG_8248.jpeg
diff --git a/static/apple/IMG_8256.jpeg b/src/static/apple/IMG_8256.jpeg
similarity index 100%
rename from static/apple/IMG_8256.jpeg
rename to src/static/apple/IMG_8256.jpeg
diff --git a/static/apple/IMG_8257.jpeg b/src/static/apple/IMG_8257.jpeg
similarity index 100%
rename from static/apple/IMG_8257.jpeg
rename to src/static/apple/IMG_8257.jpeg
diff --git a/static/apple/IMG_8290.jpeg b/src/static/apple/IMG_8290.jpeg
similarity index 100%
rename from static/apple/IMG_8290.jpeg
rename to src/static/apple/IMG_8290.jpeg
diff --git a/static/apple/IMG_8294.jpeg b/src/static/apple/IMG_8294.jpeg
similarity index 100%
rename from static/apple/IMG_8294.jpeg
rename to src/static/apple/IMG_8294.jpeg
diff --git a/static/apple/IMG_8303.jpeg b/src/static/apple/IMG_8303.jpeg
similarity index 100%
rename from static/apple/IMG_8303.jpeg
rename to src/static/apple/IMG_8303.jpeg
diff --git a/static/apple/IMG_8309.jpeg b/src/static/apple/IMG_8309.jpeg
similarity index 100%
rename from static/apple/IMG_8309.jpeg
rename to src/static/apple/IMG_8309.jpeg
diff --git a/static/apple/IMG_8326.jpeg b/src/static/apple/IMG_8326.jpeg
similarity index 100%
rename from static/apple/IMG_8326.jpeg
rename to src/static/apple/IMG_8326.jpeg
diff --git a/static/apple/IMG_8384.jpeg b/src/static/apple/IMG_8384.jpeg
similarity index 100%
rename from static/apple/IMG_8384.jpeg
rename to src/static/apple/IMG_8384.jpeg
diff --git a/static/apple/IMG_8389.jpeg b/src/static/apple/IMG_8389.jpeg
similarity index 100%
rename from static/apple/IMG_8389.jpeg
rename to src/static/apple/IMG_8389.jpeg
diff --git a/static/apple/IMG_8463.jpeg b/src/static/apple/IMG_8463.jpeg
similarity index 100%
rename from static/apple/IMG_8463.jpeg
rename to src/static/apple/IMG_8463.jpeg
diff --git a/static/apple/IMG_8464.jpeg b/src/static/apple/IMG_8464.jpeg
similarity index 100%
rename from static/apple/IMG_8464.jpeg
rename to src/static/apple/IMG_8464.jpeg
diff --git a/static/apple/IMG_8491.jpeg b/src/static/apple/IMG_8491.jpeg
similarity index 100%
rename from static/apple/IMG_8491.jpeg
rename to src/static/apple/IMG_8491.jpeg
diff --git a/static/apple/IMG_8506.jpeg b/src/static/apple/IMG_8506.jpeg
similarity index 100%
rename from static/apple/IMG_8506.jpeg
rename to src/static/apple/IMG_8506.jpeg
diff --git a/static/apple/IMG_8533.jpeg b/src/static/apple/IMG_8533.jpeg
similarity index 100%
rename from static/apple/IMG_8533.jpeg
rename to src/static/apple/IMG_8533.jpeg
diff --git a/static/apple/IMG_8549.jpeg b/src/static/apple/IMG_8549.jpeg
similarity index 100%
rename from static/apple/IMG_8549.jpeg
rename to src/static/apple/IMG_8549.jpeg
diff --git a/static/apple/IMG_8574.jpeg b/src/static/apple/IMG_8574.jpeg
similarity index 100%
rename from static/apple/IMG_8574.jpeg
rename to src/static/apple/IMG_8574.jpeg
diff --git a/static/apple/IMG_8575.jpeg b/src/static/apple/IMG_8575.jpeg
similarity index 100%
rename from static/apple/IMG_8575.jpeg
rename to src/static/apple/IMG_8575.jpeg
diff --git a/static/apple/IMG_8620.jpeg b/src/static/apple/IMG_8620.jpeg
similarity index 100%
rename from static/apple/IMG_8620.jpeg
rename to src/static/apple/IMG_8620.jpeg
diff --git a/static/apple/IMG_8641.jpeg b/src/static/apple/IMG_8641.jpeg
similarity index 100%
rename from static/apple/IMG_8641.jpeg
rename to src/static/apple/IMG_8641.jpeg
diff --git a/static/apple/IMG_8656.jpeg b/src/static/apple/IMG_8656.jpeg
similarity index 100%
rename from static/apple/IMG_8656.jpeg
rename to src/static/apple/IMG_8656.jpeg
diff --git a/static/apple/IMG_8755.jpeg b/src/static/apple/IMG_8755.jpeg
similarity index 100%
rename from static/apple/IMG_8755.jpeg
rename to src/static/apple/IMG_8755.jpeg
diff --git a/static/apple/IMG_8757.jpeg b/src/static/apple/IMG_8757.jpeg
similarity index 100%
rename from static/apple/IMG_8757.jpeg
rename to src/static/apple/IMG_8757.jpeg
diff --git a/static/apple/IMG_8889.jpeg b/src/static/apple/IMG_8889.jpeg
similarity index 100%
rename from static/apple/IMG_8889.jpeg
rename to src/static/apple/IMG_8889.jpeg
diff --git a/static/apple/IMG_8891.jpeg b/src/static/apple/IMG_8891.jpeg
similarity index 100%
rename from static/apple/IMG_8891.jpeg
rename to src/static/apple/IMG_8891.jpeg
diff --git a/static/apple/designer.jpg b/src/static/apple/designer.jpg
similarity index 100%
rename from static/apple/designer.jpg
rename to src/static/apple/designer.jpg
diff --git a/static/apple/profile.jpg b/src/static/apple/profile.jpg
similarity index 100%
rename from static/apple/profile.jpg
rename to src/static/apple/profile.jpg
diff --git a/static/boat/boat-1.jpg b/src/static/boat/boat-1.jpg
similarity index 100%
rename from static/boat/boat-1.jpg
rename to src/static/boat/boat-1.jpg
diff --git a/static/boat/boat-10.jpg b/src/static/boat/boat-10.jpg
similarity index 100%
rename from static/boat/boat-10.jpg
rename to src/static/boat/boat-10.jpg
diff --git a/static/boat/boat-11.jpg b/src/static/boat/boat-11.jpg
similarity index 100%
rename from static/boat/boat-11.jpg
rename to src/static/boat/boat-11.jpg
diff --git a/static/boat/boat-12.jpg b/src/static/boat/boat-12.jpg
similarity index 100%
rename from static/boat/boat-12.jpg
rename to src/static/boat/boat-12.jpg
diff --git a/static/boat/boat-13.jpg b/src/static/boat/boat-13.jpg
similarity index 100%
rename from static/boat/boat-13.jpg
rename to src/static/boat/boat-13.jpg
diff --git a/static/boat/boat-14.jpg b/src/static/boat/boat-14.jpg
similarity index 100%
rename from static/boat/boat-14.jpg
rename to src/static/boat/boat-14.jpg
diff --git a/static/boat/boat-15.jpg b/src/static/boat/boat-15.jpg
similarity index 100%
rename from static/boat/boat-15.jpg
rename to src/static/boat/boat-15.jpg
diff --git a/static/boat/boat-16.jpg b/src/static/boat/boat-16.jpg
similarity index 100%
rename from static/boat/boat-16.jpg
rename to src/static/boat/boat-16.jpg
diff --git a/static/boat/boat-17.jpg b/src/static/boat/boat-17.jpg
similarity index 100%
rename from static/boat/boat-17.jpg
rename to src/static/boat/boat-17.jpg
diff --git a/static/boat/boat-18.jpg b/src/static/boat/boat-18.jpg
similarity index 100%
rename from static/boat/boat-18.jpg
rename to src/static/boat/boat-18.jpg
diff --git a/static/boat/boat-19.jpg b/src/static/boat/boat-19.jpg
similarity index 100%
rename from static/boat/boat-19.jpg
rename to src/static/boat/boat-19.jpg
diff --git a/static/boat/boat-2.jpg b/src/static/boat/boat-2.jpg
similarity index 100%
rename from static/boat/boat-2.jpg
rename to src/static/boat/boat-2.jpg
diff --git a/static/boat/boat-20.jpg b/src/static/boat/boat-20.jpg
similarity index 100%
rename from static/boat/boat-20.jpg
rename to src/static/boat/boat-20.jpg
diff --git a/static/boat/boat-21.jpg b/src/static/boat/boat-21.jpg
similarity index 100%
rename from static/boat/boat-21.jpg
rename to src/static/boat/boat-21.jpg
diff --git a/static/boat/boat-22.jpg b/src/static/boat/boat-22.jpg
similarity index 100%
rename from static/boat/boat-22.jpg
rename to src/static/boat/boat-22.jpg
diff --git a/static/boat/boat-23.jpg b/src/static/boat/boat-23.jpg
similarity index 100%
rename from static/boat/boat-23.jpg
rename to src/static/boat/boat-23.jpg
diff --git a/static/boat/boat-24.jpg b/src/static/boat/boat-24.jpg
similarity index 100%
rename from static/boat/boat-24.jpg
rename to src/static/boat/boat-24.jpg
diff --git a/static/boat/boat-25.jpg b/src/static/boat/boat-25.jpg
similarity index 100%
rename from static/boat/boat-25.jpg
rename to src/static/boat/boat-25.jpg
diff --git a/static/boat/boat-26.jpg b/src/static/boat/boat-26.jpg
similarity index 100%
rename from static/boat/boat-26.jpg
rename to src/static/boat/boat-26.jpg
diff --git a/static/boat/boat-27.jpg b/src/static/boat/boat-27.jpg
similarity index 100%
rename from static/boat/boat-27.jpg
rename to src/static/boat/boat-27.jpg
diff --git a/static/boat/boat-3.jpg b/src/static/boat/boat-3.jpg
similarity index 100%
rename from static/boat/boat-3.jpg
rename to src/static/boat/boat-3.jpg
diff --git a/static/boat/boat-4.jpg b/src/static/boat/boat-4.jpg
similarity index 100%
rename from static/boat/boat-4.jpg
rename to src/static/boat/boat-4.jpg
diff --git a/static/boat/boat-5.jpg b/src/static/boat/boat-5.jpg
similarity index 100%
rename from static/boat/boat-5.jpg
rename to src/static/boat/boat-5.jpg
diff --git a/static/boat/boat-6.jpg b/src/static/boat/boat-6.jpg
similarity index 100%
rename from static/boat/boat-6.jpg
rename to src/static/boat/boat-6.jpg
diff --git a/static/boat/boat-7.jpg b/src/static/boat/boat-7.jpg
similarity index 100%
rename from static/boat/boat-7.jpg
rename to src/static/boat/boat-7.jpg
diff --git a/static/boat/boat-8.jpg b/src/static/boat/boat-8.jpg
similarity index 100%
rename from static/boat/boat-8.jpg
rename to src/static/boat/boat-8.jpg
diff --git a/static/boat/boat-9.jpg b/src/static/boat/boat-9.jpg
similarity index 100%
rename from static/boat/boat-9.jpg
rename to src/static/boat/boat-9.jpg
diff --git a/static/dbp-old-1.gpg b/src/static/dbp-old-1.gpg
similarity index 100%
rename from static/dbp-old-1.gpg
rename to src/static/dbp-old-1.gpg
diff --git a/static/dbp-old-2.gpg b/src/static/dbp-old-2.gpg
similarity index 100%
rename from static/dbp-old-2.gpg
rename to src/static/dbp-old-2.gpg
diff --git a/static/dbp.gpg b/src/static/dbp.gpg
similarity index 100%
rename from static/dbp.gpg
rename to src/static/dbp.gpg
diff --git a/static/dbp.jpg b/src/static/dbp.jpg
similarity index 100%
rename from static/dbp.jpg
rename to src/static/dbp.jpg
diff --git a/static/resume.pdf b/src/static/resume.pdf
similarity index 100%
rename from static/resume.pdf
rename to src/static/resume.pdf
diff --git a/static/ssh_key.pub b/src/static/ssh_key.pub
similarity index 100%
rename from static/ssh_key.pub
rename to src/static/ssh_key.pub
diff --git a/talks/2014/types-testing-haskell-meetup.pdf b/src/talks/2014/types-testing-haskell-meetup.pdf
similarity index 100%
rename from talks/2014/types-testing-haskell-meetup.pdf
rename to src/talks/2014/types-testing-haskell-meetup.pdf
diff --git a/talks/2016/fn-continuations-haskell-meetup.pdf b/src/talks/2016/fn-continuations-haskell-meetup.pdf
similarity index 100%
rename from talks/2016/fn-continuations-haskell-meetup.pdf
rename to src/talks/2016/fn-continuations-haskell-meetup.pdf
diff --git a/talks/2016/fn-continuations-transitions-haskell-meetup.pdf b/src/talks/2016/fn-continuations-transitions-haskell-meetup.pdf
similarity index 100%
rename from talks/2016/fn-continuations-transitions-haskell-meetup.pdf
rename to src/talks/2016/fn-continuations-transitions-haskell-meetup.pdf
diff --git a/talks/2017/artifacts-nepls.pdf b/src/talks/2017/artifacts-nepls.pdf
similarity index 100%
rename from talks/2017/artifacts-nepls.pdf
rename to src/talks/2017/artifacts-nepls.pdf
diff --git a/talks/2017/funtal-pldi.pdf b/src/talks/2017/funtal-pldi.pdf
similarity index 100%
rename from talks/2017/funtal-pldi.pdf
rename to src/talks/2017/funtal-pldi.pdf
diff --git a/talks/2017/linking-types-scm.pdf b/src/talks/2017/linking-types-scm.pdf
similarity index 100%
rename from talks/2017/linking-types-scm.pdf
rename to src/talks/2017/linking-types-scm.pdf
diff --git a/talks/2017/linking-types-snapl.pdf b/src/talks/2017/linking-types-snapl.pdf
similarity index 100%
rename from talks/2017/linking-types-snapl.pdf
rename to src/talks/2017/linking-types-snapl.pdf
diff --git a/talks/2018/ccc-prisc.pdf b/src/talks/2018/ccc-prisc.pdf
similarity index 100%
rename from talks/2018/ccc-prisc.pdf
rename to src/talks/2018/ccc-prisc.pdf
diff --git a/templates/default.html b/templates/main.html
similarity index 97%
rename from templates/default.html
rename to templates/main.html
index ebb614f..4119aad 100644
--- a/templates/default.html
+++ b/templates/main.html
@@ -24,6 +24,8 @@
work: prl.ccs.neu.edu