Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 305 lines (238 sloc) 13.328 kB
2c18510 @bjpop Split documentation notes into separate files rename todo to todo.txt
authored
1 Implementing modules.
2
3 Syntax:
4
5 import mod
6 import mod as alias
7 from mod import item_list
8 from mod import *
9
10 Issues:
11
12 The first time a module is imported into a running application it is "executed",
13 which may have side effects (eg top-level statements). Subsequent loads do not
14 execute those effects again. Thus the loaded module is cached. A corollary is
15 that modules are identified uniquely by their name.
16
17 Python supports "packages" which make the story slightly more complex. To
18 simplify things it might make sense to ignore packages in the first pass.
19
20 An import of a module binds one or more variables in the local scope of the
21 importing statement. This is
22 problematic for the "from mod import *" syntax, because it does not name the
23 variables that it binds - they are simply the variables exported by the
24 imported module. So far our scheme has been to compile
25 Python variable names into Haskell variable names (where the Haskell variable
26 is bound to an IORef). For performance reasons we are trying to avoid using
27 strings (and an explicit environment) to handle variable names - although such
28 an approach would make it relatively easy to handle this tricky kind of import.
29
30 Fortunately this tricky kind of import is only allowed at the top level of
31 a module. Given this limitation, a compromise might be possible. The idea
32 is to maintain a global environment, which maps (string) names to object
33 references (IORef Object, aka ObjectRef):
34
35 type GlobalEnv = Map String ObjectRef
36
37 All variables referred to in a module must be declared first in the
38 compiled Haskell code. Top level
39 declarations can check the global environment. If a variable of the same name
40 was imported then the corresponding ObjectRef should be retrieved from the
41 table. Otherwise a new ObjectRef should be allocated and the table should
42 be updated. Reads from variables remain unchanged.
43
44 For instance, the Python code (at the top level)
45
46 x = 12
47 print(x)
48
49 is compiled to:
50
51 = do _s_x <- var "x"
52 _s_x =: 12
53 _t_0 <- read _s_print
54 _t_1 <- read _s_x
55 _t_0 @@ [_t_1]
56
57 where the 'var' primitive does the variable declaration, and has type:
58
59 var :: Ident -> Eval ObjectRef
60
61 which would be something like this in pseudo-code:
62
63 var s = do
64 maybeGlobal <- lookupGlobalEnv s
65 case maybeGlobal of
66 Nothing -> do
67 ref <- newIORef (error $ "undefined variable: " ++ s)
68 updateGlobalEnv s ref
69 return ref
70 Just ref -> return ref
71
72 An imported module should update the global environment with the variables
73 that are imported from it. In the case of the "import *", all variables are
74 imported, whereas the other kinds of imports are more restrictive. For instance,
75 "import Foo" just introduces the "Foo" variable into scope (which will be
76 bound to a Module object).
77
78 Side note: if we ever wanted to we could presumably extend this scheme to support
79 'from Foo import *' at other nesting levels, by including a suitably nested
80 variable enviroment.
81
82 One question is what to do with threads? Does each thread have its own
83 "thread global" environment? Seems plausible.
84
85 Other issues to consider is that module imports have dynamic behaviour.
86 They can be evaluated in nested scopes and under conditional statements:
87
88 if test:
89 import Foo
90 else:
91 import Bar
92
93 So it is undesirable to require them to be statically known and linked.
94 That means we can't use Haskell's import facility to implement Python's
95 import facility (in its full glory). (Though a static import facility
96 could be supported and may be a useful extension). A promising workaround is to use
97 dynamic loading via something like the plugins library. The idea would
98 be that each compiled module exports a single entity called, say,
99 init, which would have a type like:
100
101 init :: Eval Object
102
103 The resulting object would be a Module that contains a
104 dictionary mapping all its members to objects.
105
106 So a Python statement like:
107
108 import Foo
109
110 would be compiled to:
111
112 obj <- importModule "Foo"
113 _s_Foo <- var "Foo"
114 _s_Foo =: obj
115
116 or maybe (this is probably better):
117
118 _s_Foo <- importModuleRef "Foo"
119
120 where
121
122 importModule :: String -> Eval Object
123
124 and/or
125
126 importModuleRef :: String -> Eval ObjectRef
127
128 with pseudo code:
129
130 -- this just handles the simple case of: import Foo
131 importModule name = do
132 maybeImported <- lookupModule name
133 case maybeImported of
134 Just obj -> return obj
135 Nothing -> do
136 tryCompiled <- compileModule name
137 case tryCompiled of
138 Left e -> raise ("compilation error: " ++ show e)
139 Right obj -> return obj
140
141 compileModule :: String -> Eval Object
142 compileModule name = do
143 maybePath <- findModulePath name
144 case maybePath of
145 Nothing -> raise ("could not find module")
146 Just path -> do
147 compiled <- isCompiled path
148 if compiled
149 then liftIO $ load path "init"
150 else do
151 compileResult <- compileToObj path
152 case compileResult of
153 Nothing -> liftIO $ load path "init"
154 Just err -> raise err
155
156 assuming:
157
158 load :: FilePath -> String -> IO a
159
160 or something like that.
161
162 Each Python module should be compiled to a Haskell binding defining
163 and init function which is the only variable exported from the
164 Haskell module:
165
166 init :: Eval Object
167 init = do
168 ... compiled stuff ...
169 -- should be hashed strings below
170 mkModule [("x", _s_x), ..., ("z", s_z)]
171
172 where mkModuleObject builds the object for the module from
173 the top variables defined in it:
174
175 mkModule :: [(Hashed String, ObjectRef)] -> Eval Object
176
177 We have to compile the Python code to object code and
178 then dynamically load the object code. This raises the question:
179
180 Should we call the compiler (from the running program) as a shell call,
181 or should we compile the compiler into the runtime library?
182
183 A shell call keeps the runtime and the compiler separate but at what advantage?
184 Will make the resulting executable smaller. But we could in theory
185 dynamically link the compiler to the excecutable. The space saving is not so
186 compelling because we still need to have the compiler around anyway. Nonetheless
187 static linking the compiler to the runtime would be undesirable. Does GHC support
188 dynamic linking everywhere?
189
190 It might be (slightly?) faster for the compiler to be called directly rather
191 than from a shell call. It might also be more portable.
192
193 Plan: see if we can compile the compiler into the runtime. See if it works and
194 see if the size of executables is okay. Hope for dynamic linking to work.
195
196 This will require us to build a berp (compiler) library from the cabal file.
197 Both the command line front end and the runtime will link to the library.
198
199 What about the main function?
200
201 Simple solution: the berp executable just dynamically loads the module that
202 was mentioned on the command line, something like:
203
204 main :: IO ()
205 main = do
206 args <- getArgs
207 let pySrc = getPySrc args
208 init <- importModule pySrc
209 runStmt init
210
211 How should the interpreter work? Maybe we can also use dynamic linking. The idea
212 is to compile each new statement into a temporary module and then dynamically
213 load it into the running program. The main issue to solve is how to bind the
214 free variables in the statement to their values from the running program? One
215 possible solution is to compile each statement into a function (closure) that
216 binds all the free variables. Something akin to the way we propose to handle
217 'from Foo import *'. It seems appealing to try to implement the interpreter this
218 way. Perhaps there is a performance issue due to loading times? Maybe we can
219 avoid touching the file system?
220
221 We could use the dynamic import facility to link the compiled program to the base library. That is
222 we compile a special module in the base library and dynamically load it a runtime. Or maybe it is just
223 better to statically import it? Again the availability of dynamic linking makes a difference to the
224 size of the resulting executable.
225
226 It looks like dynamic loading might invalidate any global state in the program.
227 Though it might be a bug, currently the stdandard IO devices do not seem to
228 persist as expected across a dynamic load (this is evident when the stdout is
229 redirected on the shell, after a dynamic load it seems the redirection is
230 lost and the output disappears). There are a few cases of global state in the
231 current implementation which could do with a revision in light of this
232 discovery. We use unsafePerformIO to make some otherwise effectful operations into
233 globals. This is safe because we are careful to ensure that the effects are
234 benign, such as allocating IORefs, but even so, the use of unsafe operations
235 seems less than ideal. A few observations:
236 - If these things are truly constant, then we should never need the IO monad. Hopefully global
237 constant bindings should suffice, immutable data structures.
238 - For most/all such global constants, the point of making them global is to simplify
239 scoping issues (they are in scope everywhere). An alternative approach is to bring them
240 into scope by importing them, just as the import mechanism will bring other things into scope.
241 We could pretend that every Python module has an implicit:
242
243 from builtins import *
244
245 at the top, where builtins is a special module which is part of the base library
246 implementation. This might ultimately be a cleaner solution to the problem. Then everything
247 will be in the Eval monad, and there will be no unsafePerformIO. However, a more static
248 approach might be more efficient as it seems that dynamic linking is a little bit slow.
249 Obviously berp will have to know how to find builtins.o in order to link it in.
250
251 Module naming strategy. Given a module called Foo.py, what is the name of the resulting Haskell module?
252 Haskell modules have an internal name and the filename is somewhat ancilliary. However, it tends to make
253 things easy for GHC if the module name is the same as the file name. This can cause trouble because of
254 capitalisation (and maybe there are other issues). The proposal is to prepend "Berp_" onto the front of every
255 name. For example, Foo.py will become (Berp_Foo.hs, module Berp_Foo), whereas foo.py will
256 become (Berp_foo.py, module Berp_foo). Even more name mangling might be needed if python allows characters in
257 its name that are not allowed in Haskell modules. I'm not aware of any other issues at the moment, but maybe
258 there are unusual things like unicode issues to consider.
259
260 Small problem found: we don't currently do a proper job of bound methods. This causes trouble with imports
261 like so:
262 import Foo
263 Foo.f()
264 We treat this like this was a method call of f on Foo, which means we add Foo as the first argument,
265 which is not correct. We need to distinguish between method lookups and module attribute lookups.
266
267 Question: what is considered to be defined at the top-level of a module? Obviously all top-bound variables,
268 but what about things imported from other modules? Seems like the answer is yes.
269 The semantics of which top-level variables are bound to the module object is probably a dynamic property
270 of the program, so our static mkModule technique is probably wrong. For example, consider this top-level code:
271
272 if cond:
273 x = 5
274 else:
275 y = 12
276
277 The value of cond will determine whether x or y is bound in the module. Currently we compile this code so that
278 both x and y are considered in scope, which is clearly wrong. The solution is to rejig the way the variables in
279 top scope are declared. First we need to push the declarations into the inner scopes where they are defined. Second we need to make the variable declaration operator modify some state value in the eval monad. Third we
280 need to collect all the dynamically declared variables at the end of the evaluation of a module and bind
281 them to the resulting module object. This change could break things, so we need to think about it carefully.
282
283 Implementation strategy:
284
285 1. Add a module data constructor to the Object type. Done.
286 2. Modify compilation of a Python module to use the init binding, (Done)
287 and implement mkModule (Done). Move compiler code into library (Done).
288 Implement a basic importModule (Done).
289 Need to handle the main function to evaluate the whole program, this will
290 basically just call importModule (Done).
291 3. Get module name mangling to work. (Done)
292 4. Get the simple case of "import Foo" to work. Don't worry about
293 caching imports. Don't worry about the search path. Just find
294 modules in the current directory. (Done)
295 5. Avoid recompiling haskell modules that are already compiled. (Done)
296 6. Add import caching (Done).
297 7. Get top-level variable declarations working dynamically, and thus
298 fix the way they are bound to the module object, see notes above (Pending)
299 8. Implement the more difficult case of "from Foo import (x,y,z)"
300 9. Implement the hardest case of "from Foo import *".
301 10. Add search path for files.
302 11. Get it working in the interpreter. (Maybe skip if tricky).
303 --- stop here and release the code ---
304 10. Consider what's needed for packages.
Something went wrong with that request. Please try again.