Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 305 lines (238 sloc) 13 KB
2c18510 @bjpop Split documentation notes into separate files rename todo to todo.txt
1 Implementing modules.
3 Syntax:
5 import mod
6 import mod as alias
7 from mod import item_list
8 from mod import *
10 Issues:
12 The first time a module is imported into a running application it is "executed",
13 which may have side effects (eg top-level statements). Subsequent loads do not
14 execute those effects again. Thus the loaded module is cached. A corollary is
15 that modules are identified uniquely by their name.
17 Python supports "packages" which make the story slightly more complex. To
18 simplify things it might make sense to ignore packages in the first pass.
20 An import of a module binds one or more variables in the local scope of the
21 importing statement. This is
22 problematic for the "from mod import *" syntax, because it does not name the
23 variables that it binds - they are simply the variables exported by the
24 imported module. So far our scheme has been to compile
25 Python variable names into Haskell variable names (where the Haskell variable
26 is bound to an IORef). For performance reasons we are trying to avoid using
27 strings (and an explicit environment) to handle variable names - although such
28 an approach would make it relatively easy to handle this tricky kind of import.
30 Fortunately this tricky kind of import is only allowed at the top level of
31 a module. Given this limitation, a compromise might be possible. The idea
32 is to maintain a global environment, which maps (string) names to object
33 references (IORef Object, aka ObjectRef):
35 type GlobalEnv = Map String ObjectRef
37 All variables referred to in a module must be declared first in the
38 compiled Haskell code. Top level
39 declarations can check the global environment. If a variable of the same name
40 was imported then the corresponding ObjectRef should be retrieved from the
41 table. Otherwise a new ObjectRef should be allocated and the table should
42 be updated. Reads from variables remain unchanged.
44 For instance, the Python code (at the top level)
46 x = 12
47 print(x)
49 is compiled to:
51 = do _s_x <- var "x"
52 _s_x =: 12
53 _t_0 <- read _s_print
54 _t_1 <- read _s_x
55 _t_0 @@ [_t_1]
57 where the 'var' primitive does the variable declaration, and has type:
59 var :: Ident -> Eval ObjectRef
61 which would be something like this in pseudo-code:
63 var s = do
64 maybeGlobal <- lookupGlobalEnv s
65 case maybeGlobal of
66 Nothing -> do
67 ref <- newIORef (error $ "undefined variable: " ++ s)
68 updateGlobalEnv s ref
69 return ref
70 Just ref -> return ref
72 An imported module should update the global environment with the variables
73 that are imported from it. In the case of the "import *", all variables are
74 imported, whereas the other kinds of imports are more restrictive. For instance,
75 "import Foo" just introduces the "Foo" variable into scope (which will be
76 bound to a Module object).
78 Side note: if we ever wanted to we could presumably extend this scheme to support
79 'from Foo import *' at other nesting levels, by including a suitably nested
80 variable enviroment.
82 One question is what to do with threads? Does each thread have its own
83 "thread global" environment? Seems plausible.
85 Other issues to consider is that module imports have dynamic behaviour.
86 They can be evaluated in nested scopes and under conditional statements:
88 if test:
89 import Foo
90 else:
91 import Bar
93 So it is undesirable to require them to be statically known and linked.
94 That means we can't use Haskell's import facility to implement Python's
95 import facility (in its full glory). (Though a static import facility
96 could be supported and may be a useful extension). A promising workaround is to use
97 dynamic loading via something like the plugins library. The idea would
98 be that each compiled module exports a single entity called, say,
99 init, which would have a type like:
101 init :: Eval Object
103 The resulting object would be a Module that contains a
104 dictionary mapping all its members to objects.
106 So a Python statement like:
108 import Foo
110 would be compiled to:
112 obj <- importModule "Foo"
113 _s_Foo <- var "Foo"
114 _s_Foo =: obj
116 or maybe (this is probably better):
118 _s_Foo <- importModuleRef "Foo"
120 where
122 importModule :: String -> Eval Object
124 and/or
126 importModuleRef :: String -> Eval ObjectRef
128 with pseudo code:
130 -- this just handles the simple case of: import Foo
131 importModule name = do
132 maybeImported <- lookupModule name
133 case maybeImported of
134 Just obj -> return obj
135 Nothing -> do
136 tryCompiled <- compileModule name
137 case tryCompiled of
138 Left e -> raise ("compilation error: " ++ show e)
139 Right obj -> return obj
141 compileModule :: String -> Eval Object
142 compileModule name = do
143 maybePath <- findModulePath name
144 case maybePath of
145 Nothing -> raise ("could not find module")
146 Just path -> do
147 compiled <- isCompiled path
148 if compiled
149 then liftIO $ load path "init"
150 else do
151 compileResult <- compileToObj path
152 case compileResult of
153 Nothing -> liftIO $ load path "init"
154 Just err -> raise err
156 assuming:
158 load :: FilePath -> String -> IO a
160 or something like that.
162 Each Python module should be compiled to a Haskell binding defining
163 and init function which is the only variable exported from the
164 Haskell module:
166 init :: Eval Object
167 init = do
168 ... compiled stuff ...
169 -- should be hashed strings below
170 mkModule [("x", _s_x), ..., ("z", s_z)]
172 where mkModuleObject builds the object for the module from
173 the top variables defined in it:
175 mkModule :: [(Hashed String, ObjectRef)] -> Eval Object
177 We have to compile the Python code to object code and
178 then dynamically load the object code. This raises the question:
180 Should we call the compiler (from the running program) as a shell call,
181 or should we compile the compiler into the runtime library?
183 A shell call keeps the runtime and the compiler separate but at what advantage?
184 Will make the resulting executable smaller. But we could in theory
185 dynamically link the compiler to the excecutable. The space saving is not so
186 compelling because we still need to have the compiler around anyway. Nonetheless
187 static linking the compiler to the runtime would be undesirable. Does GHC support
188 dynamic linking everywhere?
190 It might be (slightly?) faster for the compiler to be called directly rather
191 than from a shell call. It might also be more portable.
193 Plan: see if we can compile the compiler into the runtime. See if it works and
194 see if the size of executables is okay. Hope for dynamic linking to work.
196 This will require us to build a berp (compiler) library from the cabal file.
197 Both the command line front end and the runtime will link to the library.
199 What about the main function?
201 Simple solution: the berp executable just dynamically loads the module that
202 was mentioned on the command line, something like:
204 main :: IO ()
205 main = do
206 args <- getArgs
207 let pySrc = getPySrc args
208 init <- importModule pySrc
209 runStmt init
211 How should the interpreter work? Maybe we can also use dynamic linking. The idea
212 is to compile each new statement into a temporary module and then dynamically
213 load it into the running program. The main issue to solve is how to bind the
214 free variables in the statement to their values from the running program? One
215 possible solution is to compile each statement into a function (closure) that
216 binds all the free variables. Something akin to the way we propose to handle
217 'from Foo import *'. It seems appealing to try to implement the interpreter this
218 way. Perhaps there is a performance issue due to loading times? Maybe we can
219 avoid touching the file system?
221 We could use the dynamic import facility to link the compiled program to the base library. That is
222 we compile a special module in the base library and dynamically load it a runtime. Or maybe it is just
223 better to statically import it? Again the availability of dynamic linking makes a difference to the
224 size of the resulting executable.
226 It looks like dynamic loading might invalidate any global state in the program.
227 Though it might be a bug, currently the stdandard IO devices do not seem to
228 persist as expected across a dynamic load (this is evident when the stdout is
229 redirected on the shell, after a dynamic load it seems the redirection is
230 lost and the output disappears). There are a few cases of global state in the
231 current implementation which could do with a revision in light of this
232 discovery. We use unsafePerformIO to make some otherwise effectful operations into
233 globals. This is safe because we are careful to ensure that the effects are
234 benign, such as allocating IORefs, but even so, the use of unsafe operations
235 seems less than ideal. A few observations:
236 - If these things are truly constant, then we should never need the IO monad. Hopefully global
237 constant bindings should suffice, immutable data structures.
238 - For most/all such global constants, the point of making them global is to simplify
239 scoping issues (they are in scope everywhere). An alternative approach is to bring them
240 into scope by importing them, just as the import mechanism will bring other things into scope.
241 We could pretend that every Python module has an implicit:
243 from builtins import *
245 at the top, where builtins is a special module which is part of the base library
246 implementation. This might ultimately be a cleaner solution to the problem. Then everything
247 will be in the Eval monad, and there will be no unsafePerformIO. However, a more static
248 approach might be more efficient as it seems that dynamic linking is a little bit slow.
249 Obviously berp will have to know how to find builtins.o in order to link it in.
251 Module naming strategy. Given a module called, what is the name of the resulting Haskell module?
252 Haskell modules have an internal name and the filename is somewhat ancilliary. However, it tends to make
253 things easy for GHC if the module name is the same as the file name. This can cause trouble because of
254 capitalisation (and maybe there are other issues). The proposal is to prepend "Berp_" onto the front of every
255 name. For example, will become (Berp_Foo.hs, module Berp_Foo), whereas will
256 become (, module Berp_foo). Even more name mangling might be needed if python allows characters in
257 its name that are not allowed in Haskell modules. I'm not aware of any other issues at the moment, but maybe
258 there are unusual things like unicode issues to consider.
260 Small problem found: we don't currently do a proper job of bound methods. This causes trouble with imports
261 like so:
262 import Foo
263 Foo.f()
264 We treat this like this was a method call of f on Foo, which means we add Foo as the first argument,
265 which is not correct. We need to distinguish between method lookups and module attribute lookups.
267 Question: what is considered to be defined at the top-level of a module? Obviously all top-bound variables,
268 but what about things imported from other modules? Seems like the answer is yes.
269 The semantics of which top-level variables are bound to the module object is probably a dynamic property
270 of the program, so our static mkModule technique is probably wrong. For example, consider this top-level code:
272 if cond:
273 x = 5
274 else:
275 y = 12
277 The value of cond will determine whether x or y is bound in the module. Currently we compile this code so that
278 both x and y are considered in scope, which is clearly wrong. The solution is to rejig the way the variables in
279 top scope are declared. First we need to push the declarations into the inner scopes where they are defined. Second we need to make the variable declaration operator modify some state value in the eval monad. Third we
280 need to collect all the dynamically declared variables at the end of the evaluation of a module and bind
281 them to the resulting module object. This change could break things, so we need to think about it carefully.
283 Implementation strategy:
285 1. Add a module data constructor to the Object type. Done.
286 2. Modify compilation of a Python module to use the init binding, (Done)
287 and implement mkModule (Done). Move compiler code into library (Done).
288 Implement a basic importModule (Done).
289 Need to handle the main function to evaluate the whole program, this will
290 basically just call importModule (Done).
291 3. Get module name mangling to work. (Done)
292 4. Get the simple case of "import Foo" to work. Don't worry about
293 caching imports. Don't worry about the search path. Just find
294 modules in the current directory. (Done)
295 5. Avoid recompiling haskell modules that are already compiled. (Done)
296 6. Add import caching (Done).
297 7. Get top-level variable declarations working dynamically, and thus
298 fix the way they are bound to the module object, see notes above (Pending)
299 8. Implement the more difficult case of "from Foo import (x,y,z)"
300 9. Implement the hardest case of "from Foo import *".
301 10. Add search path for files.
302 11. Get it working in the interpreter. (Maybe skip if tricky).
303 --- stop here and release the code ---
304 10. Consider what's needed for packages.
Something went wrong with that request. Please try again.