Permalink
Browse files

Initial import

  • Loading branch information...
0 parents commit 526aec890a1712c5e607597c361370ff4fb43d66 @bashu bashu committed Sep 30, 2007
Showing with 1,486 additions and 0 deletions.
  1. +2 −0 AUTHORS
  2. +340 −0 COPYING
  3. +35 −0 INSTALL
  4. +11 −0 NEWS
  5. +88 −0 README
  6. +4 −0 TODO
  7. +48 −0 arch.conf
  8. +147 −0 archmage
  9. BIN archmage.1.gz
  10. +414 −0 archmod/CHM.py
  11. +15 −0 archmod/__init__.py
  12. +16 −0 archmod/htmltotext.py
  13. +44 −0 archmod/mod_chm.py
  14. +72 −0 setup.py
  15. +171 −0 templates/arch_contents.html
  16. +2 −0 templates/arch_css.css
  17. +26 −0 templates/arch_frameset.html
  18. +12 −0 templates/arch_header.html
  19. BIN templates/icons/0.gif
  20. BIN templates/icons/1.gif
  21. BIN templates/icons/10.gif
  22. BIN templates/icons/11.gif
  23. BIN templates/icons/12.gif
  24. BIN templates/icons/13.gif
  25. BIN templates/icons/14.gif
  26. BIN templates/icons/15.gif
  27. BIN templates/icons/16.gif
  28. BIN templates/icons/17.gif
  29. BIN templates/icons/18.gif
  30. BIN templates/icons/19.gif
  31. BIN templates/icons/2.gif
  32. BIN templates/icons/20.gif
  33. BIN templates/icons/21.gif
  34. BIN templates/icons/22.gif
  35. BIN templates/icons/23.gif
  36. BIN templates/icons/24.gif
  37. BIN templates/icons/25.gif
  38. BIN templates/icons/26.gif
  39. BIN templates/icons/27.gif
  40. BIN templates/icons/3.gif
  41. BIN templates/icons/35.gif
  42. BIN templates/icons/37.gif
  43. BIN templates/icons/39.gif
  44. BIN templates/icons/4.gif
  45. BIN templates/icons/5.gif
  46. BIN templates/icons/6.gif
  47. BIN templates/icons/7.gif
  48. BIN templates/icons/8.gif
  49. BIN templates/icons/9.gif
  50. BIN templates/icons/90.gif
  51. BIN templates/icons/91.gif
  52. BIN templates/icons/92.gif
  53. BIN templates/icons/93.gif
  54. BIN templates/icons/94.gif
  55. BIN templates/icons/95.gif
  56. BIN templates/icons/96.gif
  57. BIN templates/icons/97.gif
  58. BIN templates/icons/98.gif
  59. BIN templates/icons/99.gif
  60. BIN templates/icons/next.gif
  61. BIN templates/icons/prev.gif
  62. +39 −0 templates/index.html
@@ -0,0 +1,2 @@
+Copyright (c) 2003 Eugeny Korekin <az@ftc.ru>
+Copyright (c) 2005-2007 Basil Shubin <basil.shubin@gmail.com>
Oops, something went wrong.
@@ -0,0 +1,35 @@
+Source Tarball
+==============
+
+First unpark source tarball:
+
+ # tar xzvf archmage-0.1.9.tar.gz
+
+change directory:
+
+ # cd archmage-0.1.9
+
+to install arCHMage run following command:
+
+ # python setup.py install
+
+
+Debian / Ubuntu
+===============
+
+You can use prepackaged version from archive:
+
+ # apt-get install archmage
+
+if you want to use it with apache:
+
+ # apt-get install libapache-mod-python
+
+or to use it with apache2:
+
+ # apt-get install libapache2-mod-python
+
+To have ability to dump HTML data from CHM file as plain text:
+
+ # apt-get install lynx
+
@@ -0,0 +1,11 @@
+arCHMage 0.2
+============
+Bug fixes:
+
+* [SF #1767529] IOError exception using CHM dump option
+
+arCHMage 0.1.9
+==============
+Changes/New features:
+
+* New option - 'dump' for dumping HTML data from CHM file as Plain Text
@@ -0,0 +1,88 @@
+About arCHMage
+==============
+
+arCHMage is an extensible reader and decompiler for files in the CHM
+format. This is the format used by Microsoft HTML Help, and is also known as
+Compiled HTML.
+
+arCHMage is written in the Python programming language and uses PyCHM - python
+bindings for CHMLib from GnoCHM project.
+
+Originally this utility was written by Eugeny Korekin, but since 2005 it has
+been maintained and developed by Basil Shubin.
+
+
+Features List
+=============
+
+ * Extracting CHM content
+ * Dumping HTML data from CHM file as plain text (using external tools)
+ * Running as standalone http-server
+ * Extension for Apache Web Server - mod_chm
+
+
+System Requirements
+===================
+
+arCHMage requires the following libraries:
+
+ * Python 2.3 or later
+ * PyCHM
+ * CHMLib
+
+Other (optional) dependencies:
+
+ * Lynx or ELinks - dumping HTML as plain text
+ * mod_python - Apache/Python Integration
+
+
+Installation
+============
+
+See INSTALL file for more details.
+
+
+Simple Usage HOWTO
+==================
+
+There is three ways to use arCHMage package now:
+
+1) Extract CHM file content in to directory (directory will be created):
+
+ archmage -x <chmfile> <directory>
+
+ Note: Decompilation will fail, if destination directory already exist.
+
+2) Dump HTML data from CHM file as plain text:
+
+ archmage -d <chmfile>
+
+ Note: All data dumped in to standard output. To use this feature you must
+ have lynx or elinks text browser installed. See arch.conf for details.
+
+3) Run as http-server, which will publish chm file contents on specified port.
+
+ archmage -p <port> <chmfile>
+
+ Note: You can first decompress chm file into directory and use this
+ directory instead chm file, i.e: archmage -p <port> <chmdir>
+
+4) Tune your apache to publish chm file contents if there is trailing slash in
+ request to that file (you will need working mod_python for that):
+
+ Add that lines to your httpd.conf:
+
+ AddHandler python-program .chm
+ PythonHandler archmod.mod_chm
+
+ Restart apache.
+
+ Let's suppose, you have file sample.chm in DocumentRoot of your apache.
+ After that tuning you can receive raw chm file, if you point your browser
+ to
+
+ http://yourserver/sample.chm
+
+ or you can view chm file on the fly if you point your browser to
+
+ http://yourserver/sample.chm/ (note trailing slash)
@@ -0,0 +1,4 @@
+* Update manpage
+* CHM to PDF or ODT converter !!!
+* PDF or ODT to CHM converter ???
+* Any other ideas?
@@ -0,0 +1,48 @@
+# Directory for templates, all files in that directory will be parsed
+# and <%.+%> occurencies will be replaced with values from that
+# file. For example, <%title%>, will be substituted by value of title
+# variable.
+# There is also some special variables, which have default values:
+# contents - list, which represents chm file contents and deftopic -
+# name of default page.
+
+templates_dir='/usr/share/archmage/templates/'
+
+# Directory for icons
+icons_dir='/usr/share/archmage/templates/icons/'
+
+# List of auxillary files, stored inside chm.
+# Those files would not be extracted.
+
+auxes=('/#IDXHDR', '/#ITBITS', '/#STRINGS', '/#SYSTEM', '/#TOPICS',
+ '/#URLSTR', '/#URLTBL', '/#WINDOWS', '/$FIftiMain', '/$OBJINST',
+ '/$WWAssociativeLinks', '/$WWKeywordLinks', ':')
+
+# Title. That is value, which you want to see in browser title.
+# 'sourcename' is the name of source file.
+from os.path import basename
+title=basename(sourcename)
+
+# Background and foreground colors for header.
+bcolor='#63baff'
+fcolor='white'
+
+# Filenames inside chm stored in utf-8, but links can be in some
+# national codepage. If you set fs_encoding such links would be
+# converted to it.
+fs_encoding='utf-8'
+
+# If your filesystem is case-sensitive, links in the html can point to
+# files that have differences in the case you need to set
+# filename_case to 1 in that case :-)
+filename_case=1
+
+# If you want to add javascript code for restore framing to every
+# page, set addframing.
+restore_framing=1
+
+# CHM2TEXT converting. Use following command to convert CHM content to plain
+# text and dump results into stdout. Make sure that below apps is installed
+# on your PC and they are accessible through $PATH
+#htmltotext='lynx -dump -stdin'
+htmltotext='elinks -dump'
@@ -0,0 +1,147 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+# arCHMage -- extensible reader and decompiler for files in the CHM format.
+#
+# Originally written by Eugeny Korekin <aaaz@users.sourceforge.net>
+# Significantly modified by Basil Shubin <bashu@users.sourceforge.net>
+#
+# Copyright (c) 2003 Eugeny Korekin <aaaz@users.sourceforge.net>
+# Copyright (c) 2005-2007 Basil Shubin <bashu@users.sourceforge.net>
+
+
+"""arCHMage -- extensible reader and decompiler for files in the CHM format.
+
+Usage: %(program)s [options] <chmfile> [destdir]
+Where:
+
+ -x / --extract
+ Extracts CHM file into specified directory. If destination
+ directory is ommited, than the new one will be created based
+ on name of CHM file. This options is by defaut.
+
+ -p number
+ --port=number
+ Acts as HTTP server on specified port number, so you can read
+ CHM file with your favourite browser. You can specify a directory
+ with decompressed content.
+
+ -d / --dump
+ Dump HTML data as plain text into standard output.
+
+ -V / --version
+ Print version number and exit.
+
+ -h / --help
+ Print this text and exit.
+"""
+
+import os
+import sys
+import getopt
+
+from archmod import __version__, message, error_msg
+
+from archmod.CHM import *
+
+program = sys.argv[0]
+
+EXTRACT = 1
+HTTPSERVER = 2
+DUMPHTML = 3
+
+COMMASPACE = ', '
+
+def usage(code=0, msg=''):
+ message(code, __doc__ % globals())
+ message(code, msg)
+ sys.exit(code)
+
+def file2dir(filename):
+ """ Convert filename.chm to filename_html """
+ dirname = filename.rsplit('.', 1)[0] + '_' + 'html'
+ return dirname
+
+def parseargs():
+ try:
+ opts, args = getopt.getopt(sys.argv[1:], 'xdp:Vh',
+ ['extract', 'dump', 'port=', 'version', 'help'])
+ except getopt.error, msg:
+ usage(1, msg)
+
+ class Options:
+ mode = None # EXTRACT or HTTPSERVER or other
+ port = None # HTTP port number
+ chmfile = None # CHM File to view/extract
+ dirname = None # Destination directory
+
+ options = Options()
+
+ for opt, arg in opts:
+ if opt in ('-h', '--help'):
+ usage()
+ elif opt in ('-V', '--version'):
+ message(0, __version__)
+ sys.exit(0)
+ elif opt in ('-p', '--port'):
+ if options.mode is not None:
+ usage(1, '-x and -p are mutually exclusive')
+ options.mode = HTTPSERVER
+ try:
+ options.port = int(arg)
+ except ValueError, msg:
+ usage(1, 'Invalid port number: %s' % msg)
+ elif opt in ('-x', '--extract'):
+ if options.mode is not None:
+ usage(1, '-x and -p are mutually exclusive')
+ options.mode = EXTRACT
+ elif opt in ('-d', '--dump'):
+ if options.mode is not None:
+ usage(1, '-d should be used without any other options')
+ options.mode = DUMPHTML
+ else:
+ assert False, (opt, arg)
+
+ # Sanity checks
+ if options.mode is None:
+ options.mode = EXTRACT
+
+ if not args:
+ usage(1, 'No CHM file was specified!')
+ else:
+ options.chmfile = args.pop(0)
+
+ # CHM content should be extracted
+ if options.mode == EXTRACT:
+ if not args:
+ options.dirname = file2dir(options.chmfile)
+ else:
+ options.dirname = args.pop(0)
+
+ # Any other arguments are invalid
+ if args:
+ usage(1, 'Invalid arguments: ' + COMMASPACE.join(args))
+
+ return options
+
+
+def main():
+ options = parseargs()
+ if not os.path.exists(options.chmfile):
+ error_msg('No such file: %s' % options.chmfile)
+
+ # Check where is argument a CHM file or directory with decompressed
+ # content. Depending on results make 'source' instance of CHMFile or
+ # CHMDir class.
+ source = os.path.isfile(options.chmfile) and \
+ CHMFile(options.chmfile) or CHMDir(options.chmfile)
+
+ if options.mode == HTTPSERVER:
+ CHMServer(source, port=options.port).run()
+ elif options.mode == DUMPHTML:
+ source.dump_html()
+ else:
+ source.extract(options.dirname)
+
+if __name__ == '__main__':
+ main()
Binary file not shown.
Oops, something went wrong.

0 comments on commit 526aec8

Please sign in to comment.