Skip to content

santhoshtr/wiki2cd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is a set of python program and scripts to download a selected a list of topics from wikipedia to a local machine. This can be used for creating an offline repository like wikipedia selected article’s CD/DVD distribution. The program was originally written for releasing a selected 500 articles of Malayalam wikipedia in CD. The program is written such a way that it can be reused with any wikiprojects to do the same kind of work.

“The Malayalam wikipedia: selected 500 articles” generated by this program is available at here
Usage

The program has a sample set of topics from english wikipdia and pre-configured to run on that sample. All you need to do is, download the program, run the wiki2cd.sh. A folder named samplewiki will be generated in the parent folder, and you can open index.html using your browser.With the pre-configured topics, you will get a repository like this



$cd wiki2cd

$./wiki2cd.sh


How to Customize?

   1. Prepare a topic list. The input to the program is a plain text file with each line containing the title of page. See the sample topicslist given along with the program. If you want to categorize the topics, give == before the title. This will be used for creating a navigation tree in the offline version. The number of = signes in the prefix of title will determine its position in the tree.
   2. Open wiki2cd.sh in any text editor. Change outputfolder=“../wiki” to the prefered output folder. The content extraction will happen to that directory. If the directory is not present, program will create it. Change the baseurl=“http://en.wikipedia.org” to your wiki’s base URL. Change topics=“topicslist.txt” to the topics list your prepared.



      #!/bin/bash

      #Change the following properties as per your requirement

      outputfolder=“../samplewiki”

      baseurl=“http://en.wikipedia.org”

      topics=“topicslist.txt”

   3. You also need to edit some of the pages like banner, titles, credits etc as per your requirement. And you might need to edit the banner images, main page image etc to fit your preference.
   4. ISO9660 file system has lots of limitations when it comes to unicode file names, and directory depths. The first part of the shellscript will create a repository with filenames same as article titles, image names same as original wiki image name. But this will cause problems most of the time when you try to burn the repository to a CD/DVD. So the shellscript will rename all the file names to numbers and move all images to wikiimages folder to reduce the directory depth. By default the script will make the repository suitable for writing into a CD/DVD. But if you prefer to keep the filenames and imagenames as such and not planning to write into CD, you can always comment out the section of the script that does renaming. More details is available in the script wiki2cd.sh as comment.

Contact

For any assistance contact the author santhosh dot thottingal at gmail dot com
Thanks


    * Hiran Venugopal for the artworks

    * Shiju Alex for testing, feature suggestions

License

The program is licensed under GPLv3+
Pages feed

    * Home


About

[ABANDONED] Tool to create an offline repository or CD from a selected list of topics from wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published