Skip to content

A nice UI for OCLC's EZProxy based on jQuery.mmenu (and optionally data from https://rena.mpdl.mpg.de/rena/ )

License

Notifications You must be signed in to change notification settings

bibliocoll/ezmenu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EZMENU

use EZProxy with style

This repository contains our custom UI for OCLC's EZProxy.

Links to proxied resources are presented in a MMenu, which is injected into all pages visited via the proxy server.

Version History

  • 0.2.0: Typescript/webpack5 rewrite, no more jQuery, EZJump (Q4 2021)
  • 0.1.3: ReNa-Backend uses JSON instead of JSONP (Q2 2019)
  • 0.1.2: demo server (Q3 2016)
  • 0.1.1: dependency updates, some css changes (Q3 2016)
  • 0.1.0: initial release (Q2 2016)

Planned changes

  • none currently

To Do, Help Wanted:

  • Refactor grabrena.py until the salient parts can be used stand-alone
  • Sanitize network interactions for XSS vectors (somewhat done?)
  • Loading fonts from the EZProxy webserver fails due to lack of CORS headers
  • document the build process better
  • Better error handling/messages
  • LocalStorage for JS modules? (currently we rely on browser caching for our 160kb blob)

New: EZJump

We've built a little demo for the EZJump form, for those that just want to use that without the menu, see ezjump/

Concept

EZProxy has a Find/Replace directive system that allows manipulation of proxied content. We use that to insert a single <script>-tag into every proxied page, which loads our menu code. The menu structure is generated on the fly based on (locally cached) JSON files stored on the EZProxy web server. Said JSON is generated by querying MPG ReNa through the proxy, so that all URLs to proxied resources are transformed to proxy-by-hostname URLs, if they match a host EZProxy is configured for.

Note: The frontend side (the menu-injection part) of this project should be useful for anyone running an EZProxy installation, but we assume a specific source for the menu content: We (only) proxy resources contained in the MPG ReNa database, which provides us with a JSON API for resource collections. In the likely case that you are not a Max Planck Institute, you will have to come up with your own way of getting your menu entries into a usable (JSON-)format or adjust our code accordingly. We're happy to include your pull requests and move the MPG-related stuff to a branch.

Setting up the development environment

To get this running and deployed on your EZProxy setup, quite a bit of (configuration) work is required. We'll get you started on that path with a little demo of the UI, which you can play with right away.

Prerequisites

On your (linux/mac/cygwin/WSL) development box:

Later, on your EZProxy box:

  • Netfilter (or any other firewall you know how to use)
  • Python3
  • Python3 Requests

The client part of this project is written in Typescript and SASS. Both languages are not (yet) natively supported by web browsers and need to be compiled into (ES6) Javascript and CSS, respectively. We have tooling in place to do this for you, but please keep this in mind while working with the code.

Getting all the parts together

Git clone this repository, and then open a shell in the newly created directory (if in doubt: the 'ezmenu' dir with the package.json file in it) and run:

npm install

this will pull in all the project dependencies from npm. We're using dart-sass to compile SASS to CSS.

Once that is complete, you can run:

npm start

in the project directory to start a local web server with a little demo page (navigate to http://localhost:8080 to see it). It will be really slow, but you should see a blue "GO"-button to the lower left of your screen.

While npm start is running, webpack5 is watching the TypeScript and SASS files in frontend/src and frontend/sass/, and any changes made will be reflected in the demo webserver. Note that webpack-dev-server does not write anything to disk. The files in the demo directory are just scaffolding, best not edit them (but feel free to look at them, of course).

We'll get you started with an overview of the frontend/menu code next, then explain the configuration options and deployment.

Frontend

Preface: It is not required to understand the code to get this running on your installation, but in case you're curious, take a look at the files in frontend/src/ while you read this chapter (start with frontend/src/index.ts). If you're familiar with JavaScript but new to Promises, JavaScript Promises should have you covered. If you do not care for the code, please do skim this chapter regardless, as it contains explanations that you will likely need to take into consideration later.

The interface is based on MMenu, and we populate the menu with content loaded from JSON files. The expected format of these files is documented at the end of this document under Data Types, and codified in the class definitions in SetlistItem.ts and SetlistCollectionItem.ts in frontend/src/common/, which also hold the code that transforms each data item into the DOM elements that form a menu entry.

The JSON files are expected to be placed in the loggedin/ directory of the EZProxy internal web server, and they are queried by our script as follows: First, the file setlist.json is fetched. It contains an Array of SetlistItems with titles, IDs/filenames and timestamps for each category to be added to the menu. The Setlist should be less than 1 kB in size and is fetched each time the menu is generated.

Then the script will check localStorage for entries representing the content of each collection represented by a SetlistItem, and if present, compare timestamps with the freshly downloaded Setlist. If the local data is still up-to-date, it is used, otherwise the corresponding JSON file will be downloaded from the EZProxy web server (and later stored in localStorage). Either way, once all files are either available or the download attempts have timed out, a HTML <nav>-Element containing a two-level tree structure of list items representing the menu content is generated, and MMenu is initialized on that structure.

We currently do not support sub-sub-menus, simply because no-one has asked for that feature. There's probably only some type-system magic to be performed to allow this.

Since this menu is to be injected into proxied-by-hostname pages (which reside at subdomains like http://journal.domain.name.YOUR-EZPROXY.TLD) and localStorage is governed by browser same-origin rules (which mandate fully matching hostnames), the localStorage part of this script resides inside an iframe. This iframe always loads the same URL and thus has access to the same localStorage location, no matter what page/subdomain the main script is injected into. frontend/src/implant.ts holds the code that runs inside the iframe. Both scripts communicate via the Channel Messaging API to ferry menu content from the iframe to the UI building parts of the script.

Please note that privacy enhancing browser plugins or settings can prevent this setup from executing. You should educate users to make sure that all proxy-by-hostname subdomains allow opening an iframe to https://YOUR-EZPROXY.TLD with running JavaScript inside. Also, disallowing localStorage (or using a browser without a localStorage implementation) will result in more network load and thus a slower UI experience.

Frontend Configuration

At the very least, you will have to tell this script the hostname of your EZProxy installation, but you might also want to switch out the search form for your own and so on. You could also take a moment to browse mmenu examples to get a feel for what else is possible with the menu. If your list of resources is sorted alphabetically, there is some visual sugar you might want to use (look for lines commented out in frontend/src/menucfg.ts)

We do not require you to keep the copyright notice at the bottom of the menu, we mainly put it there to prevent users from triggering bottom-of-screen interactions while using the menu (this does not mean you are not bound by the (A)GPL when using our code).

Open frontend/src/menucfg.ts in your editor and search for 'XXX' to find the three spots you will have to edit to make the script ready for deployment on your EZProxy installation. Please refer to the comment fields in that file for additional explanations.

Frontend Deployment

Attention:

  • This will only give you an empty menu unless you have matching JSON files in place as well
  • This will not inject the menu into any proxied pages yet, you might want to read the rest of this document before deploying anything anywhere ;)

Stop the npm start process (CTRL-C in the console), edit the 'XXX'-marked spots in frontend/js/ezmenu/menucfg.js so they point to your EZProxy installation and then run:

npm run build

this will run webpack in production mode, which creates a dist/ folder for you, the contents of which you can drop into the 'docs/' directory of your EZProxy server.

Frontend Injection

To get our menu into a proxied page, we need to add a <script> tag to the HTML of that page. We're using the EZProxy Find/Replace directive to that end. EZproxy resources are configured as config.txt Database Stanzas, and we need to append our Find/Replace code to all of them.

During testing, we've found some websites that contain '</head>' as part of a string inside a <script> tag. To prevent our Find/Replace code from being triggered by that, we make use of states to ensure we only inject our <script> tag before the closing </head> tag of the HTML actually being rendered, and not into some JavaScript string. This leads to a rather long-ish addendum to each Stanza:

Find <head
Replace -AddState=inHtml+notInScript <head
Find <script
Replace -RemoveState=notInScript <script
Find </script
Replace -AddState=notInScript </script
Find -State=inHtml+notInScript </head>
Replace <script type="text/javascript" src="https://YOUR-EZPROXY.TLD/loggedin/injectmenu.js" defer="defer"></script></head>

You could add this to each Stanza by hand, but backend/grabrena.py contains code to handle this for you. However, since Database Stanzas are rather loosely structured, certain conventions have to be followed to make them readable by the script:

  • We require empty lines between Stanzas, and only between Stanzas
  • Only a MimeFilter line may be placed between Title and URL lines
  • Only comment lines (preceded by '#') or the keywords Option, ProxyHostnameEdit, MimeFilter, NeverProxy, AnonymousUrl, HTTPHeader, and Cookie may precede a Title or URL line
  • Everything between the Title/URL pair and the next empty line is considered part of the Stanza

You can find our Regular Expression that tries to identify a Stanza here.

Backend

To populate the menu with links, JSON files need to be placed in the loggedin/ directory of the EZProxy web server. The names are required to be setlist.json for the definitions of the first level of the menu, and an alphanumeric id of each submenu plus .json for the definitions of each submenu. Please see Data Types below for details and the demo/loggedin/ folder for examples.

This is probably the part where you will have to put in the most work yourself, unless you are a library of a Max Planck institute and you have configured resource collections on the ReNa VuFind installation ("predefined sets"). If you are one of the lucky few that fit the description, backend/grabrena.py can get all "folders"/collections you have configured on ReNa and create the required JSON files from these.

The reasoning behind using ReNa data for the menu structure is based on the fact that the EZproxy configuration "thinks" in webservers, while users are more likely to think in journal collections, databases or search engines, which usually are not quite congruent with each other (some websites hold spades of journal collections). A menu based on the EZproxy Stanzas would be accurate, but less user-friendly. ReNa also sports descriptions and indexing fields we can use.

The highway: ReNa

Find backend/grabrena.py, and run it once. It will do nothing but create a grabreny.ini in the same directory, which contains all configurable values with more-or-less sensible defaults and explanations.

This script is a horrible monolithic mess, and we would like to apologize in advance to anyone who needs to break it open and salvage it for usable parts. Pull Requests with a modular structure and a proper separation of concerns are very welcome.

With that said, here's what the script does:

  • (optionally) Run svn up on a pre-existing svn repository tracking the eResources.txt repo the MPDL provides
  • (optionally) Add our Find/Replace code to each Stanza in a copy of that file and move it over into the EZProxy configuration directory
  • (optionally) Restart EZProxy if there were changes (see below)
  • Query ReNa through the (freshly restarted) EZProxy for the list of predefined sets for your MPI
  • Query ReNa again for the content of each set (and for one collection aptly named "Everything")
  • Filter/reformat the received data and place it in JSON files on the EZProxy internal web server

The script is meant to be run via cron in the early morning, after the nightly update to ReNa has been completed.

Since the JSON(P) data provided by ReNa is polled through the proxy, and you will have added a HostJavascript entry for rena.mpdl.mpg.de to your EZProxy configuration (see below), any URLs inside the JSON that the proxy recognizes as resources-to-be-proxied are rewritten in transit and now point to the appropriate proxy-by-hostname subdomains.

Adding the grabrena user

in the user.txt file of your EZProxy installation, add something like the following above your other user definitions:

::group=RenaOnly
grabrena:12345

::group=Default
#your user definitions go here, ie: ::LDAP ...

this will create a user group named "RenaOnly", and add the user "grabrena" with password "12345" to it, and then starts defining the group "Default", which will contain all your user definitions that follow further down in the file. Obviously, you want to pick a proper password.

In config.txt, you can now reference the RenaOnly group and allow it to access ReNa, and prepend the Default group declaration to the definition of your remaining Stanzas, which will limit access to users in that group (which should be everyone but the grabrena user, effectively locking out the latter from anything but ReNa).

Group RenaOnly
Title ReNa
MimeFilter application/json .* javascript
URL https://rena.mpdl.mpg.de
HJ rena.mpdl.mpg.de
HJ https://rena.mpdl.mpg.de

Group Default
#your resource definitions go here, ie: IncludeFile config/eResources.txt

If you have a multi-tiered user setup, you should be able to adapt this to your needs.

Restarting EZProxy on changes

To make this work without running grabrena.py as root (which we discourage), EZProxy needs to run on non-priviledged ports, so it can be (re)started by a non-root user. Change your config.txt to something akin to this:

RunAs someuser:someuser

LoginPort 80 -Virtual
LoginPort 8080

LoginPortSSL 443 -Virtual
LoginPort 8443

in whatever you're using to configure the firewall on your machine, do the equivalent of these iptables port redirection rules (repeat for additional interfaces as needed if the machine is multi-homed):

iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 80 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 443 \
-j REDIRECT --to-port 8443

you will also want to allow access to the ports you just directed the traffic to with something like this:

iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -p tcp --dport 8443 -j ACCEPT

If you are running backend/grabrena.py via cron on your EZProxy machine, you can set proxy_login_port in the ini-file to the port you redirect HTTPS traffic to. The script will use that port for all URLs it queries at/via the proxy, so connections from localhost that are not governed by firewall redirects will not fail.

The goat path: Rolling your own

Read the above chapter to see how we did it in our case, and then decide how to proceed. The following is a loose work-in-progress collection of hints we hope are helpful (please send PRs to add your own):

eResources.txt

Our setup assumes that there exists a file that contains only Stanzas and that is imported into the main config.txt file via an IncludeFile directive. If you adhere to this, you can use backend/grabrena.py to append the Find/Replace lines to each Stanza in your file. The script will complain a lot, but it will do the job. Note however, that it is not equipped to properly deal with anything but Stanzas.

Make sure to read the Frontend Injection chapter again. You will have to put the following into your grabrena.ini:

svn_up_path: /path/to/your/stanza-file.txt
eres_path: /path/to/the/resulting/stanza-with-findreplace-file.txt
force_eRes_update: Yes
injection_url: https://YOUR-EZPROXY.TLD/loggedin/injectmenu.js

Security

We have attempted to mitigate against the most obvious XSS vectors in our JavaScript, but we would love some extra sets of eyes on that front. At the time of writing, snyk did not report any issues in our code or dependencies.

Data Formats

The main menu structure is defined by Array elements in a ReNa PredefinedSet JSON answer, where each entry / submenu heading is defined like this:

getSets([{
  "id":"000007897",
  "name":"Collective Goods' Selection",
  "fullName":"MBRG:Collective Goods' Selection",
  "url":"https://rena.mpdl.mpg.de\/rena\/Search\/Results?filter%5B%5D=inst_txtF_mv%3A%22MBRG%22&filter%5B%5D=predef_txtF_mv%3A%22MBRG%3ACollective+Goods%27+Selection%22"
}, ... ]);

which gets transformed by grabrena.py and stored in an Array element in the setlist.json file (and parsed into a SetlistItem by lslib.js):

// filename: setlist.json
[{
  "id": "000007897",
  "timestamp": "1449151638",
  "logo": "one",
  "name": "Collective Goods' Selection"
}, ... ]

The submenus are created from entries in ReNa collection lists (Dictionary entries):

fromRemote000007897({ "ERS000000002":{
  "title":"Academic Search Premier (EBSCO)",
  "description":"Academic Search Premier contains full text for nearly 4,500 journals, including more than 3,600 peer-reviewed titles. In addition to the full text, this database offers indexing and abstracts for all 8,144 journals in the collection.",
  "genre":["Fulltext Database"],
  "topic":["Multidisciplinary"],
  "language":["English"],
  "title_short":"Academic Search Premier (EBSCO)",
  "prov_txt_mv":["EBSCO"],
  "subject_txt_mv":["Multidisciplinary"],
  "keyword_txt_mv":["Ethnic studies","Medical sciences","Arts and literature","Language and linguistics","Chemistry","Physics","Engineering","Computer sciences"],
  "scope_txtF_mv":["MPG"],
  "naturl_str_mv":["http://search.ebscohost.com\/login.asp?profile=ehost&defaultdb=aph"],
  "access_txtF":"SUBSCRIPTION"
}, ... });

converted to Array elements in a Collection JSON file by grabrena.py (and parsed into SetlistDataCollectionItems by lslib.js):

// filename: 000007897.json
// there needs to be one json file for each id in setlist.json
// note that this is an object that contains an array
{
  "name": "Collective Goods' Selection",
  "id": "000007897",
  "data": [{
    "title": "Academic Search Premier (EBSCO)",
    "url": "http://search.ebscohost.com.go.coll.mpg.de/login.asp?profile=ehost&defaultdb=aph",
    "proxied": true,
    "free": false,
    "desc": "Academic Search Premier contains full text for nearly 4,500 journals, including more than 3,600 peer-reviewed titles. In addition to the full text, this database offers indexing and abstracts for all 8,144 journals in the collection."
  }, ... ]
}

backend/grabrena.py uses OrderedDict instead of dict to represent JSON objects/dictionaries in Python to preserve the sequence of items received from ReNa.