A DH box for Miriam Posner and Ben Schmidt's 2016 workshops in Bethesda
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
FreshInstallationScript
docs
images
puppet
test
texts
.gitignore
README.md
Vagrantfile
r-packages.R
reprovision.sh

README.md

How to Set Up This Virtual Machine

In our workshop at the National Library of Medicine, we'll learn how to use special software for mining images and texts. The problem is that this software has lots of dependencies -- things that need to be installed in order for it to work -- and the process for installing all of this stuff is time-consuming and varies across different computers.

To simplify things, Ben and Miriam have set up a virtual machine, meaning a computer you install on your computer, that's set up with everything you need. In order to install the virtual machine on your computer, you'll first download some necessary software and files, and then download the VM itself.

In order to install this software, you'll need:

  • a fair amount of space on your computer (about 5GB) and a fair amount of RAM (about 4GB). [Find out how much RAM your computer has: Mac | PC. Find out how much disk space your computer has: Mac | PC. If your computer doesn't have enough space or RAM, and you can't borrow one that does, you'll need to let the workshop organizers know so they can make other arrangements.
  • a stable Internet connection. If your Internet connection gets interrupted, it could cause some hiccups with the installation process. So it's best if you can count on a stable Internet connection for at least a few hours.
  • some time. Depending on various factors, this process could take anywhere from 45 minutes to a few hours.
  1. Install Vagrant

Vagrant is the software we'll use to set up the virtual machine. You can download it just like you download any piece of software. In your browser, go to https://www.vagrantup.com/downloads.html and download the MacOS version. Once you've finished downloading Vagrant, install the software by double-clicking the package you downloaded, double-clicking Vagrant.pkg, and following the steps in the Vagrant Installer that launches.

If you're on a PC, download the version for Windows and install the software as you normally do.

  1. Install VirtualBox

VirtualBox is the second piece of software you'll use to set up your virtual machine. Point your browser to https://www.virtualbox.org/wiki/Downloads and download the version for Mac. Once you've finished downloading VirtualBox, install the software by double-clicking the package you downloaded, double-clicking VirtualBox.pkg, and following the steps in the VirtualBox Installer that launches.

If you're on a PC, download the version for Windows and install the software as you normally do.

  1. Download the zipped folder containing the setup instructions

Go to https://github.com/bmschmidt/medicalHeritageVM and click on the Download Zip button to download a zipped version of the setup instructions for the virtual machine we're using. Unzip the folder after you've downloaded it.

Make a note of where you saved the folder, which is called medicalHeritageVM-master. On many computers, things get saved to the Downloads folder by default. That's OK, but I'm going to drag the folder to my Desktop, just to make it a little easier to find.

  1. Open your Terminal

Terminal is an application on your Mac that allows you to interact with your computer using written commands. It lives inside your Utilities folder (inside Applications), or, to simplify things, you can just search for it using your Mac's Spotlight search tool.

In the image you see here, your terminal is waiting for you to give it instructions. You can tell it's waiting for you to tell it what to do because of the dollar sign ($) after the username.

On PCs, the terminal is called the command prompt. For most versions of Windows, you can access the command prompt by clicking on the Start menu and entering command in the Search box. Here's how to get to the command prompt in Windows 10.

  1. Navigate to the folder you downloaded in step three

This is the hardest thing you'll do in this installation process! We need to use the Terminal to navigate into the folder you downloaded in Step Three.

Luckily, Macs make this pretty easy to do. Just after the dollar sign in your terminal, type the letters cd followed by a space. ("CD" stands for "change directory.") Then, remember where you saved the folder you downloaded in Step Three. (As you'll recall, the folder is called medicalHeritageVM-master.) Drag the icon for that folder into the terminal and drop it there.

Windows computers don't allow you to do the cool folder-dragging trick, so we have to navigate into the medicalHeritageVM-master folder the old-fashioned way. If you think about your files on your computer, they form a sort of upside-down tree: the root is the C: drive, and the branches extend all the way through multiple folders into all the individual documents you have on your computer.

To move one level up the tree (toward its root) on the command line, type cd .. (that's the letters c and d followed by a space and two periods). To move down the tree, type cd (which stands for "change directory"), followed by a space and the name of the folder you want to move into. (Tip: If you type the first few letters of the folder, followed by the tab key, the command prompt can auto-complete the name of the folder.) Using these two commands, navigate through your files until you're just inside the medicalHeritageVM-master folder. You can tell you're in the right place if, when you type dir (which means "show me what's in this directory), you can see the files shown in the image in Step Seven. Once you see those files, you can skip to Step Eight.

  1. This is what your terminal should look like after you drop the folder

As you can see, your Terminal has helpfully supplied the path to the folder you just dopped into it. (A path works like a URL; it tells the computer where to go.) If your terminal looks something like mine, go ahead and press return. You've completed the hardest part!

  1. Check to see what's in the folder

After you've pressed return, notice that the command prompt changes a little bit to show that you're inside the medicalHeritageVM-master folder. Let's see what's inside the folder. To do that enter the letters ls, followed by return. ("LS" means list files inside whatever folder you're in.) You should see a list of files that looks like the list pictured in the image below.

  1. Tell Vagrant to download our computing environment

Now that we've downloaded the right software and made our way into the right folder, let's download the computing environment -- that is, the operating system plus the specific software and files -- we need so that everyone's on the same page at the workshop. Luckily, the file you downloaded in step three contains instructions for your computer that tells it how to get everything set up.

Getting this started is simple: just enter vagrant up and press return. Then it will take a long time for everything to download. You'll probably want to let this run and come back to it later. The process can take anywhere from half an hour to three hours.

While this is happening, your terminal screen will fill up with many messages. Some of them look alarming and say "Error," but that's OK. You shouldn't need to worry about them, but it you want, you can pretend you're typing them and impress onlookers by looking like a hacker.

When this process is complete, you'll see the command prompt again (your name followed by the dollar sign).

  1. Check to make sure Vagrant downloaded everything you need

Within the medicalheritageVM-master file you downloaded in Step Three, you'll find an images folder and a texts folder. Open the texts folder. You should see a file called sample_journals.zip. (In the image below, I've unzipped it.) That contains the journals you'll use during Ben's portion of the workshop.

In the images folder, you'll see a zipped file called xray, which will unzip into a folder called jpeg. That folder should contain many images of journal pages, which you'll use during Miriam's portion of the workshop.

  1. Put your VM to sleep 'til you need it again

From here on out, it'll be much faster to get your virtual machine up and running (which you'll do by typing vagrant up) because everything you need is already downloaded. Let's put our virtual machine to sleep for now, to save memory. To do that, type vagrant halt into your Terminal.

Now you're all set! Leave the medicalheritageVM-master folder where it is on your computer, because we'll be using it again during the workshop.

More details

For those curious about some more of the features, read on. But as long as the downloads are accomplished, You can rest safe for the session.

Testing

  1. Open your web browser and visit http://localhost:8007/D3. You should see a bar graph giving the names of the authors of the Federalist papers: type "upon" into the box and see if the bars move.
  2. Open your web browser and visit http://localhost:8787. You should see an RStudio login screen. Enter username vagrant and password vagrant and log in. You should now see a three-paned RStudio window.

Starting and stopping the virtual machine

Before you can use RStudio in your web browser, you have to start the virtual machine. That is what vagrant up does. (It's much faster after the first time, because there's no new software to install.) Once you are done working, you will want to reclaim the (large) amount of RAM required to run all this software locally. That is the purpose of the command vagrant halt.

Saving your work

When you are working in RStudio Server, your files live on the virtual machine's virtual hard drive. How do you get those files off the virtual machine and back to your regular hard drive so you can print them, e-mail them, back them up, etc.? The answer is that a special folder is shared between the virtual machine and your real hard drive. This is /vagrant. Any file you save there on the virtual machine will appear in the folder where you saved the Vagrantfile. The same process works in reverse.

Because /vagrant itself is cluttered with the files for running the vritual machine (Vagrantfile, etc.), you'll find it convenient to create a subfolder of this directory and use that as your usual working directory.

Attributions

This configuration is based on a repository by Dieter Menne and then heavily modified by Andrew Goldstone, who also made use of work by Lincoln Mullen.

What's installed and how to modify it

In case anyone wants to fork this repository for their own courses or other purposes, here's a little more detail about what's installed:

The virtual machine

The machine is the ubuntu/wily64 box on Atlas, i.e. Ubuntu 15.10.

The machine is configured with 2GB of RAM, which is fine for most pedagogical purposes. Some students will need to reduce this allocation before the VM can fit in their machine's physical RAM. Conversely, the matrices and arrays required for topic-modeling with MALLET consume a lot of RAM and may require a larger allocation. Edit the line in Vagrantfile reading

      v.memory = 2048

to change the allocation. The number is in megabytes. Use vagrant reload for the configuration to take effect.

User accounts

The machine configuration is governed by a Puppet manifest, rstudio-server.pp. The puppet script is creates a single user, vagrant, which is also the RStudio Server user. Don't deploy this image to the cloud (or to unsecured lab machines) without some better security configuration, since the username and password are here in the clear.

Software

It installs (not exactly in this order):

  1. The latest available R

  2. RStudio Server. The version is hardcoded, but you can change it by editing the line in the manifest that sets $rstudioserver (or change the full download URL by also changing $urlrstudio).

  3. Various supporting libraries, languages, and tools: Java, python, libxml2, Make, and so on.

  4. Some sample data files.

  5. R packages. Finally, the Puppet manifest causes a set of R packages to be installed. This process is governed by an R script, r-packages.R. There's nothing sophisticated here, just a list of packages to be installed from CRAN (in the variable packages). In principle, vagrant provision will cause these to be upgraded if more recent versions are available than those that are installed.