Skip to content
Encrypt and decrypt data frame or tibble columns using the strong RSA public/private keys
Branch: master
Clone or download
Latest commit e0aae79 Mar 21, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R CRAN checks Mar 21, 2019
data-raw Merge branch 'master' of github.com:SurgicalInformatics/encryptr Feb 25, 2019
data Merge branch 'master' of github.com:SurgicalInformatics/encryptr Feb 25, 2019
docs CRAN checks Mar 21, 2019
man
packrat #1 #2 requests. Feb 26, 2019
tests
.Rbuildignore Merge branch 'master' of github.com:SurgicalInformatics/encryptr Feb 25, 2019
.Rprofile
.gitignore CRAN checks Mar 21, 2019
.travis.yml
DESCRIPTION Merge branch 'master' of github.com:SurgicalInformatics/encryptr Mar 11, 2019
LICENSE
NAMESPACE
NEWS.md Merge branch 'master' of github.com:SurgicalInformatics/encryptr Mar 11, 2019
README.md
encryptr.Rproj website support Feb 22, 2019

README.md

TravisCRAN_Status_Badge

encryptr

Easily encrypt and decrypt data frame or tibble columns using RSA public/private keys

The encryptr package provides functions to simply encrypt and decrypt columns of data. The motivation is around sensitive healthcare data, but the applications are wide. There are a number of packages providing similar functions. However, they tend to be complex and are not designed with tidyverse functions in mind. The package wraps openssl and is intended to be safe and straightforward for non-experts. Strong RSA (2048 bit) encryption using a public/private key pair is used.

It is designed to work in tidyverse piped functions.

Installation

You can install encryptr from GitHub:

devtools::install_github("SurgicalInformatics/encryptr")

Documentation

Documentation is maintained at encrypt-r.org.

Getting started

The basis of RSA encryption is a public/private key pair and is the method used of many modern encryption applications. The public key can be shared and is used to encrypt the information.

The private key is sensitive and should not be shared. The private key requires a password to be set. This password should follow modern rules on password complexity. You know what you should do. If lost, it cannot be recovered.

Generate keys

The genkeys() function generates a public and private key pair. A password is required to be set in the dialogue box for the private key. Two files are written to the active directory.

The default name for the private key is:

  • id_rsa

And for the public key name is generated by default:

  • id_rsa.pub

If the private key file is lost, nothing encrypted with the public key can be recovered. Keep this safe and secure. Do not share it without a lot of thought on the implications.

genkeys()

Encrypt

An example dataset containing the addresses general practioners (family doctors) in Scotland is included in the package.

data(gp)

# A tibble: 1,212 x 12
   organisation_code name    address1 address2 address3 city  county postcode opendate   closedate  telephone practice_type
   <chr>             <chr>   <chr>    <chr>    <chr>    <chr> <chr>  <chr>    <date>     <date>     <chr>             <dbl>
 1 S10002            MUIRHELIFF ROMUIRHEAD NA       DUNDANGUS  DD2 5NH  1995-05-01 NA         01382 584
 2 S10017            THE BLCRIEFFKING STNA       CRIEPERTHPH7 3SA  1996-04-06 NA         01764 654

Encrypting columns to a ciphertext is straightforward. An important principle is dropping sensitive data which is never going to be required.

library(dplyr)
gp_encrypt = gp %>% 
  select(-c(name, address1, address2, address3)) %>% 
  encrypt(postcode, telephone)

gp_encrypt 

# A tibble: 1,212 x 10
   organisation_code name       address1      city  county postcode      opendate   closedate  telephone      practice_type
   <chr>             <chr>      <chr>         <chr> <chr>  <chr>         <date>     <date>     <chr>                  <dbl>
 1 S10002            619057f9954c39b3fa200DUNDANGUS  796284eb46ca1995-05-01 NA         5fcc30b04e2604
 2 S10017            371aa33c3a996d07a84d2CRIEPERTH639dfc076ae31996-04-06 NA         715909615a6ae4

Decrypt

Decryption requires the private key generated using genkeys() and the password set at the time. The password and file are not replaceable so need to be kept safe and secure.

gp_encrypt %>%  
  decrypt(postcode, telephone)
  
# A tibble: 1,212 x 8
   organisation_code city        county     postcode opendate   closedate  telephone    practice_type
   <chr>             <chr>       <chr>      <chr>    <date>     <date>     <chr>                <dbl>
 1 S10002            DUNDEE      ANGUS      DD2 5NH  1995-05-01 NA         01382 580264             4
 2 S10017            CRIEFF      PERTHSHIRE PH7 3SA  1996-04-06 NA         01764 652283             4

Using a lookup table

Rather than storing the ciphertext in the working dataframe, a lookup table can be used as an alternative. Using lookup = TRUE has the following effects:

  • returns the dataframe / tibble with encrypted columns removed and a key column included;
  • returns the lookup table as an object in the R environment;
  • creates a lookup table .csv file in the active directory. file of the lookup
gp_encrypt = gp %>% 
  select(-c(name, address1, address2, address3)) %>% 
  encrypt(postcode, telephone, lookup = TRUE)
  
Lookup table object created with name 'lookup'
Lookup table written to file with name 'lookup.csv'

gp_encrypt

# A tibble: 1,212 x 7
     key organisation_code city        county     opendate   closedate  practice_type
   <int> <chr>             <chr>       <chr>      <date>     <date>             <dbl>
 1     1 S10002            DUNDEE      ANGUS      1995-05-01 NA                     4
 2     2 S10017            CRIEFF      PERTHSHIRE 1996-04-06 NA                     4

The file creation can be turned off with write_lookup = FALSE and the name of the lookup can be changed with lookup_name = "anyNameHere".

Decryption is performed by passing the lookup object or file to the decrypt() function.

gp_encrypt %>%  
  decrypt(postcode, telephone, lookup_object = lookup)

# A tibble: 1,212 x 8
   postcode telephone    organisation_code city        county     opendate   closedate  practice_type
   <chr>    <chr>        <chr>             <chr>       <chr>      <date>     <date>             <dbl>
 1 DD2 5NH  01382 580264 S10002            DUNDEE      ANGUS      1995-05-01 NA                     4
 2 PH7 3SA  01764 652283 S10017            CRIEFF      PERTHSHIRE 1996-04-06 NA                     4
gp_encrypt %>%  
  decrypt(postcode, telephone, lookup_path = "lookup.csv")

# A tibble: 1,212 x 8
   postcode telephone    organisation_code city        county     opendate   closedate  practice_type
   <chr>    <chr>        <chr>             <chr>       <chr>      <date>     <date>             <dbl>
 1 DD2 5NH  01382 580264 S10002            DUNDEE      ANGUS      1995-05-01 NA                     4
 2 PH7 3SA  01764 652283 S10017            CRIEFF      PERTHSHIRE 1996-04-06 NA                     4
 

Providing a public key

In collaborative projects where data may be pooled, a public key can be made available by you via a link to enable collaborators to encrypt sensitive data, e.g.

gp_encrypt = gp %>% 
  select(-c(name, address1, address2, address3)) %>% 
  encrypt(postcode, telephone, public_key_path = "https://argonaut.is.ed.ac.uk/public/id_rsa.pub")

Not a hash

The ciphertext produced for a given input will change with each encryption. This is a feature of the RSA algorithm. Ciphertexts should not therefore be attempted to be matched between datasets encrypted using the same public key. This is a conscious decision given the risks associated with sharing the necessary details (a salt).

Caution

All confidential information must be treated with the utmost care. Data should never be carried on removable devices or portable computers. Data should never be sent by open email. Encrypting data provides some protection against disclosure. But particularly in healthcare, data often remains potentially disclosive (or only pseudonymised) even after encryption of identifiable variables. Treat it with great care and respect.

You can’t perform that action at this time.