# Text Editors

## Why Do I need to Learn How to Use a Text Editor?

We are all used to working in graphical computing environments where we have multiple windows that we can interact with using a mouse or a track pad, in addition to our keyboard.

However, in scientific computing, we often need to connect remotely to another computer. Frequently in these circumstances we will connect in a way that we do not have a graphical environment that recognizes our mouse (or equivalent), yet we will still need to edit files.

The bash terminals that we use in this Jupyter environment is an example of a non-graphical connection.

## What Text Editors are Available To Use?

Unix/Linux comes with a number of non-graphical text editors including 
* [``nano``](https://en.wikipedia.org/wiki/GNU_nano)
* [``emacs``](https://en.wikipedia.org/wiki/Emacs)
* [``vi`` (actually ``vim`` "**vi** i**m**proved")](https://en.wikipedia.org/wiki/Vi). 

``nano`` is the newest of the three editors. ``emacs`` and ``vi`` are among the oldest programs in use in computing. There is almost a religious divide between people who use ``emacs`` and people who use ``vi``. We will spend a little bit of time learning how to use ``vim``. Our choice of ``vim`` is motivated by the fact that 
1. it is the default editor and Linux (and thus will likely come up when we forget to create a commit message with git (future lesson))
1. most of the people preparing these lectures prefer vim to emacs.

We've included a number of cheat sheets for vi in this repository (in the Resources directory)

* [vi/vim cheat sheet and tutorial (pdf)](../Resources/vi-vim-cheat-sheet-and-tutorial.pdf)
* [vi reference card (pdf)](../Resources/virefcard.pdf)


## Learning Objectives

1. We will be able to open a text document in vim.
1. We will be able to navigate the document using basic keyboard commands
1. We will be able to do basic search and replace in vim.

### Opening Documents

To open ``vim``, in the terminal we would type

```bash
vim FILENAME
```

If ``FILENAME`` exists, it will be opened for editing. If ``FILENAME`` does not exist, a buffer connected to that name will be created but a file with that name will not be created until the buffer is saved.

### Command Mode/Edit Mode

Because ``vim`` uses the same keys for navigating the document and editing (typing) the document, it has two modes:

1. command mode
1. edit mode

In **command mode**, for example, ``j`` takes you to the next line of the document, while in **edit mode** ``j`` types a j.

#### Import Command Mode keys
* ``j``: move to next like
* ``k``: move to previous line
* ``l``: move right on the current line
* ``h``: move left on the current line
* ``w``: move forward to the next word
* ``b``: move backwards to the previous word
* ``i``: enter edit mode before the current character
* ``a``: enter edit mode after the current character
* ``I``: enter edit mode at the beginning of the current line
* ``A``: enter edit mode at the end of the current line
* ``:w``: save (write) the file
* ``:x``: save the file and exit
* ``:q``: quit without saving
* ``:set``: set a vim behavior
    * ``:set num``: number the lines
    * ``:set ai``: set auto indent
    * ``:set wrap``: wrap lines wider than the terminal width
* ``/``: search for a pattern
    * ``n``: go to the next match of pattern
    * ``p``: go to the previous match of pattern
    
#### Import Edit Mode Key

* ``esc``: switches from edit mode to command mode

#### Read-only mode

If we want to make sure we do not modify a file, we can open it in **read-only** mode by providing a ``-R`` flag (note ``-r`` tries to **recover** the file).

### Exercise


In the terminal type

```bash
vim /home/jovyan/DATA/Misc/doid.obo
``` 

Since ``jovyan`` does not have write privileges for this file, ``vim`` will open it in read-only mode.

Navigate and search through the document to answer the following questions:

1. On what line is the ``DOID`` for a renal Wilms' tumor defined?
1. What is the name of the term defined two terms after "generalized anxiety disorder" is defined?
1. What is the name of the parent term for a melanoma (DOID:1909)?

## Exercise

The Health Insurance Portability and Acountability ACT (HIPAA)is a law that was passed in 1996. It includes a [privacy rule](https://www.hhs.gov/hipaa/for-professionals/privacy/index.html) that governs how health information must be treated to ensure the privacy of the individuals from whom the data were generated.

```
(A) Names

(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
    (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and
    (2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000

(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

(D) Telephone numbers

(L) Vehicle identifiers and serial numbers, including license plate numbers

(E) Fax numbers

(M) Device identifiers and serial numbers

(F) Email addresses

(N) Web Universal Resource Locators (URLs)

(G) Social security numbers

(O) Internet Protocol (IP) addresses

(H) Medical record numbers

(P) Biometric identifiers, including finger and voice prints

(I) Health plan beneficiary numbers

(Q) Full-face photographs and any comparable images

(J) Account numbers

(R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of this section [Paragraph (c) is presented below in the section “Re-identification”]; and

(K) Certificate/license numbers
``` 
([HHS ](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html))

There are a variety of kinds of biomedical data, including structured data such as lab reports which are relatively easy to de-identify, and unstructured data such as dictated reports, which can be difficult to de-identify. The vast majority of biomedical information about a patient is captured in some form of textual data.

Edit de-identified report.