<img src="./intro_images/MIE.PNG" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Senior Lecturer Health Data Science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.PNG" width="30%" />
         </td>
     </tr>
</table>

# 11.0 File handling
****

#### About this Notebook
This notebook introduces working with common files in R. This includes how you can save, load and append to files.

<div class="alert alert-block alert-warning"><b>Learning Objectives:</b> 
<br/> At the end of this notebook you will be able to:
    
- Investigate core concepts of file handling with R

- Practice basic file handling tasks using R

</div> 

<a id="top"></a>

<b>Table of contents</b><br>

11.1 [Basic file handling](#basic)

11.2 [Working with other types of file](#other)

11.3 [Navigating the file system](#nav)

The ability of a programming language to access and manipulate files and folders in a given operating system (e.g. Window, Linux, MacOS etc.) is a poweful tool that programmers can use to automate many processes. This is also a key part of many web-based applications. 

Files are stored in folders (directories) on a given operating system. You may be familiar with selecting files and folders using software like the file explorer in Windows as seen in the image below where the <code>R</code> folder is inside another folder called <code>Documents</code>. 

<img src="./intro_images/files.PNG" alt="File path image" width="40%" align="left" />

The exact location of a given file on an operating can be found with something called a <code>file path</code>. An example for a filepath for an image of the author called <code>alan.PNG</code> is located in the following path: 

<code>C:\Users\Alan_Davies\NLP\alan.PNG</code>

This tells us several things. The first letter <code>C</code> is the hard drive the file is stored on (C is the default drive on a Windows machine). Then we have a number of folders separated by slashes. On some operating systems the slashes may be the other way around. In this example, on the hard drive <code>C</code>, there is a folder called <code>Users</code>, within this, another folder called <code>Alan_Davies</code>, then a folder called <code>NLP</code> which contains the file <code>alan.PNG</code>. The other point to note is the letters that come after the dot (period) in a file name. This is referred to as the <code>file extension</code> and determines what type of file it is. In this case it's an image, specifically a Portable Network Graphic (PNG). You may be familiar with other types of file such as a word document <code>.docx</code> or a Portable Document Format (PDF) file <code>.pdf</code> and so on. This example is depicted graphically below.

<img src="./intro_images/filesfolders.PNG" width="40%" align="left" />

<div class=accessibility>
<b>Accessibility:</b> The cell above shows the file path for an image called alan.PNG. The root of the path is the C drive. Inside the C drive, there is a folder named Alan Davies. Inside that folder there is another folder, named NLP. The file named alan.PNG resides in the NLP folder. The file path is C:\Alan_Davies\NLP\alan.PNG
</div>

<a id="basic"></a>
#### 11.1 Basic file handling

Let's try to open a file. Here we will create a variable that stores the path and filename of the required file. This file is a text file (<code>*.txt</code>) and contains the lyrics for the Nirvana song "Something in way", which is also the name of the file. The <code>./</code> is for a <code>relative</code> path (relative to this notebook). This means the folder can be accessed if we put the notebook in different places as long as there is a folder called <code>file_handling</code> in the same folder the notebook is in. This saves us having to hard code the specific path. For example when I wrote this notebook, the file was here: <code>C:\Users\Alan_Davies\Intro to programming (Python)\file_handling\Something in the way.txt</code>. I send these notebooks to someone who uploads them onto a server. The path could be quite different in this case. It might for example be: <code>E:\data_files\teaching\FBMH\SHS\intro_programming\file_handling\Something in the way.txt</code>. This is where the power of relative paths comes in. As long as the folder <code>file_handling</code> is relative to this notebook file, we don't have to explicitly add the full file path to work with the file. 

In [15]:
file_path <- "./file_handling/Something in the way.txt"

Next we can use <code>read.delim</code> to open the file passing in the file path variable. The <code>header</code> can be set to <code>FALSE</code> as there is no header.

In [28]:
f <- read.delim(file_path, header = FALSE)

In [29]:
print(f)

                                    V1
1                 Something In The Way
2                Underneath the bridge
3               Tarp has sprung a leak
4         And the animals I've trapped
5              Have all become my pets
6          And I'm living off of grass
7   And the drippings from the ceiling
8                It's okay to eat fish
9  'Cause they don't have any feelings
10           Something in the way, mmm
11     Something in the way, yeah, mmm
12           Something in the way, mmm
13     Something in the way, yeah, mmm
14           Something in the way, mmm
15     Something in the way, yeah, mmm
16               Underneath the bridge
17              Tarp has sprung a leak
18        And the animals I've trapped
19             Have all become my pets
20         And I'm living off of grass
21  And the drippings from the ceiling
22               It's okay to eat fish
23 'Cause they don't have any feelings
24           Something in the way, mmm
25     Something in the w

<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
1. Open the file in the location: <code>./file_handling/A/American Pie.txt</code> for reading.<br>
2. Output it's contents
</div>

In [4]:
file_path <- "./file_handling/A/American Pie.txt"
f = read.delim(file_path, header = FALSE)
print(f)

"EOF within quoted string"

                                                                                    V1
1                                                                         American Pie
2                                                             ...A long, long time ago
3                            I can still remember how that music used to make me smile
4                   And I knew if I had my chance that I could make those people dance
5                                                And maybe they'd be happy for a while
6                                                      ... But February made me shiver
7                                                         With every paper I'd deliver
8                                                             Bad news on the doorstep
9                                                        I couldn't take one more step
10                                                     ... I can't remember if I cried
11                                         

Another thing you are going to want to do when working with files is to put some error handling around using them to deal with issues like a file not being found etc. so your program doesn't just crash, but instead provides more meaningful error handling. Let's consider opening a file to read.

In [1]:
file_path <- "./file_handling/A/American Pie.txt"
if(file.exists(file_path)){
    cat("\nThe file exists, opening file....")
}else{
    cat("\nFile with name ", file_path, " not found. Unable to open.")
}


The file exists, opening file....

Here we checked first to see if the file exists using <code>file.exists()</code> before loading it. We can then control the error and inform the user rather than just crashing the program.

We can also write files as well. In the example below we concatenate the string <code>"./file_handling/A/"</code> with <code>"output_file.txt"</code> to produce <code>./file_handling/A/output_file.txt</code>. This is where we will save the file and what it will be called. Next we create a connection to the file and use <code>writeLines</code> to write a couple of lines of text to the file before closing the connection.

In [20]:
new_file_path <- "./file_handling/A/"
file_name <- paste0(new_file_path, "output_file.txt")

file_connection <- file(file_name)
writeLines(c("to be", "or not to be."), file_connection)
close(file_connection)

<div class="alert alert-success">
<b>Note:</b> R actually provides a lot of different support for writing files. One example is the <code>sink</code> function. You can also write to files using <code>cat</code>. For example: <code>cat("hello","world",file="output_file.txt",sep="\n",append=TRUE)</code>.
</div>

<a id="other"></a>
#### 11.2 Working with other types of file

Let's create some data and store it in a list. Here we have some details for a patient. We may want to save this data to a file or transmit it over a network. This could represent many things, such as settings options for a program or app.

In [2]:
my_data <- list(name="Paul Smith",
                id="1342",
                age=45,
                diagnosis="NIDDM",
                PMH=c("Hypertension", "IBS", "Bowel CA"))

If you want to make your programs more <code>interoperable</code> (work with other systems and languages) then you should consider using something like JSON (JavaScript Object Notation) instead. JSON is widely supported and is not limited to any specific programming language despite originating from JavaScript. For this we need to use the <code>rjson</code> library.

In [3]:
require(rjson)

Loading required package: rjson

Attaching package: 'rjson'

The following objects are masked from 'package:jsonlite':

    fromJSON, toJSON



In [4]:
json_data <- toJSON(my_data)

In [5]:
json_data

Once we have serialized the data into JSON format we can write the data to a file which we will call <code>health_record.json</code>.

In [7]:
file_path <- "./file_handling/B/health_record.json"
write(json_data, file_path)

We can then load the data back into a new variable called <code>health_data</code> and output its contents.

In [9]:
health_data <- fromJSON(file=file_path)
print(health_data)

$name
[1] "Paul Smith"

$id
[1] "1342"

$age
[1] 45

$diagnosis
[1] "NIDDM"

$PMH
[1] "Hypertension" "IBS"          "Bowel CA"    



<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
1. Using the method above, create some data in a list.<br>
2. Use <code>toJSON</code> to save the file in the folder <code>./file_handling/B</code>.<br>
3. Load the file into a new empty list and output its contents.<br><br>
<strong>Note:</strong> We don't provide a solution here as the data you choose to store will be decided by each individual. 
</div>

<div class="alert alert-success">
<b>Note:</b> Additional functionality exists through libraries to work with various file types such as Word documents and PDF files to name a few. For example the <code>docxtractr</code> library that can be used to extract comments and data tables from word documents.
</div>

<a id="nav"></a>
#### 11.3 Navigating the file system

As you can see, to use files we also have to be comfortable with navigating through files and folders. Below is an example of listing the file/folder structure from a root folder. In this case we start at the <code>file_handling</code> folder relative to this notebook.

In [24]:
list.files(path="./file_handling", pattern=NULL, all.files=TRUE, full.names=FALSE)

Another useful function in R is the <code>getwd()</code> that gives us the current working directory. We can alter this with <code>setwd()</code> if required:

In [21]:
getwd()

<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
1. Using <code>list.files</code> as shown above. Modify the code to only display text files.<br>
    2. List the files and folders in <code>./file_handling/B</code>.<br><br>
    <strong>Hint:</strong> The file extension for a text file is <code>txt</code>.
</div>

In [25]:
list.files(path="./file_handling", pattern="txt", all.files=TRUE, full.names=FALSE)

In [30]:
list.files(path="./file_handling/B", pattern=NULL, all.files=TRUE, full.names=FALSE)

### Notebook details
<br>
<i>Notebook created by <strong>Dr. Alan Davies</strong>.
<br>
&copy; Alan Davies 2022

## Notes:

In [1]:
# This cell maintains the accessibility of the notebook content.
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()