# Working with Files

For working with your filesystem, you can use some standard UNIX-like commands that work directly on your system. 

**NB**: These may be different depending on your system. Windows systems tend not to have `ls`, for instance, but instead have `!dir`. 

Here are some commands that work on Linux and MacOS, at least: 

In [42]:
%pwd # Print working directory

'/home/jon/Code/course-computational-literary-analysis/Notes'

In [63]:
%ls # List files

01-intro-python.ipynb  02-files.ipynb


You can use `wget` to download a copy of _The Moonstone_, if you have `wget` on your computer. On MacOS, this might work with `curl -O` instead. 

In [6]:
!wget "https://raw.githubusercontent.com/JonathanReeve/course-computational-literary-analysis/master/Texts/moonstone.md"

--2019-07-15 13:26:40--  https://raw.githubusercontent.com/JonathanReeve/course-computational-literary-analysis/master/Texts/moonstone.md
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.188.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.188.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1095173 (1.0M) [text/plain]
Saving to: ‘moonstone.md’


2019-07-15 13:26:41 (19.6 MB/s) - ‘moonstone.md’ saved [1095173/1095173]



But you can also download it manually, [here on the course repository](https://github.com/JonathanReeve/course-computational-literary-analysis/blob/master/Texts/moonstone.md). You can also download all the course materials by clicking the green "download" button from [the course repo homepage](https://github.com/JonathanReeve/course-computational-literary-analysis). Then simply move the file into your working directory, that is, the directory Jupyter is working from. 

Alternatively, you can use the `requests` library: 

## Download a file with the Requests library

In [4]:
url = "https://raw.githubusercontent.com/JonathanReeve/course-computational-literary-analysis/master/Texts/moonstone.md"

In [64]:
import requests
moonstone = requests.get(url).text

In [65]:
len(moonstone)

1072272

Or, in Jupyter Lab, on the Files tab of the sidebar, you can press the button "upload" and select the file you want to work with.

# To Open and Read a File

First, make sure the file is in your current working directory. `%ls` lists all the files in your current directory. 

In [66]:
%ls

01-intro-python.ipynb  02-files.ipynb


Note: you could try `!dir` on Windows, if the above command doesn't work. 

In [44]:
# Using the full path to your file: 
fullPath = '/home/jon/Code/course-computational-literary-analysis/Texts/moonstone.md'
moonstone = open(fullPath).read()

# Or use a relative path: 
moonstone = open('moonstone.md').read()

# On windows, sometimes you have to do this. 
moonstone = open('moonstone.md', encoding='utf-8').read()

# Or this, if you get read errors: 
moonstone = open('moonstone.md', encoding='utf-8', errors="ignore").read()

In [15]:
len(moonstone)

1072272

Let's peek inside it, looking at characters 200-500: 

In [24]:
moonstone[200:500]

'XV\n- Second Period\n- Second Narrative\n- Third Narrative\n- Fourth Narrative\n- Sixth Narrative\n---\n\n## Prologue\n\nThe Storming of Seringapatam (1799)\n\nExtracted from a Family Paper\n\nI address these lines–written in India–to my relatives in England.\n\nMy object is to explain the motive which has induced '

The `\n`s are line breaks. So if we were to print the lines above, they would look more recognizable. Those line breaks, and the chapter headings, will allow us to split the file into parts and chapters: 

In [67]:
moonstoneParts = moonstone.split('\n## ')
firstPeriod = moonstoneParts[2]
firstPeriodParts = firstPeriod.split('\n### ')

## A simple word count comparison

In [232]:
wordList = ['sand', "moonstone", 'diamond']
headings = "{:15} " + ("{:35}-" * len(wordList))
print(headings.format("", *wordList))

for part in firstPeriodParts:
    firstLineBreakLocation = part.find('\n')
    chapterHeading = part[:firstLineBreakLocation]
    print("{:15}".format(chapterHeading), end=' ')
    for word in wordList:
        wordCount = part.lower().count(word)
        print("{:35}".format('*' * wordCount), end='-')
    print()

                sand                               -moonstone                          -diamond                            -
First Period                                       -                                   -*                                  -
Chapter I                                          -*                                  -******                             -
Chapter II                                         -                                   -***                                -
Chapter III                                        -*                                  -**                                 -
Chapter IV      *************************          -                                   -*                                  -
Chapter V       ***                                -**                                 -*************                      -
Chapter VI      ****                               -********                           -***********************************-


## Comparison across narratives

In [129]:
clack = moonstoneParts[4]
betteredge = firstPeriod

In [222]:
def compare(wordList):
    print("{:11} {:6} {:6}".format("","Bet","Clack"))
    for word in wordList:
        print("{:10}".format(word), end=": ")
        for text in [betteredge, clack]: 
            wordProportion = (text.lower().count(word)/len(text))*1000
            print("{:.4f}".format(wordProportion), end= ' ')
        print()

In [223]:
compare(['diamond', 'godfrey', '!', '(', 'lady', 'unlady', 'christian'])

            Bet    Clack 
diamond   : 0.3899 0.0767 
godfrey   : 0.1207 0.7615 
!         : 0.7774 1.4640 
(         : 0.4943 0.4368 
lady      : 0.7588 0.3365 
unlady    : 0.0000 0.0118 
christian : 0.0255 0.1535 
