# File handling

```py
with open(path, option) as mane:
    statements
    ...
```

options
 - "r" - read
 - "a" - append to a file (if it doesn''t exist - it will create the file)
 - "w" - write (if it doesn''t exist - it will create the file)
 - "x" - create a file, error if it already exists

In [138]:
with open("../Data/quotes.txt", "r") as f: # can instead use a variable (path = "../Data/quotes.txt")
    text = f.read()

text

'  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein\n\nTime is a drug. Too       much of it kills you.  -  Terry Pratchett\n\n\n An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr\n\n   Everything must be made as simple as possible. But not simpler. - Albert Einstein     \n\n\n  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  \n\nIf I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton'

In [139]:
print(text) # when we print \n we get new lines (was raw before) 

  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein

Time is a drug. Too       much of it kills you.  -  Terry Pratchett


 An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr

   Everything must be made as simple as possible. But not simpler. - Albert Einstein     


  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  

If I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton


"" Cleaning up quotes.txt

 - inspect txt-file manually (some prankster has added noise in form of whitespace and new lines)
 - remove leading and trailing whitespaces
 - remove excessive white spaces in between words
 - add quote numbers

In [140]:
path = "../Data/quotes.txt"
with open(path, "r") as f_read, open("../Data/quotes_clean.txt", "w") as f_write:

    for quote in f_read: # reads line by line and ends it with a new line
        print(quote)


  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein



Time is a drug. Too       much of it kills you.  -  Terry Pratchett





 An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr



   Everything must be made as simple as possible. But not simpler. - Albert Einstein     





  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  



If I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton


In [141]:
with open(path, "r") as f_read, open("../Data/quotes_clean.txt", "w") as f_write:
    for quote in f_read: # reads line by line and ends it with a new line
        print(quote, end="") # ends every line with nothing instead of a new line

  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein

Time is a drug. Too       much of it kills you.  -  Terry Pratchett


 An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr

   Everything must be made as simple as possible. But not simpler. - Albert Einstein     


  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  

If I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton

In [142]:
import re

with open(path, "r") as f_read, open("../Data/quotes_clean.txt", "w") as f_write:

    i=1

    for quote in f_read: # quote is a string, splits by newline
        quote = quote.strip() # Remove spaces at the beginning and at the end of the string:
        quote = re.sub(" +", " ",quote) # read up on regular expression! regex to substitute >1 whitespace with 1 whitespace
        quote = re.sub("\n+", "\n",quote) # does not work since the newlines in not in the strings
        
        if quote != "":
            f_write.write(f"{i}. {quote}\n")
            i += 1

        if quote != "": # quote with only new lines were replaced with ""
            print(repr(quote)) # representation

with open("../Data/quotes_clean.txt", "r") as g: # can instead use a variable (path = "../Data/quotes.txt")
    text2 = g.read()
print(text2)

'If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein'
'Time is a drug. Too much of it kills you. - Terry Pratchett'
'An expert is a person who has made all the mistakes that can be made in a very narrow field - Niels Bohr'
'Everything must be made as simple as possible. But not simpler. - Albert Einstein'
'Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie Curie'
'If I have seen further it is by standing on the shoulders of Giants. - Isaac Newton'
1. If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein
2. Time is a drug. Too much of it kills you. - Terry Pratchett
3. An expert is a person who has made all the mistakes that can be made in a very narrow field - Niels Bohr
4. Everything must be made as simple as possible. But not simpler. - Albert Einstein
5. Nothing in life is to be feared, it is only to be understoo

## Pick out the authors

 - find digit to find quote
 - extract first name and last name
 - join into full name
 - get unique values

In [143]:
with open("../Data/quotes_clean.txt", "r") as h_read, open("../Data/quotes_authors.txt", "w") as h_write:
    
    #creates a list line by line
    # strips away "\n"
    quotes = [quote.strip("\n") for quote in h_read.readlines()]

    authors = [quote.split()[-2:] for quote in quotes] # splits at " " as default

    # print(quotes)
    print(authors)

[['Albert', 'Einstein'], ['Terry', 'Pratchett'], ['Niels', 'Bohr'], ['Albert', 'Einstein'], ['Marie', 'Curie'], ['Isaac', 'Newton']]


In [144]:
name = [["Thomas", "Sjöstrand", "ko"]]

# joins all lists in list[0] separated with " " - This is a string method
" ".join(name[0])

'Thomas Sjöstrand ko'

In [145]:
with open("../Data/quotes_clean.txt", "r") as h_read, open("../Data/quotes_authors.txt", "w") as h_write, open("../Data/quotes_clean.txt", "a") as h_append:
    
    #creates a list line by line
    # strips away "\n"
    quotes = [quote.strip("\n") for quote in h_read.readlines()]

    # splits at " " as default
    # splits out everything after the second last element (-2)
    authors = [quote.split()[-2:] for quote in quotes] 
    print(authors)

    # joins all nested lists into one list
    # set (mägnd) gives the unique elements
    authors = set([" ".join(i) for i in authors])
    print(authors)

    h_write.write("\nAuthors: ")
    for i in authors:
        h_write.write(f"{i}, ")

    h_append.write("\nAuthors: ")
    for i in authors:
        h_append.write(f"{i}, ")

[['Albert', 'Einstein'], ['Terry', 'Pratchett'], ['Niels', 'Bohr'], ['Albert', 'Einstein'], ['Marie', 'Curie'], ['Isaac', 'Newton']]
{'Niels Bohr', 'Isaac Newton', 'Albert Einstein', 'Marie Curie', 'Terry Pratchett'}
