# file handling

```py
f = open() # this is not good practice, since you also manually need to close the file

# use this instead:
with open(path, option) as name:
    statements
    ...
```

options:
- "r" - read
- "a" - append to file (if it doesn't exist, it will create the file)
- "w" - write (if it doesn't exist, it will create the file)
- "x" - create a file, error if it already exists

In [3]:
# file in parent folder: "/../folder/name"
# file in level folder: "folder/name"
# file in same folder: "name"

file_path = "assets/quotes.txt"

with open(file_path, "r") as f:
    text = f.read()
    i = 2

print(text)

  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein

Time is a drug. Too       much of it kills you.  -  Terry Pratchett


 An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr

   Everything must be made as simple as possible. But not simpler. - Albert Einstein     


  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  

If I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton


## cleaning up quotes.txt

- inspect txt-file manually (note: random noise in form of whitespace and newlines)
- remove leading and trailing whitespaces
- remove excessive white spaces in between words
- add quote numbers

In [40]:
import re

file_path = "assets/quotes.txt"

with open(file_path, "r") as f_read, open("assets/quotes_clean.txt", "w") as f_write:

    row_count = 1
    for quote in f_read: # quote = each line in f_read (default for-loop behaviour)
        quote = quote.strip(" \n") # removes leading and trailing spaces and newlines
        quote = re.sub(" +", " ", quote) # regex to substitute (replace) -> one or more whitespace, replaced with single space

        # quote = quote.replace("\n", "") # can also be used instead of strip
        
        if quote != "": # if row is not empty (not newline)
            f_write.write(f"{row_count}. {quote}\n")
            print(f"{row_count}. {quote}\n")
            row_count += 1

1. If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein

2. Time is a drug. Too much of it kills you. - Terry Pratchett

3. An expert is a person who has made all the mistakes that can be made in a very narrow field - Niels Bohr

4. Everything must be made as simple as possible. But not simpler. - Albert Einstein

5. Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie Curie

6. If I have seen further it is by standing on the shoulders of Giants. - Isaac Newton



## pick out the quote authors

now we can see a pattern in the cleaned file, that's when you can start picking out strategies

- find digit to find quote
- extract first name and last names
- join into full name
- get unique values

In [41]:
with open("assets/quotes_clean.txt", "r") as f_quotes, open("assets/quotes_clean.txt", "a") as f_append:

    # readlines() -> each row in a list

    # reads in each line as a list
    # strips away "\n" (newlines)
    quotes = [quote.strip("\n") for quote in f_quotes.readlines()]

    # authors is a list of lists, where each element in the main list is a row of quotes
    # the sub-list gets the result of splitting the row on spaces, so that each word in that specific row is appended as an element to the sub-list
    #authors = [quote.split() for quote in quotes]

    # [-2] -> the last 2 elements (first and last name)
    authors = [quote.split()[-2:] for quote in quotes]

    print(authors)

    # set -> gives unique elements
    # joins string content of elements together, separated by space
    authors = set([" ".join(author) for author in authors])
    print(authors) # no longer in a nested list

    f_append.write("\nAuthors: ")
    for author in authors:
        f_append.write(f"{author}, ")

[['Albert', 'Einstein'], ['Terry', 'Pratchett'], ['Niels', 'Bohr'], ['Albert', 'Einstein'], ['Marie', 'Curie'], ['Isaac', 'Newton']]
{'Albert Einstein', 'Isaac Newton', 'Niels Bohr', 'Marie Curie', 'Terry Pratchett'}


In [33]:
name = [["Andreas", "Svensson"]] # list in a list, containing [0] first name, and [1] last name
" ".join(name[0]) # using string method join() for first sublist joins the elements together in a string, separated by " "

'Andreas Svensson'