## Write to Files

To open a file, we use the `open` function pointing to an external file name, followed by a mode.
The mode 'w' instructs Python to write something onto the file.
At this point, it is important to note that we can **only write strings to files**. So you will have to do type conversion otherwise.

In [8]:
#Open for text output: create/empty
myfile = open("hello.txt", 'w')

myfile

#Write a line of text: string
myfile.write("Hello, world!\n")
myfile.write("I'm done writing.\n")
myfile.close()

In [5]:
myfile = open("child_folder/newfile.txt","w")
myfile.write("testing")
myfile.close()

In [7]:
myfile=open("hello.txt","a")
myfile.write("write here again again\n")
myfile.close()

- The file 'hello.txt' doesn't exist before. After `open`, it is created in the *same folder as the notebook file*.
- *(Tips)*: you can specify the directory using "/". For example, if I have a folder called "files" in the parent folder, we can create a file in it using "files/hello.txt".
- `.write()` is a method that writes a string to the opened file.
- **Important**: the "w" mode will override the existing file if it already exists.
- "\n" together represents the line break. Why do we need it?
- In general, "\\" followed by a character has a special meaning.
- `.close()` the file after writing.

In [9]:
myfile = open("hello.txt", "a")
myfile.write("Now I can keep working on it.\n")
myfile.close()

Changing the mode to "a" allows one to append to the file.

*Exercise*: create a file 'intro.txt' and write a short self-introduction. Then download it to your local laptop and open it using notepad.

## Read Files

For existing files, we can read it and manipulate the string.

In [10]:
myfile = open("hello.txt", "r")
entire_file = myfile.read()
myfile.close()
entire_file


"Hello, world!\nI'm done writing.\nNow I can keep working on it.\n"

In [11]:
myfile = open("sample_data/README.md", "r")
output = myfile.read()
myfile.close()
print(output)

This directory includes a few sample datasets to get you started.

*   `california_housing_data*.csv` is California housing data from the 1990 US
    Census; more information is available at:
    https://docs.google.com/document/d/e/2PACX-1vRhYtsvc5eOR2FWNCwaBiKL6suIOrxJig8LcSBbmCbyYsayia_DvPOOBlXZ4CAlQ5nlDD8kTaIDRwrN/pub

*   `mnist_*.csv` is a small sample of the
    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
    described at: http://yann.lecun.com/exdb/mnist/

*   `anscombe.json` contains a copy of
    [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet); it
    was originally described in

    Anscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American
    Statistician. 27 (1): 17-21. JSTOR 2682899.

    and our copy was prepared by the
    [vega_datasets library](https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/anscombe.json).



In [None]:
print(entire_file)

Hello, world!
I'm done writing.
Now I can keep working on it.



In [12]:
output

"This directory includes a few sample datasets to get you started.\n\n*   `california_housing_data*.csv` is California housing data from the 1990 US\n    Census; more information is available at:\n    https://docs.google.com/document/d/e/2PACX-1vRhYtsvc5eOR2FWNCwaBiKL6suIOrxJig8LcSBbmCbyYsayia_DvPOOBlXZ4CAlQ5nlDD8kTaIDRwrN/pub\n\n*   `mnist_*.csv` is a small sample of the\n    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is\n    described at: http://yann.lecun.com/exdb/mnist/\n\n*   `anscombe.json` contains a copy of\n    [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet); it\n    was originally described in\n\n    Anscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American\n    Statistician. 27 (1): 17-21. JSTOR 2682899.\n\n    and our copy was prepared by the\n    [vega_datasets library](https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/anscombe.json).\n"

- If the file doesn't exist, what would happen? (Try it)
- `.read()` reads in the whole file as a single string.
- `.readlines()` reads the file line by line.

In [13]:
myfile = open("hello.txt", "r")
lines = myfile.readlines()
myfile.close()
lines

['Hello, world!\n', "I'm done writing.\n", 'Now I can keep working on it.\n']

We can print the file in a nice form using the following.

In [15]:
#We also accomplish this with the readlines method
myfile = open('hello.txt', 'r')
#Puts each line as an element in a list
lines = myfile.readlines()
for line in lines:
    print(line, end = '')
myfile.close()

Hello, world!
I'm done writing.
Now I can keep working on it.


*(Question)*: what does `end=''` mean? What if we remove this?

### Read with String Methods

You may realize that the read strings are not in a clean format. We sometimes need to process the string and extract the data.
Next we create a new text file 'data.txt' in the same folder in order to do a more complicated task.
- We'll enter three
lines, each containing a name and an extension number, separate by commas.
- We'll then have a
program to read this file, go through each line, and get the name and the number. This is done by
splitting the line where there’s a comma to eventually form a list.
- We then use the contents of this list to
populate the 'employee' directory.

In [16]:
myfile = open('data.txt', 'w')
myfile.write("Mike, 1234\n")
myfile.write("Bob, 4567\n")
myfile.write("Steve, 8910\n")
myfile.close()

Then we read the file into a dictionary

In [19]:
employees = {}
myfile = open('data.txt', 'r') # open data.txt file in read-only mode

for text_line in myfile.readlines(): # Go through each line in the file
    # e.g. text_line = "Mike, 1234\n"
    mylist = text_line.split(',') # Create list from comma-separated items in the line
    # print(mylist)
    employees[mylist[0]] = int(mylist[1].strip(' \n')) # Add items - name and number to the dictionary
    # employees['Mike']
employees

{'Mike': 1234, 'Bob': 4567, 'Steve': 8910}

- Here we created a directory and used a For loop and split the contents of 'text_line' into multiple parts,
based on where the commas are and then returns the results as a list.
- The 'split' operation inserts 'Mike' into 'mylist\[0]' and '1234' into 'mylist\[1]'. Here we read the data and
parse it into a more structured format than just plain text.
It can also be used for tab-separated files. Instead of using ',', we can use
'\t' (i.e. backslash-t).
- To add items to a dictionary
we don’t need to perform any insertion function, we just provide the key and value.
- The 'strip('\n')' operation is used to remove excess data (i.e. newline characters) from the end of the
number. This allows us to clean up the data before it is used.
- When the For loop is completed, all lines have been processed and the dictionary is populated. The file
is then closed and the contents of the dictionary are displayed.


## String Formatting with f-string

Python provides several methods to format strings. While older methods such as using the % operator are still valid, the newer f-string approach is more concise, readable, and versatile, especially for Python versions 3.6 and later.


In [21]:
#Example with single integer

age = 150
sentence = f'I am {age} years old'
sentence

'I am 150 years old'

In [22]:
pi

NameError: name 'pi' is not defined

In [27]:
#String formatting with float
import math

print(f"The value of pi is {math.pi}")
print(f"The value of pi is {math.pi:.1f}")

The value of pi is 3.141592653589793
The value of pi is 3.1


The `.2f` after : means that I want to print the variable as a `f`loat with `2` decimal places

When we want to have multiple wildcards

In [28]:
name = "Joe"
GPA=3.99
num_years = 2
sentence = f"{name} has a GPA of {GPA:.1f} after {num_years} years."

sentence

'Joe has a GPA of 4.0 after 2 years.'

Now we can export our data to a CSV (comma-separated) file.

In [33]:
midterm_grade = {'Sam': 90, 'Katy': 95, 'Ben': 85}
myfile = open('grades.csv', 'w')
myfile.write("Student,Grade\n")
for (name,score) in midterm_grade.items():
    myfile.write(f"{name}, {score}\n")
myfile.close()


In [34]:
myfile.close()

In [32]:
midterm_grade.items()

dict_items([('Sam', 90), ('Katy', 95), ('Ben', 85)])

*(Exercise)*: 1. print the following speech to a file called 'speech.txt'.
```python
"""I have, myself, full confidence that if all do their duty, if nothing is neglected, and if the best arrangements are made, as they are being made,
we shall prove ourselves once again able to defend our Island home, to ride out the storm of war, and to outlive the menace of tyranny, if necessary for years, if necessary alone. At any rate, that is what we are going to try to do. That is the resolve of His Majesty's Government-every man of them.
That is the will of Parliament and the nation. The British Empire and the French Republic, linked together in their cause and in their need, will defend to the death their native soil, aiding each other
like good comrades to the utmost of their strength. Even though large tracts of Europe and many old and famous States have fallen or may fall into the grip of the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air,
we shall defend our Island, whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender, and even if, which I do not for a moment believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the seas, armed and guarded by the British Fleet, would carry on the struggle, until, in God's good time, the New World,
with all its power and might, steps forth to the rescue and the liberation of the old."""
```
2. Read the `speech.txt` and count the number of words, sentences and lines in the file.

In [35]:
speech = """I have, myself, full confidence that if all do their duty, if nothing is neglected, and if the best arrangements are made, as they are being made,
we shall prove ourselves once again able to defend our Island home, to ride out the storm of war, and to outlive the menace of tyranny, if necessary for years, if necessary alone. At any rate, that is what we are going to try to do. That is the resolve of His Majesty's Government-every man of them.
That is the will of Parliament and the nation. The British Empire and the French Republic, linked together in their cause and in their need, will defend to the death their native soil, aiding each other
like good comrades to the utmost of their strength. Even though large tracts of Europe and many old and famous States have fallen or may fall into the grip of the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air,
we shall defend our Island, whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender, and even if, which I do not for a moment believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the seas, armed and guarded by the British Fleet, would carry on the struggle, until, in God's good time, the New World,
with all its power and might, steps forth to the rescue and the liberation of the old."""
myfile = open("speech.txt","w")
myfile.write(speech)
myfile.close()

In [36]:
#lines
myfile = open("speech.txt" ,"r")
lines = myfile.readlines()
myfile.close()
len(lines)

6

In [38]:
#sentences
myfile = open("speech.txt" ,"r")
text_sen = myfile.read()
print(len(text_sen.split('.')))
myfile.close()

8
