Beata Sirowy
# __Files and exceptions__
Based on Matthes, E. (2023) _Python: Crash Course_

To work with the contents of a file, we need to tell Python the path to
the file. A path is the exact location of a file or folder on a system. Python
provides a module called pathlib that makes it easier to work with files and
directories.

In [None]:
from pathlib import Path

path = Path("pi_digits.txt")
contents = path.read_text()
print(contents)

3.1415926535
  8979323846
  2643383279



Python’s rstrip() method removes, or strips,
any whitespace characters from the right side of a string.

In [None]:
from pathlib import Path

path = Path("pi_digits.txt")
contents = path.read_text()
contents = contents.rstrip()
print(contents)


3.1415926535
  8979323846
  2643383279


We can strip the trailing newline character when we read the con-
tents of the file, by applying the rstrip() method immediately after calling

read_text():

In [None]:
contents = path.read_text().rstrip()
print(contents)


3.1415926535
  8979323846
  2643383279


To get Python to open files from a
directory other than the one where your program file is stored, you need
to provide the correct path.

In [None]:
path = Path(
    r"C:\Users\Beata\Documents\Python Scripts\Files\pcc_3e-main\chapter_10\reading_from_a_file\pi_digits.txt"
)
contents = path.read_text().rstrip()
print(contents)

3.1415926535
  8979323846
  2643383279


In [None]:
from pathlib import Path

path = Path("pi_digits.txt")
contents = path.read_text()
lines = contents.splitlines()

for line in lines:
    print(line)

3.1415926535
  8979323846
  2643383279


In [None]:
from pathlib import Path

path = Path("pi_digits.txt")
contents = path.read_text()
lines = contents.splitlines()

pi_string = ""
for line in lines:
    pi_string += line.lstrip()

print(pi_string)
print(len(pi_string))

3.141592653589793238462643383279
32


__Large file: one million digits__

In [None]:
from pathlib import Path

path = Path("pi_million_digits.txt")
contents = path.read_text()
lines = contents.splitlines()

pi_string = ""
for line in lines:
    pi_string += line.lstrip()
print(f"{pi_string[:52]}...")
print(len(pi_string))

3.14159265358979323846264338327950288419716939937510...
1000002


__Is Your Birthday Contained in Pi?__

In [None]:
for line in lines:
    pi_string += line.strip()
    birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
    print("Your birthday appears in the first million digits of pi!")
else:
    print("Your birthday does not appear in the first million digits of pi.")

Your birthday appears in the first million digits of pi!


### Writing to a file

Once you have a path defined, you can write to a file using the write_text()
method. If the file does not exist, Python will create a new one.

In [None]:
from pathlib import Path

path = Path(
    r"C:\Users\Beata\Documents\Python Scripts\Files\pcc_3e-main\chapter_10\writing_to_a_file\programming01.txt"
)
path.write_text("I love programming.")

19

In [None]:
content = path.read_text()
lines = content.splitlines()

for n in lines:
    print(n)

I love programming.


The write_text() method takes a single argument: the string that you
want to write to the file. This program has no terminal output, but if you
open the file programming.txt, you’ll see one line:

To write more than one line to a file, you need to build a string contain-
ing the entire contents of the file, and then call write_text() with that string.

In [None]:
contents = "I like programming.\n"
contents += "I like Python and R.\n"
contents += "I also enjoy working with data.\n"

path = Path(
    r"C:\Users\Beata\Documents\Python Scripts\Files\pcc_3e-main\chapter_10\writing_to_a_file\programming01.txt"
)
path.write_text(contents)


73

In [None]:
content = path.read_text()
lines = content.splitlines()

for n in lines:
    print(n)


I like programming.
I like Python and R.
I also enjoy working with data.


![image.png](attachment:image.png)

We can specify encoding if necessary.

In [None]:
from pathlib import Path

path = Path('alice.txt')
contents = path.read_text(encoding='utf-8')

## Exceptions

Python uses special objects called exceptions to manage errors that arise during a program’s execution. Whenever an error occurs that makes Python unsure of what to do next, it creates an exception object. If you write code that handles the exception, the program will continue running. If you don’t
handle the exception, the program will halt and show a traceback, which
includes a report of the exception that was raised.

- Exceptions are handled with try-except blocks.

The only code that should go in a try block is code that might cause an
exception to be raised. Sometimes you’ll have additional code that should
run only if the try block was successful; this code goes in the else block.

In [111]:
try:
    print(5/0)
except:
    print("You can't divide by zero!")

You can't divide by zero!


In [7]:
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

while True:
    first_number = input("\nFirst number: ")
    if first_number == 'q':
        break
    second_number = input("Second number: ")
    if second_number == 'q':
        break
    try:
        answer = int(first_number) / int(second_number)
        print(answer)
    except:
        print("It's not a valid operation!")

Give me two numbers, and I'll divide them.
Enter 'q' to quit.
1.3235294117647058
4.583333333333333
0.5111111111111111
It's not a valid operation!
It's not a valid operation!


Handling the FileNotFoundError Exception

In [10]:
from pathlib import Path
path = Path('alice.txt')

try:
    contents = path.read_text(encoding='utf-8')
except:
    print(f"Sorry, the file {path} does not exist.")

Sorry, the file alice.txt does not exist.


## Analyzing text

You can analyze text files containing entire books. Many classic works of literature are available as simple text files because they are in the public domain. The texts used in this section come from Project Gutenberg

(https://gutenberg.org).

Let’s pull in the text of Alice in Wonderland and try to count the number
of words in the text. To do this, we’ll use the string method split(), which
by default splits a string wherever it finds any whitespace:

In [54]:
from pathlib import Path
path = Path(r"C:\Users\Beata\Documents\Books\alice.txt")
contents = path.read_text(encoding='utf-8').rstrip()
lines = contents.splitlines()
words = contents.split()

for line in lines[50:61]:
    print(line)
print("\n")
    
num_lines = (len(lines))
num_words=(len(words))
print("\n")

print(f"The document has about {num_lines} lines.")
print(f"The document has about {num_words} words.")
    



CHAPTER I.
Down the Rabbit-Hole


Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”




The document has about 3755 lines.
The document has about 29564 words.


__Working with Multiple Files__

We can move the bulk of this program to a function called count_words(). This will make it easier to
run the analysis for multiple books:

In [95]:
from pathlib import Path

path2 = Path(input("Please provide a txt file path"))



def count_words(path2):
    try:
        contents = path2.read_text(encoding='utf-8')
    except :
        print(f"Sorry, the file {path2} does not exist.")
    else: # Count the approximate number of words in the file:
        words = contents.split()
        num_words = len(words)
        print(f"The file {path2} has about {num_words} words.")




count_words(path2)



The file C:\Users\Beata\Documents\Books\alice.txt has about 29564 words.


In [92]:
from pathlib import Path

path1 = Path(input("Please, provide the file path"))

def count_words(path1):
    # Count the approximate number of words in the file:
    contents = path1.read_text(encoding='utf-8')
    words = contents.split()
    num_words = len(words)
    print(f"The file {path1} has about {num_words} words.")




count_words(path1)



The file C:\Users\Beata\Documents\Books\bible.txt has about 824036 words.


We can modify the program to return the last elementt in the file path:

In [None]:
from pathlib import Path

path1 = Path(input("Please, provide the file path"))

def get_last_element(path): # Return the last element of the file path 
    return path.name

def count_words(path):
    # Count the approximate number of words in the file:
    contents = path.read_text(encoding='utf-8')
    words = contents.split()
    num_words = len(words)
    print(f"The file {get_last_element(path)} has about {num_words} words.")




count_words(path1)

print(get_last_element(Path(input("Please, provide the file path"))))

The file bible.txt has about 824036 words.


__Finding a random line in the text__

randint() function from random module takes two integer arguments and returns a randomly selected inte-
ger between (and including) those numbers.

In [113]:
from random import randint
randint(1, 6)

5

Another useful function is choice(). This function takes in a list or tuple
and returns a randomly chosen element:

In [114]:
from random import choice
players = ['charles', 'martina', 'michael', 'florence', 'eli']
first_up = choice(players)

first_up


'martina'

We can use it to select a random line from a text - in this case, _Alice in the Wonderland_

In [18]:
from random import choice
from pathlib import Path

path = Path(r"C:\Users\Beata\Documents\Books\alice.txt")
contents = path.read_text(encoding='utf-8')
lines = contents.splitlines()

random_line = choice(lines)

random_line



'either question, it didn’t much matter which way she put it. She felt'

__This version returns a full sentence__

In [19]:
from random import choice
from pathlib import Path
import re

path = Path(r"C:\Users\Beata\Documents\Books\alice.txt")
contents = path.read_text(encoding='utf-8')

# Split the text into sentences using a regular expression
sentences = re.split(r'(?<=[.!?])\s+', contents) 

# Randomly select a sentence and find its index
random_sentence = choice(sentences) 

print(random_sentence)


“Do you
know why it’s called a whiting?”

“I never thought about it,” said Alice.


__This version allows a user input - file path__

In [164]:
from random import choice
from pathlib import Path
import re

# Read the file path from user input
path = Path(input("Please, provide the file path"))
contents = path.read_text(encoding='utf-8')

# Split the text into sentences using a regular expression
sentences = re.split(r'(?<=[.!?])\s+', contents) 

# Randomly select a sentence and find its index
random_sentence = choice(sentences) 

print(random_sentence)

And what is meant by saying that honour and great calamity
are to be (similarly) regarded as personal conditions?


We can modify the program to run in a loop, allowing the user to request another random sentence by typing "+" and to quit by typing "q". Here we use _Tao Te Ching_

In [8]:
from random import choice
from pathlib import Path
import re

def get_random_sentence(sentences): 
    return choice(sentences)

# Read the file path from user input
path = Path(r"C:\Users\Beata\Documents\Books\tao.txt")
contents = path.read_text(encoding='utf-8')

# Split the text into sentences using a regular expression
sentences = re.split(r'(?<=[.!?])\s+', contents) 

active = True
while active:
    print(choice(sentences))
    user_input = input("Enter '+' for another sentence or 'q' to quit: ").strip()
    
    if user_input == 'q':
        active = False
print("Program terminated.")

    

Clay is fashioned into vessels; but it is on their empty hollowness,
that their use depends.
If I were suddenly to become known, and (put into a position to)
conduct (a government) according to the Great Tao, what I should be
most afraid of would be a boastful display.
Program terminated.


We can modify the program to include n lines before and after the randomly selected line.

In [6]:
from random import choice
from pathlib import Path

# Function get_surrounding_lines: 
# This function takes the list of lines, the index of the randomly selected line, 
# and the number of lines to include before and after. 
# It calculates the start and end indices, ensuring they stay within the bounds of the list.

def get_surrounding_lines(lines, random_index, n):
    start_index = max(0, random_index - n)
    end_index = min(len(lines), random_index + n + 1)
    return lines[start_index:end_index]

path = Path(r"C:\Users\Beata\Documents\Books\alice.txt")
contents = path.read_text(encoding='utf-8')
lines = contents.splitlines()

# The script selects a random line and finds its index in the list.
random_index = lines.index(choice(lines))
n = 2 # Number of lines before and after to include 

surrounding_lines = get_surrounding_lines(lines, random_index, n) 

for line in surrounding_lines: 
    print(line)



Presently the Rabbit came up to the door, and tried to open it; but, as
the door opened inwards, and Alice’s elbow was pressed hard against it,
that attempt proved a failure. Alice heard it say to itself “Then I’ll
go round and get in at the window.”


We can retrieve surrounding text based on sentences instead of lines 

In [129]:
from random import choice
from pathlib import Path
import re

# Get the sentence before, the random sentence, and the sentence after
def get_surrounding_sentences(sentences, random_index):
    start_index = max(0, random_index - 1) 
    end_index = min(len(sentences), random_index + 1) 
    return sentences[start_index:end_index]

# Split the text into sentences using a regular expression
path = Path(r"C:\Users\Beata\Documents\Books\alice.txt") 
contents = path.read_text(encoding='utf-8')
sentences = re.split(r'(?<=[.!?]) +', contents) 

# Randomly select a sentence and find its index
random_sentence = choice(sentences) 
random_index = sentences.index(random_sentence)

# Retrieve the surrounding sentences 
surrounding_sentences = get_surrounding_sentences(sentences, random_index)

for sentence in surrounding_sentences: 
    print(sentence)




“If I eat one of these cakes,” she thought, “it’s sure to make
_some_ change in my size; and as it can’t possibly make me larger, it
must make me smaller, I suppose.”

So she swallowed one of the cakes, and was delighted to find that she
began shrinking directly.
As soon as she was small enough to get
through the door, she ran out of the house, and found quite a crowd of
little animals and birds waiting outside.


__Failing silently__

Sometimes, you’ll want the program to fail silently when an exception
occurs and continue on as if nothing happened. To make a program fail
silently, you write a try block as usual, but you explicitly tell Python to do nothing in the except block using "pass"

In [112]:
try:
    print(5/0)
except:
    pass

- The pass statement also acts as a placeholder. It’s a reminder that you’re choosing to do nothing at a specific point in your program’s execution and that you might want to do something there later. 

- For example, in our count_words(path) function we may want to write any missing filenames to a file
called missing_files.txt. Our users wouldn’t see this file, but we’d be able to
read the file and deal with any missing texts.

## Storing data