# Problem Set 6 Pointers
Problem set 6 seems confusing, but the task itself is quite simple: You will convert PDFs to text and redact ("replace") specified values from a dictionary. Once redacted, you'll save the new text as a .txt file and create a report of your findings.

The goal of the problem set is to get you more comfortable with creating functions, working with directories, and importing modules. This file includes some helpful tools you may need during your problem set.

**How the problem set works**:

* We provide you with a series of function signatures and docstrings that tell you what the function is supposed to do. 
* You need to complete the functions and properly import the library you create in your main.ipynb file.

### Sorted Function vs Sort Method

In [1]:
#original list
my_list = [5, 2, 9, 1, 7]

#using .sort() method (sorts the list in place and returns none)
sorted_in_place = my_list.sort()

#print the result of .sort() method
print("Result of .sort() method:", sorted_in_place)  #this will print none
print("Original list after .sort():", my_list)      #this will print the sorted list


Result of .sort() method: None
Original list after .sort(): [1, 2, 5, 7, 9]


In [2]:
#reset the list for comparison
my_list = [5, 2, 9, 1, 7]

#using sorted() function (returns a new sorted list)
sorted_new_list = sorted(my_list)

#print the result of sorted() function
print("Result of sorted() function:", sorted_new_list)  #this will print the sorted list
print("Original list after sorted():", my_list)         #this will print the original unsorted list

Result of sorted() function: [1, 2, 5, 7, 9]
Original list after sorted(): [5, 2, 9, 1, 7]


### Count method
The `count` method returns the number of occurrences of a specified substring within a given string.

In [3]:
#example legal text
legal_text = "The defendant shall remain in custody. The court shall hear the case on Monday."

#count occurrences of the word "shall"
shall_count = legal_text.count("shall")

#print the result
print(f"The word 'shall' appears {shall_count} times in the legal text.")

The word 'shall' appears 2 times in the legal text.


### Replacing Items Using a Dictionary

There are numerous valid approaches to replacing items based on a dictionary's key-value pairs.

In [None]:
#legal text
legal_text = "The plaintiff claims the defendant was negligent in the accident."


#dictionary with words to replace (key: word to replace, value: replacement word)
replacements = {
    "plaintiff": "claimant",
    "defendant": "respondent",
    "negligent": "at fault"
}

In [7]:
replacements.items()

dict_items([('plaintiff', 'claimant'), ('defendant', 'respondent'), ('negligent', 'at fault')])

In [7]:
### APPROACH ONE

#create copy of text
legal_text_ex_one = legal_text

#replace items in the string based on dictionary key-value pairs
for word, replacement in replacements.items():
    legal_text_ex_one = legal_text_ex_one.replace(word, replacement)

#print the modified legal text
print(legal_text_ex_one)

The claimant claims the respondent was at fault in the accident.


In [9]:
### APPROACH TWO

#create copy of text
legal_text_ex_two = legal_text

#replace items in the string based on dictionary key-value pairs
for item in replacements:
    # print(item)
    legal_text_ex_two = legal_text_ex_two.replace(item, replacements[item])

#print the modified legal text
print(legal_text_ex_two)

The claimant claims the respondent was at fault in the accident.


Those are two approaches, but there are more! Anyone have ideas of how else you could use a for loop to replace items as specified in the dictionary?

### `os` library
The os library provides functions to interact with the operating system. It allows you to handle file and directory operations, environment variables, and process management in a platform-independent way (i.e., it works on Windows, macOS, and Linux).

For **Codespaces**: The os library allows you to interact with the filesystem and environment of your running Codespace. Since Codespaces is essentially a containerized environment, the os library helps you navigate and manage files, directories, and environment variables within this virtualized space.

In [10]:
# to use the os library, import os at the top of your script
import os

#### Note on Directories
The word "directory" may throw you off. A directory is essentially a collection of folders and files. When you open a Codespace, your default directory is the root of the project you're working on. This means that when you open a terminal in Codespaces, you're automatically placed in the directory (folder) that contains all the files for your project.

If you want to access an item from within a folder, you have to specify that.

#### Helpful `os` functions

Use `os.listdir(<path>)` to list files in a given directory.

In [12]:
#simply input the folder name to get contents of a folder in your current working directory
os.listdir("exercises")

['helper.py', '__pycache__', 'copyright.ipynb']

In [13]:
#use slashes / to indicate subfolders
os.listdir("exercises/another_subfolder")

[]

In [18]:
#you can go up a level to access a folder from PARENT directory (../)
print(os.listdir("../"))

print(os.listdir("../../"))

print(os.listdir("../../labs/week-1/"))


['week-5', 'week-3', 'week-2', 'week-4', 'week-1', 'week-6']
['lecture', 'getting-started-codespaces.md', 'labs', 'remote-vs-local-infographic.png', 'images', 'README.md', '.git', 'recordings.md', 'style-guide.md', 'cold_call.ipynb']
['Lab_1.ipynb', 'IMG_2468.jpg', 'warmup.md']


Use `os.path.exists(<path>)` to check if a particular directory or file exists

In [19]:
#check if "_modified" path exists
os.path.exists("../../labs/week-1/")

True

In [21]:
os.path.exists("exercises_redacted")

False

Use `os.makedirs(<path>)` to create a new directory

In [24]:
#check if "_modified" path exists, if it doesn't, create it
if not os.path.exists("exercises_redacted"):
    os.makedirs("exercises_redacted")

Use `os.path.join(<path component>,<path component>)` to join one or more file path componenents into a single string

In [28]:
# demo with accessing different weeks in lab folder

#create a list of target folders
target_folders = ["week-1","week-2","week-3","week-4","week-5","week-10"]

for target in target_folders:
    path = os.path.join("../../labs",target)
    print(path)
    print(os.path.exists(path))

../../labs/week-1
True
../../labs/week-2
True
../../labs/week-3
True
../../labs/week-4
True
../../labs/week-5
True
../../labs/week-10
False


### `pypdf` library
The pypdf library is a Python library used to work with PDF files. It allows you to read, manipulate, and extract information from PDF documents. You can perform tasks like merging PDFs, extracting text, rotating pages, and splitting documents without needing additional tools. We've provided you with specific instructions in the problem set.