# 🛠 IFQ718 Module 04 Exercises-04

## 🔍  Context: Interacting with the file system

There are a few more modules from the Python Standard Library that we would like you to be introduced to.

In this notebook, they are `os.path` and `glob`, for interacting with the operating system and filesystem.


**An assumption: that you still have `books.csv` and `afl-2022.json` from the previous notebook**

### First, the `os.path` module

`path` is a submodule of `os`. It is for manipulating file paths.

> Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. 
>
> Source: [Python reference manual](https://docs.python.org/3/library/os.path.html#module-os.path)
    
    

In [None]:
import os

**Relative versus absolute paths**

* An **absolute path** is one that specifies where the file or directory is from the *root* of the filesystem. 
   * That is, on 
      * Windows, the root may be `C:\` or `D:\`
      * Mac or Linux, it is `/`
      
* A **relative path** is one that does not start from the *root* of the filesystem, but instead, the current working directory.
   * For example, on
      * Windows, a relative path may look like `My Documents\Resume.docx`, assuming you are within the directory `C:\Users\Admin` already
      * Mac or Linux, a relative path may look like `documents/resume.docx`, assuming you are within the directory `/home/admin` already
  * The use of periods `.` can change the directory context
      * `.` means the current working directory (essentially does nothing)
      * `..` means the parent of the current directory

**Using `os.path.abspath(path)` to get the absolute path**

In [None]:
os.path.abspath('.')

In [None]:
os.path.abspath('..')

**Joining paths**

Using the in-built join function is important. Did you notice earlier that Windows versus Mac/Linux used different path separators (i.e., `/` vs. `\`)?

In [None]:
print(os.path.abspath(os.path.join('..', '..')))

In [None]:
cwd = os.path.abspath('.')
while True:
    print(f"{'Currently at: ': <20}{cwd}")

    parent = os.path.abspath(os.path.join(cwd, '..'))
    print(f"{'Parent is: ': <20}{parent}")
    print('\n')

    if cwd == parent:
        break

    cwd = parent
    

**Checking a file exists**

In [None]:
for fp in ['data/books.csv', 'data/afl-2022.json']:
    if not os.path.exists(fp):
        print(f'Hey! Please go back to notebook 3 for this module and download the file {fp}')
    else:
        print(f'Nice... you still have the file {fp}')

**Is the path a file or directory?**

In [None]:
for fp in ['data/books.csv', 'data/afl-2022.json']:
    if os.path.isfile(fp):
        print(f'{fp} is a file')    

In [None]:
fp = os.path.join(os.path.abspath('.'), 'books.csv')

while True:
    print(fp)

    if os.path.isfile(fp):
        print(f'... is a file.\n')
    elif os.path.isdir(fp):
        print(f'... is a dir.\n')
        
    parent = os.path.abspath(os.path.join(fp, '..'))
    
    if fp == parent:
        break

    fp = parent

**Get the final file/directory name from a path**

In [None]:
books = os.path.abspath('data/books.csv')
print(books)
print(os.path.basename(books))

### Second, the `glob` module

> The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell
>
> Source: [Python reference manual](https://docs.python.org/3/library/glob.html)


In [None]:
from glob import glob

**Print the contents of the current working directory**

In [None]:
for fp in glob("*"):
    print(fp)

**Only print the Jupyter Notebook files**

In [None]:
for fp in glob('*.ipynb'):
    print(fp)

**Only print .ipynb files starting with `I`**

In [None]:
for fp in glob('I*.ipynb'):
    print(fp)

**Print the contents of the cwd's parent**

In [None]:
cwd = os.path.join('..', '*')
for fp in glob(cwd):
    print(f'{fp: <30} i.e., {os.path.abspath(fp)}')

### ✍ Activity 1: Reverse the files

The cell below generates bulk scam letters. Study the code carefully. It uses many features of Python that we have taught you. There may be one or two features that are new but should be reasonably straight-forward to understand.

In [None]:
# Do not change this cell. Make sure you run it.

import random, csv

# Make a directory to write the files
dir_name = 'exercises-04-activity-01-files'
if not os.path.exists(dir_name):
    os.mkdir(dir_name)
else:
    # empty the directory if it exists
    for fp in glob(os.path.join(dir_name, '*')):
        os.remove(fp)

# How many files?
number_of_files = random.randint(50, 100)
print(f'Generating {number_of_files} files')

# Get some author/recipient names
import urllib.request
urllib.request.urlretrieve(
    'https://raw.githubusercontent.com/hadley/data-baby-names/master/baby-names.csv', 
    'names.csv'
)

names_by_sex = {'boy' : [], 'girl' : []}
with open('names.csv', 'r') as fp:
    names = csv.DictReader(fp)
    for row in names:
        names_by_sex[row['sex']].append(row['name'])
        
names = []
for sex in names_by_sex:
    names += random.sample(names_by_sex[sex], 10)
 
# Some messages
messages = [
    'Thank you for your friendship. Let me send you some money.',
    'I have inherited millions of dollars and I want to share it. What are your bank details?',
    'It has been a long time. What is your date of birth? I will send a gift.',
    'I need to update your identification. Please send a picture of your passport.',
    'I got locked out of my emails. Can I use yours? What is your password?',
    'Your family history is interesting. What is your mothers maiden name?',
    'I am about to send you an email that will verify your account. Please click the link asap.'
]

# Generate messages
for idx in range(number_of_files):
    person_to = random.choice(names)
    person_from = random.choice(names)
    message = random.choice(messages)

    with open(os.path.join(dir_name, f'message-{idx:0>3}.txt'), 'w') as fp:
        fp.write(f'''Dear {person_to},

{message}

Kind regards,
{person_from}
''')

**Use `glob` to extract a list of file names to process**

In [None]:
# Write your code here

**How much mail did each person send?**

Construct a dictionary, where the key is the persons name and the value is a count of their mail.

In [None]:
# Write your code here

**How much mail did each person receive?**

Construct a dictionary, where the key is the persons name and the value is a count of their mail.

In [None]:
# Write your code here

**What was the most commonly used scam message?**

In [None]:
# Write your code here