## Exercise 18: Read the Last Line of a File

Video: https://youtu.be/GC5yQg2odqI

Skills:
- use open() to open the file, 'r' as read mode
    - remember to close the file after reading
- use readlines() to read all lines from the file
    - this will return a list of lines
    - will cost more memory
    - can use readline() to read one line at a time
        - while line:= f.readline(): last_line = line
- as open() is supporting iterator protocol, we can use for loop to read the file
    - for line in f:
        - pass
    - this will read the file line by line
    - will cost less memory
    - will be more efficient
    - remember this way is not supporting with f.readline(), as it will return a empty string when it reaches EOF.
- with open() as f:
    - this is a context manager, it will automatically close the file after the block is executed or any exception is raised
    - no need to call f.close()
    - this is more elegant and safer
    



In [None]:
def read_final_line(filename):
    f = open(filename, 'r')
    for line in f:
        pass
    f.close()
    return line

print(read_final_line(r'.\data\login.log'))

## Exercise 19: Extract Login Account Information

Video: https://youtu.be/zktvhCOJPX0

Skills:
- you can import pprint to print the dictionary in a pretty way
    - pprint.pprint(dict, sort_dicts=False)
    - sort_dicts=False is to keep the original order of the dictionary
    - sort_dicts=True is to sort the dictionary by the keys


In [None]:
def passwd_to_dict(filename):
    users = {}
    with open(filename) as f:
        for line in f:
            user_info = line.split(':')
            users.update({user_info[0]: user_info[2]})
    return users

print(passwd_to_dict(r'.\data\passwd.cfg'))

## Exercise 20: Count Characters, Words and Lines in a File

Video: https://youtu.be/YCrV2_wO4Cc

In [None]:
def wordcount(filename):
    result = {
        'Characters': 0,
        'Words': 0,
        'Unique words': 0,
        'Lines': 0,
        }
    unique_words = set()
 
    with open(filename, 'r') as f:
        for line in f:
            words = line.split()
            result['Lines'] += 1
            result['Characters'] += len(line)
            result['Words'] += len(words)
            unique_words.update(words)
 
        result['Unique words'] = len(unique_words)
 
    for key, value in result.items():
        print(f'{key}: {value}')

wordcount(r'.\data\text.txt')

## Exercise 21: Find the Longest Word in a File

Skills:
- you can also use set()
    - collect all the words in it
    - sorted it and return the last one
        - sorted(set(words))[-1]

In [None]:
def find_longest_word(filename):
    longest = ''
    with open(filename, 'r') as f:
        for line in f:
            for word in line.replace('.', '').split():
                if len(word) > len(longest):
                    longest = word
    return longest

print(find_longest_word(r'.\data\text2.txt'))

## Exercise 22: Reading and Writing CSV Files

Video: https://youtu.be/bSA1llDhX1I

Skills:
- delimiter ":", "\t", ","
- csv.writer() to create a csv object
- csv.reader() to read the csv file
- csv.writer.writerow() to write a row to the csv file
- csv.writer.writerows() to write multiple rows to the csv file
- newline char:
    - Unix & old Mac: \n
    - MacOS: \r
    - Windows: \r\n
- lineterminator can be defaultly set to "\n"
- also in open() function, we can set newline="\n" to force using "\n" to open file
    - newline=None: use the default newline character from system
    - newline="": keep the original newline character from file


In [None]:
import csv

def passwd_to_csv(passwd_filename, csv_filename):
    with open(passwd_filename, 'r') as f_read, \
            open(csv_filename, 'w', newline='') as f_write:
        csv_reader = csv.reader(f_read, delimiter=':')
        csv_writer = csv.writer(f_write, delimiter='\t', lineterminator='\n')
        for line in csv_reader:
            csv_writer.writerow([line[0], line[2]])

passwd_to_csv(r'.\data\passwd.cfg', r'.\data\passwd.csv')

## Exercise 23: Reading JSON Files

Video: https://youtu.be/0jHBF9-V9G0

Skills:
- json.load() to load a json file
- json.loads() to load a json string
- defaultdict(list) to create a defaultdict object whom the default value is a list

In [None]:
import json
from collections import defaultdict

def print_scores(filename):
    with open(filename) as json_file:
        record = json.load(json_file)
        result = defaultdict(list)
 
        print('Class:', record['class'])
        for record in record['score']:
            for subject, score in record.items():
                result[subject].append(score)
 
        for subject, scores in result.items():
            print('Subject:', subject)
            print('\tHighest score:', max(scores))
            print('\tLowest score:', min(scores))
            print('\tAverage:', sum(scores) / len(scores))

print_scores(r'.\data\score.json')

## Exercise 24: Batch File Reading

Skills:
- os.listdir() : list all file in specific folder as a list
- string.endswith() to check the endstring of a string
- os.path.join() to combine the path and filename
- use pathlib can also get the path with filename
    - p = pathlib.Path('C:\data\scores') creates Posixpath object
    - explore Posixpath object can get path with filename
       - p.iterdir()
- use glob:
    - glob.glob(r'C:\data\scores\*.json') can get the list of json filenames

In [None]:
import os, json
from collections import defaultdict

def print_scores(filename):
    with open(filename) as json_file:
        record = json.load(json_file)
        result = defaultdict(list)
 
        print('Class:', record['class'])
        for record in record['score']:
            for subject, score in record.items():
                result[subject].append(score)
 
        for subject, scores in result.items():
            print('Subject:', subject)
            print('\tHighest score:', max(scores))
            print('\tLowest score:', min(scores))
            print('\tAverage:', sum(scores) / len(scores))

def print_dir_scores(dirname):
    for filename in os.listdir(dirname):
        if filename.endswith('.json'):
            print('Reading file: ', filename)
            print_scores(os.path.join(dirname, filename))

print_dir_scores(r'.\data\scores')