## `pathlib` module

In [1]:
import pathlib

In [3]:
here = pathlib.Path('.')
here

PosixPath('.')

In [4]:
here.absolute()

PosixPath('/Users/alejandrosanz/Downloads')

In [5]:
here.resolve()

PosixPath('/Users/alejandrosanz/Downloads')

In [7]:
here.resolve().parent

PosixPath('/Users/alejandrosanz')

In [12]:
here.resolve() / 'projects_on_Github' / 'POC'

PosixPath('/Users/alejandrosanz/Downloads/projects_on_Github/POC')

## Plain Text

In [19]:
import os
os.chdir('projects_on_GitHub/POC/python_basics_and_intermediates/FIle_IO')

In [30]:
with open('queries.txt', 'r') as f:
    contents = f.read()
    
queries = contents.split('\n')
norm = [i.strip().lower() for i in queries]
norm

['python programmer', 'udacity', 'web developer']

In [25]:
with open('normalized_queries.txt', 'w') as outfile:
    for query in norm:
        outfile.write(query + '\n')

In [33]:
# use writelines() method
with open('normalized_queries1.txt', 'w') as outfile:
    outfile.writelines('\n'.join(norm))

### Exercise: Count unique words

In this exercise, write a function count_unique_words that prints the ten most common unique words from a text file.
```
def count_unique_words(filename):
    ...
```
Concretely, we'll be using `hamlet.txt`, a text file containing the full text of "The Tragedy of Hamlet, Prince of Denmark" released by Project Gutenberg under their license.

We won't worry too much about punctuation, capitalization, or other nuances of language. For this exercise, it's safe to say that, given a line of text from a text file, the "words" within that line are the elements that result when you split the line on any whitespace.

**Hint:** This will be significantly easier if you use a data type from Python's built-in `collections` module - `collections.Counter`. You can read more about `collections.Counter` [in the Python documentation](https://docs.python.org/3/library/collections.html#collections.Counter).

In [54]:
from collections import Counter

def count_unique_words(filename):
    words = Counter()
    with open (filename, 'r') as f:
        for line in f:
            words.update(line.split())
            
    for word, count in words.most_common(10):
        print(word, count)
        
    return words

result = count_unique_words('hamlet.txt')

the 1109
and 763
of 735
to 673
I 514
a 499
in 455
my 443
you 423
HAMLET. 359


## `.json`File

In [81]:
import json

In [60]:
# demo #1 -- how to read internal json-formatted data
tmp = [
  {
    "class": "Iris-setosa",
    "petallength": 1.4,
    "petalwidth": 0.2,
    "sepallength": 5.1,
    "sepalwidth": 3.5
  },
  {
    "class": "Iris-versicolor",
    "petallength": 4.7,
    "petalwidth": 1.4,
    "sepallength": 7,
    "sepalwidth": 3.2
  },
  {
    "class": "Iris-virginica",
    "petallength": 6,
    "petalwidth": 2.5,
    "sepallength": 6.3,
    "sepalwidth": 3.3
  }
]

In [61]:
# read the data above via pd.DataFrame
pd.DataFrame(tmp)

Unnamed: 0,class,petallength,petalwidth,sepallength,sepalwidth
0,Iris-setosa,1.4,0.2,5.1,3.5
1,Iris-versicolor,4.7,1.4,7.0,3.2
2,Iris-virginica,6.0,2.5,6.3,3.3


In [85]:
# demo #2 -- how to read a json file
with open('top.json') as f:
    content = json.load(f)

In [91]:
content['data']['children'][1]['data']['title']

'Thanks, Obama.'

In [90]:
titles = [i['data']['title'] for i in content['data']['children']]
titles

['Guardians of the Front Page',
 'Thanks, Obama.',
 'Take your time, you got this',
 'Blizzard Employees Staged a Walkout After the Company Banned a Gamer for Pro-Hong Kong Views',
 'This is Shelia Fredrick, a flight attendant. She noticed a terrified girl accompanied by an older man. She left a note in the bathroom on which the victim wrote that she needed help. The police was alerted &amp; the girl was saved from a human trafficker. We should honor our heroes.',
 'DEMOCRACY NOW',
 'I got a cease and desist for making the Crocs Gloves',
 'Printers',
 'I drew all the boys together and i did it for the internet',
 '1 dad reflex 2 children',
 'The dog is supposed to run up in front of her and sit.',
 'Guy Naruto Runs Past News Anchor for Storm Area 51',
 'Tear gas canisters filmed raining in Hong Kong - against all regulations, while police deny firing from height',
 'Protestor in Hong Kong today',
 'This image of Xi Jiping as Winnie the Pooh is illegal in mainland China',
 'Irish man le

### Warm-up
Suppose that we have the file listings.json of job listings:  
```
[
    {
        "name": "Udacity",
        "role": 100,
        "description": "A stellar Python instructor is needed for a new course!",
        "available": true
    },
    {
        "name": "Udacity",
        "role": 404,
        "description": "A quality assistance engineer who can start immediately.",
        "available": false
    }
]

```
write a program that will only keep available jobs.

In [99]:
# Extract data into Python
with open('listings.json', 'r') as infile:
    contents = json.load(infile)  # Parse JSON data into a Python object. (A)

# Filter out all unavailable job listings.
available = [job for job in contents if job["available"]]
# Write available listings to an output file.
with open('filter-listings.json', 'w') as outfile:
    json.dump(available, outfile, indent=2)

### Exercise: Nobel Prizes

## `.csv` File

In [100]:
import csv

In [101]:
high_wages = []
wage_filter = 40000
with open('wage.csv', 'r') as infile:
    wages = csv.reader(infile)
    next(wages)
    for wage in wages:
        if int(wage[2]) >= wage_filter:
            high_wages.append(wage)
            
with open('wage_filtered.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    for wage in high_wages:
        writer.writerow(wage)

### Use csv.DictReader() and csv.DictWriter() to keep the header

The general pattern is similar to before, but there's an extra step to read data or to write data.

1. Extract data from a JSON file into Python
    - Open a file-like object `f`
    - Create a csv.reader (or a csv.DictReader with extra information for headers)
    - Consume each line of the csv.reader

<br>

2. Do something with the data, now within Python

<br>

3. Write data from Python to a file.
    - Open a file-like object `f`
    - Create a csv.writer (or a csv.DictWriter with extra information for headers)
    - Write each line to the csv.writer

<img src='working_with_csv.png'>

In [108]:
high_wages = []
wage_filter = 40000

with open('wage.csv', 'r') as infile:
    wages = csv.DictReader(infile)
    for wage in wages:
#         print(wage)
        if int(wage['annual_wage']) >= wage_filter:
            high_wages.append(wage)
            

high_wages

[OrderedDict([('id', '200'),
              ('title', 'Salesperson'),
              ('annual_wage', '40000')]),
 OrderedDict([('id', '500'),
              ('title', 'Backend Engineer'),
              ('annual_wage', '50000')]),
 OrderedDict([('id', '512'),
              ('title', 'Product Lead, Eng'),
              ('annual_wage', '80000')]),
 OrderedDict([('id', '999'),
              ('title', 'Accountant'),
              ('annual_wage', '60000')])]

In [111]:
# write out
with open('wage_filtered2.csv', 'w') as outfile:
    wages = csv.DictWriter(outfile, fieldnames=high_wages[0].keys())
    wages.writeheader()
    for wage in high_wages:
        wages.writerow(wage)
    