## COMP SCI 1MD3 Introduction to Programming, Winter 2018
### Douglas Stebila (Instructor), Joey Legere, Karl Knopf, Natalie Chin, Victor Chen (TAs)
### Lab 8 Assigned Saturday, March 10, Due Friday March 16, 5pm
### Maximum grade: 20 / 16

The purpose of this lab is to:
* Work with external files
* Learn about csv and JSON data
* Work with network / remote files

#### Practice Question 1: Reading and Writing Plaintext Files

One of the most common ways to represent data is as *plain text*. This is different from rich text. as used by programs like Word, or your web browser, which can be stylized in some way, such as **bold** *italicized*, or <font color='red'>coloured</font> . Plain text data is just that--plain text, with no styling.

Write a function `copyfile(infile, outfile)` that, given two files names: `infile` and `outfile`, reads the contents of `infile`, prints them on the screen, and then writes them into `outfile`.

In [None]:
# Your code here


The cell below should print out the contents of the file if you've read it correctly:

In [None]:
copyfile("practice1.txt", "practice1_out.txt")

The cell below is a Jupyter magic command that tries to open `practice1_out.txt` and display it. If you wrote your file correctly, this should display the same as above:

In [None]:
% cat practice1_out.txt

#### Practice Question 2: Working with CSV files

A comma-separated value (csv) file is a very common way to represent a spreadsheet in a plain text file. In a csv file, each value is separated by a comma, and each row is separated by a newline character. For instance, the following table:

|name|age|salary|
|-|-||-|
|Odie|23|19000|
|Garfield|25|22000|
|John|22|21000|

Is not meant to be a commentary on the economy at all. It is simply an example of a table that could be represented in a csv file like so:

Write a program `getrow(infile, n)` that reads a .csv file, `infile`, and prints out the column labels, and the `nth` row of `infile`. For instance, in the above example, calling `getcolumn(infile, 1)` would print out:
```
['name', 'age', 'salary']
['Odie', '23', '19000']
```

You may wish to use Python's built in <a href=https://docs.python.org/3/library/csv.html>csv library</a> for this task.

In [None]:
# Your code here.


In [None]:
getrow('practice2.csv', 1)

This should produce the output
```
['name', 'age', 'salary']
['Odie', '23', '19000']
```

#### Practice Question 3: Network Files and JSON

JavaScript Object Notation (JSON) data is another common way of representing more structed data. In most cases, you can treat JSON data as a list of Python dictionaries. 

Often a JSON object will have many fields for each entry. Consider the example below, which organizes a few books on JavaScript:

In [None]:
BOOKS = \
[
    {
        "title": "Professional JavaScript",
        "authors": [
            "Nicholas C. Zakas"
        ],
        "edition": 3,
        "year": 2011
    },
    {
        "title": "Professional JavaScript",
        "authors": [
            "Nicholas C. Zakas"
        ],
        "edition": 2,
        "year": 2009
    },
    {
        "title": "Professional Ajax",
        "authors": [
            "Nicholas C. Zakas",
            "Jeremy McPeak",
            "Joe Fawcett"
        ],
        "edition": 2,
        "year": 2008
    }
]

Getting the first author of a book in the list isn't too challenging:

In [None]:
for i in range(len(BOOKS)):
    
    print(BOOKS[i]["authors"][0])

Write a function `remotejson(url, fields)` that reads in a JSON object from a remote url, and prints out every object from the given fields. Assume we are only interested in the top level fields--that is, you only need to index `OBJ[i][field]`, and no subfields that it might contain. However, if more than one field is given, return their data side by side.

You may wish to separate your output with an extra newline between outputs in order to make it easier to read.

Note: Use the built in `urllib` library to read the files from the remote server, and `json` in order to parse it.

In [None]:
# Your code here.


The cell below, when run correctly, should print a list of some of Dr. Stebila's favourite songs, and where to listen to them.

In [None]:
remotejson("http://brain.mcmaster.ca/joey/dr_stebilas_songs.json", ['Title', "Artist", "URL"])

*[Comment from Dr. Stebila: Joey only guessed 2 of my favourite songs.]*

#### Question 1a: I know words [3 points]

During his presidential campaing in 2016, Donald Trump said “I know words. I have the best words.”

Let's explore this idea in an objective, scientific manner.

Write a function `tweets_per_day(jsonurl)` that, given the website address (URL) of a JSON file, counts the number of tweets that are sent on every day of the week. Return your result as a dictionary, mapping the day of the week to the total number of tweets sent on that day.

For this question, you may want to open the file `http://brain.mcmaster.ca/joey/trump_tweets.json` and manually inspect the fields to help parse the data.
    

In [None]:
import json
import urllib.request
# Write your answer here.


In [None]:
tweets_per_day('http://brain.mcmaster.ca/joey/trump_tweets.json') == {
 'Friday': 33,
 'Monday': 19,
 'Saturday': 35,
 'Sunday': 26,
 'Thursday': 30,
 'Tuesday': 28,
 'Wednesday': 29
}

In [None]:
# Hidden test for Q1a

In [None]:
# Hidden test for Q1a

In [None]:
# Hidden test for Q1a

#### Question 1b: The Best Words [4 points]

Write a function `count_occurences(jsonurl, word)` that counts the number of times that `word` occurs across all of the given tweets. This should be case insensitive, but for simplicity, you only need to match whole words. 

Make sure you match whole words only, and not substrings. For example, if `word = "dont"`, then "dont", "DONT", and "dOnT" would all count, but "perioDONTal" should not. Remove all punctuation in the text of the tweet (if there is any). "don't" should be correctly counted for `word="dont"`.

In [None]:
from string import punctuation
print("Remove all of these symbols from the tweets: ", punctuation)

In [None]:
import json
import urllib.request
from string import punctuation
# Write your answer here


In [None]:
count_occurences('http://brain.mcmaster.ca/joey/trump_tweets.json', 'crooked') == 5

In [None]:
count_occurences('http://brain.mcmaster.ca/joey/trump_tweets.json', 'fake') == 9

In [None]:
count_occurences('http://brain.mcmaster.ca/joey/trump_tweets.json', 'obama') == 18

In [None]:
count_occurences('http://brain.mcmaster.ca/joey/trump_tweets.json', 'wall') == 8

In [None]:
# Hidden test for Q1b

In [None]:
# Hidden test for Q1b

In [None]:
# Hidden test for Q1b

In [None]:
# Hidden test for Q1b

#### Question 1c: The Most Words. The Best Words. [3 points]

Write a function `words_per_day(jsonurl, word)` that counts the total number of times that `word` is tweeted versus the day of the week.

In [None]:
import json
import urllib.request
from string import punctuation

# Write your answer here.

In [None]:
words_per_day('http://brain.mcmaster.ca/joey/trump_tweets.json', 'america') == {
 'Friday': 1,
 'Monday': 3,
 'Saturday': 0,
 'Sunday': 2,
 'Thursday': 2,
 'Tuesday': 1,
 'Wednesday': 2
}

In [None]:
words_per_day('http://brain.mcmaster.ca/joey/trump_tweets.json', 'china') == {
 'Friday': 0,
 'Monday': 0,
 'Saturday': 0,
 'Sunday': 1,
 'Thursday': 1,
 'Tuesday': 0,
 'Wednesday': 0
}

In [None]:
words_per_day('http://brain.mcmaster.ca/joey/trump_tweets.json', 'great') == {
 'Friday': 8,
 'Monday': 4,
 'Saturday': 1,
 'Sunday': 8,
 'Thursday': 8,
 'Tuesday': 8,
 'Wednesday': 9
}

In [None]:
words_per_day('http://brain.mcmaster.ca/joey/trump_tweets.json', 'hillary') == {
 'Friday': 2,
 'Monday': 0,
 'Saturday': 0,
 'Sunday': 1,
 'Thursday': 0,
 'Tuesday': 1,
 'Wednesday': 0
}

In [None]:
# Hidden test for Q1c

In [None]:
# Hidden test for Q1c

In [None]:
# Hidden test for Q1c

#### Question 2: Stocks (Again) [6 points]

Recall the `stocks` question from lab 4:

> Let's assume I'm playing the stock market - buy low, sell high. I'm a day trader, so I want to get in and out of a stock before the day is done, and I want to time my trades so that I make the biggest gain possible. And obviously I can't buy in the future and sell in the past.
>
> Write a function stocks(s) that emits the two trades in chronological order - what you think I should buy at and sell at - in order to maximize my profit.

Repeat this question, but load your data from a .csv file on a remote server. The .csv file will be formatted like so:
```
Date,Open,High,Low,Close,Adj Close,Volume
2013-12-24,16295.700195,16360.599609,16295.700195,16357.549805,16357.549805,33640000
2013-12-26,16370.969727,16483.000000,16370.969727,16479.880859,16479.880859,50160000
2013-12-27,16486.369141,16529.009766,16461.230469,16478.410156,16478.410156,47230000
2013-12-30,16484.509766,16504.349609,16476.869141,16504.289063,16504.289063,54220000
```

Assume that you buy at the value in the "Low" column, and sell at the value in "High". You may not buy and sell on the same day. Your functions, `stocks(s)` should take in a string of the filename of a CSV file containing the data. You may assume the file is in chronological order.

In [None]:
# Write your answer here.


This cell should print the values of the trades. They should be [15340.69043, 26616.710938]

In [None]:
trades = stocks("DJI.csv")
print(trades)

The cell below is how the autograder will be evaluating your answer. It only needs to be correct to two decimal places. Make sure you return a list of floating point numbers.

In [None]:
assert round(trades[0], 2) == 15340.69
assert round(trades[1], 2) == 26616.71

In [None]:
# Hidden test for Q2

In [None]:
# Hidden test for Q2

#### Bonus: Working with Pillow (4 points)

The Python Imaging Library (PIL or pillow) is a library for loading and manipulating images in Python. The below example creates an image that is black in the top left, becomes more red as you move to the right, and more green as you move towards the bottom.

In [None]:
from PIL import Image

# PIL accesses images in Cartesian co-ordinates, so it is Image[columns, rows]
img = Image.new( 'RGB', (250,250), "black") # create a new black image
pixels = img.load() # create the pixel map

for i in range(img.size[0]):    # for every col:
    for j in range(img.size[1]):    # For every row
        pixels[i,j] = (i, j, 0) # set the colour accordingly

img.show()

`img.show()` will try to show you the image, but it does not work on JupyterHub.  You will have to run this code on your local installation of Jupyter to work on your solution to this problem.  

But even on your local installation of Jupyter you may still also have to install PIL.  You can install PIL on your computer by running one of the following commands in your command-line terminal:

- `pip3 install pillow`
- `python3 -m pip install pillow`
- `sudo pip3 install pillow`

Each pixel is represented using an RGB value--that is, how much red, green, and blue there is in each pixel. The line:

`pixels[i,j] = (i, j, 0) # set the colour accordingly`

Accomplishes this by setting the pixel at coordinates (i, j) to have a red value of i (its x coordinate), a green value of j (its y coordinate), and a blue value of 0. Indexing works such that (0, 0) is at the top left corner.

This question will be manually graded. You may choose one of the following ideas, or implement your own idea using pillow (if it is sufficiently technically challenging enough, and most importantly, not copied from stackoverflow...)

* Write a function that takes in the path to an image, and darkens or lightens it by a set amount.
* Write a function that generates an image of your choosing (not the one from the example..)
* Write a function that generates your absolute best sketch of Donald Trump, programmatically.

If you use any outside sources for this question, add links to them in your comments.

In [None]:
#Write your answer here.