# Files

There are several libraries in Python that can be used for sending files across the Internet: `urllib`, `urllib2`, `urllib3`. Having all these libraries can be [confusing](https://stackoverflow.com/questions/2018026/what-are-the-differences-between-the-urllib-urllib2-urllib3-and-requests-modul). The library that is probably the easiest to use is [`requests`](https://2.python-requests.org//en/latest/). Common to all these libraries is that they use HTTP as the protocol for transferring files.

If you need to parse or build URLs, `urllib.parse` contains some useful functions.

In [None]:
import requests

In [None]:
from urllib.parse import urlparse, quote, quote_plus, unquote, unquote_plus

In [None]:
quote_plus('My books: Python Crash Course, 2nd Ed.')

In [None]:
unquote_plus(_)

In [None]:
import os

assets_url = 'https://raw.githubusercontent.com/ehmatthes/pcc_2e/master/chapter_10'

pi_digits_url = os.path.join(assets_url, 'pi_digits.txt')
response = requests.get(pi_digits_url)

The `response` object has several attributes, among others `status_code` and `contents`. The contents is encoded as a byte string. If the contents is a plain text string, it can be accessed through the attribute `text`. When the contents is a JSON object representation, it can be loaded into a JSON object, using the method `json()`.

In [None]:
response.status_code, response.content

In [None]:
response.text == response.content.decode()

In [None]:
len(response.text)

In [None]:
for line in response.text.split('\n'):
    print(line)

In [None]:
response = requests.get(os.path.join(assets_url, 'pi_million_digits.txt'))

In [None]:
len(response.content)

We can save the response to a local file:

In [None]:
with open('assets/pi_million_digits.txt', mode='w') as f:
    f.write(response.text)    

And then test that the contents of the local file is as we would expect.

In [None]:
with open('assets/pi_million_digits.txt', mode='r') as f:
    pi_txt = f.read()
    
pi_txt == response.text

In [None]:
# Count the frequencies of digits.
from collections import Counter

Counter(pi_txt).most_common()

Let's create a small multi-line file by taking the 32 first characters of the decimal expansion of π and writing them on four lines with eight characters on each line. To do this, we could take 4 slices of `pi_txt`, but for illustration purposes, we will create an in-memory file with `pi_txt` as contents using `io.StringIO`.

Let's look at different ways we can read this file into a string with no newlines.

In [None]:
!head -2 assets/pi_million_digits.txt

In [3]:
with open('assets/pi_digits.txt', mode='r') as f:
    pi_txt = ''
    line = f.readline()
    while line:
        pi_txt += line.strip()
        line = f.readline()
pi_txt[:4] + '...' + pi_txt[-3:]
pi_txt

'3.141592653589793238462643383279'

The following has an syntax error on line 3. In Python (unlike Java), an assignment is *not* an expression. In Python 3.8, *assignment expressions* are introduced. An assignment expression uses the 'walrus operator' `:=` to assign a value to a variable. The assignment is then an expression that evaluates to the assigned value. 

In [1]:
with open('assets/pi_million_digits.txt', mode='r') as f:
    pi_txt = ''
    while line = f.readline(): # This is invalid syntax in Python!
        pi_txt += line.strip()
pi_txt[:10] + '...' + pi_txt[-10:]

SyntaxError: invalid syntax (<ipython-input-1-c2f30f2274f3>, line 3)

`f.readlines()` reads all the lines into a list. 

In [None]:
with open('assets/pi_million_digits.txt', mode='r') as f:
    pi_txt = ''.join(line.strip() for line in f.readlines())
'...'.join((pi_txt[:12], pi_txt[-10:]))

`f` is an iterator. It has the advantage over `f.readlines()` that it does not store all the lines in a list, but reads the lines from the file as the iteration progresses. 

In [1]:
with open('assets/pi_million_digits.txt', mode='r') as f:
    pi_txt = ''.join(line.strip() for line in f)
'...'.join((pi_txt[:12], pi_txt[-10:]))

'3.1415926535...5779458151'

In [2]:
with open('assets/pi_million_digits.txt', mode='r') as f:
    head = next(f).strip()
    for line in f: pass
    tail = line.strip()
'...'.join((head[:12], tail[-10:]))

'3.1415926535...5779458151'

In [16]:
from datetime import date

today_str = str(date.today()).replace('-', '')[2:]
today_str = '590330'
print(today_str)
try:
    res = f'{today_str} found at position {pi_txt.index(today_str)}'
except ValueError as e:
    res = f'{today_str} not found amond the first 1000000 digits of pi'
res

590330


'590330 found at position 293014'