# Last Time
* Our last session dealt almost exclusively with data-frames
* Make sure you understand how to build a data-frame for part c of the 2nd homework!


# This Time
* Before anything else, we'll take a quick look at loading a text file into list format in Python
* We'll run through the exercises provided on Monday
* We'll look at formatting strings using various data types.
* We'll look at string methods and functionality in more detail


## Reading text files in Python
* There are numerous ways to read files into Python that address a variety of formats and distinct purposes
* We'll look at several during the course of the class. 
* But to begin with, we consider the basic process of reading from an unformatted text file
    * The standard way to open files is using a suite following a **`with open`** clause
    * Specify the file-name as a parameter to `with open`, followed by `as` _object name_ to assign the opened file to a variable
    * Files are opened as iterables, by default split apart according to new-line characters.

In [4]:
wordList = [] #Start with an empty list
with open ("Corleone.txt") as fp: #fp is the file object we'll work with
    for line in fp: #default behavior is that files are opened as iterable with respect to newlines
        wordList.extend(line.replace('.','').split()) #make sure to use extend here, and not append!
print(wordList)

FileNotFoundError: [Errno 2] No such file or directory: 'Corleone.txt'

# Exercise 6.2
Write code to display the unique words in the string `text` (in sorted order) and the number of occurrences of each word. 

In [2]:
#This won't work, but it's close
from collections import Counter
text = ('to be or not to be that is the question')
counter = Counter(text.split())
for word, count in sorted(counter): 
    print(f'{word:<12}{count}')

b           e
i           s


ValueError: too many values to unpack (expected 2)

In [5]:
#This handles the first issue.
from collections import Counter
text = ('to be or not to be that is the question')
counter = Counter(text.split())
for word, count in sorted(counter.items()): 
    print(f'{word:<12}{count}')

be          2
is          1
not         1
or          1
question    1
that        1
the         1
to          2


# Exercise 6.5
Write a script that uses a dictionary to determine and print the number of duplicate words in a sentence. Treat uppercase and lowercase letters the same and assume there is no punctuation in the sentence. Words with counts larger than 1 have duplicates

In [9]:
"""Tokenizing a string and counting duplicate words."""

text = ('This is sample text with several words ' + 
        'this is more sample text with some different Words')

word_counts = {}

# count occurrences of each unique word
for word in text.split():
    if word.lower() in word_counts: 
        word_counts[word.lower()] += 1  # update existing key-value pair
    else:
        word_counts[word.lower()] = 1  # insert new key-value pair

print(word_counts)
print(f'{"WORD":<12}COUNT')

for word, count in sorted(word_counts.items()):
    if count > 1:
        print(f'{word:<12}{count}')


{'this': 2, 'is': 2, 'sample': 2, 'text': 2, 'with': 2, 'several': 1, 'words': 2, 'more': 1, 'some': 1, 'different': 1}
WORD        COUNT
this        2
is          2
sample      2
text        2
with        2
words       2


# Exercise 6.6
Write a function that receives a list of words, then determines and displays in alphabetical order only the unique words. Treat uppercase and lowercase letters the same. The function should use a set to get the unique words in the list

In [14]:
def unique_words(words):
    uniques = set([word.lower() for word in words])
    return (sorted(uniques))

In [15]:
text = ('this is sample text with several words ' 
        'this is more sample text with some different words')

print (unique_words(text.split()))

['different', 'is', 'more', 'sample', 'several', 'some', 'text', 'this', 'with', 'words']


# Exercise 6.10
Use the following sets:
```python
{'red','green','blue'}
{'cyan','green','blue','magenta','red'}
```
display the results of:
1. Comparing the sets using each of the comparison operators.
2. Combining the sets using each of the mathematical operators.

In [16]:
set1 = {'red', 'green', 'blue'}
set2 = {'cyan', 'green', 'blue', 'magenta', 'red'}

In [17]:
#Major in(equalities)
print(set1 < set2)
print(set1 <= set2)
print(set1 >= set2)
print(set1 > set2)
print(set1 == set2)
print(set1 != set2)

True
True
False
False
False
True


In [18]:
print(set1 | set2)
print(set1 & set2)
print(set1 - set2)
print(set2 - set1)
print(set1 ^ set2)

{'green', 'cyan', 'red', 'blue', 'magenta'}
{'green', 'red', 'blue'}
set()
{'cyan', 'magenta'}
{'cyan', 'magenta'}


# Exercise 7.21
In chapter 7, we discussed shallow vs. deep copies of arrays. Python's built-in list and dictionary types have copy methods that perform shallow copies. Using the following dictionary
```python
dictionary = {'Sophia':[97,88]}
```
demonstrate that a dictionary's copy method indeed performs a shallow copy. To do so, call ```copy``` to make the shallow copy, modify the list stored in the original dictionary, then display both dictionaries to see that they have the same contents. 
Next, use the copy module's ```deepcopy``` function to create a *deep* copy of the dictionary. Modify the list stored in the original dictionary, then display both dictionaries to prove that each has its own data. 

In [19]:
dictionary = {'Sophia': [97, 88]}
print(dictionary)


{'Sophia': [97, 88]}


In [20]:
dictionary_copy = dictionary.copy()
dictionary['Sophia'][0] = 100
print(dictionary)
print(dictionary_copy)

{'Sophia': [100, 88]}
{'Sophia': [100, 88]}


In [21]:
from copy import deepcopy
dictionary_deepcopy = deepcopy(dictionary)
dictionary['Sophia'][1] = 100
print(dictionary)
print(dictionary_deepcopy)

{'Sophia': [100, 100]}
{'Sophia': [100, 88]}


## Miscellaneous Presentation Types with Formatted Strings
* We previously introduced the notion of formatting strings `f'` to provide clean and coherent code and presentation
* We'll briefly look at a few additional ways to present varying data types using formatting.

### Decimals
* Decimals can be presented with a particular precision using `.x` where _x_ is the desired level of precision
* Use `.xf` for standard floating point representation, where _x_ defines the number of digits after the decimal point.
* Use `.xe` for scientific notation, where _x_ defines the number of significant digits. 

In [22]:
myNum = 173.47359
print(f'{myNum:.3f}') #floating point representation
print(f'{myNum:.3e}') #scientific notation

173.474
1.735e+02


### Integers
* The default style of presentation for integers is decimal format (`d`)
* Integers can alternatively be presented in binary (`b`), octal (`o`), or hexadecimal (`x`) format. 
* Unicode representations of given numbers can be obtained using `c`

In [23]:
print(f'{255:d}') #decimal (default) format
print(f'{255:b}') #binary format
print(f'{255:o}') #octal format
print(f'{255:x}') #hexadecimal format


255
11111111
377
ff


In [24]:
print(f'{65:c}') #Capital A 
print(f'{97:c}') #Lower-case A
print(f'{937:c}') #Capital Omega

A
a
Ω


### Strings
* If `s` specified explicitly, the value to format **must be a string**, an **expression that produces a string** or a **string literal**. 
* If you do not specify a presentation type, non-string values are converted to basic string representations.

In [29]:
print(f'{"hello":s} {7}')

hello 7


In [None]:
print(f'{7:s}')

## Field Widths and Alignment for Numbers (Default Behavior)
* Python **right-aligns numbers** and **left-aligns other values**.
* Python formats `float` values with **up to six digits of precision** in these representations.

In [30]:
print(f'[{27:20d}]')

[                  27]


In [31]:
print(f'[{3.5:20f}]')

[            3.500000]


In [32]:
print(f'[{"hello":20}]')

[hello               ]


### Explicitly Specifying Left and Right Alignment in a Field 
* Can specify left alignment for any formatted string type using `<` 
* Similarly, can specify right alignment for any type using `>`

In [33]:
print(f'[{27:<15d}]')

[27             ]


In [34]:
print(f'[{3.5:<15f}]')

[3.500000       ]


In [35]:
print(f'[{"hello":>15}]')

[          hello]


### Centering a Value in a Field 
* Formatted strings can be centered using `^` in lieu of `<` or `>`
* Centering attempts to spread the remaining unoccupied character positions equally to the left and right of the formatted value
* Python places the extra space to the right if an odd number of character positions remain

In [36]:
print(f'[{27:^7d}]')

[  27   ]


In [38]:
print(f'[{3.5:^10f}]')

[ 3.500000 ]


In [None]:
print(f'[{"hello":^9}]')

### Other Numeric Formatting
* A `+` before the field width specifies that a positive number should be preceded by a `+`
* To fill the remaining characters of the field with `0`s rather than spaces, place a `0` before the field width (and **after** the `+` if there is one)
* Add an extra space (instead of a `+`) to include a space prior to positive numbers (so that they take up as much width as the negative equivalent)
* Format numbers with **thousands separators** by using a **comma (`,`)**

In [39]:
print(f'[{27:+10d}]')

[       +27]


In [40]:
print(f'[{27:+010d}]')

[+000000027]


In [41]:
print(f'[{-27:+010d}]')

[-000000027]


In [42]:
print(f'[{27: 010d}]')

[ 000000027]


In [43]:
print(f'{12345678:,d}')

12,345,678
