# Important Things About Python
* basic types (scalars) vs. container types
  * scalars: int, float, boolean, (string)
  * containers: string, list, dict
* mutable vs. immutable objects
  * immutable: string
  * mutable: list, dict
* everything is an object
  * it lives in memory at some address
* Guido van Rossum
  * raison d'être for Python: string/text/file manipulation
* "batteries included"
* built-in functions do not change objects
  * if you want to change an object, you must call a method on that object
  * not all methods change the objects they are applied to

# Important things about learning
* know when to zoom in / zoom out

# Pythonic
* using the Python idioms and programming in way that is natural to Python programmers
  * converting from one type to another is very Pythonic
  * using negative indexing to access last few items in a container
  * "chaining" function calls
  * the "in" operator
  * [:n] means "the first n items"
  * [-n:] means "the last n items"
  * prefer "for thing in container" to indexing
 

# Programming Thoughts
* DRY = Don't Repeat Yourself
* Hal Abelson: "Programs are written for others to read, and only incidentally for computers to execute."
* Eagleson's Law: "Code that was written more than 6 months ago might as well have been written by someone else."
* DWS says "Efficiency doesn't matter until it matters, and it rarely matters.
* Donald Knuth: "Premature optimization is the root of all evil (or at least most of it)..."
* you read code 10x as often as you write code

In [1]:
string = 'Python'

In [2]:
string

'Python'

In [3]:
string

'Python'

In [4]:
id(string)

140169614559280

In [11]:
x = 4
y = 6
print(x + y) # this is executed by Python
print('ok') # this is executed in interactive mode

10
ok


In [13]:
import keyword
print(keyword.kwlist)

['False', 'None', 'True', '__peg_parser__', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


In [14]:
str(True)

'True'

In [16]:
int(True)

1

In [17]:
id(y)

140169613339088

In [18]:
id(True)

4309390640

In [21]:
print(1, 2, 3, end='...')
print(4, 5, 6)

1 2 3...4 5 6


In [22]:
name = 'Dave'

In [23]:
print('Hi, my name is', name)

Hi, my name is Dave


In [24]:
name

'Dave'

In [25]:
x, y = 1, -2

In [1]:
import random

In [2]:
id(random)

140192700113680

In [3]:
id(len)

140192967532848

In [4]:
id(str)

4333744928

In [5]:
type(random)

module

In [6]:
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_inst',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [7]:
help(random.randint)

Help on method randint in module random:

randint(a, b) method of random.Random instance
    Return random integer in range [a, b], including both end points.



In [8]:
help(random)

Help on module random:

NAME
    random - Random variable generators.

MODULE REFERENCE
    https://docs.python.org/3.9/library/random
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
        bytes
        -----
               uniform bytes (values between 0 and 255)
    
        integers
        --------
               uniform within range
    
        sequences
        ---------
               pick random element
               pick random sample
               pick weighted random sample
               generate random permutation
    
        distributions on the real line:
        ------------------------------
               uniform
               triangular
               normal (Gaussian)
      

In [10]:
random?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'random' from '/Users/dave-wadestein/opt/anaconda3/lib/python3.9/random.py'>
[0;31mFile:[0m        ~/opt/anaconda3/lib/python3.9/random.py
[0;31mDocstring:[0m  
Random variable generators.

    bytes
    -----
           uniform bytes (values between 0 and 255)

    integers
    --------
           uniform within range

    sequences
    ---------
           pick random element
           pick random sample
           pick weighted random sample
           generate random permutation

    distributions on the real line:
    ------------------------------
           uniform
           triangular
           normal (Gaussian)
           lognormal
           negative exponential
           gamma
           beta
           pareto
           Weibull

    distributions on the circle (angles 0 to 2pi)
    ---------------------------------------------
           circular uniform
           von Mises

General notes on the underly

In [11]:
import math

In [12]:
math?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'math' from '/Users/dave-wadestein/opt/anaconda3/lib/python3.9/lib-dynload/math.cpython-39-darwin.so'>
[0;31mFile:[0m        ~/opt/anaconda3/lib/python3.9/lib-dynload/math.cpython-39-darwin.so
[0;31mDocstring:[0m  
This module provides access to the mathematical functions
defined by the C standard.


In [13]:
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.9/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in 

## Quick Lab: Loops/Strings
* have the user enter a string, then loop through the string to generate (or print) a new string in which every character is duplicated, e.g., "Python" => "PPyytthhoonn"

In [17]:
string = input('Enter something: ')
for letter in string:
    print(letter * 2, end='')

Enter something:  Python


PPyytthhoonn

In [18]:
# or...
string2 = ''
for letter in string:
    string2 += letter * 2
print(string2)

PPyytthhoonn


In [19]:
# or ...
for letter in string:
    print(letter + letter, end='')

PPyytthhoonn

In [20]:
for letter in string:
    print(letter, letter, sep='', end='')

PPyytthhoonn

## Lab: Loops
* Loop through the numbers from 2 to 25 and print out which numbers are prime, and for those numbers which are not prime numbers, you should print them as a product of two factors
* Remember that prime = no divisors other than 1 and itself
* Don't worry about efficiency, but if you're interested, check out math.sqrt()
* example output:
<pre>
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3
10 equals 2 * 5
11 is a prime number
12 equals 2 * 6
13 is a prime number
14 equals 2 * 7
15 equals 3 * 5
16 equals 2 * 8
17 is a prime number
18 equals 2 * 9
19 is a prime number
20 equals 2 * 10
21 equals 3 * 7
22 equals 2 * 11
23 is a prime number
24 equals 2 * 12
25 equals 5 * 5
</pre>

In [None]:
# logic test first
# are we checking the right numbers?
for num in range(2, 26): # 2..25
    print(num, end=': ')
    for divisor in range(2, num): # no need to subtract 1 here
        print(divisor, end=' ')
    print()

In [None]:
# looks good, now do the work
for num in range(2, 2600):
    print(num, end=' ')
    for divisor in range(2, num):
        if num % divisor == 0: # divides in evenly
            print('equals', divisor, '*', num // divisor)
            break # no need to check further divisors
    else: # good use of else on for loop...
        print('is a prime number')

In [15]:
for num in range(2, 26):
    print(num)

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


In [None]:
# The key to programming
# 1. break down the problem into a series of steps that you do in your head to solve the problem
# for each number from 2 to 25
#.   for each possible divisor from 2 up to that number
#.        if possible divisor divides in with no remainder, then it's a factor (not prime)
#.        stop looking for more factors
#.   if no possible divisor divides in evenly, then ... it's PRIME

# 2. and then convert each step to Python/Java/etc.

In [16]:
13 % 4

1

In [21]:
print(1)

1


In [22]:
print('string')

string


In [23]:
print(True)

True


In [30]:

print(1)
print(2)
print(3)

1
2
3


In [28]:
s

'\nprint(1)\nprint(2)\n'

In [31]:
'string'.upper()

'STRING'

In [32]:
s = 'python'

In [33]:
s[0] = 'P'

TypeError: 'str' object does not support item assignment

In [34]:
s = 'Python'

## Lab: String Functions
* write a Python program which prompts the user for a string and a stride (increment), and alternately makes the string upper case and lower case, stride characters at a time, e.g.,
![alt-text](images/uplow.png "uplow")


In [58]:
string = 'abcdefghijklmnopqrstuvwxyz'
stride = 4

In [37]:
# for letter in string: # hold this thought

In [39]:
len(string) // stride + 1 # Daniel's idea, break the string into chunksof the appropriate size

7

In [40]:
string[0:4]

'abcd'

In [41]:
string[4:8]

'efgh'

In [43]:
string[8:12]

'ijkl'

In [44]:
string[20:24]

'uvwx'

In [50]:
string[24:28]

'y'

In [46]:
string[10:2]

''

In [52]:
for index in range(0, len(string), stride):
    print(index, ':', index + stride)

0 : 4
4 : 8
8 : 12
12 : 16
16 : 20
20 : 24
24 : 28


In [61]:
times = 0

for index in range(0, len(string), stride):
    if times % 2 == 0:
        print(string[index:index + stride].upper(), end='')
    else:
        print(string[index:index + stride].lower(), end='')
    times += 1 # no ++

ABCDefghIJKLmnopQRSTuvwxYZ

In [59]:
'abcd'.upper()

'ABCD'

In [67]:
do_upper = True

for index in range(0, len(string), stride):
    if do_upper: # should we convert this slice to upper case?
        print(string[index:index + stride].upper(), end='')
        do_upper = False
    else:
        print(string[index:index + stride].lower(), end='')
        do_upper = True

ABCDefghIJKLmnopQRSTuvwxYZ

In [63]:
do_upper = True

for index in range(0, len(string), stride):
    if do_upper: # "should we conver this slice to upper case?
        print(string[index:index + stride].upper(), end='')
    else:
        print(string[index:index + stride].lower(), end='')
    do_upper = not do_upper

ABCDefghIJKLmnopQRSTuvwxYZ

In [68]:
words = 'apple fig pear'.split()

In [69]:
words

['apple', 'fig', 'pear']

In [70]:
['apple', 'fig', 'pear'].join(' ')

AttributeError: 'list' object has no attribute 'join'

In [71]:
'hello'.split()

['hello']

In [72]:
list('hello')

['h', 'e', 'l', 'l', 'o']

In [73]:
list(range(1, 10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [74]:
list(range(1, 10, 2))

[1, 3, 5, 7, 9]

In [75]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [76]:
nums = list(range(10))

In [77]:
nums

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [78]:
nums[1]

1

In [79]:
nums

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [80]:
nums.pop(1)

1

In [81]:
nums

[0, 2, 3, 4, 5, 6, 7, 8, 9]

In [83]:
for times in range(10): # either this is 1) counting from 0..9, or 2) do something 10 times
    print(times)

0
1
2
3
4
5
6
7
8
9


In [85]:
for _ in range(10): # can only mean "do this 10 times"
    print('hi!')

hi!
hi!
hi!
hi!
hi!
hi!
hi!
hi!
hi!
hi!


In [86]:
fruits = 'fig apple pear banana'.split() # instead of [ 'fig', .... ]

In [87]:
fruits

['fig', 'apple', 'pear', 'banana']

In [89]:
fruits = sorted(fruits) # mistake no. 1

In [90]:
fruits # works, but we did extra work (and moved the list in memory) ... we should have fruits.sort()

['apple', 'banana', 'fig', 'pear']

In [91]:
fruits = fruits.sort() # mistake no. 2

In [92]:
help(list.sort)

Help on method_descriptor:

sort(self, /, *, key=None, reverse=False)
    Sort the list in ascending order and return None.
    
    The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
    order of two equal elements is maintained).
    
    If a key function is given, apply it once to each list item and sort them,
    ascending or descending, according to their function values.
    
    The reverse flag can be set to sort in descending order.



In [93]:
print(fruits)

None


## Quick Lab: Lists
* Write a Python program to read in a list of items possibly containing duplicates, and then constructs a new list which contains the elements from the original list, with the order preserved, but the duplicates removed
![alt-text](images/list2.png "list2")

In [115]:
words = input('Enter a list of items: ').lower().split() # chained functioned == Pythonic
new_words = [] # start w/an empty list

for word in words: # for each word in the original list...
    if word not in new_words: # if word is not already in the new list... (we mentioned in, but didn't mention 'not in')
        new_words.append(word) # append it to the new list

#print(' '.join(new_words))
print(*new_words)

Enter a list of items:  apple fig apple fig lemon


apple fig lemon


In [97]:
# read until the user enters quit

In [102]:
prompt = 'What? ' 
response = input(prompt)
responses = []

while response != 'quit':
    print('process', response)
    responses.append(response)
    response = input(prompt)
    
print(responses)

What?  quit


[]


In [104]:
response = ''
while response != 'quit':
    response = input(prompt)
    if response == 'quit':
        break
    print('process', response)

What?  one


process one


What?  two


process two


What?  quit


In [105]:
while True:
    response = input(prompt)
    if response == 'quit':
        break
    print('process', response)

What?  one


process one


What?  two


process two


What?  quit


In [106]:
import sys
sys.version

'3.9.12 (main, Apr  5 2022, 01:53:17) \n[Clang 12.0.0 ]'

In [110]:
# in Python 3.8, we got "assignment expressions" (the "walrus" operator)
while (response := input(prompt)) != 'quit':
    print('process', response)

What?  one


process one


What?  to


process to


What?  quit


In [116]:
sorted('autozone')

['a', 'e', 'n', 'o', 'o', 't', 'u', 'z']

In [117]:
list('autozone')

['a', 'u', 't', 'o', 'z', 'o', 'n', 'e']

In [118]:
sorted([1, -3, 4, 2, 9, -5])

[-5, -3, 1, 2, 4, 9]

In [119]:
sorted({'three': 3, 'four': 4, 'five': 8 })

['five', 'four', 'three']

In [120]:
d = {}

In [121]:
d['Python'] = 1991

In [122]:
d['Java'] = 1995

In [123]:
d['Golang'] = 2009

In [124]:
d

{'Python': 1991, 'Java': 1995, 'Golang': 2009}

## Lab: dictionary
* use a dict to translate Roman numerals into their Arabic equivalents
1. load the dict with Roman numerals M (1000), D (500), C (100), L (50), X (10), V (5), I (1)
2. read in a Roman numeral
3. print Arabic equivalent
4. try it with MCLX = 1000 + 100 + 50 + 10 = 1160
4. __If you have time, deal with the case where a smaller number precedes a larger number, e.g., XC = 100 - 10 = 90, or MCM = 1000 + (1000-100) = 1900__
4. __MCMXCIX = 1999__

In [125]:
roman_to_arabic = {
    'M': 1000,
    'D': 500,
    'C': 100,
    'L': 50,
    'X': 10,
    'V': 5,
    'I': 1,
}

In [126]:
roman_to_arabic['C']

100

In [144]:
roman = input('Enter a Roman numeral: ')

Enter a Roman numeral:  MCMXCIX


In [145]:
total = 0

for digit in roman:
    total += roman_to_arabic[digit]
    
print(total)

2221


In [146]:
# one way to solve the general problem
# pass 1: put the Arabic values in a list
# MCMXCIX
# [ 1000, 100, 1000, 10, 100, 1, 10 ]
# pass 2:
# for each Arabic value in the list:
#.   if value is < its neighbor (to the right), then
#.      make it negative
#
# [ 1000, -100, 1000, -10, 100, -1, 10 ]
# sum up the list = 1999

In [147]:
arabic_vals = []

# pass 1
for digit in roman:
    arabic_vals.append(roman_to_arabic[digit])
    
print(arabic_vals)

[1000, 100, 1000, 10, 100, 1, 10]


In [148]:
# pass 2
for index in range(len(roman) - 1): # don't fall off the end!
    if arabic_vals[index] < arabic_vals[index + 1]:
        arabic_vals[index] = -arabic_vals[index]
        
print(arabic_vals)

[1000, -100, 1000, -10, 100, -1, 10]


In [149]:
# final summation
sum(arabic_vals)

1999

In [134]:
d = { 'foo': 'bar' }

In [135]:
d['foo']

'bar'

In [136]:
d['foot']

KeyError: 'foot'

In [137]:
if 'foot' in d:
    d['foot']

In [138]:
'foo' in d

True

In [139]:
'bar' in d

False

In [140]:
'bar' in d.values()

True

## Quick Lab: File I/O
* write a Python program which prompts the user for a filename, then opens that file and writes the contents of the file to a new file, in reverse order, i.e.,

<pre><b>
    Original file       Reversed file
    Line 1              Line 4
    Line 2              Line 3
    Line 3              Line 2
    Line 4              Line 1
</b></pre>

In [142]:
# for the above, I would recommend using .readlines()
# which reads the entire file at once

In [150]:
filename = input('Reverse what file? ')

with open(filename) as infile:
    lines = infile.readlines() # read all lines of the file into lines list

with open(filename + '.rev', 'w') as outfile:
    # write the lines out in reverse order
    print(''.join(lines[::-1]), file=outfile)

Reverse what file?  poem.txt


## Lab: File I/O + dicts
* write a Python program to read a file and count the number of occurrences of each word in the file
* use a __`dict`__, indexed by word, to count the occurrences
* remember __`d.get(key)`__ will return __`None`__ if there is no such key in the dict (vs. __`d[key]`__ which will throw an exception) and also the __`in`__ operator
  * or use a __`collections.defaultdict`__ if we've covered it
* treat __The__ and __the__ as the same word when counting
* print out words and counts, from most common to least common
* EXTRA: remove punctuation, so __Hamlet,__ == __Hamlet__ # refer back to "import this"
* Road Not Taken and Hamlet are in your materials

In [152]:
filename = input('Count words in which file? ')
wordcounts = {}

with open(filename) as infile:
    for line in infile: # for each line...
        for word in line.lower().split(): # for each word (after making line lower case)
            if word in wordcounts: # we've seen it before
                wordcounts[word] += 1
            else: # new word
                wordcounts[word] = 1
                
# we're done, but the dict will be in order of the words found in the file...not useful
# we could sort the dict, which will sort the keys (words)
for word in sorted(wordcounts):
    print(word, wordcounts[word])           

Count words in which file?  poem.txt


a 3
about 1
ages 2
all 1
and 9
another 1
as 5
back. 1
be 2
because 1
bent 1
better 1
black. 1
both 2
by, 1
claim, 1
come 1
could 2
day! 1
difference. 1
diverged 2
doubted 1
down 1
equally 1
ever 1
fair, 1
far 1
first 1
for 2
grassy 1
had 2
has 1
having 1
hence: 1
how 1
i 8
if 1
in 4
it 2
i— 1
just 1
kept 1
knowing 1
lay 1
leads 1
leaves 1
less 1
long 1
looked 1
made 1
morning 1
no 1
not 1
oh, 1
on 1
one 3
other, 1
passing 1
perhaps 1
really 1
roads 2
same, 1
shall 1
should 1
sigh 1
somewhere 1
sorry 1
step 1
stood 1
telling 1
that 3
the 8
them 1
then 1
there 1
this 1
though 1
to 2
took 2
travel 1
traveled 1
traveler, 1
trodden 1
two 2
undergrowth; 1
wanted 1
was 1
way 1
way, 1
wear; 1
where 1
with 1
wood, 2
worn 1
yellow 1
yet 1


### It's better to sort by the counts of the words, i.e., the *value* of the dict
* In order to do that we need to know two new things:
1. The __`sorted()`__ function takes an optional _key_ argument...it has nothing to do
with dictionary keys–instead, it describes *how* sort does its job. The sorting
key is a function which takes 1 argument, and what it returns dictates the sorting
order. 
2. Every dictionary has a method __`get()`__ which, when passed the key, returns its
corresponding value. It's basically a function version of using the []s to index the dict.

In [156]:
# Let's try an example–support we want to sort a list by the length of the words in it,
# rather than alphabetically. So the list ['pear', 'fig', 'apple'] should be sorted as
# ['fig', 'pear', 'apple'] because _fig_ is the shortest word, then _pear_, and then _apple_.
#
# A regular sorted() call will sort alphabetically...
sorted(['pear', 'fig', 'apple'])

['apple', 'fig', 'pear']

In [158]:
# Now let's try it using the built-in len() function as the sorting key...
# Note that we just pass the *name* of the len function. That tells sorted
# which function to call when comparing two items. So when sorted() has to
# decide how to order 'fig' and 'apple', it calls len('fig') and len('apple')
# and the order is dictated by the numbers 3 and 5 that are returned.
sorted(['pear', 'fig', 'apple'], key=len)

['fig', 'pear', 'apple']

In [159]:
# Let's use the __`.get()`__ method to sort a dictionary–that will cause the
# dict to be sorted by its _values_ rather than its keys

In [160]:
# First, we'll sort the normal way...
sbux_dict = { 'grande': 16, 'venti': 20, 'tall': 12 }
sorted(sbux_dict)

['grande', 'tall', 'venti']

In [161]:
# ...but the above isn't very useful because we don't want the cup *names* to be
# sorted, but rather, the cup *sizes* (the values in the dict). By setting the
# sorting key to the get() method, we get what we want, namely, that the keys
# are sorted by their corresponding values...12, 16, 20.
sorted(sbux_dict, key=sbux_dict.get)

['tall', 'grande', 'venti']

In [164]:
filename = input('Count words in which file? ')
wordcounts = {}

with open(filename) as infile:
    for line in infile: # for each line...
        for word in line.lower().split(): # for each word (after making line lower case)
            if word in wordcounts: # we've seen it before
                wordcounts[word] += 1
            else: # new word
                wordcounts[word] = 1
                
# Sort the dict keys by their values, and in reverse order (largest to smallest).
# We can use slicing to restrict the output to the top 10, 20, or however many we
# want to see...

top = int(input('Top 10/20/etc.? '))
for word in sorted(wordcounts, key=wordcounts.get, reverse=True)[:top]:
    print(word, wordcounts[word])           

Count words in which file?  hamlet.txt
Top 10/20/etc.?  20


the 1137
and 936
to 728
of 664
a 527
i 513
my 513
in 423
you 405
hamlet 401
that 345
it 325
is 318
his 294
not 274
with 263
this 249
your 242
but 229
for 228


In [171]:
# To remove punctuation we have several options. There is a string method named
# translate(), which takes a dict of chars to conver to other chars. We could use
# translate() to convert all punctuation to empty strings (thereby removing them),
# but it's a bit clunky since the keys have to be Unicode values–which you can get
# by calling the built-in function ord().
#
# A more Pythonic solution is to list-ify the string, dropping out the punctuation
# characters, then join-ing the list items back into a string. Normally this would 
# be done in one line with a *list comprehension*, something we didn't have time to
# cover in our one-day class. So instead I will do it "longhand"...

import string # we can get a string containing all the punctuation characters here

line = 'Hi Hamlet, how are you?'
edited_line = ''

for char in line.lower():
    if char not in string.punctuation: # if it's not punctuation, append it
        edited_line += char
        
print(edited_line)

hi hamlet how are you


In [173]:
# Now we put the above into our solution...
import string

filename = input('Count words in which file? ')
wordcounts = {}

with open(filename) as infile:
    for line in infile: # for each line...
        new_line = ''
        for char in line.lower():
            if char not in string.punctuation: # if it's not punctuation, append it
                new_line += char
        for word in new_line.split(): # for each word (after making line lower case)
            if word in wordcounts: # we've seen it before
                wordcounts[word] += 1
            else: # new word
                wordcounts[word] = 1
                
# Sort the dict keys by their values, and in reverse order (largest to smallest).
# We can use slicing to restrict the output to the top 10, 20, or however many we
# want to see...

top = int(input('Top 10/20/etc.? '))
for word in sorted(wordcounts, key=wordcounts.get, reverse=True)[:top]:
    print(word, wordcounts[word])  

Count words in which file?  hamlet.txt
Top 10/20/etc.?  20


the 1142
and 964
to 737
of 669
i 567
you 546
a 531
my 513
hamlet 463
in 436
it 416
that 389
is 340
not 313
lord 310
his 296
this 296
but 270
with 267
for 248
