# Python Student Notebook for Intermediate Topics

A compendium of intermediate-level topics, illustrative examples, best practices, tips and tricks.

## Table of Contents

+ [Default Dictionary](#DefaultDict)
+ [Ordered Dictionary](#OrderedDict)
+ String Formatting
+ Logging
+ Web Crawling
+ [Appendix](#Appendix)


## Default Dictionary
<a id=DefaultDict</a>

Default dictionaries are extensions of the dict object class.  Default dictionaries have all the characteristics of regular dictionaries, except they do not raise KeyError.  Default dictionaries will populate a new entry with a default when a reference is made to a non-existent entry.

Default dictionaries are one of about a dozen data structures available from the collections module.


https://docs.python.org/3/library/collections.html

https://www.accelebrate.com/blog/using-defaultdict-python/

https://pymotw.com/2/collections/defaultdict.html

https://alexlouden.com/posts/2015-defaultdict-in-python.html

### Getting Started
We will need a "factory function" to populate the default values.  Some use cases will indicate the need for a factory that generates a "blank" or "empty" value.

In [1]:
from collections import defaultdict

# Useful "factory" functions to provide "empty" starting values.  
print (int())
print (list())
print (str())
print (float())
print (set())
print (dict())

# Some favorite strings for organizing output
bar_string = "#" + 65*'='   # Multipled string can be multiple character
line_string = "#" + 65*'-' # Multipled string can be multiple character

0
[]

0.0
set()
{}


### Default Dictionary Generation

In [2]:
# Generating a normal dictionary
a = {"Alabama":"Pine", "Texas":"Pecan", "Alaska":"Spruce"}
print ( dir(a))
print ( type(a))

print (line_string)
# A default dictionary for state trees, with blank returned for undefined states.
state_tree = defaultdict(str)   # Note str as "factory" function called with no parentheses

print ( dir(state_tree))     # A few new initial methods
print ( type(state_tree))

state_tree["Alabama"] = "Pine"
print (len(state_tree))
state_tree["Texas"] = "Pecan"
state_tree["Alaska"] = "Spruce"
print (state_tree["Alaska"])
print (len(state_tree))
print (state_tree)
print (line_string)

print (state_tree["Vermont"])    # Vermont hasn't been defined
print (len(state_tree))          # Note that length has increased just from the query
print (state_tree)               # Note that contents have been expanded just from the query

['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
<class 'dict'>
#-----------------------------------------------------------------
['__class__', '__contains__', '__copy__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__missing__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'default_factory', 'fromkeys', 'get

### Factory Generating Real and Variable Content for Default Value

In [3]:
# A default dictionary for state trees, with "Oak" always returned for undefined states.

state_tree = defaultdict(lambda: "Oak")   # Note lambda as "factory" function called to generate constant default

state_tree["Alabama"] = "Pine"
print (len(state_tree))
state_tree["Texas"] = "Pecan"
state_tree["Alaska"] = "Spruce"
print (state_tree["Alaska"])
print (len(state_tree))
print (state_tree)
print (line_string)

print (state_tree["Georgia"])    # Georgia hasn't been defined
print (len(state_tree))          # Note that length has increased just from the query
print (state_tree)               # Note that contents have been expanded just from the query

print (line_string)

# A default dictionary for state trees, with today's day of week returned for undefined states.
# A default dictionary with a separate custom factory function with logic 

import datetime
def day_factory():
    day_string = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    index = datetime.datetime.today().weekday()
    return day_string[index]

state_tree = defaultdict(day_factory)   # Note defined function as "factory" function called to generate custom default

state_tree["Alabama"] = "Pine"
print (len(state_tree))
state_tree["Texas"] = "Pecan"
state_tree["Alaska"] = "Spruce"
print (state_tree["Alaska"])
print (len(state_tree))
print (state_tree)
print (line_string)

print (state_tree["Georgia"])    # Georgia hasn't been defined
print (len(state_tree))          # Note that length has increased just from the query
print (state_tree)               # Note that contents have been expanded just from the query
                                    # and note that new contents is today day of week.


1
Spruce
3
defaultdict(<function <lambda> at 0x0000000004C2ED90>, {'Texas': 'Pecan', 'Alaska': 'Spruce', 'Alabama': 'Pine'})
#-----------------------------------------------------------------
Oak
4
defaultdict(<function <lambda> at 0x0000000004C2ED90>, {'Texas': 'Pecan', 'Alaska': 'Spruce', 'Alabama': 'Pine', 'Georgia': 'Oak'})
#-----------------------------------------------------------------
1
Spruce
3
defaultdict(<function day_factory at 0x0000000004AD5510>, {'Texas': 'Pecan', 'Alaska': 'Spruce', 'Alabama': 'Pine'})
#-----------------------------------------------------------------
Thursday
4
defaultdict(<function day_factory at 0x0000000004AD5510>, {'Texas': 'Pecan', 'Alaska': 'Spruce', 'Alabama': 'Pine', 'Georgia': 'Thursday'})


### Default Dictionary Techniques

In [4]:
# Using default dictionary to count things where the identity of particpating things not known in advance.
# This example counts words that start with the various letters of the alphabet.
#      Note that factory generates zero, which must be incremented as default count.
words= ["Houston", "Austin", "Huston", "Walnut Creek", "Kansas City", "Atlanta"]
wordcount = defaultdict(int)
for word in words:
    firstletter = word[0].lower()
    wordcount[firstletter] += 1                # Note iteration
print (wordcount)

print (line_string)
city_list = [('Texas','Austin'), ('Texas','Houston'), ('Texas', 'Abilene'), ('New York','Albany'), ('New York', 'Syracuse'), 
             ('New York', 'Buffalo'), ('New York', 'Rochester'), ('Texas', 'Dallas'), ('California','Sacramento'), 
             ('Kansas', 'Lawrence'), ('California', 'Palo Alto'), ('California', 'Atlanta')]
cities_in_state = defaultdict(list)
for state, city in city_list:
    cities_in_state[state].append(city)
print (cities_in_state)    

for state in cities_in_state:
    print (state)
    for city in cities_in_state[state]:
        print (city)





defaultdict(<class 'int'>, {'a': 2, 'h': 2, 'w': 1, 'k': 1})
#-----------------------------------------------------------------
defaultdict(<class 'list'>, {'Texas': ['Austin', 'Houston', 'Abilene', 'Dallas'], 'Kansas': ['Lawrence'], 'New York': ['Albany', 'Syracuse', 'Buffalo', 'Rochester'], 'California': ['Sacramento', 'Palo Alto', 'Atlanta']})
Texas
Austin
Houston
Abilene
Dallas
Kansas
Lawrence
New York
Albany
Syracuse
Buffalo
Rochester
California
Sacramento
Palo Alto
Atlanta


### Default Dictionary Mysteries
To the apprentice, some notation doesn't apply to that which it first seems.  In this case, the append method which appears to be applied to the dictionary and default dictionary is actually applied to the list contained therein.

In [1]:
# Why does append work when it doesn't seem to be documented?
wordclean = ["a", "am", "all", "as", "at", "ma", "saw", "was", "bat", "tab"]
empty_dict = dict()
print (type(empty_dict))
print (dir(empty_dict))       # Note that there is no append method.
print ("#" + 65*'-')

test_list_in_dict = dict()

test_list_in_dict = {"Texas":["Austin", "Houston", "Abilene"],
    "Kansas":["Riley", "Leavenworth"]
    }

test_list_in_dict["Kansas"].append("Olathe") #The append method appears for the dict but is actually for contained list.

def signature(word):
    return ''.join(sorted(word))
import collections
words_in_sig_ddict = collections.defaultdict(list)
print (type(words_in_sig_ddict))
print (dir(words_in_sig_ddict))       # Note that there is no append method.
for word in wordclean:
    words_in_sig_ddict[signature(word)].append(word)



<class 'dict'>
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
#-----------------------------------------------------------------
<class 'collections.defaultdict'>
['__class__', '__contains__', '__copy__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__missing__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', '

## Ordered Dictionary
<a id=OrderedDict</a>
Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.

### Getting Started


In [7]:
import datetime
import string
import csv
from collections import OrderedDict

### Ordered Dictionary Generation

In [11]:
# regular unsorted dictionary
d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
print (type(d))

# dictionary sorted by key
ds1 = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
print (type(ds1))
print (ds1)

# dictionary sorted by value
ds2 = OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
print (type(ds2))
print (ds2)

# dictionary sorted by length of the key string
ds3 =  OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
print (type(ds3))
print (ds3)

<class 'dict'>
<class 'collections.OrderedDict'>
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
<class 'collections.OrderedDict'>
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
<class 'collections.OrderedDict'>
OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])


### Test Reading Lines of Shortened File Into Ordered Dict
Using syntax provided from:

https://docs.python.org/3/library/csv.html#csv.DictReader

In [4]:
# Create empty dict_of_ordered_dicts
temp_dood = dict()

# Read CSV file line by line and populate
with open('test_data_1.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print ('type of row: ', type(row))
        print(row['DATE'], row['TMAX'], row['TMIN'])
        key = row['DATE']
        temp_dood[row['DATE']] = row
        # temp_dood(row['DATE']) = row
    print (type(reader))
    print (type(row))
    print (type(temp_dood))
    print (row)
    # print (temp_dood)
    print (len(temp_dood.keys())) 

type of row:  <class 'dict'>
19500501 69 61
type of row:  <class 'dict'>
19500502 84 64
type of row:  <class 'dict'>
19500503 87 72
type of row:  <class 'dict'>
19500504 89 72
type of row:  <class 'dict'>
19500505 76 68
<class 'csv.DictReader'>
<class 'dict'>
<class 'dict'>
{'TMIN': '68', 'DATE': '19500505', 'STATION': 'GHCND:USW00013958', 'TMAX': '76', 'STATION_NAME': 'AUSTIN CAMP MABRY TX US'}
5


## String Formatting

https://docs.python.org/3/library/string.html

https://pyformat.info/

https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3


True!


## Logging



## Web Crawling



### FTP

In [None]:
import urllib.request
urllib.request.urlretrieve('ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt','ghcnd_stations.txt')

## Appendix
<a id="Appendix"></a>

Welcome!  This notebook (and its sisters) was developed for me to practice some Python and data science fundamentals, and for me to explore and notate some interesting tricks, quirks, and lessons learned the hard way.

Because I'm a naval history buff, I have occasionally used US naval ship information as practice data.  US naval ships each have a unique identifying "hull number," making it is easy to build many common Python data structures around ship characteristics.  More information about US "hull numbers" is available from:

http://www.navweaps.com/index_tech/index_ships_list.php

### Tell Me I'm an Idiot!
I welcome coaching, constructive criticism, and insight into more efficient, effective, or Pythonic ways of accomplishing results!

Sincerely,

*Carl Gusler*

Austin, Texas

carl.gusler@gmail.com