# Python Introduction

This is a far from thorough introduction to Python but should give you enough information to finish the assignment. Python is an interpreted (rather than a compiled) language. What this means is that your code is read and interpreted at runtime instead of optimized beforehand. A  variable can be defined as any of the standard types (integers, floating point numbers, strings, and booleans).

In [None]:
a = 34 #integer
b = 3.2 #floating point
c = False #boolean
d = "this" #string

In Python indentation is important.  Blocks of code must all have the same level of indentation. For example, we can define a function "alpha" using the key word `def` that takes no arguments.  The function must be indented in from its declaration. 


In [None]:
def alpha():
    print("Hello world")

We can run a function by:

In [None]:
alpha()

We can use other functions from modules using the `import` command.  Here is an example of an if/else clause based on importing the `math` module.

In [None]:
import math

def lessMorePi(num):
    pi = math.atan(1.) * 4
    if num < pi:
        print("less than pi")
    else:
        print("more than pi")

lessMorePi(3.14)
lessMorePi(3.142)

A variable that contains a list is declared using the `[]` syntax. We can then add elements to the list by using the `append` function.

In [None]:
kids = []
kids.append("Sammy")
kids.append("Sydney")
kids.append("Fiona")
print(kids)

We can access individual members of a list by their index.

In [None]:
print(kids[1])

A slightly more complicated data structure is a dictionary. A dictionary, like a list, is a container, but its elements are not limited to being accessed through integer indexes. A list can be thought of as a (key, value) pair where each key must be an integer and the integers are in order 0...n-1, where n is the size of the list.
Dictionaries are a more abstract container class where the keys can be of any type. A dictionary is declared using `{}`. Each key can be assigned to at most one value.

In [None]:
goat = {}
goat["football"] = "Jerry Rice"
goat["basketball"] = "Michael Jordan"
goat["hockey"] = "Wayne Gretzky"
print(goat)
goat["football"] = "Tom Brady"
print(goat)

All of the data types we have talked about are different types of Python objects/classes. Using Python classes is an example of object oriented (rather than funtional) programming.  We can declare our own classes using the `class` keyword. Classes are initialized using the `__init__` function.  Each function that accesses class members must use the `self` keyword.

In [None]:
class cluster:
    """A class defining CEES clusters"""
    def __init__(self, name, nodes, cores):
        """Initializer for the cluster class
            name - name of cluster
            nodes - number of nodes
            cores - number of cores per node"""
        self.name = name
        self.nodes = nodes
        self.cores = cores
        
    def numberOfCores(self):
        """Return number of cores"""
        return self.cores * self.nodes
    
ceesClusters = {}
ceesClusters["rcf"] = cluster("rcf", 170, 16)
ceesClusters["cees"] = cluster("cees", 150, 24)
ceesClusters["cees"].numberOfCores()

# Assignment: Parsing a log file

In this section you will modify a working Python code to be more memory efficient.
Currently the code reads in a log file of web requests, storing the access information in an array.
Each element of the dictionary is referenced by its web address, and contains an array
of all of the requests for the given web page. These requests are stored within the class
structure page access. A keyword is then
used to search the database of web pages. All pages that match the command line
string are printed.


The vanilla version of our code (that you will need to modify) is going to create a class called `webEntry` with an initializer and a print function.

In [None]:
class webEntry:
    """Class for storing web entries"""

    def __init__ (self, page, frm, date):
        """Initialize a webEntry object with page name, computer from which it was accessed, date accessed"""
        self.page = page
        self.frm = frm
        self.date = date

    def printIt(self):
        print ("Page:%s from=%s date=%s"%(self.page, self.frm, self.date))

Next we will define a function that loads a logfile. It reads in the file, looping though all the lines in the file. For each line of the file that matches a given pattern we add it to a list of web entries. The function returns the list of web entries.

In [None]:
import re

def buildTree(logFile):
    """Opens a logfile and builds a list containg all page entries
        logFile - file to read"""
    
    parse = re.compile("^(\S+) - - \[(.+)\] \"GET (\S+)") 
    array = []

    with open(logFile) as f: 
        for line in f.readlines():
            x = parse.search(line)
            if x:
                array.append(webEntry(x.group(3), x.group(1), x.group(2)))
    return array

This function searches each item in a list for a given search term. Any line that matches the search term is printed out.

In [None]:
def searchList(database, searchTerm):
    """Search database for searchTerm and print it out"""

    myRegEx = re.compile(searchTerm)
    for item in database:
        if myRegEx.search(item.page):
            item.printIt()

Let's download the logfile.

In [None]:
!touch logFile; rm -rf logFile; wget http://sep.stanford.edu/sep/bob/download/logFile

Now let's run our code.

In [None]:
myList = buildTree("./logFile")
searchList(myList, "biondo")

The memory requirements of using this approach can be significant. 
Each reference to each page is stored in memory. 
For large logfiles this often becomes cost-prohibitive. 
Your assignment is to modify the script so that only web addresses that fit the string that the user requests to search for are added to the dictionary.
In addition we want you to organize the search results, so that all of the lines associated with a given web page are printed out together. 
