# Lab 2

## Lab purpose
This lab takes a step back from working with and within ArcGIS to explore Python as a language in and of itself. In this lab, you’ll be asked to solve a variety of problems using Python. Some of these problems will have immediate applied uses, while others will simply be asking you to think computationally – about what you can and cannot solve using Python and how it might be used in a variety of generalized tasks. You will also gain familiarity with the specific syntax of the Python language.

There are a wide variety of articles, guides, tutorials, and reference materials available on the Python language. You’re encouraged to read through many of these and refer to them when you run into difficulty. Make sure you understand why any solution works or you will run into significant difficulties later.

You will turn in a series of python scripts that you create to solve each problem. You will host them [in the repository here](https://github.com/UWTMGIS/TGIS501_W18/tree/master/lab2). If you are using a notebook, you will enter all of your code here and upload a single .ipynb file with the format LastName.ipynb. If you are uploading individual scripts, they will be LastName_ProblemNumber.py.

**It is strongly recommended you work through the entirety of Exercise 4 from the Scripting… book before attempting these problems. This lab may be completed using either python 2 or 3.**

## Problem 1: The Trouble With Turtles

Depending on your age, you may remember playing with a Turtle drawing program in elementary school. The history behind the Turtle is a bit longer (dating back to the 1960s) and “Turtle Graphics” generally refers to a means of drawing vector graphics on a Cartesian plane. You draw by moving an imaginary turtle around the screen. Each turtle (and there can be more than one) has a location, a pen, and an orientation. With simple commands you can draw truly complex shapes.

Follow [this link](http://openbookproject.net/thinkcs/python/english3e/hello_little_turtles.html) and work up through section 3.6 (there are challenge problems below you can do for fun, but perhaps come back to them later).


### Question 1

**Write a script that _asks the user to input a number and then draws a shape with that number of sides_**

If you need help on getting keyboard input in python, take a look [here](http://www.python-course.eu/input.php)

If you are using a notebook for this assignment, you can enter your code in the cell below, otherwise write your own script.

In [1]:
import turtle

#user input: number of sides
num = raw_input("Input a whole number between 3 and 20, then press 'enter' to draw a polygon with that many sides: ")
num = int(num)
num = num-1

#handle user input error conditions
if(num<2):
    print("Squirtle can't draw a polygon in that few of lines :(")
    
if(num>19):
    print("Sorry, that's too many.")
    
#if user input in range, start to draw
elif(num<=19 and num>=2):
    
    #open screen canvas, set background color
    wn = turtle.Screen()
    wn.bgcolor("violet")

    #set window title
    wn.title("Run, turtle, run")

    #name, position, and stylize the turtle
    squirtle = turtle.Turtle()
    squirtle.pu()
    squirtle.shape("turtle")
    squirtle.color("black")
    squirtle.pensize(2)
    squirtle.goto(-50,-120)
    squirtle.pd()
    
    #draw, add stamp on verticies to make counting sides easier
    for i in range(num):
        squirtle.fd(100-(2*num))
        squirtle.left(360/(num+1))
        squirtle.stamp()
        
    squirtle.goto(-50,-120)

    #allow kernel to stop on click, otherwise get SBBOD
    wn.exitonclick()


Input a whole number between 3 and 20, then press 'enter' to draw a polygon with that many sides: 24
Sorry, that's too many.


## Problem 2: Is GIS the best, and the looming horrors of New England

Problem 2 will have you working with text files to manipulate and analyze them. Here, we'll be working with plain text files and doing some very rudimentary forms of analysis. We'll return to these ideas later in the course using a more sophisticated approach (natural language processing), but for now we're focusing on the basics of opening files, manipulating data, and using flow control to iterate across datasets. 



### Question 3

You have written a long, beautiful ode to GIS and called it GIS_is_the_best.txt. Wow, you're very proud of yourself.
Find this file, which you have written [in this repository](https://github.com/UWTMGIS/TGIS501_Files).

You can either read it directly from the web (which we'll cover in a later lab, but you can likely figure out now if you wish) or just clone/download it somewhere local. 

A couple of hints before we go one:
1. Check out the methods .upper() and .lower()
2. The assigned readings for this week covered opening files. Exercise 4 (recommended above) discusses counting words. You can also check out a quick tutorial on text files [here](http://opentechschool.github.io/python-data-intro/core/text-files.html).


In the next cell (or in your own script), write a script that prints the total number of words in the document.


In [47]:
# Will use 'open' function in next exercise, here I try to load file from Github raw url
# *only works with valid token, which I will need to hide before putting on github*

import requests
url = 'https://raw.githubusercontent.com/UWTMGIS/TGIS501_Files/master/GIS_is_the_best.txt?token=[InsertGitHubTokenHere]'
page = requests.get(url)


# Read file as text string
text = page.text

# Get rid of all uppercase text (not really needed, yet)
lowc = text.lower()

# Strip leading spaces on line 1
strip = lowc.lstrip()

# Split string into list of words
wordlist = strip.split()

# Count lentght of list
ct = len(wordlist)

# Print resulting count
print 'The file contains', ct, 'words.'


The file contains 28177 words.


**That was pretty fun!"

But, don't worry, we have something even more exciting in store. It turns out that you've recently become enamored with the unspeakable horrors and non-Euclidean geometries found in H.P. Lovecraft's prose. You've decided you want to investigate his work _The Shunned House_ to find two things: How many unique words he uses and how many times he uses the word "uncle".

You can find the text file for the shunned house in the same directory as your ode to GIS, [here](http://opentechschool.github.io/python-data-intro/core/text-files.html).

As a hint, you make want to look into the [collections module](https://docs.python.org/2/library/collections.html#collections)


### Question 4

How many unique words does Lovecraft use in _The Shunned House_?

Like before, case does not matter; so, "whisker" and "Whisker" would be the same. Make sure you strip out punctuation, by the by, otherwise you might end up with "whisker." "whisker?" and "whisker" as separate words!

In [2]:

# Import string module to allow translation to list without punctuation and another sub module to strip numbers
import string
from string import digits

# Open txt file (I'm using a local file instead of a url to demonstrate both 'get' and 'open' in this assignment)
doc = open("shunned_house.txt")

# Read file as text string
textstr = doc.read()

# Convert to all lowercase
lowcase = textstr.lower()

# Need to replace dash with space to split properly (otherwise joins hyphenated words)
nodash = lowcase.replace("-", " ")

# Fix ae in Athanaeum?? - still shows as single, unique word in set. ignore.

# Remove punctuation by translating text with string module 
remove_punct = nodash.translate(string.maketrans("",""), string.punctuation)

# Split string on standard whitespace to get a list of all words used
textlist = remove_punct.split()

# Quick iterate to remove numbers
textlist2 = [ i for i in textlist if i.isalpha()]

# Create a set from textlist to remove duplicates
unilist = set(textlist2)

# Use len to get a count of objects in set array (ie unique words)
ct_uni = len(unilist)

print "There are", ct_uni, "unique words in The Shunned House."
print "Note: this calculation counts possessive forms of a noun as unique from the noun itself (Joe vs. Joe's)."




There are 2943 unique words in The Shunned House.
Note: this calculation counts possessive forms of a noun as unique from the noun itself (Joe vs. Joe's).


### Question 5

How many times does Lovecraft use the word "uncle" - again, case does not matter and make sure you strip punctuation.

In [130]:

# Import string module to allow translation to list without punctuation
import string

# Open txt file (I'm using a local file instead of a url to demonstrate both 'get' and 'open' in this assignment)
doc = open("shunned_house.txt")

# Read file as text string
textstr = doc.read()

# Convert to all lowercase
lowcase = textstr.lower()

# Remove apostrophe's to capture "uncle's" - CAREFUL, splits into 2 words ("Uncle" + "s")
noapos = lowcase.replace("'"," ")

# Need to replace dash with space to split properly (not a recognized punctuation)
nodash = noapos.replace("-", " ")
nodash_alt = lowcase.replace("-", " ")

# Remove punctuation by translating text with string module 
remove_punct = nodash.translate(string.maketrans("",""), string.punctuation)
remove_punct_alt = nodash_alt.translate(string.maketrans("",""), string.punctuation)

# Split string on standard whitespace to get a list of all words used
textlist = remove_punct.split()
textlist_alt = remove_punct_alt.split()

uncle = textlist_alt.count("uncle")
uncles = textlist.count("uncle")

print "Lovecraft uses the word 'uncle'", uncles, "times, if you include the posessive (uncle's)."
print "Otherwise, just the word 'uncle' is used", uncle,"times."


Lovecraft uses the word 'uncle' 39 times, if you include the posessive (uncle's).
Otherwise, just the word 'uncle' is used 26 times.


### Bonus Questions (+1 pt each)

These questions are _meant_ to be hard. I will only give you limited help with these. You will find some questions harder than others, you will find some questions more interesting than others. All of the questions are possible.

#### Bonus Question 1
Excluding prepositions and articles ("from", "the", "an", "with", etc.) - what are the five most frequently used words in _The Shunned House_? 

In [119]:

# Import string module to allow translation to list without punctuation
import string

# Import counter to get a list of each word and count its occurence
from collections import Counter

# Open txt file (I'm using a local file instead of a url to demonstrate both 'get' and 'open' in this assignment)
doc = open("shunned_house.txt")

# Read file as text string
textstr = doc.read()

# Convert to all lowercase
lowcase = textstr.lower()

# Need to replace dash with space to split properly (not a recognized punctuation)
nodash = lowcase.replace("-", " ")

# Remove punctuation by translating text with string module 
remove_punct = nodash.translate(string.maketrans("",""), string.punctuation)

# Split string on standard whitespace to get a list of all words used
textlist = remove_punct.split()

""" totals = Counter(textlist)
    print totals               """

# Create list of prepositions and articles (had to do a raw count first to assemble list see above)
# I left out short conjunctions and possessive pronouns as well, (my, his, her, their, etc...), they are no fun
filterwords = ['the', 'and', 'of', 'a', 'in', 'to', 'that', 'it', 'which', 'with', 'as', 'from', 'at', 'for', 'by', 'on', 'or', 'but', 'not', 'all']

# Subtract filterword list from textlist list
filteredlist = [w for w in textlist if w not in filterwords]

# Use Counter() to count occurences and output as collection (dictionary subclass, key value pairs)
counted = Counter(filteredlist)

# Print 5 most common collection elements to see most common (nonprepositional/article/possessive pronoun) words
print "5 most common words, along with # of occurences:", counted.most_common(5)
print "(Excludes prepositions, articles, conjunctions, and possessive pronouns, because those are no fun)"


           

5 most common words, along with # of occurences: [('i', 177), ('was', 155), ('had', 134), ('my', 101), ('house', 58)]
(Excludes prepositions, articles, conjunctions, and possessive pronouns)


#### Bonus Question 2
Write a script that asks for number, checks to make sure that it is in fact a number, and then finds said number’s square root (within an error of .0001). __Do not use any build in commands that find square roots (such as sqrt() or x**(1/2). You must build the script using only multiplication, division, addition, and subtraction (you may also use absolute value).__

Pay attention to how many iterations it takes to solve, try to minimize it (By the way, there is a 'best' solution here, try not to look it up).

In [136]:
num = raw_input("Enter a number, and I will guess it's square root using inefficient iterations:")
print num                
                
                

Enter a number, and I will guess it's square root using inefficient iterations:298372958
298372958


Once you lab is done - either as a series of separate scripts with names of the form LastName_QuestionNumber.py or within this document itself (a .ipynb file), upload them to the [lab2](https://github.com/UWTMGIS/TGIS501_W18/tree/master/lab2) area of the repository.