In [1]:
print('Hello Class!')

Hello Class!


# Lab 2
## Solving Problems with Python

This lab is meant as a first step into the wonderful world of scripting. We're going to be focusing on solving problems using a fundamental element of flow control: the if/then statement.

The second half of this lab and next week's class will go over iteration and flow control (loops, etc.), but for the first half, you just need to think about the statement like this: "*If* something is true, *then* do something; *if not* do something else."

The best way to understand that is with some hands on examples, so let's get started.

# But, first some housekeeping.

This lab should appear to you as a series of markdown and code boxes with which you can interact. **If** it appears as a statically rendered page (try clicking on a box), then you are not loading it within jupyter or arcgis. Let's talk about how you set up environments for a moment.


## The following is how to set up your lab using anaconda. If you are using arcgis, please skip to the next section.
I like to create a new environment for each lab. This keeps everything nice and tiday and makes sure that any conflicts between versions of libraries are handled 'automagically' (*usually*). But, I've also had students create a single environment for each class. Feel free to do either. For now, let's talk about how you create en environment.

**I am assuming that you have anaconda installed and are able to get to the anaconda prompt**. If that is not the case, [go here and do so](https://www.anaconda.com/download/).

With your prompt staring at you, you can create an environment like so:
`conda create -n lab1 python=3.7`

That tells conda to create an environment named (-n) lab1 and to install python version 3.7 with it. You could just as easily give it another name or a different version of python. Anaconda will go out and find all the packages necessary to do this and prompt you that you are sure you want to install them, you are. **Note: If you are using the ESRI python prompt, you may get an error if you try to use a version of python other than 3.8**

Ok, now once that's completed, it's time to activate your environment. You do so like so:

**In windows:** `activate lab1`

**In mac/linux:** `source activate lab1`

You should now have `(lab1)` to the left of your prompt. This means that you are in the environment you created. 

Now it's time to install the packages we'll need for this lab. In this case, we only need jupyter notebooks, so go ahead and type:
`conda install jupyter`

This tells Anaconda to go out and find all the packages needed to run jupyter notebooks. You'll be prompted that you want to install them, you'll say yes again.

Once ***all that*** is done, type in `jupyter notebook` and a notebook application should launch in your web browser. At that point, simply navigate to this file and open it again. **Boom**, you're ready to go.

#### This may seem like a lot, but with time it will become like second nature. I recommend taking some time now to play with Anaconda and Jupyter; they are both extremely useful tools that, along with github, will help you organize and create amazing scripts.


---


## Here are the instructions if you are choosing to remain within Arcgis (and its version of anaconda and jupyter).

ESRI, in their vast wisdom, both installs a version of anaconda to manage packages within Arcgis Pro's python environment(s) **and** now allows you to open notebooks directly within ArcGIS Pro.

If you want to use the command prompt, look for what ESRI calls the "Python Command Prompt" in your Windows Menu. It will load into what amounts to an anaconda prompt, but places you in a default `(arcgispro-py3)` environment. This means that anything you install (unless you change environments) will be included in the environment used by Arcgis Pro. This has benefits and dangers! You can both extend the functionality of python with ArcGIS Pro *and* potentially install something that doesn't work with core features of ArcGIS Pro. Fun!

Once you have the command prompt open, go to the directions above for anaconda. They are identical because... ESRI is just using a version of anaconda! Sneaky!

If you prefer to stick entirely to the GUI (not recommended, but you should find the path for coding that works best for you), **pages 229 to 250** of your textbook cover this topic. You can also read the ESRI documentation [here](https://pro.arcgis.com/en/pro-app/latest/arcpy/get-started/pro-notebooks.htm). 

---

#### Of course, I'll be showing you **all** methods in class, so pay attention, review the videos, and read the documentation! 

#### Finally, think about your workflow. How will you *pull* things down from our class repository and *push* them up when you are done. **Set up your workflow now!**

# Still with me, great. Let's look at `if` statements


In [3]:
x = 20

print("Checking your number...")
if x > 4:
    print('The number is large!')


Checking your number...
The number is large!


### Ok, ok, ok.
That's pretty simple. True, but throughout this course you're going to learn how to use *relatively* simple concepts to break apart *seemingly* difficult problems to save time and energy and to accomplish some really neat things.

As you can see, the format for an if/then statement in python looks like:
```python
if [something]:
    [do something]
```
This seems trivial, but it's actually fairly fundamental - `if` allows you to tell a script to evaluate `[something]` and to perform a task **conditional upon that evaluation**.

Think about it for a moment: If a set of data isn't in one project, reproject it. If files aren't in a directory, go get them. Etc.

### Of course, what if we want it to do something `else`


In [1]:
x = 10

print('Checking your number...')
if x > 20:
    print('What a big number!')
else:
    print("Not quite big enough, I'm afraid")

Checking your number...
Not quite big enough, I'm afraid


### Simple enough.

Now, let's introduce `elif` which stands for "else if..."

Basically, this allows you to chain a bunch of `if` statements together. Bearing in mind, **ONLY THE FIRST STATEMENT THAT EVALUATES TO TRUE WILL BE PERFORMED**.

Let's look at this:

In [7]:
x = 16

print('Checking your number...')
if x > 20:
    print('What a big number!')
elif x > 15:
    print('Pretty big!')
elif x > 10:
    print('This is a fine number')
else:
    print('This number is too small')

Checking your number...
Pretty big!


#### Note how *only* the second elif was performed.

In other words, even though x **is** greater than 10 as well, only the code after `x > 15` was performed.

**This is important, so I'll say it again in big letters:**
### Only the first statement that evaluates to true will be performed.

And with that, we're ready to start our lab. *No, seriously,* this is about learning to break apart difficult problems using relatively simple computational tools. We're ready to start!

### Finally, remember to review your readings for this week (found on Canvas). They cover all of this material directly.

## Problem 1: Can I sleep?

In this fairly classic problem (so, *yes*, if you feel like it you can easily look up the answer), I'm going to give you three lists. Each list contains two values. The first value is whether it is a weekday or not. The second value is if I am on vacation or not. Your goal is to output whether or not I can sleep in.

An example will help.

`[True, False]` would mean that it is a weekday and I am not on vacation. **Uh oh, better get up!**

`[False, False]` would mean I am not on vacation, but it also isn't a weekday. **Sleep in!**

Now, of course, all of this is a bit contrived as with kids there is never any sleeping in, but it's an exercise in evaluating boolean logic.

In [60]:
x = [False, True]
y = [True, True]
z = [True, False]

xx = [x, y, z]
for i in range(0, 3):
    if xx[i][0] == True and xx[i][1] == True:
        print('Sleep in!')
    elif xx[i][0] ==True and xx[i][1] == False:
        print('Uh oh, better get up!')
    elif xx[i][0] ==False and xx[i][1] == False:
        print('Sleep in!')
    else:
        print('Sleep in!')

Sleep in!
Sleep in!
Uh oh, better get up!


## Problem 2: Odd addition.

Another classic with a twist. Here, I'm going to give you three lists of two numbers. What I want you to do is write a script that checks each pair of numbers and *if both numbers are odd* multiplies them and prints the result. Otherwise, it prints the result of adding both numbers.

Example: `[3, 5]` would return `15`.

While: `[3, 6]`would return `9`.

Hint: Look up the modulo operator in python (it's in your lecture as well).

In [62]:
a = [7, 9]
b = [10, 3]
c = [2, 5]

aa = [a, b, c]
for i in range(0, 3):
    if aa[i][0] % 2 == 1 and aa[i][1] % 2 == 1:
        print(aa[i][0] * aa[i][1])
    else:
        print(aa[i][0] + aa[i][1])


63
13
7
a = 63
b = 13
c = 7


## Problem 3: Twenty One.

Here's a nice relaxing final problem.
I'll give you two numbers, return `True` if they add up to 21, return `False` if they don't; however, if the numbers are identical, return the string "Split".

In [63]:
p = [19, 2]
q = [7, 7]
r = [4, 6]

pp = [p, q, r]
for i in range (0, 3):
    if pp[i][0] == pp[i][1]:
        print('Split')
    else:
        print(pp[i][0] + pp[i][1] == 21)


True
Split
False


## Problem 4: Is it even?

Write a script that asks the user for a number and then prints out whether the number is even or not. Hint: Think about if/then statements and how you check for the remainder of a something.

Note: For this problem, assume that the user inputs an actual number.

**Bonus Possible:** Don't assume your user gives a valid input. Instead, build some form of error catching that continues to ask for an input *until* the user gives a number. **+1 pt**

In [1]:
number = input('Please enter a number! ')

while str.isdigit(number) == False:
    input('Please enter a number! ')
number = int(number)
if number % 2 == 0:
    print('Even')
else:
    print('Odd')


Please enter a number! 5
Odd


### Bonus Problem 1: Find the square root (+2 pts possible)

Write a script that asks the user to input a number and then finds the square root of said number within .001. You may assume that the input is a valid number; however, you may not use the built in commands to find square roots (such as sqrt()) or x**(1/2)). **Instead, you must use only multiplication, division, addition, and subtraction (you may also use absolute value)**.

Pay attention to how many iterations your solution takes. There is an optimal solution here; however, don't look it up, instead spend some time thinking about how you might 'guess' a number. Use nested loops if you need to. Partial credit is not possible.

In [70]:
number = input('Please enter a number! ')
number = float(number)
while number - number * number >= 0.001:
    number = number / 2
    

Please enter a number! 4


# Checkpoint 1
#### You should have up until this point completed by the start of the second class (1/21/21). It's ok if some things aren't quite working yet, but you really should be to this point if at all possible. 
#### You will receive 5 points (out of a total of 15 for the lab) for being at this checkpoint.

---

# Lab 1 - Week 2
## Iteration, data parsing, and the like

This week, we're not yet focusing on spatial data (or arcgis) quite yet, but instead really diving into some of the core concepts with coding and automation - namely the ability to parse and manipulate data through iteration and flow control.

As always, spend some time reading through the lab and thinking about how to break apart each 'chunk' of the problem into something small. Some of these problems will have immediate applied uses, while others will simply be asking you to think computationally – about what you can and cannot solve using Python and how it might be used in a variety of generalized tasks. You will also gain familiarity with the specific syntax of the Python language.

There are a wide variety of articles, guides, tutorials, and reference materials available on the Python language. You’re encouraged to read through many of these and refer to them when you run into difficulty. Make sure you understand why any solution works or you will run into significant difficulties later. **But, never forget your assigned readings and lectures. They will be extremely useful as references**.

As always, you'll turn your lab in [here](https://github.com/UWTMGIS/TGIS501_w21_Students/tree/master/lab2). File format remains `[lastname]_lab2.ipynb`

### Ready, then let's goooooo.

## Problem 5: Is GIS really *the best*?

### Part 1
In the files repository, you'll find a file called `GIS_is_the_best.txt`. You're going to open that file and count the **total** number of words in it.

There are a few approaches to this. You can load the file directly from the web (which we'll cover in a later lab, but if you feel like diving in, check out this [stack overflow discussion](https://stackoverflow.com/questions/1393324/in-python-given-a-url-to-a-text-file-what-is-the-simplest-way-to-read-the-cont) or [this module](http://docs.python-requests.org/en/master/). I find the latter easier to work with, but your mileage may vary). **Alternatively**, simply move the file to a local directory and open it like so:

```python
with open('file location', 'r') as file:
    #do some stuff
```

Then, you need to count the total number of words used.
A few hints for this:
1. Here's a nice [tutorial](https://www.pythonforbeginners.com/files/reading-and-writing-files-in-python) on dealing with text files.
2. Remember list comprehension (**check your readings!**): how are you going to count words?
3. the .upper(), .lower(), and .split() methods all might be useful.

#### In summary, your next cell should load up the text file and print out the total number of words in it (it should be 28,177).

In [12]:
with open('GIS_is_the_best.txt', 'r') as file:
    x = 0
    for line in file:
        words = line.split()
        x = x + len(words)
    print(x)


28177


### Part 2

Now it's time to find out what the most common word in that file is. This is a fairly fundamental task of parsing data, so it's good practice (also, I'll ask you to do it again in a slightly more complicated way below). 

Open the file again and, *this time*, return the two most common words **and** how many times they have been used.

There's one additional thing to note: capitalization doesn't matter. So, GIS, gis, and Gis should all be counted the same. Similarly, Geospatial and geospatial would be the same. Here, the .upper() or .lower() method can help you.

#### In summary, your next cell should output the two most common words in the text file and how many times it appears.

In [9]:
with open('GIS_is_the_best.txt', 'r') as file:
    freq = {}
    for line in file:
        words = line.lower().split()
        for word in words:
            if word in freq:
                freq[word] += 1
            else:
                freq[word] = 1
reverse = sorted(freq.items(), key = lambda x: x[1], reverse = True)
top_2 = reverse[:2]
print(top_2)

[('geographic', 7545), ('information', 7545)]


## Problem 6: A 'love' of Lovecraft

Ok, first, let's get it out of the way: H.P. Lovecraft was a disgusting racist. We're using this text because it's in the public domain and uses a lot of unique words, but let's not take our eyes off the prize; this is not an endorsement of the author.

With that out of the way, you'll find a .txt copy of *The Shunned House* in the same files repository where you found this lab. Just like before, you can pull the files down locally or (try to) access them through the web (which will be covered in detail in a later lab).

### Part 1

Here, I want you to count the **number of unique words** in the text. 
A few directions:
1. Case **does not** matter - so 'Whisker' should be the same as 'whisker.'
2. Make sure you strip out punctuation - otherwise 'whisker?' and 'whisker!' will come back as different words.
3. Plurals are different words in this exercise - 'whiskers' and 'whisker' count separately.

A few hints:
1. Check out the [collections](https://docs.python.org/2/library/collections.html#collections) module.
2. One way to do this would be to create a dictionary of each word and then check the dictionaries length.
3. You should get around 3,000 words depending on what assumptions you bake into this. There's no austere 'right' answer as the directions are a bit opaque intentionally (design decisions matter!).
4. If you want to get *real* fancy, students in the past have made use of NLTK to do this. If that seems overwhelming, *don't worry!* Remember, different people will have different familiarities with python. Right now, **you have all the tools you need to do this**, there's just always 'more' possible!

#### In summary, your next cell should output how many unique words Lovecraft uses in the copy of the Shunned House I have provided. 

In [10]:
with open('shunned_house.txt', 'r') as file:
    import re
    freq = {}
    for line in file:
        line =re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", line)
        words = line.lower().split()
        for word in words:
            if word in freq:
                freq[word] += 1
            else:
                freq[word] = 1
print(len(freq))

3063


### Part 2

Excluding prepositions ('from,' 'the,' 'an', 'with', 'a', etc.) what are the five most frequently used words in *The Shunned House* and how many times does each appear?

#### In summary, your next cell should output the five most common non-prepositions in the text provided.

In [6]:
with open('shunned_house.txt', 'r') as file:
    import re
    freq = {}
    for line in file:
        line =re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", line)
        lower = line.lower()
        words = lower.split()
        for word in words:
            if word in freq:
                freq[word] += 1
            else:
                freq[word] = 1
keys = ['a', 'the', 'and','an', 'with','to','of', 'in']
freq = {key: freq[key] for key in freq if key not in keys}
reverse = sorted(freq.items(), key = lambda x: x[1], reverse = True)
top_5 = reverse[:5]
print(top_5)

[('i', 175), ('was', 153), ('that', 134), ('had', 132), ('it', 122)]


### Part 3

Ok, last Lovecraft; I promise.

Last time, we counted the words he loved to use and how he used them. But, Lovecraft is also known for his lugubrious sentences, the long-winding means by which he tells us his terrible tales. I’m interested in how many characters his average sentence is. I’d also like a printed copy of the longest sentence and the shortest sentence in The Shunned House.

Think about how you opened the file, read it, and parsed it into a list last time… *What means do you have to count the character length of objects? What format do those objects have to be?* Etc.

This question will help you think about ways to parse through and iterate over different types of data sets (in this case a text file).

Hint: In the past, some students have used [nltk](https://pythonspot.com/category/nltk/) to solve this problem; **it is not necessary**.

In [7]:
with open('shunned_house.txt', 'r') as file:
    '''split file by . '''
    sec = file.read().split('.')
    '''max'''
    ma = max(sec, key=len)
    '''min'''
    mi = min(sec, key=len)
    '''how many sectence in list'''
    y = len(sec)
    ''' total length '''
    x = 0
    for char in sec:
        words = char.split()
        x = x + len(words)
    '''average length'''
    aver = x / y
    
    print(aver)
    print(ma)
    print(mi)

31.218023255813954
 A weak, 
filtered glow from the rain-harassed street-lamps outside, and a feeble 
phosphorescence from the detestable fungi within, showed the dripping 
stone of the walls, from which all traces of whitewash had vanished; 
the dank, fetid and mildew-tainted hard earth floor with its obscene 
fungi; the rotting remains of what had been stools, chairs, and tables, 
and other more shapeless furniture; the heavy planks and massive beams 
of the ground floor overhead; the decrepit plank door leading to bins 
and chambers beneath other parts of the house; the crumbling stone 
staircase with ruined wooden hand-rail; and the crude and cavernous 
fireplace of blackened brick where rusted iron fragments revealed the 
past presence of hooks, andirons, spit, crane, and a door to the Dutch 
oven--these things, and our austere cot and camp chairs, and the heavy 
and intricate destructive machinery we had brought
 m


### Bonus Problem 2: Vowel Squares (+2 pts possible)
This is a bit of a 'classic' problem. I'm going to give you a two-dimensional matrix of letters. You need to analyze said matrix to see if within it there exists a 2 by 2 grid of all vowels (hint: there is). If so, you print out the members of that grid; if not, you print out "No match found."

In other words, I'm going to give you something like this:

a b c d e f

g h i j k l

**a a** k e v o

**i o** e r p z

And, you'd need to return a a i o.

**THIS IS HARD**. Work together! Don't look up a solution, do think about how you might parse through this. There are many solutions, one I might consider is creating a new matrix that simple records if something is a vowel or not (i.e. a 1 value for a vowel, a 0 for not, etc.). *Partial credit is possible*.

In [3]:
letters = [['a', 'j', 'k', 'e', 'i'], ['b', 'o', 'e', 'n', 'a'], ['u', 'i', 'a', 'z', 'i'] ]
import re



---

# No-credit, fun times
## Want to build a chatbot?

This isn't really related to anything we're doing in class per se, but it's a fun example of how powerful libraries can be in python.

In this case, I'm going show you how to use the library [markovify](https://github.com/jsvine/markovify) to generate 10 short sentences from any lengthy piece of text you want to (I'm always partial to Capital Vol 1, but you can use the Lovecraft selection you already have, or anything you want - it just has to be pretty long). I'll be using the Lovecraft here, just because it's on hand and you'll be able to follow note for note.

Markovify has directions at the link above which are pretty straightforward. But, I'll also give you example code below. Poke at it, change some things up! Did you know you can combine texts (and even weight them differently) to produce different 'voices.' This is an incredibly simple model, but it's a lot of fun. If you'd like to see where the state of the art is right now, check out [GPT-3](https://github.com/openai/gpt-3) or the API for it [here](https://openai.com/blog/openai-api/)).

Once you get the hang of it (changing texts, sentence length, etc.), go look at the documentation linked above. You'll find lots of ways to tweak the output - from changing the depth of the search (roughly how many words must be strung together in the model) to weighting different texts differently.

In [1]:
import markovify


In [3]:
#note I'm assuming that the text file is in the same directory as the notebook; this need not be the case.
with open('shunned_house.txt') as f:
    text = f.read()
    
model = markovify.Text(text)

for i in range(10):
    print(model.make_short_sentence(240))
    print()

The space south of the house was repulsively damp even in dry weather, and in time I yielded to my eyes.

It was of this sinister vegetation, but at the window.

In 1780, as a symbol of all dreamable hideousness which the family data my uncle in his sleep attracted my notice.

There are horrors beyond horrors, and this was one of those vital processes, worn as they were by eighty- one years of deposit had shrouded and festooned into monstrous and hellish shapes.

The Roulets, it seemed, had come in 1686, after the revocation of the wild ancient tales of the fungus-ridden earth steamed up a precipitous lawn from the placid sidewalk outside.

He died the next morning at the cellar directly upon Benefit Street, preferring to have a more immediate access to the fungous floor where a pool of greenish grease was spreading, it seemed to hear of the house proper.

As I turned my electric flashlight on him and his wife Rhoby Dexter, with their children, Elkanah, born in 1755, Abigail, born in 1

---

The next step, for example, might be to set up a listener to a twitter account that monitors for either someone mentioning the author or someone tweeting at the account or... whatever. Then your little bot could respond. We'll get to how you might do that later in the course, for now, just have some fun.