# Assignment 1 – variables, types, booleans, conditionals

**Questions? Drop em in the Slack under #questions**

## Introduction: Operationalizing Linguistic Research

Often in the humanities we deal with subjective concepts, things which are not directly measurable. Linguistics, though, sits somewhere on the border of subjective and objective. In quantitative linguistics, we try to use objective data to clarify issues once thought to be only subjective. In doing so, we have to learn how to "operationalize" our research questions. 

To operationalize means to condense a complex question into a simpler one that can be addressed with empirical data (Stefanowitsch 2010). It's possible that we lose some nuance in the process. But we also enable progress to be made within a specific niche. An accumulation of this progress eventually allows us to re-evaluate our theoretical starting point. 

Consider, for instance, the following two research questions:

> (1) In world languages, does the verb serve as the syntactic head of the sentence?

> (2) In world languages, how predictive is a verb for other arguments in a sentence?

The first question is a largely subjective one, which linguists continue to disagree on (e.g. Croft, *Radical Construction Grammar*, 2001). There is no way to test that hypothesis with only empirical data. 

The second question, on the other hand, can be answered with some Python, linguistic annotations, and a basic knowledge of statistics. And even though the answer to that question is simpler, it still has broader implications for question 1.

<img src="../images/research_operationalization.png" height="500px" width="500px"><br>
*From question, to Python, to data, to theories*
    
**Further reading**<br>
[Anatol Stefanowitsch. 2010. "Empirical Cognitive Semantics"](https://pdfs.semanticscholar.org/5237/c136ec1f1c09c42b7fb9ffc7b36a5f217489.pdf#page=364)

<hr>

## Exercise Brief: From Strings to Integers

In the following exercises, we will practise miniature operationalizations, wherein we convert a small linguistic question into Python code that produces counts from which we can form theories. In Pythonic terms, we move from strings to integers.

Besides strings and integers, the following exercises will also test your knowledge of booleans and conditionals.

## Warm-up

Write code that prints the type of object each of these items represents.

In [None]:
'meaning of life'
42
True

Combine these two strings using variables.

In [None]:
'μῆνιν ἄειδε θεὰ Πηληϊάδεω ' 
'Ἀχιλῆος οὐλομένην'

Write code that shows the number of times the letter "e" appears in the text below.

In [None]:
virgil = 'Arma virumque canō, Trōiae quī prīmus ab ōrīs'

Observe the code and [text](https://www.sacred-texts.com/hin/rvsan/rv01001.htm) below. What does the `\n` do (look closely for it)?

In [None]:
print('अग्निमीळे पुरोहितं यज्ञस्य देवं रत्वीजम \nहोतारं रत्नधातमम')

Change the code below so that it prints the statement after the colon (:) on a separate line. 

In [None]:
print('Fair is foul, and foul is fair: Hover through the fog and filthy air.')

Complete the code below to make it run without error.

In [None]:
if
    print 'scrumdiddlyumptious

Does the following line of code evaluate to `True` or `False`?

In [None]:
not not True

Below is the [longest word in the English language](https://en.wikipedia.org/wiki/Pneumonoultramicroscopicsilicovolcanoconiosis) (contained in a dictionary). Write code that shows how many characters this word contains.

In [None]:
'Pneumonoultramicroscopicsilicovolcanoconiosis'

## Exercise 1. Basic Statistics from Strings

Below you see an excerpt of the first lines of Beowulf. Examine the various features of the text. For instance, we can see punctuation, and newlines. We can also see Old English characters.

### Part A

Write code that calculates the following statistics from the text:

1. Store the number of sentences under a variable and use f-string to display the answer within a statement (e.g. "The number of sentences is blank"). *hint: think about punctuation*
2. Store the number of lines and use f-string to print the answer within a statement.
3. Use f-string to print within a statement how many more lines there are than sentences.
4. What is the [ratio](https://www.mathsisfun.com/numbers/ratio.html) of lines to sentences? Print this in a statement as a decimal.

NB: the Beowulf text would be easier to work with as a variable!

[Beowulf source](https://www.poetryfoundation.org/poems/43521/beowulf-old-english-version)

In [None]:
'''\
Hwæt. We Gardena in geardagum, 
þeodcyninga, þrym gefrunon, 
hu ða æþelingas ellen fremedon. 
Oft Scyld Scefing sceaþena þreatum, 
monegum mægþum, meodosetla ofteah, 
egsode eorlas. Syððan ærest wearð 
feasceaft funden, he þæs frofre gebad, 
weox under wolcnum, weorðmyndum þah, 
oðþæt him æghwylc þara ymbsittendra 
ofer hronrade hyran scolde, 
gomban gyldan. þæt wæs god cyning. 
ðæm eafera wæs æfter cenned, 
geong in geardum, þone god sende 
folce to frofre; fyrenðearfe ongeat 
þe hie ær drugon aldorlease 
lange hwile. Him þæs liffrea, 
wuldres wealdend, woroldare forgeaf; 
Beowulf wæs breme blæd wide sprang, 
Scyldes eafera Scedelandum in. 
Swa sceal geong guma gode gewyrcean, 
fromum feohgiftum on fæder bearme, 
þæt hine on ylde eft gewunigen 
wilgesiþas, þonne wig cume, 
leode gelæsten; lofdædum sceal 
in mægþa gehwære man geþeon. 
Him ða Scyld gewat to gescæphwile 
felahror feran on frean wære. 
Hi hyne þa ætbæron to brimes faroðe, 
swæse gesiþas, swa he selfa bæd, 
þenden wordum weold wine Scyldinga; 
leof landfruma lange ahte. 
þær æt hyðe stod hringedstefna, 
isig ond utfus, æþelinges fær. 
Aledon þa leofne þeoden, 
beaga bryttan, on bearm scipes, 
mærne be mæste. þær wæs madma fela 
of feorwegum, frætwa, gelæded; 
ne hyrde ic cymlicor ceol gegyrwan 
hildewæpnum ond heaðowædum, 
billum ond byrnum; him on bearme læg 
madma mænigo, þa him mid scoldon 
on flodes æht feor gewitan. 
Nalæs hi hine læssan lacum teodan, 
þeodgestreonum, þon þa dydon 
þe hine æt frumsceafte forð onsendon 
ænne ofer yðe umborwesende. 
þa gyt hie him asetton segen geldenne 
heah ofer heafod, leton holm beran, 
geafon on garsecg; him wæs geomor sefa, 
murnende mod. Men ne cunnon 
secgan to soðe, selerædende, 
hæleð under heofenum, hwa þæm hlæste onfeng.\
'''

### Part B

We are also interested in the kinds of words that occur in this text. Write an analysis program that does the following:

1. Counts any given word in the text (use spaces to distinguish from other runs of characters).
2. If the word count is > 3, print the word & the count together (using "+", hint: and something else)
3. Otherwise, print "Try again!"
4. Use your program to find at least one word with a count > 3.
5. [Write any answers you find in a markdown cell](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/jupyter-python/code-markdown-cells-in-jupyter-notebook/)

In [None]:
word = ' lacum '

# fill in the rest below

## Midsommer Text Style

Recall some of the facts we've learned about strings ([table source](https://github.com/cltl/python-for-text-analysis/blob/master/Assignments/ASSIGNMENT-1.ipynb)):

| Topic | Explanation |
|-----------|--------|
| `quotes` |	A string is delimited by single quotes ('...') or double quotes ("...") |
| `special characters` |	Certain special characters can be used, such as "\n" (for newline) and "\t" (for a tab) 	|  	 
| `printing special characters` |	To print the special characters, they must be preceded by a backslash (\\)	 |
| `continue on next line` |	A backslash (\\) at the end of a line is used to continue a string on the next line	  	 |
| `multi-line strings` |	A multi-line print statement should be enclosed by three double or three single quotes ("""...""" of '''...''')	 | 
<br>

Now have a look at the excerpt below of Shakespeare's Midsommer Night's Dream from [a 1619 printing](https://archive.org/details/midsommernightsd1619shak/page/n43/mode/2up).

<img src="../images/midsommer.png" height="500px" width="500px">

Using a double or single quote string, enter the text found above and reproduce the visual layout of the printing. Print the result.

Now reproduce the first half of the printing using a triple quote string and print it.

### Part 2

Have a look at the code below and think about what it does.

In [None]:
tab = 2
line = '-' * tab
print(line, 'a...line')

Take the text you copied above for Midsommer Night's dream and write code so that you can arbitrarily change the indentations in the text. Use `*`.

## String methods

We've already used at least one method of strings to count words and punctuation in texts. You find info about string methods in the [python documentation](https://docs.python.org/3/library/string.html). 

But the fastest way to see what you can do with something like a string is with `dir`. Try running the code below. Ignore the values with double underscores.

In [None]:
dir('ترادف')

Want to see what one of the items does? You can also use `help` to instantly access the documentation for that method.

In [None]:
help('ترادف'.strip)

You can also access string methods by writing the reserved Python variable `str`:

In [None]:
dir(str)

## Text Annotation with `input()`

"transitive" refers to verbs which occur with a direct object whereas "intransitive" refers to those verbs which do not require a direct object. It is hard to automate this kind of semantic interpretation. But we can use Python to make it easier to annotate such data. 

For the following sentences, write a program that:

1. stores each sentence under a variable
2. for each sentence, use the sentence as a prompt to the user for tagging with "T" or "I" (transitive/intransitive). Use `input` to do this.
3. check the resulting input to make sure only "T" or "I" is accepted (e.g. "tran" is not accepted); tell the user that their input is not valid
4. keep a count of how many sentences are tagged "T" or "I" and print the results at the end of the session
5. if a "to be" verb is in the sentence, add ".copula" to the user's tag

```
Sally kicked the ball.
The kitty laid in the sun.
My computer shut down.
He threw the keyboard.
It is a small world afterall!
The fish slept all night.
```

<hr>

## Fun detour from text analysis

## Alarm clock

*The following exercise is borrowed from the [Python for Text Analysis](https://github.com/cltl/python-for-text-analysis/blob/master/Assignments/ASSIGNMENT-1.ipynb) course*

Write code to set your alarm clock! Given the day of the week and information about whether you are currently on vacation or not, your code should print the time you want to be woken up following these constraints: 

Weekdays, the alarm should be "7:00" and on the weekend it should be "10:00". Unless we are on vacation -- then on weekdays it should be "10:00" and weekends it should be "off".


Encode the weeks days as ints in the following way: 0=Sun, 1=Mon, 2=Tue, ...6=Sat. Encode the vacation infromation as boolean. Your code should assign the correct time to a variable as a string (following this format: "7:00") and print it.

Note: Encoding the days as an integer helps you with defining conditions. You can check whether the week day is in a certain interval (instead of writing code for every single day). 

In [None]:
# your code here

### Debugging

*The following debugging exercise is borrowed from the [Python for Text Analysis](https://github.com/cltl/python-for-text-analysis/blob/master/Assignments/ASSIGNMENT-1.ipynb) course*

Debugging skills come in really handy, especially when you will be working with bigger programs. Therefore, it is good to practice this skill early on!

The following cell does not run - it will throw an error instead. Can you figure out what is wrong? There is more than just one problem with the code. Solve the problems one by one and rerun the code after each step to check what is going on.

**Attention: There is one bug in the code which is NOT going to give you an error message. If you solve all the other bugs, the code runs, but it is not doing what it is supposed to do.**

In [None]:
n_apples = input('How many apples would you like? ')
n_oranges = input('How many oranges would you like? ')
price_apple = 0.20
price_orange = 0.30
limit = 3

print(f'I would like to by {n_apples} apples and {n_oranges} oranges.')
price = n_apples * price_apple + n_oranges * price_orange
print(f'The total price is {price} euros.')

if price > limit: 
    print(f'This is too expensive! I only want to spend {limit} euros.')
    n_apples_new = input('Choose fewer apples. ')
    price_new = n_apples_new * (price_apple + n_oranges) * price_orange
    print(f'The new price is {price_new} euros.')
    if price_new > limit:
        print('Still too expensive! I'm going home.')
    else:
        print('Ok, could you wrap them for me?')
else:
    print('Ok, could you wrap them for me?')

<hr>

## BYOT

We will now begin to apply what you've learned to explore and describe the text you've brought. Probably you already know something about this text. But we will use Python to discover quantitative and qualitative facts that you probably don't know. These abilities set you on a path to already begin to imagine some research questions you might begin to ask.  

Below we'll load the text you've selected for the first time. Be sure you've placed a [valid](../BYOT/README.md) text in the `BYOT` folder or the code will complain!

Have a look at the code below. We haven't yet learned these kinds of statements. You call have a look at [get_byot.py](get_byot.py) to see what's happening. Don't worry if you can't understand it yet! We haven't gotten there!

In [None]:
from get_byot import your_text

### The Stages of Text Exploration

Anytime we sit down with a new dataset or text, we should start off by exploring the dataset a little bit at a time. Some texts can be enormous. And we have to take care that we don't accidentally print something so big that the [kernel](https://en.wikipedia.org/wiki/Kernel_(operating_system)) crashes.

When exploring a text for the first time, start off with very basic information.

Note that in the code above, we've imported a variable called `your_text`. 

Use a Python function to find out what kind of object `your_text` is.

How many characters (how long) is `your_text`?

Use string slices to peek at the first 1000 characters in the text. 

Now peek further (if necessary) to find where the boundary between the metadata and body of your text is.

Some texts will also have metadata at the end. Does yours? Use negative indices to peek at the last several hundred characters at the end of the string. **But be careful, if you accidentally print the whole text, you might overload your browser**. Store the result in a variable first. Print the length of the variable before your print the index.

Now that you know the indices of the front matter and back matter, isolate the body of the text and store it in a variable you can refer back to.

Using the information gleaned in this assignment, discern the following information from the body of your text. Store the results in variables.

* Roughly how many words (determined graphically) does your text contain? (NB: this will be a very rough estimate)
* If your text contains punctuation, how many punctuated units (e.g. sentence, verse) does it contain? You can copy / paste a character if you need to.
* On average, how many words are there per punctuated unit? (this will be rough)
* Take the average you just calculated and create two variants. In one, convert the average to an integer. In the other, call `round()` on the number. Can you determine the difference between these two approaches? Use dummy numbers if you need to in order to answer this.
* Create a 1000 character slice of your body text. Use `split` and a `for` loop to print each individual word in the text. Look back on the end of `Chapter 03 - Strings.ipynb` to remind yourself how to do this.