# Homework 1: Causality and Expressions

Please complete this notebook by filling in the cells provided. When you’re done:

1. Select `Run All` from the `Cell` menu to ensure that you have executed all cells.
2. Rename your file to `LastnameFirstname_hw01` using the `File>Rename` from the `File` menu

This assignment is due Thursday, January 24 at 11:59PM.  Directly sharing answers is not okay, but discussing problems with course staff or with other students is encouraged.

Reading:
- Textbook chapters [1](http://www.inferentialthinking.com/chapters/01/what-is-data-science.html), [2](http://www.inferentialthinking.com/chapters/02/causality-and-experiments.html), and [3](http://www.inferentialthinking.com/chapters/03/programming-in-python.html)

##### Tests
Automated tests are provided for many questions.  If a question has automated tests, there will be a runnable code cell after the question's answer cell.  Run your answer cell, then run the test cell to check your answer.  **Passing the automatic tests does not guarantee full credit on any question.** The tests are provided to help catch some common errors, but it is *your* responsibility to answer the questions correctly.

Run the cell below to prepare the notebook and automated tests.

In [1]:
# Run this cell to set up the notebook, but please don't change it.
from client.api.assignment import load_assignment
tests = load_assignment('hw01.ok')



Assignment: Homework 1: Causality and Expressions
OK, version v1.13.11



## 1. Can We Say That?


The Urban Institute in Washington D.C. reports that most adult welfare recipients are single mothers in their 20’s and 30’s with one or two children. Some take advantage of job training programs to improve their skills, and many of those are able to increase their job earnings and leave the welfare system.

**Question 1.1:** Can we say from this study that participation in the job training programs causes increased job earnings and independence from welfare?  Explain why or why not?

<font color='green'> *I do*
<font color='red'> *not* </font>
<font color='green'> *think that the participation in job training programs is a direct cause to increased job earnings and independance. This is because not only was there not a more specific target group with a great sample, but it was more generalized, which in turn leaves room for a greater chance of lurking variables.* </font>

**Question 1.2:** After seeing the data, the Urban Institute randomly chooses two groups of mothers when they apply for welfare. One group is required to participate in a job training program, but this program is not offered to the other. At the end of the study, the mothers who participated in the job training program showed a higher increase in earnings than and larger percentage of individuals that left welfare than those that did not receive job training.

Can we say from this study that participation in the job training program causes increased job earnings and independence from welfare?  Explain why or why not?

<font color='green'> *I think that although this did establish association between the job training program and increased job earnings, it still does not account for lurking variables so it cannot prove that it is a direct cause of the job training program.* </font>

## 2. Breaking Down Expressions


The most important idea in Computer Science is that complicated, useful things can be built by putting together simple parts according to simple rules.  Python code is an important example of this principle.  Once you understand the basic rules, you can code with confidence.  These exercises are designed to give you some practice with those rules.

First, a brief review of subexpressions.

You can take any Python expression that has a value and combine it with other expressions.  For example, you can combine two number-valued expressions by putting a `+` between them to add their values together.  This forms a new, larger expression called a *compound expression*.  The expressions that were combined together are called *subexpressions*.

You can tell if something is a subexpression by checking whether it would make sense to write it in a line by itself.  For example, in the expression `2 * 3`, `2` is a subexpression, but `2 *` isn't, because `2 *` isn't a valid expression.  (Try executing it!)

**Question 2.1** List all the subexpressions of the following expression:

    2 + 3
    
Put each subexpression on its own line in the next cell.

*Hint:* There are two of them.

In [19]:
2
3

3

**Question 2.2:** Consider the following expression:

    (1 + 2) * ((3 / 4) ** 5)

Here is a list of almost all the subexpressions of that expression.  One is missing.

1. `1`
2. `2`
3. `3`
4. `4`
5. `5`
6. `(1 + 2)`
7. `((3 / 4) ** 5)`

In the next cell, write the missing expression.

In [20]:
(3 / 4)


0.75

**Question 2.3:** List all the subexpressions of the following expression:

    (((2**3) / 4) / 5) - 6

Put each subexpression on its own line in the next cell.

In [21]:
2
3
4
5
6
2**3
((2**3) / 4)
((2**3) / 4) / 5

0.4

## 3. Errors in Naming


**Question 3.1:** When you run the following cell, Python will produce an slightly-cryptic error message.  Explain in the text cell below, in your own words, what's wrong with the code.  (Remember, double-click the cell to edit it, and then click the Run button when you're done.)

In [22]:
4 = 2 + 2

SyntaxError: can't assign to literal (<ipython-input-22-4c8b769209ad>, line 1)

<font color='green'> *The variable on the left of the = sign cannot be a 'literal' which in this case means it cannot be a number.* </font>

**Question 3.2** When you run the following cell, Python will produce an slightly-cryptic error message.  **Fix the error,** and then **explain below** in your own words what was wrong with the code.

In [17]:
two = 2
four = two + two

In [18]:
_ = tests.grade('q3_2')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<font color='green'> *Whenever wanting to add, multiply, subtract etc. these functions are stored under math, and they are executed through their symbols such as + or -. Simply saying plus does not add the two variables, to the program plus is not assigned to any function or command, and that's why the syntax error pops up.* </font>

## 4. Job Opportunity Education in Rural India


A [study](http://www.nber.org/papers/w16021.pdf) at UCLA investigated factors that might result in greater attention to the health and education of girls in rural India. One such factor is information about job opportunities for women. The idea is that if people know that educated women can get good jobs, they might take more care of the health and education of girls in their families, as an investment in the girls’ future potential as earners.

The study focused on 160 villages outside the capital of India, all with little access to information about call centers and similar organizations that offer job opportunities to women. In 80 of the villages chosen at random, recruiters visited the village, described the opportunities, recruited women who had some English language proficiency and experience with computers, and provided ongoing support free of charge for three years. In the other 80 villages, no recruiters visited and no other intervention was made.

At the end of the study period, the researchers recorded data about the school attendance and health of the children in the villages.

**Question 4.1:** Did this analysis have a treatment group and a control group? If so, describe the two groups.

<font color='green'> *Yes, the analysis had a treatment and a control group. The control group was the one that had no recruiters or any intervention through the observational period, while the treatment group was given support for three years.*  </font>

**Question 4.2:** Was this an observational study or a randomized comparative experiment?

<font color='green'> *This was a randomized comparative experiment since it not only had the group that was being interfered with by the given variable and the other is a control group to compare changes, if any.* </font>

**Question 4.3:** The study reported, “Girls aged 5-15 in villages that received the recruiting services were 3 to 5 percentage points more likely to be in school and experienced an increase in Body Mass Index, reflecting greater nutrition and/or medical care. However, there was no net gain in height. For boys, there was no change in any of these measures.” Why do you think the author points out the lack of change in the boys?

<font color='green'> *I think the author mentioned the lack of change in the boys since the study was meant to see the change in women after the recruiting services, so it provides that extra information that it helps women specifically.* </font>

## 5. Differences between Universities


**Question 5.1:** Suppose you're choosing a university to attend, and you'd like to *quantify* how *dissimilar* any two universities are.  You rate each university you're considering on 3 traits, using a 0 to 10 scale for each trait:

1. Cost to attend (0 for the cheapest)
2. Graduation rate
3. How cool its mascot is

You decide that the dissimilarity between two universities is:

* the maximum of
* the absolute values of
* the 3 differences in their trait values.

Using this method, compute the dissimilarity between Stanford (whose traits are 8, 9, and 0, respectively) and Berkeley (whose traits are 7, 9, and 10, respectively).  Call your answer `dissimilarity`.  Use a single line of code to compute the answer.  Use Python to do all the steps, including arithmetic (like subtracting 8 from 7).

In [33]:
dissimilarity = max(abs(8-7),abs(9-9),abs(0-10))
dissimilarity

10

In [28]:
_ = tests.grade('q5_1')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question 5.2:** Identify all the subexpressions in your answer to the previous question.  Write each on its own line.

*Hint:* If your answer to the previous question was as straightforward as possible, there should be 12 subexpressions, not including the whole expression itself.

In [34]:
8
7
9
0
10
(8-7)
(9-9)
(0-10)
abs(8-7)
abs(9-9)
abs(0-10)
max(1,0,10)

10

## 6. More Children Living at Home?


A USA Today [article](http://usatoday30.usatoday.com/news/nation/2006-03-16-failure_x.htm) from 2006 includes this sentence: “Since 1970, the percentage of people ages 18 to 34 [in the United States] who live at home with their family increased 48%, from 12.5 million to 18.6 million, the Census Bureau says.”

**Question 6.1:** The word “percentage” isn’t used correctly in the context of the rest of the sentence. What word should replace it?

<font color='green'> *Percentage should be replaced with the word 'number'.* </font>

**Question 6.2:** Can you give a simple explanation for these data? Feel free to include other sources of data to support your explanation, but please keep your answer to 1-3 sentences.

<font color='green'> *The data shows that starting 1970, the amount of people ages 18-34 that are living at their families home has been steadily increasing through the years up by 48%..* </font>