# Dealing with Data Spring 2020 – Class 2

---

# Variables


Last class we calculated a meal tip. Let's do that again, this time, with a 100 USD bill, sales tax  of 8.875%, and a pre-tax tip of 20%:

But what if, instead of 100 USD, the cost of the meal is $75? 

Notice we need to make updates multiple places. Fortunately, we can use a `variable` to assign values. 

Let's use variables for the cost of the food, the tax, and the tip:

---

# ⭕ **QUESTIONS?**

---

# Variable names

In Python, a variable can be named almost anything (kind of – more on that later). 

Variables are declared by stating the variable name and assigning to it using the "=" operator. At any time, you can reassign a value to a variable.

Some rules for naming variables:

* they can be one word only
* they can contain both letters and numbers, but **cannot** start with a number 
* the underscore character `_` can appear in a name. _It is often used in variables names with multiple words, such as `my_name`_
* they can also start with an underscore character `_`, but we generally avoid doing this unless we are writing library code for others to use

---

# Updating Variable Values

Let's revisit our meal cost calculator:

Now let's update the cost of the food:

And let's see the cost of the meal:

Note that the value of the `cost` variable remained the same. 

Unlike using formulas with cells in Excel, it was not automatically updated when we updated the `food` variable.

To update the `cost` variable, we need to execute again the calculation:

Notice that the value of `food` persisted from the prior cell. 

---

# ⭕ **QUESTIONS?**

---

# Example: Using and Updating Variables

Upon taking this class, imagine you are recruited in a new hot startup. They offer you a starting salary of 150K USD, a promised 25\% bonus, and an equity package currently worth 400K USD, vesting over a period of 4 years. 

You want to examine the true value of this package, so you write the following program:

Now, we'll write the same program but will do the calculation in three steps. Notice that we have to update the `yearly_value` each stage.

As expected, the outcome is the same as before.

Let's see one more variation of the same program, but now we will introduce the `+=` operator. The command `x += y` means  _"add the value of variable `y` to the variable `x`"_.

---

# Exercise 1

Imagine you not only want to know the value with the current company valuation (\$400K equity) but also want to see what would happen if your equity is worth nothing at the end. 

Furthermore, what happens if the company does really well, and grows by a factor of 10x. 

Write the code for the two scenarios below:

# Solution 

---

# Selecting Good Variable Names

Notice the importance of selecting good variable names, to make the program readable. 

The program below is identical to the one above, but with less explanatory names for the variables. Instead, we have  comments to document the code.

Now, let's take the comments away. The program will run in exactly the same manner; comments are purely for humans to read, not for the computer. Still, notice how cryptic the program becomes.

And here is the same program again, but with really bad selection of variable names. (You may find it artificial, but this is an example of variable names in a real student project submitted a few years ago.)

Remember, the most frequent other person that will read your code is your future self. Treat your future self nicely!

# Improper Variable Names, Reserved Keywords, and Dangerous Choices for Variable Names

If you give a variable an illegal name, you will get a syntax error:

# Reserved keywords

Python3 has 33 "reserved keywords", which are listed below and are also [available in the Python manual](https://docs.python.org/3/reference/lexical_analysis.html#keywords). It a very common mistake for beginners to use some of the reserved keywords below as their variable names, so please try to be vigilant about such mistakes. The editor will help you, as reserved keywords are often colored differently.

Notice that the code below will not run because "`raise`" is a reserved keyword

# Key points to remember

* We use variables to store values.

* We can use variables to make our calculations readable and easier to update.

* Variable values persist between cells. Variable values keep the value from the last _executed_ cell that updated them. (Remember that you can execute cells in any order.)

* Using good variable names makes our programs easier to read and understand.

* Variable names contain letters, numbers, and underscores; variable names cannot start with numbers.

* There are a few _reserved keywords_ that cannot be used. It is also a good idea  not to use common function names as variables (eg., `print`, `min`, `max`, etc.; they are all function and are colored green in the editor)





---

# ⭕ **QUESTIONS?**

---

# Exercise 2

Repeat the exercise from earlier on, about the daily return of a stock, but use variables instead of the raw numbers. 

You can initialize the variables to use \$550 for the closing price of the stock on Monday, and then use \$560 as the closing price on Tuesday. 

Then, try to change these numbers and see the results. (Hint: notice what would happen if you use the name `return` for the variable.) 

# Solution

---

# Exercise 3



Write a Python program to convert centimeters to feet and inches. Remember that one foot is 30.48 centimeters, and one inch is 2.54 centimeters.

# Solution

---

# Exercise 4

Imagine we want to compute the "[wind chill index](https://en.wikipedia.org/wiki/Wind_chill)" as described by Wikipedia

$T_\mathrm{wc}=35.74+0.6215  *  T_\mathrm{a}-35.75 * v^{0.16}+0.4275 * T_\mathrm{a} * v^{0.16}$

where $T_\mathrm{wc}$  is the wind chill index, based on the Fahrenheit scale; $T_\mathrm{a}$ is the air temperature in degrees Fahrenheit, and $v$ is the wind speed in miles per hour. (Note: Windchill temperature is defined only for temperatures at or below 50 °F and wind speeds above 3.0 miles per hour.)

Right now, it is 32 degrees Farenheit outside and the wind is blowing at 5mph. 

# Solution

---

# Exercise 5

Imagine you have a loan with a given principal amount $p$, to be repaid over a period of $n$ years. At the end of each year, you  get charged an interest of $r$ for the total amount that you owe. Each year you pay an installment equal to $1/n$-th of the initial principal, plus interest on the *remaining* principal. 

Assume that the principal is \\$10K, theinterest rate of 3%, and repayment is over 10 years.

Write code that computes how much you owe at the end of year 1. Then, write code for calculating your payment at the end of year 2.

# Solution

---

# ⭕ **QUESTIONS?**

---

# Casing, Indexing and Slicing: Changing and accessing  parts of the string

**Note: The concepts of indexing and slicing are general. We will encounter them in many other contexts, beyond strings.**

# Capitalization

Now let's see how we can change the capitalization of the text. This will also be our first, informal, introduction to a new type of function. 

* `str.upper()`: returns an uppercase version of a string
* `str.lower()`: returns a lowercase version of a string

In [None]:
news = """According to a Wall Street Journal investigation, the site today offers a steady stream of clothing from 
dozens of Bangladeshi factories that most leading retailers have said are too dangerous to allow into their supply 
chains."""

# Small Deviation

The `.lower()` is a function that applies only to string variables. 

Notice the different way that we apply the function, compared to, say, the `len(news)` above. 

In this case, we have a string variable (`news`), then we have a dot (`.`), and then we put the name of the function: (`lower()`). Notice that we have the two parentheses at the end, which indicate that the function does not have any arguments, except for the `news` variable. 

Notice that if we omit the parentheses, we will get a strange result:

---

# ⭕ **QUESTIONS?**

---

# Indexing

Strings can be indexed (subscripted), with the first character having index 0.

# Negative indexing

Indices may also be negative numbers, to start counting from the right at -1:

In [None]:
##### Illustration of indexing and negative indexing

#               | P| y| t| h| o| n|
#Index          | 0| 1| 2| 3| 4| 5|
#Negative Index |-6|-5|-4|-3|-2|-1|

# Slicing

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

---

# ⭕ **QUESTIONS?**

---

# Exercise 6

* Assign the string 'Dealing with Data' to a Python variable. 
* Print the word 'Dealing' by using the indexing/slicing approach.
* Print the word 'Data' by using the negative indexing/slicing approach.

# Solution

---

# String Comparisons

We will examine now the concept of comparisons among strings and introduce a few comparison operators. 

# Equality comparison

Let's first examine how we can check if two strings are identical. For this comparison, we need the equality operator `==`.

Notice that **capitalization matters** when comparsing strings in Python. If we want to make the comparison case-insensitive we typically first convert both sides of the equality to the same case:

The opposite operator for equality is the inequality operator: `!=`. For example:

# Ordering Strings

String also allow for inequality comparisons. When we compare strings, the string that is "smaller" is the one that is coming first in the dictionary. Let's see an example: 

Notice though the following, where the capitalization of `Bill` changes:

What causes this is the fact that the order is not simply the order in which we would encounter words in the dictionary. Technically, strings are ordered based on the order of the characters in the ASCII (or Unicode) table.

For example, if we have the list of strings below, and we try to sort them, look what happens:

---

# ⭕ **QUESTIONS?**

---

# Finding text within string variables

####  `in` operator


+ The `in` operator, `needle in haystack`: reports if the string `needle` appears in the string `haystack`


For example, string "New York" appears within "New York University", so the following operator returns `True`:

But, "New York University" is not in "New York" 

# `find` function

* The `find` function, `haystack.find(needle)`: searches `haystack` for `needle`, prints the position of the first occurrence, indexed from 0; returns -1 if not found.

For example:

If we are looking to find additional appearances of the string, then we can add a second parameter in the `find` function, specifying that we are only interested in matches after the position specificed by the parameter.

---

# Exercise 7

Consider the string `billgates@microsoft.com`. 

Write code that finds the username of the email address and the domain of the email address. You will need to use the .find() command, and also use your knowledge of indexing and slicing for this exercise. 

Hint: You will need to search for the `@` character using find, and then use the result to get the parts of the string before and after the `@` character. (Do not worry if this seems tedious, this is mainly for practice; later on, we will see how to do this in an easier way.)


# Solution

---

# `count` function

+ `str_1.count(str_2)`: counts the number of occurrences of one string in another.

Of course, notice that if capitalization is different, the matches will not "count".

---

# Exercise 8

Consider the news article from [NYT](https://www.nytimes.com/2022/02/15/sports/olympics/alex-hall-nick-goepper-gold-silver-slopestyle-skiing.html?), which is given below, and stored in the string variable `article`.

* Count how many times the word `Hall` appears in the article. .
* Count how many times the word `Goepper` appears in the article. 
* Now sum up the occurences of `Hall` and `Goepper` and display the percentage of coverage for each of the two strings. (For example, if Hall appears 2 times and Goepper 3 times, then Hall is 40% and Goepper is 60%.)

In [None]:
article = """
Alex Hall let out a whoop when he landed his last trick on the slopestyle course, and that was before the judges awarded him with what would be the winning score. It was, he said later, the best run of his life.

“Oh, I was stoked,” he said. “I couldn’t believe I just landed that.”

Hall was one of three Americans looking to crowd the medal stand at the men’s freestyle skiing slopestyle event, hoping that a European-centric field would not disrupt those plans.

Two of them did it: Hall won gold and Nick Goepper took silver on another sunny, below-zero day at Genting Snow Park. Jesper Tjader of Sweden won bronze.

In a competition where only a skier’s best score counted, Hall set the standard early with a 90.01 score on the first of three runs. Everyone else spent the frigid morning trying to match it, but no one did. Goepper came closest, on his second run, scoring 86.48.

“All right,” he said when the score popped up. “I’ll take it.”

Each of the Americans in the final arrived with high hopes and a stirring story. Goepper, 27, was looking to complete a full rainbow of medals, having won a bronze in 2014 and a silver in 2018. He has battled alcohol abuse and depression, opening up about his struggles after his 2018 performance in Pyeongchang, South Korea.

In an interview last month, Goepper said that he was glad that other Olympians seem increasingly willing to discuss their mental health.

Colby Stevenson, 24, was in a near-fatal car accident in 2016, late at night on a rural road in Idaho. He spent days in a coma, but recovered to return to the global circuit and win major events. At these Olympics, he won a silver medal in big air and was a contender for another medal in slopestyle.

Instead, he finished seventh, unable to cleanly land the run he imagined.

“Gave it everything I had,” he said after his last chance.

The day belonged to Hall. The 23-year-old was born in Alaska but grew up mostly in Switzerland, the son of professors at the University of Zurich. He did not have coaching until he was 16, when he was invited by the U.S. freeski team to train in Utah. For a time, he considered competing for Italy, where his mother is from.

That background, free from the constraints of coaching and youth competitions, imbibed him with a bit of a free spirit.

He was 16th in slopestyle at the 2018 Pyeongchang Winter Olympics, just as his career was taking off. He won a World Cup event that year and the X Games in 2019. He was third at last year’s world championships.

He is tall, well over six feet, but stands out on the slopes mostly for his originality.

“You’ll see him doing a whole bunch of taps and nose butters and creative ways to utilize the course,” U.S. freeski coach Dave Euler said of Hall in December. “He’s a very creative course user.”

The Olympic contest was the final showing for the slopestyle course, a standout venue — but a temporary one, made of snow — designed to look like a section of the nearby Great Wall. Its combination of rails, obstacles and jumps created a plethora of possibilities, but vexed some of the world’s best snowboarders and freeskiers. Hall and Goepper loved it.

“As soon as you standardize this sport, you’re going to kill it,” Goepper said. “So if you can leave the creativity and the artistry up to us, that is going to keep this sport fresh.”

It is why Hall was deemed the worthiest of Olympic champions. He has won big contests with dizzying spins, a ceaseless spin-to-win trend in both freeskiing and snowboarding that worries purists.

But on Wednesday, Hall brought a bag of technical tricks, hoping the judges would reward him for originality rather than rotations.

His last jump was one that he had landed only a couple of times before, even though it is really only a 900-degree rotation — half of what many other tricks are these days. As Hall described it, he spun one way in the air, stopped and spun the other way before landing.

That led to the whoop.

“I’ve always told myself, if I’m not having fun doing it, then there’s really no reason to do it,” Hall said. “So I might as well do what brings me all this joy.”

His smile was concealed by a mask, a hallmark of an Olympics held during a pandemic. But his eyes lit up below his ice-frosted eyebrows. He wore an American flag over his shoulders, and soon, a gold medal around his neck.


"""

# Solution

---

# ⭕ **QUESTIONS?**

---

# `startswith` and `endswith` functions

Finally, we can also check if a particular string starts or ends with a another substring

+ `haystack.startswith(needle)`: does a the haystack string start with the needle string?
+ `haystack.endswith(needle)`: does a the haystack string end with the needle string?


# Special characters

When we use strings, you will notice that we often want to use some "_special characters_". These special characters consist of the backslash character (`\`) followed by another character.

**Tab character** `\t`: For example, if we want to create an output of multiple columns, with each columns being separated with a tab from each other, we can use the tab character, which is represented as `\t`. 

**New line character** `\n`: This is a special character that we use to represent a new line.

**Backslash character** `\\`: In general, backslash (`\`) is used to introduce special characters. If we want to type the backslash character itself, we do it by typing backslash twice in a row: `\\`.

**Quotes**: 

---

# ⭕ **QUESTIONS?**

---

# Splitting Strings: `split` and `join`

Since we talked about special characters, let’s talk now about splitting strings, using the `split()` function.


+ `longstring.split(separator)`: split the first string (longstring) at every occurrence of the second string (separator) Outputs a list (see below).
+ `connector.join(list)`:  join is the "reverse" of split, and joins all  the elements of the list, using the `connector` string in front.

Or let's take the example from above:

Notice that when we split a string and the delimeter character
does not appear, then we get back the string itself, but converted
into a list with a single element. (We will talk about lists in a future class)

  Join is the "reverse" of split, and joins all
 the elements of the list, using the "string" in front.
 The command below joins the elements of the `line` variable using the 
 characters `###` as the connecting element.

---

# Exercise 9 

Consider the string `billgates@microsoft.com.`

Write code that finds the username of the email address and the domain of the email address, using the `split()` command.

# Solution

---

# Booleans

Booleans represent the truth or success of a statement, and are commonly used for branching and checking status in code.

They can take two values: `True` or `False`.

If you remember from our strings session, we could execute a command that checks in a string appears within another. For example:


# Boolean Operations:

Frequently, one wants to combine or modify boolean values. Python has several operations for just this purpose:

+ `not a`: returns the opposite value of `a`.
+ `a and b`: returns true if and only if both `a` and `b` are true.
+ `a or b`: returns true either `a` or `b` are true, or both.

Like mathematical expressions, boolean expressions can be nested using parentheses. 

Consider the outcomes of the following examples

---

# ⭕ **QUESTIONS?**

---

# Control Structures: if statements

Traversing over data and making decisions based upon data are a common aspect of every programming language, known as control flow. Python provides a rich control flow, with a lot of conveniences for the power users. Here, we're just going to talk about the basics.

A common theme throughout this discussion of control structures is the notion of a "block of code." Blocks of code are **demarcated by a specific level of indentation**, typically separated from the surrounding code by some control structure elements, immediately preceeded by a colon, `:`. We'll see examples below. 

# if Statements:

If statements are perhaps the most widely used of all control structures. An if statement consists of a code block and an argument. The if statement evaluates the boolean value of it's argument, executing the code block if that argument is true. 

And here is an `if` statement paired with an `else`.

In [None]:
text = "The day belonged to Hall. The 23-year-old was born in Alaska but grew up mostly in Switzerland, the son of professors at the University of Zurich. He did not have coaching until he was 16, when he was invited by the U.S. freeski team to train in Utah. For a time, he considered competing for Italy, where his mother is from."



In [None]:
text = "The day belonged to Hall. The 23-year-old was born in Alaska but grew up mostly in Switzerland, the son of professors at the University of Zurich. He did not have coaching until he was 16, when he was invited by the U.S. freeski team to train in Utah. For a time, he considered competing for Italy, where his mother is from."

Each argument in the above if statements is a boolean expression. Often you want to have alternatives, blocks of code that get evaluated in the event that the argument to an if statement is false. This is where **`elif`** (else if) and **`else`** come in. 

An **`elif`** is evaluated if all preceding `if` or `elif` arguments have evaluated to false. The `else` statement is the last resort, assigning the code that gets executed if no `if` or `elif` above it is true. These statements are optional, and can be added to an if statement in any order, with at most one code block being evaluated. An `else` will always have it's code be executed, if nothing above it is true.

---

# Exercise 10

You need to be 21 years old and above to drink alcohol. Write a conditional expression that checks the age, and prints out whether the person is allowed to drink alcohol.

# Solution

---

---

# A Hint for Your Homework

If you would like to request input from a user, you can simply use `input()`. For instance...

---