# First steps with Python

# Expressions

Programming languages are much simpler than human languages. Nonetheless, there are some rules of grammar to learn in any language, and that is where we will begin. In this text, we will use the [Python](https://www.python.org/) programming language. Learning the grammar rules is essential, and the same rules used in the most basic programs are also central to more sophisticated programs.

Programs are made up of *expressions*, which describe to the computer how to combine pieces of data. For example, a multiplication expression consists of a `*` symbol between two numerical expressions. Expressions, such as `3 * 4`, are *evaluated* by the computer. The value (the result of *evaluation*) of the last expression in each cell, `12` in this case, is displayed below the cell.

In [1]:
3 * 4

12

The grammar rules of a programming language are rigid. In Python, the `*` symbol cannot appear twice in a row. The computer will not try to interpret an expression that differs from its prescribed expression structures. Instead, it will show a `SyntaxError` error. The *Syntax* of a language is its set of grammar rules, and a `SyntaxError` indicates that an expression structure doesn't match any of the rules of the language.

In [2]:
3 * * 4

SyntaxError: invalid syntax (<ipython-input-2-012ea60b41dd>, line 1)

Small changes to an expression can change its meaning entirely. Below, the space between the `*`'s has been removed. Because `**` appears between two numerical expressions, the expression is a well-formed *exponentiation* expression (the first number raised to the power of the second: 3 times 3 times 3 times 3). The symbols `*` and `**` are called *operators*, and the values they combine are called *operands*.

In [3]:
3 ** 4

81

**Common Operators.** Data science often involves combining numerical values, and the set of operators in a programming language are designed to so that expressions can be used to express any sort of arithmetic. In Python, the following operators are essential.

| Expression Type | Operator | Example    | Value     |
|-----------------|----------|------------|-----------|
| Addition        | `+`      | `2 + 3`    | `5`       |
| Subtraction     | `-`      | `2 - 3`    | `-1`      |
| Multiplication  | `*`      | `2 * 3`    | `6`       |
| Division        | `/`      | `7 / 3`    | `2.66667` |
| Exponentiation  | `**`     | `2 ** 0.5` | `1.41421` |

Python expressions obey the same familiar rules of *precedence* as in algebra: multiplication and division occur before addition and subtraction. Parentheses can be used to group together smaller expressions within a larger expression.

In [4]:
1 + 2 * 3 * 4 * 5 / 6 ** 3 + 7 + 8 - 9 + 10

17.555555555555557

In [5]:
1 + 2 * (3 * 4 * 5 / 6) ** 3 + 7 + 8 - 9 + 10

2017.0

This chapter introduces many types of expressions. Learning to program involves trying out everything you learn in combination, investigating the behavior of the computer. What happens if you divide by zero? What happens if you divide twice in a row? You don't always need to ask an expert (or the Internet); many of these details can be discovered by trying them out yourself. 

In [14]:
12//3

4

# Names

Names are given to values in Python using an *assignment* statement. In an assignment, a name is followed by `=`, which is followed by any expression. The value of the expression to the right of `=` is *assigned* to the name. Once a name has a value assigned to it, the value will be substituted for that name in future expressions.

In [1]:
a = 10
b = 20
a + b

30

A previously assigned name can be used in the expression to the right of `=`. 

In [2]:
quarter = 1/4
half = 2 * quarter
half

0.5

However, only the current value of an expression is assigned to a name. If that value changes later, names that were defined in terms of that value will not change automatically.

In [3]:
quarter = 4
half

0.5

Names must start with a letter, but can contain both letters and numbers. A name cannot contain a space; instead, it is common to use an underscore character `_` to replace each space. Names are only as useful as you make them; it's up to the programmer to choose names that are easy to interpret. Typically, more meaningful names can be invented than `a` and `b`. For example, to describe the sales tax on a $5 purchase in Berkeley, CA, the following names clarify the meaning of the various quantities involved.

In [4]:
purchase_price = 5
state_tax_rate = 0.075
county_tax_rate = 0.02
city_tax_rate = 0
sales_tax_rate = state_tax_rate + county_tax_rate + city_tax_rate
sales_tax = purchase_price * sales_tax_rate
sales_tax

0.475

# Call Expressions

*Call expressions* invoke functions, which are named operations. The name of the function appears first, followed by expressions in parentheses. 

In [1]:
abs(-12)

12

In [2]:
round(5 - 1.3)

4

In [3]:
max(2, 2 + 3, 4)

5

In this last example, the `max` function is *called* on three *arguments*: 2, 5, and 4. The value of each expression within parentheses is passed to the function, and the function *returns* the final value of the full call expression. The `max` function can take any number of arguments and returns the maximum.

A few functions are available by default, such as `abs` and `round`, but most functions that are built into the Python language are stored in a collection of functions called a *module*. An *import statement* is used to provide access to a module, such as `math`.

In [15]:
import math

math.sqrt(5+4)

3.0

An equivalent expression could be expressed using the `+` and `**` operators instead.

In [5]:
(4 + 5) ** 0.5

3.0

Operators and call expressions can be used together in an expression. The *percent difference* between two values is used to compare values for which neither one is obviously `initial` or `changed`. For example, in 2014 Florida farms produced 2.72 billion eggs while Iowa farms produced 16.25 billion eggs (http://quickstats.nass.usda.gov/). The percent difference is 100 times the absolute value of the difference between the values, divided by their average. In this case, the difference is larger than the average, and so the percent difference is greater than 100.

In [6]:
florida = 2.72
iowa = 16.25
100*abs(florida-iowa)/((florida+iowa)/2)

142.6462836056932

Learning how different functions behave is an important part of learning a programming language. A Jupyter notebook can assist in remembering the names and effects of different functions. When editing a code cell, press the *tab* key after typing the beginning of a name to bring up a list of ways to complete that name. For example, press *tab* after `math.` to see all of the functions available in the `math` module. Typing will narrow down the list of options. To learn more about a function, place a `?` after its name. For example, typing `math.log?` will bring up a description of the `log` function in the `math` module.

In [7]:
math.log?

    log(x[, base])

    Return the logarithm of x to the given base.
    If the base not specified, returns the natural logarithm (base e) of x.

The square brackets in the example call indicate that an argument is optional. That is, `log` can be called with either one or two arguments.

In [8]:
math.log(16, 2)

4.0

In [9]:
math.log(16)/math.log(2)

4.0

# Data Types

Every value has a type, and the built-in `type` function returns the type of the result of any expression.

One type we have encountered already is a built-in function. Python indicates that the type is a `builtin_function_or_method`; the distinction between a *function* and a *method* is not important at this stage.

In [1]:
type(abs)

builtin_function_or_method

This chapter will explore many useful types of data.

# Numbers

Computers are designed to perform numerical calculations, but there are some important details about working with numbers that every programmer working with quantitative data should know. Python (and most other programming languages) distinguishes between two different types of numbers:

* Integers are called `int` values in the Python language. They can only represent whole numbers (negative, zero, or positive) that don't have a fractional component
* Real numbers are called `float` values (or *floating point values*) in the Python language. They can represent whole or fractional numbers but have some limitations.

The type of a number is evident from the way it is displayed: `int` values have no decimal point and `float` values always have a decimal point. 

In [1]:
# Some int values
2

2

In [2]:
1 + 3

4

In [3]:
-1234567890000000000

-1234567890000000000

In [4]:
# Some float values
1.2

1.2

In [5]:
3.0

3.0

When a `float` value is combined with an `int` value using some arithmetic operator, then the result is always a `float` value. In most cases, two integers combine to form another integer, but any number (`int` or `float`) divided by another will be a `float` value. Very large or very small `float` values are displayed using scientific notation.

In [6]:
1.5 + 2

3.5

In [7]:
3 / 1

3.0

In [8]:
-12345678900000000000.0

-1.23456789e+19

The `type` function can be used to find the type of any number.

In [9]:
type(3)

int

In [10]:
type(3 / 1)

float

The `type` of an expression is the type of its final value. So, the `type` function will never indicate that the type of an expression is a name, because names are always evaluated to their assigned values.

In [11]:
x = 3
type(x) # The type of x is an int, not a name

int

In [12]:
type(x + 2.5)

float

## More About Float Values

Float values are very flexible, but they do have limits. 

1. A `float` can represent extremely large and extremely small numbers. There are limits, but you will rarely encounter them.
2. A `float` only represents 15 or 16 significant digits for any number; the remaining precision is lost. This limited precision is enough for the vast majority of applications.
3. After combining `float` values with arithmetic, the last few digits may be incorrect. Small rounding errors are often confusing when first encountered.

The first limit can be observed in two ways. If the result of a computation is a very large number, then it is represented as infinite. If the result is a very small number, then it is represented as zero.

In [13]:
2e306 * 10

2e+307

In [14]:
2e306 * 100

inf

In [15]:
2e-322 / 10

2e-323

In [16]:
2e-322 / 100

0.0

The second limit can be observed by an expression that involves numbers with more than 15 significant digits. These extra digits are discarded before any arithmetic is carried out.

In [17]:
0.6666666666666666 - 0.6666666666666666123456789

0.0

The third limit can be observed when taking the difference between two expressions that should be equivalent. For example, the expression `2 ** 0.5` computes the square root of 2, but squaring this value does not exactly recover 2.

In [18]:
2 ** 0.5

1.4142135623730951

In [19]:
(2 ** 0.5) * (2 ** 0.5)

2.0000000000000004

In [20]:
(2 ** 0.5) * (2 ** 0.5) - 2

4.440892098500626e-16

The final result above is `0.0000000000000004440892098500626`, a number that is very close to zero. The correct answer to this arithmetic expression is 0, but a small error in the final significant digit appears very different in scientific notation. This behavior appears in almost all programming languages because it is the result of the standard way that arithmetic is carried out on computers. 

Although `float` values are not always exact, they are certainly reliable and work the same way across all different kinds of computers and programming languages. 

# Strings

Much of the world's data is text, and a piece of text represented in a computer is called a *string*. A string can represent a word, a sentence, or even the contents of every book in a library. Since text can include numbers (like this: 5) or truth values (True), a string can also describe those things.

The meaning of an expression depends both upon its structure and the types of values that are being combined. So, for instance, adding two strings together produces another string. This expression is still an addition expression, but it is combining a different type of value.

In [1]:
"data" + "science"

'datascience'

Addition is completely literal; it combines these two strings together without regard for their contents. It doesn't add a space because these are different words; that's up to the programmer (you) to specify.

In [2]:
"data" + " " + "science"

'data science'

Single and double quotes can both be used to create strings: `'hi'` and `"hi"` are identical expressions. Double quotes are often preferred because they allow you to include apostrophes inside of strings.

In [3]:
"This won't work with a single-quoted string!"

"This won't work with a single-quoted string!"

Why not? Try it out.

The `str` function returns a string representation of any value. Using this function, strings can be constructed that have embedded values.

In [4]:
"That's " + str(1 + 1) + ' ' + str(True)

"That's 2 True"

# Comparisons

Boolean values most often arise from comparison operators. Python includes a variety of operators that compare values. For example, `3` is larger than `1 + 1`.

In [1]:
3 > 1 + 1

True

The value `True` indicates that the comparison is valid; Python has confirmed this simple fact about the relationship between `3` and `1+1`. The full set of common comparison operators are listed below.

| Comparison         | Operator | True example | False Example |
|--------------------|----------|--------------|---------------|
| Less than          | <        | 2 < 3        | 2 < 2         |
| Greater than       | >        | 3>2          | 3>3           |
| Less than or equal | <=       | 2 <= 2       | 3 <= 2        |
| Greater or equal   | >=       | 3 >= 3       | 2 >= 3        |
| Equal              | ==       | 3 == 3       | 3 == 2        |
| Not equal          | !=       | 3 != 2       | 2 != 2        |

An expression can contain multiple comparisons, and they all must hold in order for the whole expression to be `True`. For example, we can express that `1+1` is between `1` and `3` using the following expression.

In [19]:
1< 1+1 < 3

True

# Sequences

Values can be grouped together into collections, which allows programmers to organize those values and refer to all of them with a single name. By grouping values together, we can write code that performs a computation on many pieces of data at once.

Calling the function `array` on several values places them into an *array*, which is a kind of sequential collection. Below, we collect four different temperatures into an array called `highs`. These are the [estimated average daily high temperatures](http://berkeleyearth.lbl.gov/regions/global-land) over all land on Earth (in degrees Celsius) for the decades surrounding 1850, 1900, 1950, and 2000, respectively, expressed as deviations from the average absolute high temperature between 1951 and 1980, which was 14.48 degrees.

# Arrays

While there are many kinds of collections in Python, we will work primarily with arrays in this class. We've already seen that the `make_array` function can be used to create arrays of numbers.

Arrays can also contain strings or other types of values, but a single array can only contain a single kind of data. (It usually doesn't make sense to group together unlike data anyway.)  For example:

In [20]:
from numpy import array

In [21]:
english_parts_of_speech = array(["noun", "pronoun", "verb", "adverb", "adjective", "conjunction", "preposition", "interjection"])
english_parts_of_speech

array(['noun', 'pronoun', 'verb', 'adverb', 'adjective', 'conjunction',
       'preposition', 'interjection'], dtype='<U12')

Returning to the temperature data, we create arrays of average daily [high temperatures](http://berkeleyearth.lbl.gov/auto/Regional/TMAX/Text/global-land-TMAX-Trend.txt) for the decades surrounding 1850, 1900, 1950, and 2000.

In [22]:
baseline_high = 14.48
highs = array([baseline_high - 0.880, 
                   baseline_high - 0.093,
                   baseline_high + 0.105, 
                   baseline_high + 0.684])
highs

array([13.6  , 14.387, 14.585, 15.164])

Arrays can be used in arithmetic expressions to compute over their contents. When an array is combined with a single number, that number is combined with each element of the array. Therefore, we can convert all of these temperatures to Fahrenheit by writing the familiar conversion formula.

In [28]:
(9/5) * highs + 32

array([56.48  , 57.8966, 58.253 , 59.2952])

![array arithmetic](../../../images/array_arithmetic.png)

Arrays also have *methods*, which are functions that operate on the array values. The `mean` of a collection of numbers is its average value: the sum divided by the length. Each pair of parentheses in the examples below is part of a call expression; it's calling a function with no arguments to perform a computation on the array called `highs`.

In [24]:
highs.size

4

In [25]:
highs.sum()

57.736000000000004

In [26]:
highs.mean()

14.434000000000001

# Ranges

A *range* is an array of numbers in increasing or decreasing order, each separated by a regular interval. 
Ranges are useful in a surprisingly large number of situations, so it's worthwhile to learn about them.

Ranges are defined  using the `np.arange` function, which takes either one, two, or three arguments: a start, and end, and a 'step'.

If you pass one argument to `np.arange`, this becomes the `end` value, with `start=0`, `step=1` assumed.  Two arguments give the `start` and `end` with `step=1` assumed.  Three arguments give the `start`, `end` and `step` explicitly.

A range always includes its `start` value, but does not include its `end` value.  It counts up by `step`, and it stops before it gets to the `end`.

    np.arange(end): An array starting with 0 of increasing consecutive integers, stopping before end.

In [2]:
np.arange(5)

array([0, 1, 2, 3, 4])

Notice how the array starts at 0 and goes only up to 4, not to the end value of 5.


    np.arange(start, end): An array of consecutive increasing integers from start, stopping before end.

In [3]:
np.arange(3, 9)

array([3, 4, 5, 6, 7, 8])


    np.arange(start, end, step): A range with a difference of step between each pair of consecutive values, starting from start and stopping before end.

In [4]:
np.arange(3, 30, 5)

array([ 3,  8, 13, 18, 23, 28])

This array starts at 3, then takes a step of 5 to get to 8, then another step of 5 to get to 13, and so on.

When you specify a step, the start, end, and step can all be either positive or negative and may be whole numbers or fractions. 

In [5]:
np.arange(1.5, -2, -0.5)

array([ 1.5,  1. ,  0.5,  0. , -0.5, -1. , -1.5])