# Getting Started in Python

Python is an interpreted language, which means that when any Code cell in a Jupyter notebook is evaluated, the Python code will be executed, and any output or error messages will appear in a new output portion of the cell that will appear just after input portion of the cell (that contains the Python code). 

At the bottom of the `Intro.ipynb` notebook, there is a blank cell where you can start entering Python code. If there is not already a blank cell there, click on the last cell and press `alt-Enter`.

## Hello World!

We start with the canonical example that appears in almost every programming book, which is to create a function that writes "Hello World!" to the screen/output. 

To print output in Python, use the `print` function, which is a built-in function in Python 3. Here a *function* is used to denote a *named* set of Python instructions that can be called on demand. Python functions are called by using the function name with parentheses after it. Functions can accept *arguments*, which are variables and values that the function can act on. The arguments for a function are put inside the parentheses, with commas separating different items. 

To output text using `print`, put it as an argument inside single or double quotation marks:

In [1]:
print("Hello World!")

Hello World!


The `print()` function in Python knows how to deal with most standard data types. Unlike `printf` in C, explicit formatting instructions do not have to be given -- just pass the thing to be printed as an argument to `print`:

In [2]:
print(32)

32


In [3]:
print([1, 4, 9, 16])

[1, 4, 9, 16]


To print multiple items, just concatenate them commas inside the `print()` function:

In [4]:
print("Hello", "world!", 10 + 5, [1, 4, 9, 16])

Hello world! 15 [1, 4, 9, 16]


Code cells may contain multiple Python statements. All statements in a cell will be run sequentially when the cell is run. The results of every `print` statement will be shown after the *input* part of the cell:

In [5]:
print("Hello ")
print("World!")

Hello 
World!


Some Python statement return results, and these will appear in a special *output* part of the cell:

In [6]:
5 + 10

15

However, if a cell contains multiple statments, the output will be the result of the **last** statement:

In [7]:
5 + 10
5 * 10

50

The last statement in a cell may produce not output. Then there is not output from the cell. Note that printed items are not outputs:

In [8]:
5 + 10
print("hello")

hello


There are several different ways to get the output of multiple statements. One easy way is to just print all the results you want to see:

In [9]:
print(5 + 10)
print(5 * 10)

15
50


## Comments

Comments are text in your Python code that are ignored by the Python interpreter.
Documentation is usually used to convey what you intend
for your code to be doing -- which is not always what it is actually doing! 
Documentation is both for others who may need to read your code and for your future self, who
may not remember what you meant for a block of code to do. 

Anything that follows a # (hash) symbol is a comment:

In [10]:
# It is important to use comments to document your thinking on big assignments

There is not really a multi-line comment in Python (like /* */ in C). One way to make a multiline comment is to just make a multi-line string that is not assigned to any variable.  Multi-line strings are delimited by triple-ticks ('''):

In [11]:
""" This is a
multi-line comment,
okay?"""

' This is a\nmulti-line comment,\nokay?'

When making custom functions, a multi-line string right after the function definition serves as the documentation string  (docstring) for that function.

## Python Variables and Types

Variables are named entities that store values for future reference. In Python, variable names consists of alphanumeric characters (a-z, A-Z, 0-9) and underscores, but they cannot start with a number. Variable names are case sensitive.  Detailed descriptions of naming conventions for Python are given in [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/]). Generally, we will follow these conventions:
* names should be descriptive, except when using as a simple index
* variable and function names should be lower case, with underscores separating different words
* variables should not duplicate names of modules or Python functions/methods

Variables are created by assigning a value to a valid variable name:

In [12]:
x = 10

Python uses implicit typing, which means that you do not have to define a variable's type (i.e., what type of data it holds). The type is determined by the interpreter at the time of assignment. You can determine a variable's type by passing the variable as the argument of the `type()` function:

In [13]:
type(x)

int

The type of a variable can change whenever new data is assigned to it:

In [14]:
x = 10.5

In [15]:
type(x)

float

In [16]:
x = "Hello World!"

In [17]:
type(x)

str

When a variable is passed as an argument to the `print()` function, the value of the variable is printed:

In [18]:
print(x)

Hello World!


Python knows how to perform many different operations on different data types and will usually do the *right thing* based on the type of the variable:

In [19]:
a = 3
b = 4
a + b

7

In [20]:
print(a * b)

12


```{warning}
Exponentiation is done using two asterices between the number to be operated on and the exponent. The carat operator (^) is reserved for bitwise exclusive or.
```

In [21]:
2 ** 3

8

We will often have the need to add to an existing value, and Python provides a shortcut notation for this (similar to C). For instance, here is the usualy way we write incrementing a counter:

In [22]:
counter = 0
counter += 1
print(counter)

1


In [23]:
a = "Hello "
b = "World"
print(a + b)

Hello World


In [24]:
a = 3
b = 4.1
print(a + b, type(a + b))

7.1 <class 'float'>


However, Python cannot read minds, and its rules may create unexpected results:

In [25]:
a = "3"
b = "4"
print(a + b)
print(int(a) + int(b))

34
7


In [26]:
c = 5
print(a * c)
print(int(a) * c)

33333
15


### Basic Data Types

Python has many data types built-in, and we have already seen a few of these: *int*, *float*, and *str* (string). In this book, we will also use a few other of Python's built-in types. 

Boolean values are stored in *bool* variables that take on either `True` or `False`. Note that exact capitalization is required for these values.

In [27]:
a = True
print(a)

True


We can check if two variable have the same values using `==`:

In [28]:
b = 2
c = 3

print(b == c)

False


In [29]:
d = b == c
type(d)

bool

Not equals is written as `!=`. Other comparisons are written in the usual ways (<, >, <=, >=).

In [30]:
print(b != c, b <= c)

True True



Python has a variety of different types of *sequence* types for containing an ordered  collection of values. Values can be retrieved by index from a sequence-type variable by putting the index number in square brackets after the variable name.

```{warning}
Like C, C++, and JavaScript, indexing in Python starts at 0. This means that in a sequence of $n$ items, the first item is at index 0, and the last item is at index $n-1$. MATLAB and many textbooks use indexing starting at 1.
```

The *list* type is a **mutable** container of values. *Mutable* means the values in the list can be changed. Lists are delimited by square brackets, and items in lists are separated by commas. 

In [31]:
my_list = ["dogs", "cats", 3, 7.0]
print(my_list)

['dogs', 'cats', 3, 7.0]


In [32]:
print(my_list[1])

cats


Because lists are mutable, values can be updated:

In [33]:
my_list[0] = "puppies"
print(my_list)

['puppies', 'cats', 3, 7.0]


Lists also can be sorted in place:

A *tuple* is an **immutable** sequence type. The values in a tuple cannot be changed. Tuples are often used to contain multiple values returned from a function. Tuples are delimited by parentheses: (). To create a tuple with only one value in it, include a comma after the value. As with lists, tuples can contain a variety of values. Trying to change a value in an immutable type results in an error:

In [34]:
tuple1 = (1, 4, "nine")
tuple2 = (16,)

In [35]:
print(tuple1[2])

nine


Trying to change a value in an immutable type results in an error:

In [36]:
tuple1[0] = "one"

TypeError: 'tuple' object does not support item assignment

The *range* type is an **immutable** sequence of numbers that is usually used in looping (especially `for` loops). A range object can be created using the `range` keyword in the following ways:
* `range(stop)` creates a sequence of *stop* values starting at 0. Thus the values are 0, 1, 2, $\ldots,$ *stop*-1. Note that the Python convention is that the *stop* value is not included in the range. 
* `range(start, stop)` creates a sequence of values that starts at *start* and ends at *stop*-1. 
* `range(start, stop, step)` creates a sequence of values that starts at *start*, increments by *step* and ends at *stop*-1.

You can iterate over ranges using a `for`...`in` statement, which is discussed in the next section.

Python also contains one *mapping* type:
* *dict* is a dictionary object that provides a map between key-value pairs. 
As with lists and tuples, keys and values can be any data types, but the keys must be unique.
Dictionaries are delimited by curly braces: { }. Each entry in a dictionary is written the form `key:value`, and different key-value pairs are separated by commas. The value for a particular key can be retrieved by putting the key in square brackets after the dictionary variable's name.

In [37]:
squares = {1: 1, 2: 4, 3: 9, 4: 16}
print(squares[3])

9


In [38]:
misc = {"cats": 0, "dogs": 1, 3: "test"}
print(misc["cats"])
print(misc[3])

0
test


Python variables are actually much more powerful than variables in many languages because they are actually *objects*. Python is an object-oriented programming (OOP) language. To make the content of this book more accessible to people with a wide range of programming experience, this book does not generally use an object-oriented approach. However, we do not to know some fundamentals about **objects* and **classes**:
* *Objects* are special data types that have *methods* associated with them to work on those objects. Methods are similar to functions, except they are specialized to the *objects* to which they belong. Methods are called by giving the variable/object name, adding a period, specifying the method name, and then adding parentheses, with any aguments provided in parentheses. 
* A *class* is a template for an object that defines how an object stores information and defines its methods.

For instance, since *list* is a mutable type, a list can be sorted in place. The *list* object defines a sorting method to achieve this:

In [39]:
new_list = [5, 7, 1, 3, 13]
new_list.sort()
print(new_list)

[1, 3, 5, 7, 13]


For dictionaries, we often need to retrieve the keys or values. We can do this using methods provided by the dictionary type:

In [40]:
misc = {"cats": 0, "dogs": 1, 3: "test"}
print(misc.keys())

dict_keys(['cats', 'dogs', 3])


In [41]:
print(misc.values())

dict_values([0, 1, 'test'])


The `keys` and `values` methods return special objects of type `dict_keys` and `dict_values`, respectively, but for our purposes, we can treat these like lists. An example is shown in the Section {ref}`python-intro:loops`.

## Copying Variables

You have to be careful when copying variables in Python or you may get some unexpected results. Suppose you have a variable c that contains a list, and you want to copy it to a variable d:

In [42]:
c = [0, 1]
d = c
print(d)

[0, 1]


Everything works as expected. Now, let's change the value in position 1 in `d` and print out both `c` and `d`:

In [43]:
d[1] = 2
print("d=", d)
print("c=", c)

d= [0, 2]
c= [0, 2]


Changing the value of `d[1]` also changed the value of `c[1]`! This is because when `c` is made, `c` is variable that points to the list `[0, 1]`. When we set `d=c`, then Python sets the variable `d` to point to the **same** list `[0,1]`.

To make a copy of the list `c`, we can use the list's `copy` method to create a new list with the same contents as `c`:

In [44]:
e = c.copy()
e[1] = 3
print("e=", e)
print("c=", c)

e= [0, 3]
c= [0, 2]


We can check if two variables point to the same data using Python's `is` command:

In [45]:
c is d

True

If two variables point to the same data, then they will have the same values:

In [46]:
c == d

True

However, the opposite is not true. If we make a new copy of `c`, it will point to a different list object, but the contents will be the same:

In [47]:
f = c.copy()
print(f is c, f == c)

False True



(python-intro:indentation)=
## Indentation and Line Breaks in Python

Line breaks in Python are usually used to distinguish different Python statements, and we will often use this convention in this book. Exceptions to this include:
* When a statement includes arguments in parentheses, line breaks can be used within the parentheses to improve readability. The statement will not terminate until the closing parenthesis, which should be followed by a line break.
* As previously introduced, strings that include line breaks can be created using triple quotes.
* If a line ends in a backslash (\), it will be interpreted as continuing onto the next line.
* Multiple statements can be put onto a single line by separating them by semi-colons (;), but this is not encouraged and will not be used in this book.


In [48]:
print("Hello", "Amelia!")

Hello Amelia!


In [49]:
a = 2 + 3
print(a)

5


In [50]:
print(
    """Goodbye, 
Amelia!"""
)

Goodbye, 
Amelia!


Programming languages usually have some convention to indicate which statements belong together. For instance, if a statement starts a loop (such as `for` or `while`), there must be a way to indicate which of the following statements should be executed in each iteration of the loop and which should be executed after loop iteration. In many languages, such as C, C++, Java, and Javascript, code blocks are surrounded by curly braces: { }.

In Python, indentation is used to deliminate code blocks. The languages mentioned above used indentation as a *convention* that indicates meaning to humans working on code. Python uses indentation to *define* code blocks and convey meaning to the Python interpreter.

Either tabs or spaces can be used to denote code blocks, but the PEP-8 standard is to use 4 spaces per indent level. Jupyter inserts 4 spaces when `Tab` is pressed in a code block. **Change to footnote: Some text editors can be set up to insert a prescribed number of spaces when tab is used or convert tabs to spaces upon saving.**



(python-intro:loops)=

## Loops and Conditionals

### `for`... `in` Statements
In data science, we often need to either iterate over data or carry out iterations of a simulation. For both purposes, we will usually relay on Python's `for`...`in` statement. This is a type of *compound* statement. Compound statements consist of a *header* and a *suite* (Python's terminology) or *body*. The header always ends in a colon, and the *suite* is one more statements that are run consecutively according to conditions in the header. For our purposes, we will always create the suite as a set of statements that are indented one more level than the corresponding header, with each statement on a new line. 

The `for`...`in` compound statement takes a variable after the `for` keyword. This variable will hold the current iteration value. It takes an *iterable* object, which is any object that can be iterated over. The typical one we will use for this is the `range` object that we have just introduced:

In [51]:
for i in range(4):
    print(i)

0
1
2
3


In [52]:
for j in range(2, 4):
    print(j)

2
3


In [53]:
for k in range(2, 10, 2):
    print(k)

2
4
6
8


Another example of an iterable object is a list:

In [54]:
fruit = ["apples", "bananas", "cherries"]
for kind in fruit:
    print(kind)

apples
bananas
cherries


When iterating over objects like a list, it is often helpful to keep track of the iteration index. One easy way to do this is to use the `enumerate()` function that returns a tupe of the current iteration index and the item:

In [55]:
for i, kind in enumerate(fruit):
    print(i)
    print(kind)
    print()

0
apples

1
bananas

2
cherries



In the previous example, I purposefully split the printing into different lines to show a multi-statement suite.

For statements can be *nested*, which means that one `for` statement is inside of another for statement. For each iteration of the outer loop, the inner loop will run through all its iterations:

In [56]:
for i in range(3):
    for j in range(2):
        print(i, j)
    print()

0 0
0 1

1 0
1 1

2 0
2 1



### `if` Statements

The `if` statement is used to run code conditionally. If statements will often be inside other loop or conditional statements. For instance, to print out only the even numbers from 1 to 10, we can use an `if` inside a `for` statement:

In [57]:
for i in range(1, 11):
    if i % 2 == 0:
        print(i)

2
4
6
8
10


 More generally, an `if` statement may also have `elif` and `else` clauses.  Elif is short for "else if", and these headers act like `if` headers but only will be evaluated if the above `if` or `elif` headers did not have their conditions satisfied. There can only be on `else` clause, and its suite will be executed if the `if` and `elif` clauses did not have their conditions satisfied.  Note that the `elif` and `else` headers must be at the same indentation level as the corresponding `if` header, and these headers must also end with a colon (:). We will put the statements that belong to the suites for these headers on separate lines below the header and indented to one greater level.
 
As a simple example, suppose we want to identify the even numbers from 1 to 10, but if a number is NOT even, we wish to determine if it is divisible by 3. Otherwise, we just want to print an asterisk:

In [58]:
for i in range(1, 11):
    if i % 2 == 0:
        print(i, " is even")
    elif i % 3 == 0:
        print(i, " is not even but divides by 3")
    else:
        print("*")

*
2  is even
3  is not even but divides by 3
4  is even
*
6  is even
*
8  is even
9  is not even but divides by 3
10  is even


Note how indentation changes meaning in the following two examples, which differ only by one Tab:

In [59]:
a = 2
b = 3
if a == 2:
    print("a=2")
if b == 2:
    print("b=3")
    print(a, b)

a=2


In [60]:
a = 2
b = 3
if a == 2:
    print("a=2")
if b == 2:
    print("b=3")
print(a, b)

a=2
2 3


We will generally iterate over dictionaries by iterating over their keys:

In [61]:
squares = {1: 1, 2: 4, 3: 9, 4: 16}

for key in squares.keys():
    print(key, "** 2 = ", squares[key])

1 ** 2 =  1
2 ** 2 =  4
3 ** 2 =  9
4 ** 2 =  16


### `while` statements

While loops combine looping and conditionals and operate similarly to while loops in other languages. The loop will continue as long as the condition specified in the `while` statement is satisifed:

In [62]:
i = 1
while i < 10:
    if i % 2 == 0:
        print(i)
    i = i + 1

2
4
6
8


(python-intro:functions)=
## Functions

We have already used built-in functions, like `print()` and `type()`. 
In this book, we will also often create new functions. As with the built-in functions, 
user-defined functions can take arguments and can return values. Functions are declared 
using the Python keyword ```def``` followed by the function name and then parentheses. 
Any variables to received arguments are listed inside the parentheses.  Here is a simple
example:

In [63]:
def say_hello(name):
    print("Hello,", name, "!")

The function name and parameter list are called the *function signature*. User-defined functions are called in the same way as built-in functions:

In [64]:
say_hello("Amelia")

Hello, Amelia !


Functions can also return values by placing them after the Python `return` keyword. If multiple values are to be returned, they should be separated by commas. If more than one value is returned, it will be returned as a tuple:

In [65]:
def square_and_cube(x):
    return x ** 2, x ** 3

In [66]:
square_and_cube(4)

(16, 64)

When storing returned values from a function into multiple variables, you do not have to explicitly use the parentheses around the tupe of variables. You can just separate the variables by commas, as shown below:

In [67]:
four2, four3 = square_and_cube(4)

In [68]:
print(four2, four3)

16 64


As mentioned when we introduced strings, you can provide a docstring for a function as 
a string that directly follows the function definition. For example, let's define a function
that returns the squared error between its inputs:

In [69]:
def squared_error(x, y):
    """
    Returns the squared error (the squared difference) of the arguments
    """

    return (x - y) ** 2

In [70]:
squared_error(3, 2)

1

In [71]:
squared_error(2, 3)

1

Python also you to specify default values for function arguments that will be used if the user does not pass a value for the argument. When defining a function, specify the default value for an argument by writing the parameter name, followed by an equal sign, followed by the default value in the function signature.

Let's make a new version of `squared_error` function that sets the default value of `y` to 0:

In [72]:
def squared_error2(x, y=0):
    """
    Returns the squared error between the two arguments, with a default value of 0
    for the second argument
    """

    return (x - y) ** 2

In [73]:
squared_error2(2)

4

If the user passes a value, the default value is not used:

In [74]:
squared_error2(2, 3)

1

Note that we can use the names of the parameters (instead of parameter order) to pass values to those parameters. This is very commonly used in functions that have lots of parameters that are optional.

In [75]:
squared_error2(y=2, x=3)

1

Parameters can be passed using a mix of order and parameter name. However, any parameters passed by order must come before those passed by name.

We will see in the next section how to get help on a function.

## Getting Help and Completion

Python has built-in help for almost every function and object. This help can be retrieved in several ways. For instance, consider the built-in `sum` command. Here are several ways to get help in Jupyter:

In [76]:
help(sum)

Help on built-in function sum in module builtins:

sum(iterable, start=0, /)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.



In [77]:
?sum

[0;31mSignature:[0m [0msum[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return the sum of a 'start' value (default: 0) plus an iterable of numbers

When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
[0;31mType:[0m      builtin_function_or_method


For user-defined functions, 'help' will display the docstring you wrote. The following assumes that you have defined the function `squared_errors` from Section {ref}`python-intro:functions`

In [78]:
?squared_error

[0;31mSignature:[0m [0msquared_error[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns the squared error (the squared difference) of the arguments
[0;31mFile:[0m      ~/Dropbox (UFL)/teaching/stats/book/idse/01-intro/<ipython-input-69-e551966a0156>
[0;31mType:[0m      function


If a function's Python source code is available, it can be retrieved in Jupyter using `??`:

In [79]:
??squared_error

[0;31mSignature:[0m [0msquared_error[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0msquared_error[0m[0;34m([0m[0mx[0m[0;34m,[0m[0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m'''[0m
[0;34m    Returns the squared error (the squared difference) of the arguments[0m
[0;34m    '''[0m[0;34m[0m
[0;34m[0m    [0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0;34m([0m[0mx[0m[0;34m-[0m[0my[0m[0;34m)[0m[0;34m**[0m[0;36m2[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/Dropbox (UFL)/teaching/stats/book/idse/01-intro/<ipython-input-69-e551966a0156>
[0;31mType:[0m      function


Now, let's look at the help for a variable. 

In [80]:
squares = {1: 1, 2: 4, 3: 9, 4: 16}

In [81]:
?squares

[0;31mType:[0m        dict
[0;31mString form:[0m {1: 1, 2: 4, 3: 9, 4: 16}
[0;31mLength:[0m      4
[0;31mDocstring:[0m  
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)


You should also try `help(x)`, but I have omitted that because it provides help for every method that a dictionary object has available, which results in a lot of output.

You can also try `help()` with no argument to get an interactive help session.

Jupyter also provides many features to help you during programming. Assuming you have run the command defining `squares` above, try the following in a new Jupyter notebook cell:
1. Type `sum(`. When you type the open parenthesis, Jupyter should automatically insert a pair of parentheses. 
2. Press `shift-Tab`. You should see the call signature and doc string for the `sum` function in a pop-over box. You can press the `Esc` key to close the pop-over box. 
3. Type `sq` and press `Tab`. You should see a list of variables and functions that begin with `sq`. Use the cursor keys to scroll to `squares` and press `Enter` to insert it without having to type the full name of the squares dictionary.
4. Let's sum the values in the `squares` dictionary. Type a period and then press `Tab` again to see a list of methods for a `dict` object. Select `values` using the keyboard or mouse
5. Don't forget that we need parentheses to call the `values` method. Press `(` and a pair of parentheses should appear.
6. Press `shift-Enter` to run the cell

In [82]:
sum(squares.values())

30

## Python Modules and Namespaces

Python has many useful modules that extend Python's basic functionality. Some of these are included with the base Python distributin and many others are included in the Anaconda distribution. Many more can be installed over the Internet. 

To use a module, you must import it into your Python working environment. The most basic way to do this is to type `import` followed by the name of the module to be imported:

In [83]:
import numpy

Here we have imported NumPy, which is one of the most important Python modules for working with numerical functions and arrays. When a library is imported, the functions and classes from the library will be available in Python, but they are imported into their own *namespace*. To access something that exists in a different namespace, type the name of the namespace, followed by a period, followed by the name of the thing you are trying to access.

For instance, the value of $\pi$ is a constant object named `pi` in numpy. Now that we have imported numpy, we can access that value:

In [84]:
print(numpy.pi)

3.141592653589793


Numpy has many typical mathematical functions, which we can call using the `numpy` namespace:

In [85]:
numpy.sin(numpy.pi / 4)

0.7071067811865475

In [86]:
numpy.sqrt(2) / 2

0.7071067811865476

We can control the namespace into which the contents of a module is imported. Because many modules, like numpy, are commonly used, the community often uses community-standardized namespaces that are shorter than the full module name. To do this, type `import`, followed by the module name, followed by the `as` keyword, followed by the desired namespace. 

For NumPy, the user community typically uses `np`, so the import statement is as follows:

In [87]:
import numpy as np

In [88]:
print(np.pi)

3.141592653589793


```{warning}
It is possible to import the contents of a module into the *global* namespace, which means that the namespace does not have to be specified before each function, class, or object. However, this practice is strongly discouraged because it will often result in conflicts. For instance, both Matplotlib (a plotting module) and SymPy (a symbolic algebra module) have a `plot` function. If you were to import both `matplotlib` and `sympy` into the global namespace, you could not be sure which `plot` you were calling, unless you kept track of which module was imported last.
```


Importing into namespaces is such good practice that we will not give an example of how to import an entire module into the global namespace. However, on occasion, it may be helpful to import just a single function from a module, and in this case, it is reasonable to import into the global namespace if we can be confident that there will not be any collisions. An example follows:

In [89]:
from scipy.special import factorial

In [90]:
factorial(10)

3628800.0

Why is factorial returning a float? Let's check the docstring:

In [91]:
?factorial

[0;31mSignature:[0m [0mfactorial[0m[0;34m([0m[0mn[0m[0;34m,[0m [0mexact[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
The factorial of a number or array of numbers.

The factorial of non-negative integer `n` is the product of all
positive integers less than or equal to `n`::

    n! = n * (n - 1) * (n - 2) * ... * 1

Parameters
----------
n : int or array_like of ints
    Input values.  If ``n < 0``, the return value is 0.
exact : bool, optional
    If True, calculate the answer exactly using long integer arithmetic.
    If False, result is approximated in floating point rapidly using the
    `gamma` function.
    Default is False.

Returns
-------
nf : float or int or ndarray
    Factorial of `n`, as integer or float depending on `exact`.

Notes
-----
For arrays with ``exact=True``, the factorial is computed only once, for
the largest input, with each other result computed in the process.
The output dtype is increased to ``int64

By inspecting the docstring, we can see that if we want the exact value, we need to set the parameter `exact` to True. This is typically done using the parameter name because anyone reading the function call will understand what the value of `True` is being used for:

In [92]:
factorial(10, exact=True)

3628800

### NumPy Arrays

NumPy provides an `numpy.ndarray` container for holding one-dimensional or multi-dimensional collections of numbers. Most engineers will have some familiarity with vectors or matrices, and one-dimensional arrays are equivalent to vectors, while two-dimensional arrays are equivalent to matrices. The majority of our work on arrays and their mathematics (linear algebra) will be deferred to **CHAPTER ON LINEAR ALGEBRA**. However, arrays are very helpful for doing basic data manipulation and storing numerical values for simulations, so we review some basics here.

Arrays can be created using NumPy's `array` function. For instance, to create a one-dimensional array (i.e., a vector), pass a list of numbers to `np.array`:

In [93]:
A = np.array([1, 2, 3, 4])
A

array([1, 2, 3, 4])

Two-dimesional arrays can be created by passing a list of lists of numbers. Each of the lists of numbers must have the same length. This will be most clear with an example:

In [94]:
B = np.array([[1, 2, 3, 4], [8, 7, 6, 5]])

B

array([[1, 2, 3, 4],
       [8, 7, 6, 5]])

We will also be using methods that return NumPy arrays of random values.  


NumPy arrays offer many advantages over lists. Two primary ones are:
1. It is easy to perform numerical operations on the elements of NumPy arrays.
2. NumPy arrays offer a variety of built-in methods that we will find useful.

To illustrate these, consider the array $A$ defined above and a Python `list` $L$ containing the same numbers:

In [95]:
L = [1, 2, 3, 4]

The NumPy array makes it easy to multiply all the elements in the array by a value:

In [96]:
A * 5

array([ 5, 10, 15, 20])

Compare the result with the effect of the multiplication operator on a list:

In [97]:
L * 5

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

An example of a built-in method is `sum()`:

In [98]:
A.sum()

10

List does not have a built-in method, but Python does offer a general `sum` function:

In [99]:
L.sum()

AttributeError: 'list' object has no attribute 'sum'

In [100]:
sum(L)

10

We will introduce other methods of the `array` type and other NumPy functions that work on and/or return arrays as we introduce more data science techniques. 

## Writing Big Numbers in Python

We will be building simulations in Python that require looping thousands or millions of times. Thus, we will often be writing a `range` that has an argument with many zeros. The `range` function will not take a `float` value, and numbers written in scientific notation (like `1e6`) will be treated as floats. This results in using integers that are very hard to read, like 10000000. We can't use commas in the numbers because that would create a tuple:


In [101]:
10, 000, 000

(10, 0, 0)

Fortunately, Python provides a simple way to make large numbers like these more readable. Instead of using commas as a delimiter between every third digit, use underscores (\_). Doing this makes it much easier to interpret large numbers, like ten million:

In [102]:
10_000_000

10000000

## Summary and Other Resources

Do not worry too much about absorbing all these details about Python now. This book contains many examples to get you started, and you can refer back to this section for reference. Some additional features of the Python programming language will be introduced as needed.

For users who want to learn more about Python, the following resources are recommended:
* [*A Whirlwind Tour of Python* (https://jakevdp.github.io/WhirlwindTourOfPython/)](https://jakevdp.github.io/WhirlwindTourOfPython/), by Jake VanderPlas, is a free eBook that covers all the major syntax and features of Python
* [Learn Python for free (https://scrimba.com/learn/python)](https://scrimba.com/learn/python) is a free 5-hour online introduction to Python (signup required)
* The Python documentation includes a [Python Tutorial (https://docs.python.org/3/tutorial/)](https://docs.python.org/3/tutorial/)

## Jupyter Magics

Code cells can also contain special instructions intended for JupyterLab itself, rather than the
Python kernel. These are called *magics*. For instance, to output your current directory, you can use the 
`%pwd` magic

In [103]:
%pwd

'/Users/jshea/Dropbox (UFL)/teaching/stats/book/idse/01-intro'

You can use the "%cd" magic to change your directory (recall that~ is a shortcut for your home user directory):

In [104]:
cd ~

/Users/jshea


Changing directories is often useful to switch to a directory where data is stored.

You can get a list of magics and other information about Jupyter using the `%quickref` magic.

In [105]:
%quickref


IPython -- An enhanced Interactive Python - Quick Reference Card

obj?, obj??      : Get help, or more help for object (also works as
                   ?obj, ??obj).
?foo.*abc*       : List names in 'foo' containing 'abc' in them.
%magic           : Information about IPython's 'magic' % functions.

Magic functions are prefixed by % or %%, and typically take their arguments
without parentheses, quotes or even commas for convenience.  Line magics take a
single % and cell magics are prefixed with two %%.

Example magic function calls:

%alias d ls -F   : 'd' is now an alias for 'ls -F'
alias d ls -F    : Works if 'alias' not a python name
alist = %alias   : Get list of aliases to 'alist'
cd /usr/share    : Obvious. cd -<tab> to choose from visited dirs.
%cd??            : See help AND source for magic %cd
%timeit x=10     : time the 'x=10' statement with high precision.
%%timeit x=2**100
x**100           : time 'x**100' with a setup of 'x=2**100'; setup code is not
                   co