This Jupyter notebook is meant to introduce core Python ideas and the mechanisms of working with Jupyter notebooks.

Everything you see in a Jupyter notebook is contained in a notebook cell. There are a few main types of cells:

* **Code cells** These are cells that contain Python code.
* **Output cells** These are cells that contain any output that a Python code cell produces.
* **Text/Markdown cells** These are cells that contain text, formatted in Markdown notation. These can be used to explain the code, or for you to take notes.

Below is a code cell:

In [1]:
x = 10
y = 11

# Lines that start with hash marks are comments and are ignored by Python.

# Add x + y
z = x + y

# Print the sum
print(z)

21


The 21 above is the output of the code, which in this case is just the output of the print statement.

You can make more meaningful print statements as well

In [2]:
print("The sum is = " + str(z))

The sum is = 21


Here, the `+` sign means concatenate two strings together: the string `"The sum is = "` and the string representation of the number `z`: `str(z)`. Without the `str` around `z`, you would get an error:

In [3]:
print("The sum is = " + z)

TypeError: can only concatenate str (not "int") to str

**It is important that you learn how to read an error message.** The easiest thing you can do is just read the bottom of the output, as this usually has the most immediate cause of the error. In this case, Python is telling us we cannot concatenate a string to anything other than a string.

You can also concatenate several strings together

In [5]:
print("The sum of " + str(x) + " and " + str(y) + " = " + str(z))

The sum of 10 and 11 = 21


That's a bit long. You can use formatted strings and Pythons `%` string operator to shorten this a bit.

In [6]:
print("The sum of %d and %d = %d"%(x, y, z))

The sum of 10 and 11 = 21


Here `%d` is a placeholder for a whole number. See here for a detailed description of how to use `%` to format strings:
* https://realpython.com/python-input-output/#the-string-modulo-operator

Other operations you can do with numbers:

In [7]:
# Add
print(x + y)

# Subtract
print(x - y)

# Multiply
print(x * y)

# Divide
print(x / y)

# Exponentiate
print(x**y)

21
-1
110
0.9090909090909091
100000000000


# Other data types
---

Besides numbers and strings, you can work with other types of lists and dictionaries.

## Dictionaries

A dictionary lets you store values by key:

In [8]:
my_dict = {
    'fname': 'Kris',
    'lname': 'Reyes',
    'email': 'kreyes3@buffalo.edu'
}

your_name = my_dict['fname']
print(your_name)

Kris


Here, we create a dictionary with 3 entries. For example, the first entry has key `fname` and value `Kris`. We then access the dictionary using square brackets and a key: `my_dict['fname']`, which results in `Kris`.

You can have empty dictionaries, add to an existing dictionary, and the dictionaries don't have to have the same types of values.

In [9]:
# Create an empty dictionary
my_class = {}

# Add some entries to the dictionary

# add a string value
my_class['title'] = 'Multivariate Statistics for Materials Informatics'

# add a number value
my_class['number'] = 504

# add a dictionary value
my_class['prof']  = my_dict


# you can print dictionaries
print(my_class)

{'title': 'Multivariate Statistics for Materials Informatics', 'number': 504, 'prof': {'fname': 'Kris', 'lname': 'Reyes', 'email': 'kreyes3@buffalo.edu'}}


The printed out dictionary isn't nice to look at. We can "prettify" it, but we need another module called `pprint`, which we'll have to **import**. (Documentation: https://docs.python.org/3/library/pprint.html)

In [10]:
import pprint

Once a module is imported in a Jupyter notebook, you can use it in any cell. Even cells above.

Once imported, we can use it to print the dictionary nicely (after consulting the documentation on how to use it).


In [11]:
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(my_class)

{   'number': 504,
    'prof': {'email': 'kreyes3@buffalo.edu', 'fname': 'Kris', 'lname': 'Reyes'},
    'title': 'Multivariate Statistics for Materials Informatics'}


It's a bit better.

## Lists

Lists are an important part of Python. You can define lists with square brackets and a comma separated list of things. You can access a list using a integer index. Lists are zero-indexing, meaning you the first element of the list has an index of 0.

In [12]:
my_list = [1, 2, 4, 8, 16, 32]

print(my_list[0])
print(my_list[3])

1
8


You'll get an error if you try to index into a list past the end.

In [13]:
print(my_list[6])

IndexError: list index out of range

The error above says the index "6" is too big. Remember, even though the list has 6 elements, because of zero indexing, the largest index you can use is "5", and not "6".

While so far we've been counting things from the left, we can also index things from the right using negative values. The right most element of the list can be accessed with a negative value -1. The second-to-the-right element has index -2, and so on.

In [16]:
print(my_list[-1])

32


In [17]:
print(my_list[-2])

16


You can have a list of any type.

In [18]:
str_list = ["hello", "how are you", "good_bye"]
print(str_list[1])

how are you


You can even have a list of lists

In [19]:
list_list = [my_list, str_list]
print(list_list[0])

[1, 2, 4, 8, 16, 32]


In [20]:
print(list_list[1])

['hello', 'how are you', 'good_bye']


You can have a list of different types of values if you needed it.

In [21]:
mixed_list = [0, "hello"]
print(mixed_list[0])
print(mixed_list[1])

0
hello


A string is secretly a list of characters

In [22]:
my_str = "file_name.csv"

# The dot in the filename
print("The character fourth from the right is = " + my_str[-4])

The character fourth from the right is = .


## Slice notation
In addition to getting individual elements of a list, you can also get certain subsets of the list. This can be done using the slice notation.

The slice notation can be used for extracting contigouous parts of a list. For example, to get the slice of letters from `my_str`, starting from the character at the 2nd index (`e`) up to (but not including) the character at the 7th index (`m`), you would do:

In [23]:
my_substr = my_str[3:7]
print(my_substr)

e_na


It's often useful to remember if you are slicing a list like `my_list[a:b]`, the resulting sublist with have `b-a` entries. For example, in the above, we are slicing from 3 to 7, which means we should expect 4 entries: `e`, `_`, `n`, and `a`.

To get the first 3 entries, we can do:

In [24]:
print(my_str[0:3])

fil


As a shortcut, you can omit `0` here.

In [25]:
print(my_str[:3])

fil


Again, this works for any list, not just a string. Recall `my_list = [1, 2, 4, 8, 16, 32]`. We can extract the first 3 entries as well.

In [26]:
print(my_list[:3])

[1, 2, 4]


We can use negative numbers to slice from the right. For example:

In [27]:
print(my_str[-4:-1])

.cs


The same logic holds here. It returns the slice starting at the character with index -4 (`.`), and all the characters up to (but not including) the character with index -1 (`v`). 

It's useful to understand that every character has two ways to index:

```
        string:   f   i   l   e  _  n  a  m  e  .   c   s   v
positive index:   0   1   2   3  4  5  6  7  8  9  10  11  12
negiatve index: -13 -12 -11 -10 -9 -8 -7 -6 -5 -4  -3  -2  -1  
```


To get the last 4 letters you can just omit the number on the right:

In [28]:
print(my_str[-4:])

.csv


You can slice up to a negative index as well

In [29]:
print(my_str[:-4])

file_name


This in particular is useful if you have an input CSV file, and you want to save the output file to a similarly named file but with a different extension:

In [30]:
outfile = my_str[:-4] + '.out'
print(outfile)

file_name.out


### List operations

You can use several operators such as '+' or '*' to operate on lists.

In [31]:
list1 = [1, 2, 3, 4, 5]
list2 = [-10, -20, -30, -40, -50]



The '+' operator concatenates list

In [32]:
print(list1 + list2)

[1, 2, 3, 4, 5, -10, -20, -30, -40, -50]


The '*' operator repeats lists

In [33]:
print(list1*2)

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]


Because strings are lists, these operators apply to strings as well. We've already seen the concatenation operator on strings. The '*' operator works as expected.

In [34]:
print("around the world "*10)

around the world around the world around the world around the world around the world around the world around the world around the world around the world around the world 


The special *`len`* function returns how many elements are in a list

In [35]:
how_many = len(list1)
print("There are %d elements in list 1"%(how_many))

There are 5 elements in list 1


In [36]:
how_many = len(list1*2)
print("There are %d elements in two copies of list 1"%(how_many))

There are 10 elements in two copies of list 1


# Control Flow
---

While we've been executing small **serial** programs, it would be nice to:

* Conditional execution: perform different blocks of code depending on certain conditions.
* Loop: repeat the same block of code, perhaps with different inputs 
* Functions: define and call functions or procedures 

Refer to the following tutorial for additional details:
* https://docs.python.org/3/tutorial/controlflow.html

Below we review how to do this in python

## Conditional execution
The main method for conditional execution in python are with **if statements**, which look like the following:

In [37]:
grade = 63

if grade > 60:
    print('You passed!')

You passed!


Here we constructed a condition `(grade > 60)`, and defined a block of code that will execute if and only if the condition is true. Since the grade was preset to `63`, this condition is true, so the block of code is executed. If instead the grade as `40`, then the code won't execute.

In [38]:
grade = 40

if grade > 60:
    print('You passed!')

There are 2 important things to highlight
* There is a colon `:` at the end of the `if` statement.
* The conditional code block is idented.

Indentation is important. It actually doesn't matter by how much or whether it is tab or spaces, but one convention is 4 spaces to indent.

You can provide an `else:` block follwing the `if` block to provide an alternative block that gets executed if the condition is false:

In [39]:
if grade > 60:
    print('You passed!')
else:
    print('You failed :(')

You failed :(


You can additionally check other conditions, and provide different blocks if those conditions are true using the `elif` statement. This works like the `if` statement, but is only executed if the conditions above it are all false.

In [40]:
grade = 73

if grade > 90:
    print('You got an A')
elif grade > 80:
    print('You got a B')
elif grade > 70:
    print('You got a C')
elif grade > 60:
    print('You got a D')
else:
    print('You failed :(')

You got a C


The condition can be anything that evaluates to a Python `bool` data type. This includes
* The basic `bool` types: `True` and `False.
* Combinations of conditionals
* Special functions that return a true or false value.

In [41]:
if True:
    print("This will always execute")
    
if False:
    print("This will never execute")

# You can combine conditions using and, or, not or other logical operators
if grade > 70 and grade < 79:
    print('You got a C')
    
# If using inequalities, then you can combine them using this short cut
if 70 < grade < 79:
    print('You got a C')

if not grade > 60:
    print('You failed')
    
# special functions will also return true or false statements
if 'prof' in my_class:
    print(my_class['prof'])

This will always execute
You got a C
You got a C
{'fname': 'Kris', 'lname': 'Reyes', 'email': 'kreyes3@buffalo.edu'}


A very special condition is to check whether a variable has been set to the `None` type, which is a special type in Python. This is often used when a variable is optional in an analysis, or has not been correctly set. In this case, the preferred method for checking this condition is to use `is None`:

In [42]:
data = None

if data is None:
    print('No data yet!')

No data yet!


## Loops

There are lots of ways to loop or iterate code in Python. The most basic way is with the `for loop`:

In [43]:
for list_element in my_list:
    print(list_element)

1
2
4
8
16
32


This is code that loops over every entry in the list `my_list`. Here, `list_element` will represent some element in `my_list`. The indented code is executed on that element, and then `list_element` gets set to the next element of the list, and this process is repeated until every element has been iterated over.

In the example, we simply print out that element, but you can imagine doing something more complicated for every element. Let's calculate the sum.

In [44]:
# Initialize the list sum to be 0
list_sum = 0

# iterate over every element in the list
for list_element in my_list:
    # add this element to the sum
    list_sum = list_sum + list_element

In [45]:
print(list_sum)

63


**N.B.** You can use the `+=` operator to shorten line 7 a bit from

```
    list_sum = list_sum + list_element
```
to

```    
    list_sum += list_element
````

Often, you may want to iterate over the numbers from `0, 1, 2, ..., N-1`, for example, to loop over indices of a list. Instead of build a list with those elements, you can use the built-in function `range(N)` to build something similar for you.

In [46]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


Of course, you can do something else inside the `for-loop`. Below we'll use the modulo operator `%`, which returns the remainder after division. We use it here to check the remainder after dividing by 2. If it is 0 (i.e. if the number is even), then we'll print it out.

In [47]:
for i in range(10):
    # check if even
    if i % 2 == 0:
        print(i)

0
2
4
6
8


This can be similarly done by going from 0 to 10, but stepping by two each iteration:

In [48]:
for i in range(0, 10, 2):
    print(i)

0
2
4
6
8


## List comprehensions

Consider the following loop that squares every element in a list.

In [49]:
my_list = [1, 2, 3, 4, 5]

squared = []
for list_element in my_list:
    squared_element = list_element**2
    squared.append(squared_element)
    
print(squared)

[1, 4, 9, 16, 25]


While simple, this example highlights a very common pattern we often encounter when working with and manipulating lists. That is, we do some operation to every element of some list, and save the result to another list. In this way the elements of the resulting list correspond to the elements of the original one. This is such a common pattern that Python has a shortcut to do this in one line:

In [50]:
squared = [ list_element**2 for list_element in my_list]
print(squared)

[1, 4, 9, 16, 25]


This is called a list comprehension, and it's very useful for writing succinct code. However, it's considered **syntatic sugar** -- there is nothing magical or extra happening in a list comprehension that wouldn't happen in a similarly simple `for`-loop.

While the above element processes and adds a new element to `squared` for each and every element from `my_list`, you can add conditionals to the list comprehension to conditionally add elements.

In [51]:
small_squared = [list_element**2 for list_element in my_list if list_element > 3]
print(small_squared)

[16, 25]


The above code will only add a new element to the resulting `small_squared` list if `list_element` is larger than 3.

List comprehensions can get fairly complex, and so use them with some reservation, as short, succinct but hard-to-understand code is just as bad as longer code. As with most things in life, your goal as a programmer is to strike a balance. Later, we'll explore a different way of performing operations on list elements one-by-one using the `numpy` library.

## Functions

Functions allow you to modularize your programming by moving common code blocks to a single location for use throuhgout your other blocks of code. Functions therefore promote abstraction, code-reuse and maintainability -- all good things for programming.

To define a function, you will use the `def` keyword.

In [52]:
def add_four_numbers(a, b, c = 0, d = 1):
    return a + b + c + d

The above function definition:

1. Starts with the `def` keyword.
2. Specifies a function name: `add_four_numbers`.
3. States that the function has 2 required arguments and two optional arguments.
4. Names the 2 required arguments as `a` and `b`
5. Names the optional arguments `c` and `d`, and provides the default values of 10 and 1 if one or both of them are not specified.
6. Does some calculations and returns the result of a function, which in this case is the sum of `a`, `b`, `c`, and `d`.

To call the function, we just use the function name and specify at least 2 arguments

In [53]:
result = add_four_numbers(3, 4, 1, 3)
print(result)

11


If we don't specify any of the optional arguments, the function will just use their default values in the function:

In [54]:
result = add_four_numbers(1, 2)
print(result)

4


If you want to specify a value for one of the optional arguments, you can refer to it directly in your function call.

In [55]:
result = add_four_numbers(1, 2, d = 10)
print(result)

13


#### Exercise: Flipping a biased coin
---

Write a function `count_heads` that counts how many heads are obtained by flipping a biased coin several times. The function should take 1 required input `n` which is the number of times to flip the coint. The function should also take in an optional argument `p`, which is the probability of seeing a head after a coin flip. Let `p = 0.5` be the default value. The function should return the number of flips that resulted in a head.

To simulate the flipping of a biased coin, you can use the `random.random()` function inside the `random` module.
This function returns a number uniformly distributed between 0 and 1. Thus if `random.random() < p`, you can count this as a "Heads".

---

In [56]:
# code goes here.



#### Exercise: Fizz Buzz
---

Print the numbers 1 through 100, except:
1. print "Fizz" if the number is divisible by 3, 
2. print "Buzz" if the number is divisible by 5,
3. print "FizzBuzz" if the number is divisible by 3 and 5.

You may want to use the `%` operator: for 2 numbers `x` and `y`, `x%y` is the remainder of the division of `x` by `y`. In particular if `x` is divisible by `y`, then `x%y == 0`.


You *may* want to know that to not print a new line after `print` statement, you can say:
```
print("hello", end="")
```
---

In [57]:
# Code goes here



# Extra Reading
---

## Python 
* https://www.python.org/about/gettingstarted/
* https://wiki.python.org/moin/IntroductoryBooks
* https://diveintopython3.net/

## Numpy/Scipy/Scikit-learn
* https://numpy.org/devdocs/user/quickstart.html


## Jupyter
* https://realpython.com/jupyter-notebook-introduction/
* https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/index.html

    
## Markdown
* https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook
* https://commonmark.org/help/tutorial/
* https://www.markdownguide.org/getting-started
