# Python 

* About the Python language
* Running Python code
* Python data structures

Some slides copied and adapted from [Software Carpentry](http://swcarpentry.github.io/python-novice-gapminder/02-variables/index.html)

# Running Python 

* Python is an interpreted language, the `python` program runs python code
* Run `python` with no arguments to get a Python prompt
    * enter Python expressions and see the results (demo)
* Save your program in a file `code.py` and run it with a command line:
    * `python code.py`
* Use the Jupyter Notebook environment to run your code
    * enter code in blocks and click __Run__, see the result below
    

# Variables and Values

* Variables are names for values.
* In Python the `=` symbol assigns the value on the right to the name on the left.
* The variable is created when a value is assigned to it.
* Variable names
    * can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
    * cannot start with a digit
* Here, Python assigns an age to a variable `age` and a name in quotes to a variable `first_name`.

In [1]:
age = 23
first_name = 'Steve'

# Use print to display values.

* Python has a built-in function called `print` that prints things as text.
* Call the function (i.e., tell Python to run it) by using its name.
* Provide values to the function (i.e., the things to print) in parentheses.
* To add a string to the printout, wrap the string in single or double quotes.
* The values passed to the function are called ‘arguments’

In [2]:
first_name


'Steve'

In [3]:
print(first_name)

Steve


# Variables must be created before they are used.

* If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error.
* Unlike some languages, which “guess” a default value.

In [4]:
last_name= 'Smith'

In [5]:
print(other_name)

NameError: name 'other_name' is not defined

# Order of execution

* In a notebook, the order of execution is the order you run the cells
* You could define a variable early in the notebook but if the cell is not run it will not be defined
* To prevent confusion, it can be helpful to use the `Kernel` -> `Restart & Run All` option which clears the interpreter and runs everything from a clean slate going top to bottom.

# Variables can be used in calculations.

* We can use variables in calculations just as if they were values.
    * Remember, we assigned 23 to `age` a few lines ago.

In [6]:
age = age + 2
print("Age in two years is:", age)

Age in two years is: 25


# Use an index to get a single character from a string

* Square brackets after variable name access parts of the string
* Each character position is given a number or index
* index starts from 0 
* So, the fist character is 0, the second 1 etc

In [10]:
city = "Wuhan"
print(city[3])
city + ' here'

a


'Wuhan here'

# Use a slice to get a substring

* If we want to get part of a string, use a slice
* Same square bracket notation
* This time two indices: `[start:end]`
* Value is every character from `start` upto but not including `end`
* If you miss `start` or `end` then the start or end of the string is assumed

In [11]:
print(city[0:3])
print(city[2:4])
print(city[:3])
print(city[2:])

Wuh
ha
Wuh
han


# Python Data Structures

* Understand the different data structures available in Python
* Strings, Lists, Tuples, Dictionaries
* Methods defined on each type 
* What to use for different tasks
* Packages and data structures for numerical data analysis:
    * Numpy vectors, arrays, matrices

## Strings

* Sequence of characters representing text
* In Python, strings are unicode - so can store any characters
* Can use single quote ', double quote " or triple quotes """ 
* Strings are objects, call methods on them
* Check out [the documentation](https://docs.python.org/3/library/string.html) for more

In [12]:
s1 = 'single quoted string might have "quotes in it"'
s2 = "double quoted string could contain it's or that's"
s3 = """triple quoted string
can contain 
newlines"""
s4 = "String containing 中文"
print(s3)

triple quoted string
can contain 
newlines


In [13]:
print(s4)

String containing 中文


# Operations on Strings

In [14]:
# convert s4 to uppercase and store as s5
s5 = s4.upper()
s5

'STRING CONTAINING 中文'

In [15]:
# find the first occurence of 'g' in s4
firstg = s4.find('g')
# get all characters after that
s4[firstg:]

'g containing 中文'

# Lists and Tuples

* Lists are sequences of values
* Written inside square brackets, separated by commas
    * ['this', 'is', 'a', 'list', 'of', 'strings'], [3, 4, 6]
    * ['strings', 'and', 3, 9, 2]
    * ['lists', 'containing', ['another', 'list']]
* Lists can be modified, elements added, removed, replaced
* Tuples are just like lists but can't be modified
    * ('this', 'is', 'a', 'tuple')
    * sometimes it's more efficient to use a tuple
* Check out the [documentation](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)

In [16]:
list1 = ['this', 'is', 'a', 'list', 'of', 'strings']
list2 = ['embedded', 'list', list1]
list(tuple(list2))

['embedded', 'list', ['this', 'is', 'a', 'list', 'of', 'strings']]

In [18]:
# the 'in' operator checks whether something is in a list
'a' in list1

True

In [19]:
if 'a' in list1:
    print("found it")
else:
    print("not found")

found it


# Lists and Loops

In [20]:
text = "To be or not to be that is the question"
# split the string at every space character, generate a list
words = text.split()
print(words)

# create a new empty list
wordlengths = []
for word in words:
    # append the length of this word to our list
    wordlengths.append(len(word))
    print(word, len(word))
print(wordlengths)


['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
To 2
be 2
or 2
not 3
to 2
be 2
that 4
is 2
the 3
question 8
[2, 2, 2, 3, 2, 2, 4, 2, 3, 8]


# Tuples vs Lists

In [None]:
# could use a tuple to represent a data record
record = ('steve', 'cassidy', 39) 
# can use some of the same operations as lists
print("Length: ", len(record))
print("Second element: ", record[1])

In [None]:
# but this can't be modified
record.append(21)

# Tuples, Lists and Strings

These are all _sequence_ types and share some common operations.

In [None]:
s = ['a', 'list', 'of', 'stuff']
t = [1, 2, 3]

'a' in s         # boolean test
'x' not in s     # boolean test
s + t            # concatenate s and t
t * 3            # t repeated 3 times
s[1]             # second element
s[:2]            # all elements up to the second
s[2:4]           # all elements from 3rd to 4th
min(s)           # smallest element
max(s)           # largest element
s.count('a')     # how many times does 'a' occur

# Dictionaries

* Dictionaries are associative arrays
* Associate a key with a value
* Key can be any immutable type (string, number, tuple)
* Value can be anything
* O(1) access to elements (hash table)
* Compare with lists that are O(n)


# Dictionaries

In [21]:
info = dict()
info['name'] = 'Steve Cassidy'
info['age'] = 53
info['weight'] = 80
info

{'name': 'Steve Cassidy', 'age': 53, 'weight': 80}

In [22]:
info = {
    'name': 'Steve Cassidy', 
    'age': 53, 
    'weight': 80
}
info['age']

53

In [26]:
if 'age' in info:
    print("Age: ", info['age'])

Age:  53
