# Python Introduction

Python is a very powerful tool for automating tasks that would otherwise be time-consuming or impossible to do by hand or other conventional tools and techniques. Here, we'll go over basic ways to use Python for data analysis to introduce you to a slice of its potential.

## Markdown

We are working in a **Jupyter Notebook**. This lets us have descriptive text in Markdown as well running analysis in Python code. You can highlight text in **bold** or *italics*.

For more on Markdown, check out [this cheatsheet](https://www.markdownguide.org/cheat-sheet/).


## Code Block

Jupyter notebooks allow code to be run in blocks (also called chunks). Lines are run top to bottom. You can always edit a code block and re-run it to make changes.

In [1]:
#  INSTRUCTIONS:   Write a message in the quotes.
#  type Shift+Enter to run the cell.
message = ''

print(message)




## Python as a calculator

An incredibly simple yet pivotal role of Python is to perform math calculations (addition, subtraction, multiplication, etc.). We show how to do basic math below.

You'll see the symbol `#` used often. These are comments, and they are used to write descriptions. Any characters following `#` are not run or executed.

In [2]:
3 + 4 * 5  # addition and multiplication 

23

In [3]:
12 / (6 - 4) # division and substraction

6.0

In [4]:
2 ** 3 # exponentiation

8

#### Question 1: Math
Calculate the following value in Python: $ \frac{25}{(35 - 3)^3} $

In [5]:
### Put your code below here:


## Assigning Variables
A foundational tool in Python is assigning values to variables. We do this with the `=` operator.

In [6]:
x = 50 # x is 50

This sets the variable `x` to be 50, an **integer**, or `int`. This value of x is now stored in our notebook, and we can access this value in other cells until the notebook is reset. For instance, subtracting 20 from `x` prints out a value of 30.

In [7]:
# What if I use x again in a different cell?
x - 20

30

**Variables persist between cells once they have been run (executed).**

If we ever want to check the value of any variable, we can use the built-in `print()` command to display the value. 

In [8]:
y = 35
print(y)

35


We can also assign the value of one variable to another variable. If we execute `x = y`, x takes the current value of `y` and assigns that to `x`.

*Note: `y` will be unaffected by this assignment. `x = y` should be interpretted as "let x take the current value of y".*

In [9]:
x = y
print(x)
print(y)

35
35


If we change `y` to be a different value, `x` will be unaffected.

In [10]:
y = 3.8
print(x) # will not always be the same value as y
print(y)

35
3.8


**Basic variables only change value when something is assigned to them.**
They are **not** like spreadsheets where a cell can depend on another and update automatically.

Variables can be integers, floats (numbers with decimals), and strings (sets of characters). Strings must be specified with double quotes or single quotes.

In [11]:
a = 52 # integer
b = 3.14 # float
c = 'Inigo Montoya' # string

#### Question 2: Swapping Values
Given the code below, what is the value of the variable `swap` by the end of the block?

In [12]:
x = 1.0
y = 3.0
swap = x
x = y
y = swap 

**What's in a name?** _Variable name conventions_
- Use only letters, digits, and underscores _
- Start with a letter (typically lower case)
- Variable names are case sensitive
- Use meaningful names!

**Variables must be created before they are used.** Otherwise, Python will throw an error.

## Python Data Structures

Going over the fundmentals, we learned how to store numbers and sets of characters as variables. Python also has built-in ways to store multiple strings and numbers, two of which are **lists**, and **dictionaries**. These all allow you to groups of ints, floats, and strings in different arrangements. Here, we created a `list` of 5 ints called `x`.

In [13]:
x = [ 1, 2, 3, 4, 5 ] # integers 1 - 5

Here are brief descriptions of the four structures.

- Lists: store data in a **specific order**
- Dictionaries: store combinations of **keys** and **values** without any order

We will dive into these in more detail below, focusing primarily on lists and dictionaries.

### Lists

Lists are very common tools in Python. They allow us to store large amounts of data with an order. They come with very handy tools to reference different objects stored in them. We can also easily add items to them.

#### Initialize a list
There are two ways of initializing an empty list: `list()` and ` [] `. 

In [14]:
# these do the same thing
my_list = list() 
your_list = []

To make a list with pre-populated with items, we can fill the brackets with comma-separated values.

In [15]:
number_list = [ 0.1, 0.2, 0.3, 0.4 ] # lists can hold numbers
string_list = [ 'cat', 'dog', 'rabbit' ] # can also hold strings

#### Referencing to items in a list

Each item in a list has an **index**. If your list has 3 items, it has 3 indexes (or indicies, depending on who you ask). 

In Python, indexes starts at 0. To reference the first item in `number_list`, we use `number_list[0]`.

In [16]:
number_list[0] # get the first item from number_list

0.1

We can reference the rest of the numbers in the list with indexes 1 through 3. Note that even though there are 4 items in the list, the index goes from 0 to 3. 

We also can use the items in a list the same way we can use variables to do math or other operations.

In [17]:
print( number_list[1] - number_list[3] ) # 0.2 - 0.4 
print( 'my favorite kind of animal is', string_list[2] ) # prints rabbit

-0.2
my favorite kind of animal is rabbit


If we try to reference an index that does not exist in a list, we get an error.

In [18]:
number_list[4]

IndexError: list index out of range

#### Appending to a list
If we want to add an item to the end of `string_list`, we can use `string_list.append()`.

In [20]:
string_list.append('bear')
print(string_list)

['cat', 'dog', 'rabbit', 'bear']


#### Reassigning an item in a list
We can also alter any item currently in a list.

In [21]:
number_list[0] = 2096
print(number_list)

[2096, 0.2, 0.3, 0.4]


#### Question 3: Lists
Create a list of numbers. Add the first number in the list to the last number of the list. Append this value to the list.

In [None]:
### Your code here:


### Dictionaries
Like lists, dictionaries are powerful ways to store items. However, the two structures are quite different from each other. Instead of storing items in a specific order, like a list, dictionaries store them as **keys** and **values**. For example, you might have a key `giraffes` paired with the value `25`, and the key `kangaroos` paired with the value `32`. We can do this with using brackets (`{ }`) and colons (`:`) with the format of `{ KEY1: VALUE1, KEY2: VALUE2, ... }`.

In [22]:
animals_dict = { 'giraffes': 25, 'kangaroos': 32 }

Notice that our keys are strings and our values here are ints. Keys and values can be any data types, though it tends to be best practice for keys to be strings. 

We can also write this vertically, putting key-value pairs on their own lines for visual clarity. You will still need to separate entries with a comma, however.

In [23]:
animals_dict = { 
    'giraffes': 25, 
    'kangaroos': 32 
}

Once we create a dictionary with keys and values, we can use the key to return the corresponding value. We do this by using `DICT[KEY]`:

In [24]:
animals_dict['giraffes']

25

Similarly to lists, if we try to reference a key that is not present in the dictionary, we will get an error.

In [25]:
animals_dict['beaver']

KeyError: 'beaver'

However, dictionaries have a way around this. We can use the `.get()` function to either return a value for a key or a default value that we specify. We run it with `DICT.get(KEY, DEFAULT_VALUE)`.

In [26]:
animals_dict.get('beaver', 0) # will return 0 because beaver is not a key in this dictionary

0

Making an empty dictionary is similar to making an empty list. We can either use `dict()` or `{}`.

In [27]:
# these do the same thing
my_dict = dict()
your_dict = {}

#### Adding to a dictionary
It is very simple to add a new item to a dictionary. Instead of using the colon notation, we can simply run `DICT[KEY] = VALUE`. 

In [30]:
animals_dict['moose'] = 43 # new key-value pair - moose: 43
print(animals_dict)

{'giraffes': 85, 'kangaroos': 32, 'moose': 43}


#### Give a key a new value
Giving keys new values works just like reassigning in lists. Note that this means that you cannot have two identical keys in the same dictionary.

In [31]:
animals_dict['giraffes'] = 85 # key giraffes assigned the value of 85
print(animals_dict)

{'giraffes': 85, 'kangaroos': 32, 'moose': 43}


#### Question 4: Dictionaries
Assign the value of `giraffes` in `animal_dict` to a new key `rabbit` in the same dictionary.

In [None]:
### your code here: 

## Booleans and conditionals

Often in code, we want to take different actions based on the current state of our program (e.g., do we have more or less than 100 samples in our data?). We can ask yes or no questions about this state called **boolean expressions**. These questions are answered as **true** or **false**. We can design the program to perform an action based on the response, which is called a **conditional**. 

### Boolean

`True` and `False` are keywords in Python. They are a unique data type called **booleans**. 

Capitalization is critical. Booleans in Python have their first letter capitalized and the rest lower-case.

In [32]:
f = False
print(f)

t = True
print(t)

False
True


### Boolean expressions
Boolean expressions essentially ask questions that evaluate as `True` or `False`. These can examine whether two values are equal, if one is larger than another, or similar questions. To ask these questions we need to use special boolean operators that you'll see below.

Boolean expressions are best used between the same data types. You can easily get unexpected results when comparing strings and ints, for instance. We can also subset with categorical variables. Here, we take all rows where the country is Hungary. 

In [33]:
"bad" == "bad"

True

In [34]:
"bad" != "BAD" # capitalization matters!

True

In [35]:
5.1 > 5.0

True

In [36]:
3 >= 3

True

In [37]:
my_list = [ 'apple', 'pear', 'grape' ]
'apple' in my_list

True

In [38]:
print('i' in 'team')
print('i' in 'win')

False
True


In [39]:
not 20 < 40

False

### Conditionals

#### If statements

The true power of boolean expressions is in making decisions based on whether they are true or false. We do this with `if` statements. The general syntax follows this format:

In [40]:
if 'a' != 'b':
    
    print('hello!')

hello!


To break this down:
- `if` keyword is first word in line
- Boolean expression (`'a' != 'b'`) followed by a colon
- Code below that is tabbed over
- If the expression is true, the code that is below and tabbed is run
- If the expression is false, nothing happens

#### if-else statement

Often in coding, we want one thing to happen if an expression is true, and another to happen if it is false. To accomplish this, we can add an `else` statement below the `if` statement. This will always be evaluated if the expression after `if` is `False`, otherwise it will not run: the `if` and `else` are mutually exclusive.

In [41]:
x = 23

if x < 20: # if x less than 20; False
    
    print('Less than 20')

else: # x greater or equal to 20 
    
    print('Greater than 20')

Greater than 20


#### `elif`

What if you want to differentiate between more than 2 conditions? We can use the `elif` keyword, which stands for `else if`. This goes between the `if` and the `else` statements, and must include a new boolean expression.

Again, these options are all mutually exclusive. If the `elif` code is run, that means the `if` and `else` code do not run.

In [42]:
y = 101

if y < 100: # y less than 100
    
    print('y is less than 100')

elif y < 200: # y is 100-200 (excluding 200)
    
    print('y ia between 100 and 200')

else: # y is 200 or larger
    
    print('y is a big number')


y ia between 100 and 200


If we use `elif`, an `else` statement is not required. This will may result in neither the code associated with `if` nor `elif` running, however.

Regardless of if there is an `else` statement or not, we can also include as many `elif` conditions as we want.

In [25]:
favorite_movie = 'Indiana Jones'

if favorite_movie == 'Batman':
    print("I'm Batman.")

elif favorite_movie == 'Lord of the Rings':
    print("And my axe!!")

elif favorite_movie == 'Indiana Jones':
    print('That belongs in a museum!!')

elif favorite_movie == 'The Matrix':
    print('whoa')

else:
    print('No quotes available :(')

Index(['country', 'year', 'region', 'population', 'life_expectancy',
       'age5_surviving', 'babies_per_woman', 'gdp_per_capita', 'gdp_per_day',
       'life_difference'],
      dtype='object')


#### Question 5: Conditionals

Write code that will prints the square root of `x` if x is larger than 20 and `0` if x is less than `0`.

Hint: you can use `x ** 0.5` to calculate a square root.


In [43]:
### your code below:




## Loops

Often, programs need to do the same task several times repeatedly. You may need to run a task just a handful of times, or maybe thousands of times.

For example, say you want to print out all of the numbers 0 - 5. You could write `print()` 5 times:

In [26]:
print(0)
print(1)
print(2)
print(3)
print(4)
print(5)

However, if you want to expand this even a few numbers further, it gets very tedious very quickly.

To save us from having to have the same code duplicated over and over, we have **loops**. They are incredibly powerful tools for examining large amounts of information. Here we will be looking at **`for` loops** specifically.

### for loops

`for` loops are one of the most powerful tools that base Python has to offer. `for` loops take **iterables** (lists, dictionaries, sets, tuples, even strings) and perform the same actions to each item contained within them.  

In the code below, each number in a list gets added to 20, and then the sum is printed. We call this **iterating** over the items in the list. Note the keywords `for` and `in`.

In [44]:
num_list = [0, 1, 2, 3, 4, 5] # list of numbers

for n in num_list: # one at a time, make each of those numbers n
    
    print(n + 20) # print that number + 20

20
21
22
23
24
25


Let's break down this code:
- `num_list = [0, 1, 2, 3, 4, 5]`: Makes a list of integers 0-5.
- `for n in num_list:`: Take the first item in num_list and assign its value to `n`.
- `print(n + 20)`: Add n and 20 and print the sum.
- We then go back to the start of the loop, take the next item, assign it to `n`, and start all over again.

For ordered iterables, like lists, tuples, and strings, `for` loops iterate over these groups in order.

Just like normal variable names, the variable name we use after `for` is arbitrary, though short and descriptive is best. 

In [45]:
for triangle in num_list: # 0 is not a triangle
    
    print(triangle)


0
1
2
3
4
5


We can start to use for loops to do tasks with strings, as well.

In [None]:
my_breakfast = ['eggs', 'cereal', 'oatmeal', 'toast'] 

for food in my_breakfast: # for each string in the list of string
    
    sentence = 'I like to eat ' + food + '.'
    print(sentence)

`for` loops can become quite powerful when you include conditionals that change behavior based on the item in the current iteration.

In [None]:
for food in my_breakfast:
    
    if food == 'eggs': # if food is currently 'eggs'
        
        sentence = 'I do not like to eat ' + food + '.'
    
    else: # for all other values of food
    
        sentence = 'I like to eat ' + food + '.'
        
    print(sentence)

#### Question 6: `for` loops -> Challenge

Iterate over all integers from 0 to 1000 and print all multiples of 41 (numbers that can be divided by 41 with no remainder). How many multiples are there?

Hint: use `%` to get a remainder: `5 % 2` will give you `1`. 

In [None]:
### put your code below:


## Resources

- [Official Python documentation](https://www.python.org/doc/)
- Take time with tutorials at [Kaggle.com](https://www.kaggle.com/learn)
- [Brandeis LinkedIn Learning portal](https://www.brandeis.edu/its/support/linkedin-learning/index.html)
- [Stack Overflow](https://stackoverflow.com/)
- [Getting started with Pandas](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)
- [Pandas cheatsheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
- Data Visualization: [Python Graph Gallery](https://www.python-graph-gallery.com/)
- Other visualization libraries: [Seaborn](https://seaborn.pydata.org/tutorial.htmls), [Plotly](https://plotly.com/python/)
- Install Python: [Anaconda](https://docs.anaconda.com/anaconda/install/)

This lesson is adapted from 
<a href='http://swcarpentry.github.io/python-novice-gapminder/design/'>Software Carpentry.</a>

## Contact
Ford Fishman<br>
Data Analysis Specialist for Science<br>
Brandeis Library<br>
[fordfishman@brandeis.edu](fordfishman@brandeis.edu)<br>
[dataservices@brandeis.edu](dataservices@brandeis.edu)<br>
[Set up an appointment](https://calendar.library.brandeis.edu/appointments/fordfishman)