# Session II

---
### Session I Challenge

In [None]:
print('Hello, World')
partner = 'P. Diddy'
print('Hello, ' + partner)

---

## Variables, Data Structures, and I/O

This session will be a lot more interactive. We will be going over some of the 'anatomy' of a script. Again, keep the following in mind:

| Coding Counterpart | Paper Component | 
| :-- | :-- |
| Variables | Nouns | 
| Operators | Verbs |
| Lines | Sentences |
| Functions | Paragraphs | 
| Modules | Sections | 

## The Basics

### Variables: The 'nouns' of scripts

Just like math, a variable in programming is a symbol (or ***reference***) that represents something else.

In [None]:
# Describe a straight line


But, in Python, a *variable* can be a *reference* to ***anything***

In [None]:
# Changing objects


In [None]:
# Strings


In [None]:
# Containers


In [None]:
# Reference to a Reference


The most important part to remember is that **variables are just references** to other objects

### Naming conventions

As the 'Zen of Python' suggests, your code should be written to be as readable as possible. However, there is a common practice to name variables in obscure ways.

In [None]:
# BAD variable naming examples
a = 'something important' # If this variable is seen later, the user wouldn't know what it is
i = 42                    # Not informative...what is it for?
aMuddledName=4.2          # The name, while 'camelCase', is not the preferred convention. It is hard to read

Python suggests that the coder use the concept of 'Self-Documenting Code'. That is just a fancy way of saying your variable names should indicate what the variable is to the reader.

In [None]:
# GOOD variable naming examples
important_string = 'something important'
current_index = 42
a_long_name = 4.2

---

### Operators: The 'verbs' of programs

An operator is just an something that performs on operation, ... 

In [None]:
# PEMDAS


... but this isn't just math operations

In [None]:
# Comparison


In [None]:
# Bitwise


In [None]:
# Logical


In [None]:
# Identity & Membership


In [None]:
# Assignment 


### Syntax Convention

The only real convention regarding operators is to put a space between the operands and the operator for readability.

In [None]:
# BAD
some_logic = (62//3)*42**0.5==12<<16

In [None]:
# GOOD
some_number = (62 // 3) * 42 ** 0.5 == 12 << 16

---

## Excercise

In the cell below:
1. Store you and your partner's names in variables called `driver` and `navigator`
2. Store both of your favorite food in variables called `d_food` and `n_food`
3. If both of you could have super powers, store them as `d_power` and `n_power`

---

## Data Structures

This is just a scary way to say, "Things that hold things" or *containers*

### Mutable vs Immutable

If you mutate something, what have you done to it?

Therefore, what does 'mutable' mean? What about immutable?

#### Mutable Data Structures

In [None]:
# Lists


In [None]:
# Dictonaries


In [None]:
# Sets


In [None]:
# Byte arrays


#### Immutable Data Structures

In [None]:
# Booleans


In [None]:
# Integers


In [None]:
# Floating-point numbers


In [None]:
# Tuples


In [None]:
# Strings (and bytes)


#### Mutability Comparison

In [None]:
# Mutable
list_of_numbers = [1,2,3,4,5]
print(f'First element of the list: {list_of_numbers[0]}')

list_of_numbers[0] = 42
print(f'That same element has been changed: {list_of_numbers[0]}')

In [None]:
# Immutable
tup_of_numbers = = (1,2,3,4,5)
print(f'First element of the tuple: {tup_of_numbers[0]}')

tup_of_numbers[0] = 42
print(f'That same element has been changed: {tup_of_numbers[0]}')

---

### Getting Help

Python was built for ease-of-use. If you are ever confused or curious about how something works, you have a couple of options to explore them.

In [None]:
# To get help on something
a_list = [1, 2, 3, 4, 5]

In [None]:
?a_list # only works in Jupyter lab and notebooks

In [None]:
help(a_list)

The easiest way (in Jupyter Lab) is to click on the item you want to look at and press 'SHIFT'+'TAB' and a ballon will pop-up showing you the help text for that item.

In [None]:
list()

---

#### Exploring Objects

One of the most difficult things in any programming language is figuring out what everything can do. When you are stuck, always inspect the object itself to figure out what it can do using the `dir()` function.

**Note**: Just ignore anything that has an underscore ('_') in front of it (for now).

In [None]:
dir(a_list)

---

### The most powerful tool in Python: Dictionaries

If any of you have ever programmed before, or listened to some talk that involved computer science, you may have heard the term 'hash table'.

A dictionary is a hash table. By extension, a set is just a 'one-sided' dictionary-a dictionary of just keys, and therefore a hash table as well.

#### What is a hash table?

Think of a ***really big list***. Now, if I asked you to find something in that list, how would you do it? How much time would it take?

---
### Example

In [None]:
# Make a sufficiently large list
biggish_list = list(range(1, int(1e6) + 1))

In [None]:
%%timeit
biggish_list.index(int(1e2))

In [None]:
%%timeit
biggish_list.index(int(1e4))

In [None]:
%%timeit
biggish_list.index(int(1e6))

This is called $0(n)$ compexity. The longer the list is, the longer it takes.

---
Let's try dictionary lookups

In [None]:
# Make a sufficiently large dictionary
biggish_dict = {number: index for index, number in enumerate(range(1, int(1e6) + 1))}

In [None]:
%%timeit
biggish_dict[int(1e2)]

In [None]:
%%timeit
biggish_dict[int(1e4)]

In [None]:
%%timeit
biggish_dict[int(1e6)]

This is called $0(1)$ compexity. Lookup time is constant.

---

### Regarding Lines

Line count (or the number of lines of code your program has) is usually a metric for how big a project is. However, I want to add a *personal* distinction to what I think lines represent.

As shown in the table above, a line of code is analogous to a sentence in a paper. That is, a line of code should represent a complete thought. If you can do a whole process sequence on one concise line, do...don't break it up into little lines that 'interupt the thought'.

Enough of that though. Let's get into the grit of data science.

---

## I/O: input/output and file handling

First, let's look at some of the data we will be playing around with.

* [Weather](./datasets/weather.tsv) data from the past year in the Ann Arbor area
* [Ramen Ratings](./datasets/ramen-ratings.csv) from around the world
* [Pokemon](./datasets/pokemon.csv) base stats and data

File handling lies at the root of almost every research project. It is how we get data into our programs and how we get it out. Therefore, it is really important to know how to use the file handling protocols in Python.

Everything you need to open a file is found in the `open()` function

In [None]:
?open()

A really important take-away here is that when you 'open' a file, you are actually not loading the entire file into memory. All you are doing is creating a reference to where the file is located. That means when you wont burn your machine down when you `open('wheat_genome.fa')`

#### Some Vocabulary

* **'seperator'** or **'delimiter'**: what is the thing that separates the data within the file. Commonly a comma (',') or a tab-character (represented as '\t')
* **'eol'**: End of line character. On Windows, it is represented as '\r\n', MacOS sometimes uses only '\r' or '\n' depending on the application, and all Linux systems just use '\n'
* **'\n'**: newline character
* **'\r'**: carriage return

In [None]:
# Let's open a file
open('./datasets/weather.tsv')

What happened?

In [None]:
# Trying again


The default `mode` that files are opened in is 'r'. Can we write to the file? 

In [None]:
# Writing to file


In [None]:
# Writing the right way


Since files can take any string data, we can also leverage our favorite little `print()` function to help us out.

In [None]:
# Using print to save to file
cupcake = "You can't be sad when you're holding a cupcake"

### Going a little further

Sometimes, messing around with all the delimiters and eol can be tedious. Thankfully, Python already comes with a handy library for reading and writing data files.

In [11]:
import csv

In [13]:
# Reading file
pokemon = open('./datasets/pokemon.csv')
csv_reader = csv.reader(pokemon)
file_header = next(csv_reader)

In [12]:
# Writing to file
funny_quote = 'The more you weigh, the harder you are to kidnap. Stay safe. Eat cake.'

new_file = open('./datasets/testing.tsv', 'w')
tsv_writer = csv.writer(new_file, delimiter = '\t', lineterminator='\n')
tsv_writer.writerow(funny_quote.split())

Files are always opened until they are closed. In the next section, we will cover how to elegantly close a file. For now though, you can always just use `.close()` method.

---

### Lastly: System Arguments

When a program is run from the command line, often it needs information like what files it will be working on or where it should save things. We already saw a version of this before.

This is a command line entry. 

```bash
python basic_script.py
```
1. `python` is the command
2. `basic_script.py` is the argument


Within Python, we can capture those arguments and bring them into our programs

In [None]:
import sys

In [None]:
args = sys.argv

In [None]:
type(args)

In [None]:
print(args)

In [None]:
args[0]

---

## Challenge

This is meant to be a challenge, and if you don't get it, don't worry. The purpose is to make you think about how you would solve it.

Given 2 strings of paranthesis `str1` & `str2`:
1. Determine if either concatenation combination is balanced
2. If any combination is balanced, print out the string 'Balanced'
3. If not, print out the string 'Unbalanced'

Example:

```python
>>> str1_pass = '((()()('
>>> str2_pass = '))())'
# some code
Balanced

>>> str1_fail = '((())()(()'
>>> str2_fail = '))()(())(('
# some code
Unbalanced

```


In [None]:
str1 = ')()(())))'
str2 = '(()(()('