# Lab 1 - Python Basics

## Background

### R & RStudio
R is a programming language that is well-suited for statistical analysis and working with
large datasets. The best way to use R is in an integrated development environment (IDE)
called Rstudio, where one could simultaneously view and control the code, command
prompt, variables, directory, and  gures.

### Python
Python is an all-purpose programming language that is suited for a variety of tasks and
functionalities (e.g. machine learning to applications with graphical user interfaces). It is
known for its intuitive syntax, simplicity, and open-source support. There are many
environments and methods possible to run Python code, and we will  rst learn how to
execute Python code by familiarizing oneself with the command line. A decent interactive
development environment for Python is PyTorch, among many others. It provides an
iPython interactive command line, meaning that one can execute code with Python syntax
through the console, analogous to RStudio.

### JupyterHub
Nowadays, most data scientists write python code in a notebook, meaning that there are
chunks of code executed in a chronological order and output is presented right below the
chunk of code. The advantage of notebooks is that debugging becomes much easier and it
allows for easier dissection and understanding of one's work
ow. Typically, these are called
Jupyter notebooks because it was originally developed an organization called Project
Jupyter and the code is run on a server that is either instantiated locally or located
remotely. For an example, see this link.

### Terminal 
The terminal is an interface in which you can type and execute text based commands. It
can be much faster to complete some tasks using a terminal than with graphical
applications and menus. Another bene t is allowing access to many more commands and
scripts. Recall some of the basic functions that were shown at the beginning of lab: 'cd',
'ls', and 'cat'. 'cd', short for 'change directory' allows you to navigate your local or remote
computer. You can change the directory one folder at a time, or list several subdirectories,
separating them with a '/'. 'ls', abbreviated for 'list', shows you all  les, folders, and
1
applications that exist within your current directory. If you don't want to change your
directory and simply look at the contents of a particular directory, you can use this
command, followed by the name of the directory, to see what's inside. Finally, 'cat', short
for 'concatenate', allows you to read text  les without having to open them.

# Python Basics I

## Printing and Commenting

In [None]:
#Your first python program!

"""
this is your first python program!
it doesn't do much... 
"""
print?
print("Hello World!")

In [None]:
## code can be commented either by prefixing a line with a #

"""
or by inserting multiple lines of text 
in between triple quotation blocks like this
"""

print("this code is run") #the code before this comment is run

#print("this code is not run")

print("a # inside quotes is just fine though")

In [None]:
## printing multiple things
"""
The print function can take in multiple arguments and will
concatenate the string separated by spaces
"""

print("we can print out numbers too like", 51, "or", 42)

## Simple math, variables, and logic

In [None]:
# 1 simple math and variables
"""
Math is pretty straightforward in python3

we can use all the normal symbols

+: plus
-: minus
/: divide
*: times
%: modulo
<: less than
>: greater than
<=: less than or equal to
>=: greater than or equal to

==: equivalence
!=: non-equivalence

https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

"""
print("13*2 =", 13*2)
print("13+2 =", 13+2)
print("30/2 =", 30/2)

In [None]:
## floats vs integers

print(type(13+2)) 
print(type(30/2))
print((30/2)==(13+2)) # fortunately python is smart enough to compare integers and floats


In [None]:
## the modulo operator
"""
# the '%' modulo operator gives the remainder 
e.g. 10 % 4 = 2 , the remainder of 10/4 is 2 
useful for finding even / odd numbers or counting by "k"
"""

print(1%2, 2%2, 3%2, 4%2) 


In [None]:
## The += operator

"""
The operation of incrementing a number by some fixed value
e.g. by 1, is so common that there is a shorthand for it

+=x : increment by x
"""

x=2
print(x)
x=x+1
print(x)
x+=1
print(x)

In [None]:
## Variables
"""
variables can be assigned with the "=" operator
"""

x=5
y=3
z=x*y/2
print(z)

In [None]:
## Assignment versus equivalence
"""
it is CRITICAL to understand the difference between "=" and "=="
"""

x=10
print(x==5)

#print(x=5) #This will fail!

x=5
print(x==5)
print(x!=5)

In [None]:
## Boolean logic
"""
simple boolean logic can be performed using english words
"""

t = True
f = False

print(type(t)) # "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"


## Strings (PAUSE!)

In [None]:
## Strings

x = "IB120"
print(x)
print(len(x))  # String length; prints "5"

In [None]:
## String formatting

x="IB120"
IB201_ranking = 1

complex_string = "{v1} is my number {v2} class!".format(v1=x, v2=IB201_ranking)

print(complex_string)



In [None]:
## String operations
"""
there are all kinds of operations that can be 
run on strings that are super helpful

https://docs.python.org/3/library/stdtypes.html#string-methods
"""

s = "hello"
print(s.upper())  # Convert a string to uppercase; prints "HELLO"
print(s.replace('l', '(l)'))  # Replace all instances of one substring with another;
print('--TEXT--'.rstrip("-"))  # Strip trailing characters
print('--TEXT--'.lstrip("-"))  # Strip leading characters
print('--TEXT--'.strip("-"))  # Strip leading and trailing characters
print(len('--TEXT--'))  # what is the length of this string?

## Containers: lists and dictionaries

In [None]:
#1.4 Containers - lists
"""
lists are ordered arrays of elements
in some languages this type is called an array
"""

xs = [9, 5, 12] # Create a list
print(xs)

xs.append(201)  # add an element to the end of the list
print(xs)

element = xs.pop() # pop an element off the end of a list returning it
print(element)
print(xs)


In [None]:
## list subsetting

xs = [5,7,1,9]
print(xs[1])   # indexing starts at 0! (UNLIKE R!!!)

print(xs[-1])  # Negative indices count back from the end of a list
print(xs[3])


In [None]:
## list slicing

xs = [5, 7, 1, 9, 201]

print(xs[2:4])        # Get a slice from index 2 to 4 - exclusive! ie, includes elements 2 and 3, not 4
print(xs[2:])         # Get a slice from index 2 to the end; 
print(xs[:2])         # Get a slice from the start to index 2, doesn't include element 2

print(xs[:-1])        # negative slices, all but last element
print(xs[:-2])        # negative slices, all but last two elements


In [None]:
## super slicing
"""
    array[start:end:step] <-- formatting
"""

xs = ["a", "b", "c", "d", "e"]
x = "abcde"

print(xs)
print(xs[::])     #the whole list
print(xs[::2])    #every second element
print(xs[2::2])    #every second element starting at element 1
print(xs[::-1])   #reverse the list
print(xs[:1:-1]) #reverse the list, and stop at element 1

x.split?

In [None]:
## list operations

xs = [5, 7, 1, 9, 201]

print(19 in xs)       #is 19 in the list?
print(9 in xs)        #is 9 in the  list?  
print(xs.index(9))    #where is 9?

print(min(xs))        #the minimum element
print(max(xs))        #the maximum element


In [None]:
## list concatenation

x1 = [1,2,3]
x2 = [4, 5, 6]

joined_list = x1 + x2

print(joined_list)

x3 = list(range(20)) ## range() generates a sequence; range(start, stop, step)
print(x3)
#print(range(20)) ##notice what happens if you try printing without concatenation using 'list()'


In [None]:
#CONTINUE ON YOUR OWN HERE
## splitting strings into lists

student_names = "Debora,Miguel,Stacy,Xu"
list_of_names = student_names.split(",")
print(list_of_names)

In [None]:
## converting strings to numbers

"""
often when we are reading files numbers are encoded as
strings. To use these numbers as numbers we need to
convert them to floats or integers.
"""

string_number = "5.3"

#print(string_number + 3) # this will fail - you cannot add a string to an integer

#print(int(string_number) + 3) # this will fail - 5.3 is not an integer!

print(float(string_number) + 3)


In [None]:
#####SECRET CODE - make sure to run this cell, but you don't need to understand it...yet#####
"""
In this block I am activating a new function to print out dictionaries 
in a pretty fashion - it's not part of the lesson
"""
import json

def dprint(d):
    print(json.dumps(d,
                     sort_keys=True,
                     indent=4))



In [None]:
#1.5 Containers - Dictionaries

"""
Dictionaries hold key:value pairs

in some languages these are called "hash tables"

*** NOTE I HAVE DEFINED A FUNCTION CALLED dprint above which 
allows us to print pretty dictionaries that are easier to look at
the print() function works just fine too - we will learn more about
functions at a later date so you can ignore that code for now***

"""

#dictionaries can store a mixture of different "types", also known as keys 
#all keys have to be of the same type, either string or int, but the values can be a mixture of types

student_info_dict = {"student_ID": 120120, 
                     "first_name": "Jane",
                     "last_name": "dough"} 

##note here also that you can continue writing on the next line

dprint(student_info_dict)
print(student_info_dict["student_ID"]) ##extract the value associated with the key 'student_ID'

student_info_dict["grade"] = "A+" ##add an extra key - 'grade'
dprint(student_info_dict)

In [None]:
## combining lists and dictionaries

s1 = {"student_ID": 120120, 
      "first_name": "Jane",
      "last_name": "dough"}

s2 = {"student_ID": 2, 
      "first_name": "Jake",
      "last_name": "Dunn"}

s3 = {"student_ID": 3, 
      "first_name": "Mary",
      "last_name": "Claire"}

students = [s1,s2,s3]

dprint(students)


In [None]:
## dictionaries of dictionaries

s1 = {"student_ID": 120120, 
      "first_name": "Jane",
      "last_name": "dough"}

s2 = {"student_ID": 2, 
      "first_name": "Jake",
      "last_name": "Dunn"}

s3 = {"student_ID": 3, 
      "first_name": "Mary",
      "last_name": "Claire"}

students_by_id = {"student1":s1,    ##assigns dictionaries to keys
                  "student2":s2,
                  "student3":s3}

dprint(students_by_id)

In [None]:
## dictionaries of dictionaries
"""
dictionaries are very useful ways to store data
"""
dprint(students_by_id['student3'])

## PAUSE for terminal basics
These can also be found in the file 'termFun.txt'

### Assignment

For questions 1-3, submit a screenshot of the terminal, printing "Hello World" through running the python script. Download this notebook (in 'Notebook (.ipynb)' format) with your answers to the rest of the questions in the cells below the question, and then upload the notebook, along with the screenshot, to bCourses. 

1. Open a new text file, write "print('Hello World')", and save as "first_script.py".

2. Open a terminal window.

3. Navigate your command prompt to the directory containing the file and write "python3 first_script.py".

4. Consider the following lists: a = [1,2,3,4,5], b = [5,6,7,8,9]. How could you create the list [1,2,3,4,5,6,7,8,9]? (Notice the '5' is not repeated).

5. You have a list of gene names and their IDs: </br>
"FOS1", 92231 </br>
"JUN", 2313 </br>
"BERP", 5641 </br>
Given a set of gene IDs, how could you easily pull out which gene names correspond to which IDs?