# Short introduction to Python for R users



## Jupyter Notebooks

Jupyter notebooks are a way of writing code, running code, and displaying output in one convenient place. You can write code in code blocks, or write markdown/text in blocks like this. It's often useful to explain what you're doing and finding so when you or someone else picks up the notebook in the future they'll know what's going on.

You can execute code chunks by clicking the cell to run and hitting "Run" button on the top bar, or by typing "shift-enter". You can always return to a previous code chunk, modify it, and re-run it.

You can write math in notebooks, just like in Rmarkdown: $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$

Jupyter is great for prototyping, but for more heavy duty use, replication, running on a server, etc., I recommend re-writing the code into a `.py` file that can be called from the command line with `python mycode.py`. This process also gives you a chance to refactor your code, making it more efficient, readable, and dependable. You can convert a notebook to (messy) Python code by going `File > Download as > Python (.py)`.

## Python vs. R

Python and R are fairly similar. This is a quick overview of the differences to help you get up to speed.

### Installing and importing packages

- In Python, packages are installed from the command line, NOT from inside Python itself. We can run command line commands from within a Jupyter Notebook. To do so prepend `!` to the command. To install the `mypackage` package you can run `!pip install mypackage` or from the terminal,
  run `pip install mypackage`. 
- Importing packages is similar: instead of `[R] library(mypackage)`, do `[Python] import mypackage`.
- Python also lets you import specific functions from a package: `from mypackage import cool_function`
- You can also rename packages if they're too long: `import numpy as np`

In [1]:
# practice by installing tqdm and import the tqdm function from it
!pip install tqdm
from tqdm import tqdm



### What are all those dots for? (Or, methods, attributes, and namespaces)

- Dots have special meaning in Python. It's not like R, where people put dots in all sorts of names.
- Python is much more careful about keeping packages' functions attached to the functions. If the `requests` library has a function called `get`, you call it like this `requests.get()`. This reminds you where the `get` function came from and prevents you from overwriting some other package's `get` function.
- Python is also more "object oriented" than R. Objects often have built in or attached functions, called methods. 
- These methods are called with a dot notation. Compare: 
```
[R] strsplit("Andy Halterman", " ")
```
and

```
    [Python] "Andy Halterman".split(" ")
```

- Objects can also have attributes, which are pieces of data attached to an object. Example: `andy.subfields = ['methods', 'security']`


In [3]:
## Practice! Can you figure out how to make a string all upper case?
my_name = "Adam Kaplan"
my_name.upper()

'ADAM KAPLAN'

### Data Structures

- Like R's vectors, Python uses a lot of lists. These are ordered arrays. Note that Python starts with 0! 

```
my_list = ["x", "y", "z"]
> my_list[0] 
x
```

- Python has a data structure called a dictionary, which are like lists that you access by key name instead of by position (think a more general form of R's dataframes). Example:

```
article = {"title": "Rivalry and Revenge",
           "author" : "Balcells",
           "year" : "2017"}

> article.keys()
['title', 'author', 'year']

> article['author']
"Balcells"
```

### Loops and functions


- Functions are only slightly different from R:

```
def my_function(x):
    z = (x + 2)^2
    return z
```

- Loops are fast and nice in Python, unlike in R, and are very easily done:

```
for i in my_list:
    my_function(i)
```

- Pro move: list comprehensions:

```
[my_function(i) for i in my_list]
```

In [7]:
## Practice! Can you make a list of words, write a loop that goes over the list, and 
## prints a capitalized version of each one?
andy_subfields = ["methods", "security"]

def capitalize_list(list_of_words):
  return [word.upper() for word in list_of_words]

print(andy_subfields)
capitalized_andy_subfields = capitalize_list(andy_subfields)
print(capitalized_andy_subfields)

['methods', 'security']
['METHODS', 'SECURITY']


### Whitespace

- As you can tell, Python makes heavy use of whitespace to set apart different levels of functions, for loops, etc. Use four spaces (Jupyter converts tabs to four spaces automatically. 
- No need for curly braces!

```
def my_function(big_list):
    print(len(big_list))
    for l in big_list:
        for i in l:
            ...
    return stuff
```

## Extra fun: Progress bars

In [8]:
# tqdm is one of my favorite libraries in Python.
# It makes very nice progress bars without any real effort:
from tqdm.autonotebook import tqdm
from time import sleep

_ = [sleep(0.01) for i in tqdm(range(0, 500))]

  This is separate from the ipykernel package so we can avoid doing imports until


HBox(children=(FloatProgress(value=0.0, max=500.0), HTML(value='')))


