![IE](../img/ie.png)

# Session 2: The Python execution model

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data

## How does `import` work?

How do `import os` and `import pandas` work? What happens if the latter is not installed?

## How can I `import` my code?

There are three ways to `import` our own code:

- **Being on the same directory**. This is the quickest, however it scales quite poorly (imagine having all of pandas and scikit-learn in a single directory to do any data analysis project!)
- **Appending our code location to `PYTHONPATH`**. This is effective, but we will try to avoid it because it can bring problems in the future.
- **Making our code _installable_**. Since any code that's _installed_ can be _imported_, this shifts the question to "how to make our code installable".

However, let's first explore the implications of the first two options.

### Our first Python library

We will create a new Python library, "IE Titanic utils", to analyze the [Titanic dataset](https://www.kaggle.com/c/titanic/data). To start:

1. Open a command line
2. Browse to your home directory: `cd`
   - Or just any directory of your liking: `cd ~/Projects/IE`
3. Create a new directory: `mkdir ie-titanic-utils`
4. Enter that directory: `cd ie-titanic-utils`

And we will do some basic setup before we start coding:

5. Create a basic `README.md` containing the name of the project and your name
6. Let's generate an appropriate `.gitignore` file next to `README.md`
   - For simplicity, you can use https://www.gitignore.io/api/python,jupyternotebooks
7. `git add` the two new files, and `git commit` with the message `"First commit"`

#### Exercise

Now that we have some basic structure, let's write some basic code.

1. Create a `str_utils.py` file with a function called `tokenize` that takes a `str` sentence and splits it into a `list` of words
2. Open a Python interpreter (`python` on the command line) and check that `from str_utils import tokenize` works
3. Test the function by calling it with some sentence

### The `PYTHONPATH`

We saw above that we could easily import our `tokenize` function. However, this only works if we are in the same directory:

```
$ ls
str_utils.py README.md
$ cd ..
$ ls
ie-titanic-utils
$ python3
>>> import math  # Still works
>>> import str_utils
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'str_utils'
```

Why? Python looks in some predefined locations to know where to find what we want to import, called the "PATH":

```
>>> import sys
>>> sys.path
['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']
```

Therefore, there are two ways of making our code **globally importable**:

1. Modify the "PATH"
2. Put our code inside a location predefined in the "PATH"

The first option can be achieved like this:

```
>>> sys.path.insert(0, "/home/username/ie-titanic-utils")
>>> import str_utils  # Works!
>>>
```

Or, alternatively, from outside of the interpreter:

```
$ export PYTHONPATH=/home/username/ie-titanic-utils
$ python3
>>> import sys
>>> sys.path  # Notice the change!
['', '/home/username/ie-nlp-utils', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']
>>> import str_utils  # Now it works!
>>>
```

However, **both are bad practices and should be avoided**. In future sessions we will see [the right way to distribute Python code](https://packaging.python.org/tutorials/packaging-projects/).

### What does `import` do?

Python code is normally written in `.py` scripts. For example:

```
$ tail -n1 str_utils.py
print(tokenize(["Hello, world!"]))
$ python hello.py 
['Hello,', 'world!']
```

These scripts can be imported in the same way that any model or package from the [standard library](https://docs.python.org/3/library/index.html) can:

```
$ python3
>>> import math  # Works, because it's in stdlib
>>> import numpy as np  # Works if you ran `pip install numpy` in advance
>>> import str_utils  # Works if you are in the same directory
['Hello,', 'world!']
>>> 
```

When the user imports a script, **Python runs the script**. That's the way all the possible functions and classes inside it are available.

### How to separate "running code" from reusable pieces

A Python module (any `.py` script) might contain code that we want to _run_, as well as code that we only want to _import_. To separate these, we use this trick:

```
$ tail -n2 str_utils.py
if __name__ == "__main__":
    print(tokenize("Hello, world!"))
$ python hello.py  # The `print` runs
['Hello,', 'world!']
$ python
>>> import str_utils  # The `print` doesn't run!
>>>
```