# Modules, Packages, & Libraries

## 1. Modules

A module is code that you can run even though it lives in a different file from the one you are working in.

Say we have this code that stores values in a list called `my_data`.

In [1]:
my_data = [ 3, 10,  8,  7,  5,  3,  3,  5,  40,  8,  5,  3,  2,  5,  5,  8,  2,  9,  6,  6, 10, 10,  7,  3,
         5,  7,  6, 10,  2,  -33,  8,  4,  8,  7,  9,  8,  1,  2,  8,  5,  3,  7,  1,  4,  9,  6,  6,  8,
        10,  1]

Let's put the code above into a different file.

### 1.1 Create a .py file

1.) Click on the file icon in the upper left, and then a big blue plus sign will appear. Click it to open a new Launcher tab.

<img src="new_launcher.png" align="left"/>

<br clear="left"/> <!-- This clears the float to ensure text starts after the image -->

2.) In the Launcher tab, click to open a new Text File:

<img src="new_text_file.png" width=150 align="left"/>

<br clear="left"/> <!-- Clears the float after this image -->

3.) RIGHT CLICK on the untitled.txt tab and choose "Rename Text".

<img src="rename_txt.png" width=400 align="left"/>

<br clear="left"/> <!-- Clears the float after this image -->

4.) Rename the file "my_data_stats.py"

<img src="rename_file.png" width=300 align="left"/>

<br clear="left"/> <!-- Clears the float after this image -->

5.) Copy this code and paste it into the my_data_stats.py file:

```
my_data = [ 3, 10,  8,  7,  5,  3,  3,  5,  40,  8,  5,  3,  2,  5,  5,  8,  2,  9,  6,  6, 10, 10,  7,  3,
         5,  7,  6, 10,  2,  -33,  8,  4,  8,  7,  9,  8,  1,  2,  8,  5,  3,  7,  1,  4,  9,  6,  6,  8,
        10,  1]
```

**Note: Do not include apostrophes around the code when I ask you copy and paste the code**

6.) In the my_data_stats.py file, manually change the last element of the list variable `my_data` to `35`.

7.) Save the .py file: From the my_data_stats.py tab, click File>Save Python File.


### 1.2 Import variables in the .py file

Your .py file can be imported as a *module* in this .ipynb.

Run the code below

In [None]:
# run this code
import my_data_stats 
print(my_data_stats.my_data)

You can access the variables in my_data_stats.py by importing the .py file as a module and then using the name of the module followed by a dot and the name of the variable!

### 1.3 Import only works the first time...

Follow these steps!

1.) In the my_data_stats.py file, manually change the last element of the list variable `my_data` to `55`.

2.) Save the .py file: With the my_data_stats.py tab selected, click File>Save Python File.

3.) Rerun the code *above* that imports my_data_stats and prints my_data

Note that your change won't show up because the .ipynb thinks it already imported this file and doesn't realize it has changed. 

### 1.4 Force it to import again

One way to force it to import again is to ***restart the kernel** and then re-run the `import my_data_stats` code* above. This is often a good solution, but is sometimes annoying if you plan on changing and using the .py file in succession a bunch.

Another solution is using the **importlib.reload()** function:

In [None]:
import importlib
importlib.reload(my_data_stats)
print(my_data_stats.my_data)

Now you can see that the element that you changed has been updated in this notebook too.

To recap: Python only imports a module once per session. If you change a .py file and re-import it, the changes won't show up unless you restart the kernel or use importlib.reload().

### 1.5 Import a function in the .py file

1.) Copy the averaging function below and paste it into the my_data_stats.py file, underneath where you defined the my_data variable:

```
def avg(data):
    '''
    Calculate the average of the data
    '''
    sum = 0
    for d in data:
        sum = sum + d
    avg = sum/len(data)
    return avg
```

2.) In my_data_stats.py, rename the function from `avg` to `my_avg`.

3.) Save the .py file: From the my_data_stats.py tab, click File>Save Python File.

4.) Run the code below

In [None]:
importlib.reload(my_data_stats)

print(my_data_stats.my_avg([0,5,10]))

6.) Use the my_avg function to find the average of my_data.

In [6]:
# your code here

### 1.6 Importing modules **as** a short name 

You may be getting tired of typing my_data_stats every time you want to use a variable or function from that file. We can import the module `as` something else to make our life easier by shortening it with an alias like this:

In [None]:
import my_data_stats as mds
mds.my_avg(mds.my_data) # that's less work to write!

### 1.7 Importing variables and functions **from** a module

If you don't want to have to call the module by name each time you want to use one of it's variables or functions,

Or if you only want to use SOME of the variables or functions from the module, you can import parts of the module directly in the notebook using `from`

In [None]:
from my_data_stats import my_data, my_avg
my_avg(my_data)

### 1.8 Modules can store a bunch of stuff!

Copy the following functions and paste them into the my_data_stats.py file, underneath the code you have already put there:

```
def my_std_dev(data):
    '''
    Calculate the standard deviation of the data
    '''
    mean = my_avg(data)
    squared_diffs = []
    for x in data:
        squared_diffs.append((x-mean)**2)
    variance = sum(squared_diffs) / len(data)
    std_dev = variance**(1/2)
    return std_dev

def my_outliers(data):
    '''
    Identify and return the outliers in the dataset
    The method for identifying outliers used here is
    data points that are more than 3x the standard deviation away from the mean
    '''
    mean = my_avg(data)
    std = my_std_dev(data)
    upper_threshold = mean + 3*std
    lower_threshold = mean - 3*std
    outliers = []
    outlier_indices = []
    for i in range(len(data)):
        if (data[i] > upper_threshold) or (data[i] < lower_threshold):
            outliers.append(data[i])
            outlier_indices.append(i)
    return outliers, outlier_indices, upper_threshold, lower_threshold
```

Then save the my_data_stats.py file.

Afterwards we can reload and import these functions into our notebook too.

In [None]:
importlib.reload(my_data_stats)
from my_data_stats import my_std_dev, my_outliers
print('Mean=',my_avg(my_data))
print('Standard Deviation=',my_std_dev(my_data))
outliers,outlier_indices,upper,lower = my_outliers(my_data)
print('Upper Outlier Threshold=',upper)
print('Lower Outlier Threshold=',lower)
print('Outliers=',outliers)

## Packages

Sometimes you'd like to store multiple modules inside a folder. This folder of modules is called a "package".

**Drag the my_data_stats.py file into the folder called "this_data"**

<img src="move_module_to_package.png" width = 300 align="left"/>

<br clear="left"/> <!-- This clears the float to ensure text starts after the image -->

If we click on this folder we see there are now two .py files inside.

We can import everything in this folder using `import this_data`.

If we want to use a module in the *this_data* package, we use the dot notation again, where this dot is similar to a slash in a directory path:

In [None]:
import this_data
my_data = this_data.my_data_stats.my_data #grabbing my_data from the file this_data/my_data_stats.py
print(my_data)

Or we can use *from* to import the individual modules, or even individual variables and functions of interest

In [None]:
from this_data.my_data_stats import my_data
from this_data.plot_my_data import time_series_plot, histogram_plot

time_series_plot(my_data)
histogram_plot(my_data)

So a package is a folder containing multiple modules. Using packages helps keep your code organized and makes it easier to import related functions. For example, instead of keeping all helper functions in one large .py file, you can break them into separate modules within a package.

# Libraries

Libaries are a collection of packages. You can import an entire libary and then can use certain packages using the dot notation.

For example, the numpy library contains many packages for numerical computing, and the matplotlib library has multiple packages for plotting. When you import numpy, you get access to all its packages and modules using dot notation (e.g., numpy.linalg for linear algebra functions).

In [None]:
import numpy as np
import matplotlib.pyplot as plt