# Packages, Libraries & Modules

<img src="../images/python-3rd-party-libraries.webp" width="80%" height="80%">

## Terminology

People often use the terms _"package"_, _"library"_, and _"module"_ synonymously.

A **package** generally refers to source code that is bundled up in a way that a package manager can host. 
   * [PyPI](https://pypi.org/)
   * [Anaconda](https://anaconda.org/anaconda/repo)
   
When you `pip install pkg` or `conda install pkg` you are installing the `pkg` package from PyPI or Anaconda, respectively, to your computer or the server you’re working on.

A **library** generally refers to a centralized location on an operating system where installed package source code resides and can be imported into a current session. 

We can always see the full list of packages installed on our operating system with `pip list` or `conda list`:

In [None]:
%pip list

A **module** is the broard term used for any code you are importing from outside your current script/notebook. 

This may include libraries installed on your computer or it may also include stand alone .py files that you created to hold support functions for your current analysis and you want to import them into your current notebook.

## Importing libraries / modules

To use a library / module we need to first `import` it into our Python session so we have access to its functionality.

For example, functionality related to the operating system -- such as creating files and folders -- is stored in a package called `os`.

To use the tools in `os`, we `import` the library.

In [1]:
import os

Once we import it, we gain access to everything inside.
With Jupyter's autocomplete, we can view what's available.

In [None]:
# Move your cursor the end of the below line and press tab.
os.

## Importing options

There are a few ways to use the `import` statement:

In [3]:
# Preserving module namespace
import numpy

numpy.cos(numpy.pi)

-1.0

In [4]:
# Explicit module import by alias
import numpy as np

np.cos(np.pi)

-1.0

In [7]:
# Explicitly import module functions
from numpy import cos, pi

cos(pi)

-1.0

<div class="admonition warning alert alert-danger">
 <b><p class="first admonition-title" style="font-weight: bold">Caution!</p></b>
 <p>There is another option but we do not use it. Implicit module importing is generally frowned upon and in this class I expect you to use one of the three previous explicit importing approaches.</p>
</div>

```python
# Don't do this!!!
from numpy import *
```

## Standard library

> The Python source distribution has long maintained the philosophy of "batteries included" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.
>
> \- PEP 206

Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.
Collectively, this group of packages is known as the *standard library*.

<img src="../images/python-3rd-party-libraries.webp" width="60%" height="60%">

Python’s standard library contains many useful built-in modules, which you can read about fully in [Python’s documentation](https://docs.python.org/3/library/). Any of these can be imported with the `import` statement, and then explored using the help function seen in the previous module. Here is an extremely incomplete list of some of the modules you might wish to explore and learn about:

- `os` and `sys`: Tools for interfacing with the operating system, including navigating file directory structures and executing shell commands
- `math` and `cmath`: Mathematical functions and operations on real and complex numbers
- `itertools`: Tools for constructing and interacting with iterators and generators
- `functools`: Tools that assist with functional programming
- `random`: Tools for generating pseudorandom numbers
- `pickle`: Tools for object persistence: saving objects to and loading objects from disk
- `json` and `csv`: Tools for reading JSON-formatted and CSV-formatted files.
- `urllib`: Tools for doing HTTP and other web requests.

You can find information on these, and many more, in the Python standard library documentation: https://docs.python.org/3/library/.

## Knowledge check

The `random` library provides tools for generating pseudorandom numbers. If we wanted to generate a random integer between the values 0-10 as asked in the knowledge check we could do so with the following:

In [9]:
from random import randint

randint(0, 10)

9

## Third party packages

Other packages must be downloaded separately, either because
- they aren't sufficiently popular to merit inclusion in the standard library
- *or* they change too quickly for the maintainers of Python to keep up

<img src="../images/python-3rd-party-libraries.webp" width="60%" height="60%">

Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:

- pandas
- numpy (numerical computing)
- scikit-learn (modeling)
- scipy (scientific computing)
- matplotlib (graphing)

## Jake Vanderplas' Python Data Science Ecosystem

In 2017, Jake Vanderplas (Python author, developer, and advocate) spoke at PyCon (the biggest Python conference) about the Python data science ecosystem.

Notably, the diagram is arranged to show that higher-level libraries are built upon lower-level libraries -- a common development approach in open ecoystems.

<center><img src="../images/jakevdp_ecosystem.jpg" style="height:400px;"/></center>

## Paco Nathan's Python Data Science Landscape

More recently, Paco Nathan (data science researcher and frequent conference speaker) wrote a blog post in which he illustrated the current Python data science landscape.

<center><img src="../images/paco_nathan_ecosystem.png" style="height:500px;"/></center>

## How To Keep Up

Following the developments in such a full space can be daunting, but we recommend a few things:

- **Subscribe to newsletters**. The [O'Reilly Data Newsletter](https://www.oreilly.com/data/newsletter.html) and [Python Weekly](https://www.pythonweekly.com) are reliably excellent.
- **Listen to podcasts**. [Talk Python to Me](https://talkpython.fm) and [Python Bytes](https://pythonbytes.fm) are both good. While neither is data science-specific, they cover a wide range of topics and ideas.
- **Google**! We've discovered many good packages by searching for something specific. For example, many services offer a Python package so you can interact with them programmatically. I use the [Todoist API](https://github.com/Doist/todoist-python) to automatically update my todo-list on a regular cadence.