# Python Libraries

**Learning Objectives**:
- Understand the purpose of libraries, and where to find them.
- How to install a new library.
- Import and use functions from libraries in Python.
****

A **library** is a collection of functions bundled neatly into a module that can be used by other users. Libraries usually contains classes, functions, and sometimes variables. Libraries are typically developed to serve a unified purpose.

We have already been using Python's [standard library](https://docs.python.org/3/library/) - it comes ready and loaded with Python. We've also used `pandas` to work with data frames.

Many additional libraries are available from [Anaconda](https://www.anaconda.com) or [PyPI](https://pypi.python.org/pypi) (the Python Package Index).

## Installing New Libraries

### Option 1: Anaconda Navigator

Many popular (and some not-so-popular) libraries are available for installation through the Anaconda Navigator. Let's use the Anaconda Navigator to install a library called `fuzzywuzzy`. 

To do so, follow these steps:

1. Open the Anaconda Navigator application.
2. Click the "Environments" tab in the left-hand menu.
3. Click the drop-down box that, by default, says "Installed". Change it to say "All".
4. Click the "Channels" button, which will open up a dialog box. In that dialog box press the "Add" button in the top right corner. Type "conda-forge" in the new line that appears in the dialog box, then press enter. Finally, press the green "Update channels" button in the bottom right-hand corner.
5. Use the "Search Package" text box to enter `fuzzywuzzy` and press "Enter".
6. Select the checkbox next to the `fuzzywuzzy` package (library) name that appears in the list below.
7. Click the green "Apply" button in the bottom right-hand corner. This will open up a dialog box that says "Install Packages". It will say "Solving package specifications" with a blue progress bar. This may take a few minutes.
8. Eventually, it will show a list of packages that would need to be installed. Press the green "Apply" button in the "Install Packages dialog box" which will install the packages. This may take a few minutes.

### Option 2: Installing Using the Command Line

Another option is to install a package directly using the command line. On Macs, you would open up the Terminal application. On Windows, open up the Anaconda Prompt application.

Once you have this application open, you can use `pip`, a Python package installer, to install new packages from PyPI. Simply run `pip install [PACKAGE_NAME]`, and the package will be installed.

### Option 3: Installing within a Jupyter Notebook

You can also install packages within a Jupyter Notebook. Create a new cell, and run the command `!pip install [PACKAGE_NAME]`. This should then run the `pip` install in the background.

## Importing Packages

An installed package is not yet available for us to use while running Python. We still have to **import** the package into the current session. You can think of installing a package as buying a book and putting it on the bookshelf. If we wanted to actually read the book, we would have to take it off the shelf ad place it on the desk, similarly we need to import a package.

Importing is done via the `import` keyword. We simply run `import [PACKAGE_NAME]`, and everything inside the package becomes available to use.

Packages are typically organized into modules . Within these modules are functions, classes, and variables. All of these components can be accessed with **dot notation**: e.g., `[LIBRARY_NAME].[MODULE_NAME]`. Python uses `.` to mean "part of".

Let's import the `numpy` module, which has a lot of useful functions for working with numerical data. Although you will occasionally access it directly, more frequently `numpy` will be a package that many other useful packages (namely `pandas`) is based on. Let's access a function from this module using dot notation.



In [None]:
import numpy

print('mean of [1,4,5] is:', numpy.mean([1,4,5]))

For many packages, like `numpy`, there is an **alias**, or nickname that they are often imported as. For common packages (especially those with long names), it saves a lot of typing when you use a nickname. For example, `numpy` is usually imported as below:

In [None]:
import numpy as np

print('mean of [1,4,5] is:', np.mean([1,4,5]))

There are very common abbreviations used for some of the more popular libraries, including:

* `pandas` -> `pd`
* `numpy` -> `np`
* `matplotlib` -> `mpl`
* `statsmodels.api` -> `sm`

But sometimes aliases can make programs harder to understand, since readers must learn your program's aliases. Be very intentional about using aliases!

## Finding More About a Package's Contents

How do we know what we can do with `numpy`? Usually, packages provide documentation which explain these components. We can access this documentation with the `help` function:

In [None]:
help(numpy)

You can also view documentation [online](https://docs.python.org/3/library/math.html). 

Being comfortable sifting through documentation is a **very** important skill!

**Question:** You are curious about what is available in the `math` module, so you run `help(math)`. However you get an error. What went wrong?

In [None]:
help(math)

## Importing Specific Components of a Package

We generally want to import only what we need from a package. To import a specific component of a package, we can use the `from` keyword. This allows us to import a specific module, function, or variable, and then refer to them directly without the library name as prefix.

Specifically, we use the syntax `from [PACKAGE_NAME] import [COMPONENT]`.

Let's do this with the `numpy` module. From the `numpy.random` module we want to import the `shuffle()` function, which will shuffle a list of items.


In [None]:
from numpy.random import shuffle
test = [1,2,3,4]
shuffle(test)
print(test)

**Question:** There is another module caled `random` in the Python standard library. Knowing that, why might we not want to run `from numpy import random` and `import random` in the same notebook?

## Challenge 1: Locating the Right Library

You want to select a random value from a list of data.

1. What [standard library](https://docs.python.org/3/library/)    would you most expect to help? (**Hint:** it was mentioned earlier in the notebook)
2. Which function would you select from that library? Are there alternatives?
3. Read the documentation for that function. How many arguments does the function take? How many of them have defaults?
4. Import the library, and apply the function to the following list.

In [None]:
 ids = [1, 2, 3, 4, 5, 6]

In [None]:
# YOUR CODE HERE
