
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>



# Libraries
## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>
- Explore the Databricks **%run** feature
- Introduce libraries and PyPI



### %run

So far, we have been running individual cells in a single notebook or using **Run All** in the notebook toolbar to run all of the executable command cells in sequence from top to bottom.

Databricks also supports the **%run** magic command to allow one notebook to run all of the executable command cells in a *different* notebook.

When we do this, we also get access to that other notebook's state, which means all of the classes, functions, and variables defined in that notebook.

The **%run** magic command must appear at the top of a command cell, followed by the path to the other notebook. You can *not* have any Python code in the command cell before or after the **%run** magic command

In [0]:
%run ./Includes/run_example



The example notebook contains defines the function **`greet()`**, which takes in a name and returns a greeting. Since we now have access to that notebook's state, we can use **`greet()`** even though it was not defined in the notebook. This is a useful way to define & test helper functions without cluttering up your main notebook.

In [0]:
greet("Bob")

Out[2]: 'Hello Bob, how are you?'



### PyPI and Python Libraries

This idea of storing collections of useful definitions in libraries to access in different files extends beyond the Databricks Environment. In fact, when Python is installed on a system it usually includes a useful collection of utilities called the [Python Standard Library](https://docs.python.org/3/library/). Furthermore, developers can share any libraries they create with the Python community by uploading them online. 

<a href="https://pypi.org/" target="_blank">PyPI</a>, which stands for the Python Package Index, is the central repository where developers upload and share their libraries, and where users can download them. It contains thousands of libraries for a wide variety of uses. Some of them have become the industry standard for certain use cases and help standardize code in an industry. 


By default, Python does not have access to these libraries. Before you can import a library to use in your program, you must first install it from PyPI on your system.



#### pip

pip is the tool most often used to actually download a library from PyPI. It is usually included in your Python installation. 

In a command line or terminal, simply type **`pip install package_name`**. If we wanted to install **`numpy`**, a popular library, for example, we would write **`pip install numpy`**.

The Databricks environment is a little different. Rather than typing **`pip install package_name`** into a terminal, we would write **`%pip install package_name`** into a cell. The Databricks cell is expecting Python code, and **`%pip`** tells it to expect a pip command instead.

**Note:** In Databricks, this type of command will restart the Python terminal, so it's best to use it at the top of a notebook to not lose the results of code ran before.

In [0]:
%pip --help


Usage:   
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  index                       Inspect information available from package indexes.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  debug                    



The following code cell will restart the Python interpreter, which means you will have to re-run the `./Includes/run_example` notebook to have access to `greet()`.

In [0]:
%pip install numpy

Python interpreter will be restarted.
Python interpreter will be restarted.



Let's say we wanted a function to take the square root of a number. Python doesn't have this built-in but <a href="https://numpy.org/doc/stable/" target="_blank">numpy</a> does.

Fortunately, our Databricks environment comes with several useful libraries preinstalled, including **`numpy`**. For more information, you can refer to the <a href="https://docs.databricks.com/en/release-notes/runtime/index.html" target="_blank">Databricks Runtime Release Notes</a>. So, for now, we will only have to import the one we want.

The first step is to tell Python that we want to use the features defined by **`numpy`** by *importing* it. The simplest way to import a library installed on your system is to use the **`import`** statement as shown below:

In [0]:
import numpy



Now to access functions defined in the **`numpy`** library once imported, you write **`numpy.function_name(arguments)`**.

Let's see this for the square root function which is defined in numpy as **`sqrt(arguments)`**.

In [0]:
numpy.sqrt(4.0)

Out[2]: 2.0



We can create an alias when importing the library as well.

In [0]:
import numpy as np

np.sqrt(4.0)

Out[3]: 2.0



We can also import specific functions from libraries.

In [0]:
from numpy import sqrt

sqrt(4.0)

Out[4]: 2.0



#### `help()`

Recall the **`help()`** function that displays documentation for the item passed into it. We can use **`help()`** both on a library and anything defined in that library.

In [0]:
help(np)

Help on package numpy:

NAME
    numpy

DESCRIPTION
    NumPy
    =====
    
    Provides
      1. An array object of arbitrary homogeneous items
      2. Fast mathematical operations over arrays
      3. Linear Algebra, Fourier Transforms, Random Number Generation
    
    How to use the documentation
    ----------------------------
    Documentation is available in two forms: docstrings provided
    with the code, and a loose standing reference guide, available from
    `the NumPy homepage <https://www.scipy.org>`_.
    
    We recommend exploring the docstrings using
    `IPython <https://ipython.org>`_, an advanced Python shell with
    TAB-completion and introspection capabilities.  See below for further
    instructions.
    
    The docstring examples assume that `numpy` has been imported as `np`::
    
      >>> import numpy as np
    
    Code snippets are indicated by three greater-than signs::
    
      >>> x = 42
      >>> x = x + 1
    
    Use the built-in ``help`` func

In [0]:
help(np.sqrt)

Help on ufunc:

sqrt = <ufunc 'sqrt'>
    sqrt(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
    
    Return the non-negative square-root of an array, element-wise.
    
    Parameters
    ----------
    x : array_like
        The values whose square-roots are required.
    out : ndarray, None, or tuple of ndarray and None, optional
        A location into which the result is stored. If provided, it must have
        a shape that the inputs broadcast to. If not provided or None,
        a freshly-allocated array is returned. A tuple (possible only as a
        keyword argument) must have length equal to the number of outputs.
    where : array_like, optional
        This condition is broadcast over the input. At locations where the
        condition is True, the `out` array will be set to the ufunc result.
        Elsewhere, the `out` array will retain its original value.
        Note that if an uninitialized `out` array is 



Note that while creating a library is outside the scope of this introductory course, all of the functions and classes they define are defined in the same way we have seen.

&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>