#  Python Packages for Earth Data Science

### What is a Python Package

In Python, a package is a bundle of pre-built functionality that adds to the functionality available in base Python. Base Python can do many things such as perform math and other operations. However, Python packages can significantly extend this functionality.

You can think of a Python package as a toolbox filled with tools. The tools in the toolbox can be used to do things that you would have to otherwise hand code in base Python. These tasks are things that many people might want to do in Python, thus warranting the creation of a package. After all, it doesn’t make sense for everyone to hand code everything!

For example, the matplotlib package allows you to create plots of data. Since most of us create plots routinely, having a Python package to create plots makes programming more efficient for everyone who needs to create plots.

# Open Source Python Packages for Earth Data Science

There are many different packages available for Python. Some of these are optimized for scientific tasks such as:

- Statistics
- Machine learning
- Using geospatial data
- Plotting & visualizing data
- Accessing data programmatically

and more! The list below contains the core packages that you will use in the upcoming chapters of this textbook to work with scientific data.

- **os**: handle files and directories.
- **glob**: create lists of files and directories for batch processing.
- **matplotlib**: plot data.
- **numpy**: work with data in array formats (often related to imagery and raster format data).
- **pandas**: work with tabular data in a DataFrame format.
- **rasterio**: work with raster (image and arrays) data.
- **geopandas**: work with vector format (shapefiles, geojson - points, lines and polygons) using a geodataframe format.
- **earthpy**: plot and manipulate spatial data (raster and vector).

Packages can contain many modules (i.e. units of code) that each provide different functions and can build on each other. For example, the matplotlib package provides functionality to plot data using modules, one of which is the commonly used module called pyplot.

Every Python package should have a unique name. This allows you to import the package using the name with the import command.

For example, the command below imports the matplotlib package.

```
import matplotlib
```

you can also import only a module from this library using

```
import matplotlib.pyplot
```
It is very common to use an alias when importing any package, such as:

```
import matplotlib as plt
```
here the alias for the *matplotlib* package is *plt*

In [10]:
import numpy as np

list1 = [2,3.4,4,5.2,6,7,8,9]
print(np.mean(list1))
print(np.min(list1))
print(np.median(list1))

5.575
2.0
5.6


*np.()* and the key tab can show all the possible functions and classes to use with numpy

# Commonly Used Aliases for Python Packages

There are many packages and modules that have standard alias names. A few commonly used aliases within the **Python** community (and thus used in this textbook) are listed below:

| package.module        | alias |
|----------------------|-------|
| matplotlib           | mpl   |
| matplotlib.pyplot    | plt   |
| numpy                | np    |
| pandas               | pd    |
| rasterio             | rio   |
| geopandas            | gpd   |
| earthpy              | et    |


## Best Practices for Importing Python Packages In Scientific Code

There are a set of best practices that you should follow when importing **Python** packages in your code. These best practices are outlined in the **PEP 8 guidelines** and apply to both **Python** scripts and to working in **Jupyter Notebook** files.

---

### 1. Import Python Libraries at the Top of Your Script or Notebook

It is good practice to import all of the packages that you will need at the top of your **Python** script (.py file) or in the first code cell of a **Jupyter Notebook** file.

This allows anyone looking at your code to immediately know what packages they need to have installed in order to successfully run the code. This rule also follows the PEP 8 conventions for **Python** code.

```python
import os
import pickle
import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
```




# Install Packages in Python

Previously in this chapter, you learned about conda environments and the difference between conda and pip. On this page, you will learn how to create and work with conda environments. You will also learn how to install Python packages using the conda-forge channel.

In order to create a conda environment, you first need to install an conda distribution. To do this, you have two main options: Anaconda and Miniconda.

Anaconda ships with a suite of libraries and software pre-installed, which makes it quite large (~3Gb). All of the installed packages can also lead to dependency conflicts as you install new packages.

Miniconda, on the other hand, is a streamlined conda distribution. It only contains critical packages and software such as the conda package manager and a basic Python environment.

Miniconda is predominately designed for users who know what packages they need and do not want or need the extra installations. For this textbook, we suggest that you use the Miniconda installation.

Once you have conda installed on your machine, you can create your first conda environment:

```sh
$ conda create -n myenv Python=3.7
```

### List Available Conda Environments

Up to this point, you have constructed one or multiple conda environments. In order to make use of a conda environment, it must be activated by name.

Conda doesn’t expect you to remember every environment name you create over time, so there is a built-in command to list all that are available:

```sh
$ conda env list
```

Activate an Environment for Use
Now that you have the name of the env that you would like to use, you can activate it using:
```sh
$ conda activate myenv
```
After activating your environment, run conda env list again, and notice that the asterisk has moved to myenv signifying that this environment is currently active.

```sh
$ base                       /Users/test/miniconda3
$ myenv                  *   /Users/test/anaconda3/envs/myenv
$ otherenv                   /Users/test/anaconda3/envs/otherenv
```
Once you have activated a conda environment, all installations that you run will be installed specifically to this environment. This allow you to have ultimate control when installing and managing dependencies for each project.

