# General Python Information

---

## "What is currently available in Python libraries?"

There are two parts to this question:
- What packages _exist_ that are useful?
- What is the current _state_ of these packages?

### What packages _exist_ that are useful?

Python has a _huge_ number of packages out there for data analysis and computation.

There are lots to consider when selecting what packages you use, such as _scalability_ and _portability_.

Personally, I recommend the _Pangeo_ software stack, based on:

- [xarray](http://xarray.pydata.org/en/stable/): CDF-like data model (can read and write NetCDF files)
- [dask/distributed](https://dask.org/): To easily parallelize `xarray` workflows
- [Jupyter](https://jupyterlab.readthedocs.io/en/stable/): A browser-based, "terminal"-like interface with nice graphics capabilities

Other packages provide additional useful capabilities, such as:

- [numpy](https://www.numpy.org/): For general Python N-dimensional arrays
- [scipy](https://docs.scipy.org/doc/scipy/reference/): For scientific python functions and features based on NumPy arrays
- [scikit-learn](https://scikit-learn.org/stable/): For data mining and data analysis
- [matplotlib](https://matplotlib.org/): For general plotting capabilities
- [cartopy](https://scitools.org.uk/cartopy/docs/latest/): For plotting data on map projections
- [geoviews](http://geoviews.org/): For simple, concise geographical visualization
- [hvplot](https://hvplot.pyviz.org/): A high-level plotting API for the PyData ecosystem built on HoloViews.
- [esmlab](https://esmlab.readthedocs.io/en/latest/): NCAR package built on `xarray` to provide common statistics, climatology, and anomaly computation
- [intake/intake-esm](https://intake-esm.readthedocs.io/en/latest/): For easily "getting" CESM and CMIP data into `xarray` data structures
- [xesmf](https://xesmf.readthedocs.io/en/latest/): Xarray-based regridding package

More packages are on the horizon.  For example, the NCL team is "pivoting" to Python (codename `Skylab`).

### What is the current _state_ of these packages?

...Well, that's what this tutorial is all about.

---

## "Which statistics packages are available in Python?"

Again, there are plenty.

The most commonly used today are:

- [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html): General-purpose Python statistics functions (for NumPy arrays)
- [Statsmodel](https://www.statsmodels.org/stable/index.html): For estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. 
- [rpy2](https://rpy2.bitbucket.io/): R for Python
- [esmlab](https://esmlab.readthedocs.io/en/latest/): In progress, but is `xarray`-friendly

---

## "How to install external python libs on my laptop / desktop?"

 To best way to get the scientific Python environment is using the [Conda package management system](https://conda.io/docs/). Please follow the [official installation guide](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) for installing on 
 - Linux: https://conda.io/projects/conda/en/latest/user-guide/install/linux.html
 - Mac: https://conda.io/projects/conda/en/latest/user-guide/install/macos.html
 - Windows: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html

Linux/Mac also comes with a system Python (/usr/bin/python). Don't touch that. Windows users might find the full Anaconda (Conda plus tons of packages) with graphical interface easier to use than the command line.

After installation, check the paths:

```bash
$ which conda python
.../miniconda3/bin/conda
.../miniconda3/bin/python
```