# Numpy and Matplotlib

* Bio 724D, Spring 2024
* Paul M. Magwene


## Numpy

Numpy is the de facto library for numerical computing in Python.

* Provides an efficient multi-dimensional array data structure that facilitates a wide variety of numerical computing tasks.

* Provides a wide range of functions for array manipulation and core numerical functions that are important for many numerical tasks in mathematics, statistics and machine learning, bioinformatics, etc.

* Basis for many other packages

## What about Scipy?

* Scipy is a companion to Numpy, providing a broad range of numerical routines that are widely used in different fields, but perhaps not so fundamental as what's included in numpy
* Includes:
	* Integration
	* Optimization
	* Interpolation
	* Signal processing
	* Linear algebra -- numpy has a linalg module but scipy version adds some additional functionality
	* Spatial data 
	* Graph algorithms
	* Statistics
	* Image processing

## Numpy: key concepts

* Numpy arrays (type: `ndarray`) are multi-dimensional, ordered, and homogenous 

* Arrays have a size, dimension, and shape

* **Typed** -- Arrays have a data type (`dtype`) which specifies the types of the data they hold. You can leave this up to numpy to infer/guess, or you can specify `dtype` explicitly

* **Indexing and slicing** -- Array access is a logical extension of indexing for other Python data structures like lists but there are some subtleties to be aware of

* **Vectorized** -- Many Numpy operators and functions are vectorized and are applied to every element in an input array

* **Broadcasting** -- Numpy array operations/function that take two or more arrays as input exploit "broadcasting" to carry out operations between arrays. This as long as the dimensions of the arrays are compatible.

  * Sort of like a stricter version of R's "recycling"

    <p></p>
 
* **Views and copies** -- Most methods on arrays return "views" of/into the original array than copies.  Changing a view will change the data in the original array. To get a copy, make sure to use the `.copy()` method

## Numpy Broadcasting Rules

From the Numpy User Guide, [Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

> When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when
>
>    1. they are equal, or
>
>    2. one of them is 1.
>
>If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes.

## Matplotlib

Matplot is the de facto library for plotting in Python.

* Provides two "flavors" of plotting interfaces
  1. A "state based" function oriented interface through `matplotlib.pyplot`  
  2. An object oriented interface
 
     <p></p>

* The function based interface tends to be "higher level" while the object oriented interface provides greater flexibility and tweaking. It is not uncommon to mix the two types of interfaces when constructing plots interactively.

* Does not provide the elegance of a "grammar of graphics". However, provides powerful set of tools for creating novel graphics that would be hard or impossible to do in ggplot2.  Also has much better support for 3D graphics and other less common plot types.

## Basic plot types in matplotlib.pyplot

* `plot()` 
* `scatter()` 
* `hist()`, `hist2d()`
* `boxplot()`
* `violinplot()`
* `contour()/contourf()`
* `bar()`
* `pie()`
* `imshow()`