# 9. Importing Modules

While creating your own functions can be incredibly useful, others may have done so already (and these might actually be faster).

Often, these will come packaged as part of a wider python library.

A simple example of this is calculating the Euclidean distance between two points in space.

Previously, you learned to write exponents, so you can write this distance as a function yourself.

In [None]:
# Exercise 10.1: Write a function that calculates the Euclidean between two points

However, certain python libraries already do this, allowing you to perform the same 
operation/function without having to write all of the above lines.

One of the modules available for these math operations is NumPy!

To use numpy in your code, all you need to do is

In [None]:
import numpy

We can now use numpy's function etc as `numpy.functionname()`. It can often be useful to give an module a 'alias', especially if the name is long. You do this by adding `as` and the new name:

In [None]:
import numpy as np

And now you can use the functions by `np.functionname()`. You can choose whatever you want, but be careful not to pick a name that already has another use within python.

NumPy has a Euclidean distance calculator.

You can now either import all of NumPy, or just the function you need. You might want to do the latter,
as that helps your code run faster, but for the purposes of this tutorial, import NumPy as above.

To calculate the Euclidean distance, use:

In [None]:
print(numpy.linalg.norm(numpy.array([1,2,3]) - numpy.array([1,2,3])))
print(numpy.linalg.norm(numpy.array([1,2,3]) - numpy.array([3,2,1])))

###### So what is Numpy exactly?

NumPy stands for "Numerical Python" and, along with some other libraries (SciPy, Pandas, etc.), is the core library used for scientific computing. It contains a big number of tools and functions that can be used to solve an array of common and not so common problems, some examples of which you'll see below. However, most importantly, NumPy contains the all powerful NumPy arrays!

###### What is a numpy array?

A numpy array is a bit like an improved version of a Python list, mentioned above. It "is a high-performance multidimensional array object that is a powerful data structure for efficient computation of arrays and matrices. To work with these arrays, there’s a huge amount of high-level mathematical functions operate on these matrices and arrays." 

In other words, an array is a good structure to store data from 1D, 2D or 3D matrices.

This means that arrays can have rows and columns. In a 2D array, rows can also be called as the "axis 0" while columns are the "axis 1". The number of axis will go up according to the dimentions of the arrays so a 3D array would also have an "axis 2". These axes are useful when it comes to manipulating the data in your arrays. 

We will see some hands on examples of arrays below.

In [None]:
# An example of a 2D array
my_2d_array = numpy.ones((5, 3))

print(my_2d_array)

# Print out the shape of `my_array`
print(my_2d_array.shape)

# Print out the data type of `my_array`
print(my_2d_array.dtype)



In [None]:
# This is an example of a 3D array
my_3d_array = numpy.ones((2, 5, 3))

print(my_3d_array)

# Print out the shape of `my_array`
print(my_3d_array.shape)

# Print out the data type of `my_array`
print(my_3d_array.dtype)



You can also add, subtract, multiply or divide your arrays. Although the usual numeric symbols will work, numpy also provides ready made functions for you.

In [None]:
my_new_array = my_2d_array + 1

print(my_new_array)

In [None]:
my_new_array = my_2d_array + my_3d_array

print(my_new_array)

In order to perform arithmetic, and other, operations on two, or more arrays, there are certain criteria that need to be fulfilled. 

Firstly, their dimensions need to be compatible. This is the case when they are equal.

Secondly, two dimensions are compatible when one of them is 1.

Thirdly, the arrays need to be compatible in all directions.

We could be spending hours talking about arrays and all the functionality of numpy so this is where we will leave this part. If you are interested in learning more about arrays, this is a very good tutorial from datacamp:

https://www.datacamp.com/community/tutorials/python-numpy-tutorial


There are aslo a multitude of Python libraries just waiting for you to discover them.

Here are some examples:

Math (https://docs.python.org/3.6/library/math.html) is a very useful library for mathematical
operations. This includes math.ceil(), math.floor(), math.fabs()(!), and math.factorial(),
among others.

NumPy (http://www.numpy.org/) and SciPy (https://www.scipy.org/) are excellent libraries 
that do a lot more that simple mathematical operations (like the Euclidean distance calculator above).
These also interface very well with plotting libraries.

Matplotlib (https://matplotlib.org/) is the most commonly used plotting library available. It is simple, 
and can do a lot. Other plotting libraries, such as GGPlot (http://ggplot.yhathq.com/), SeaBorn (https://seaborn.pydata.org/), and Bokeh (https://bokeh.pydata.org/en/latest/).

Pandas (https://pandas.pydata.org/) is a great library to manipulate data structures. 
It's ease of use makes it ideal to work with large data sets.

MDAnalysis (https://www.mdanalysis.org/), MDTraj (http://mdtraj.org/1.9.0/), and 
PyTraj (https://github.com/Amber-MD/pytraj), are some of the libraries used to process
Molecular Dynamics trajcetories and other files. These interface very well with 
NumPy and Pandas and Matplotlib.

IPython (https://ipython.org/) is a useful shell for interactive python.

Jupyter (https://jupyter.org/) is what we are using right now! It is very useful for tutorials!

More advanced libraries:

SciKit-Learn (http://scikit-learn.org/stable/) for machine learning in Python.

Sys (https://docs.python.org/2/library/sys.html) and ArgParser (https://docs.python.org/2/howto/argparse.html) are used to take inputs and write outputs from the Terminal.

Requests (http://docs.python-requests.org/en/latest/) is used to process information from the web.

MPI4Py (http://mpi4py.readthedocs.io/en/stable/) allows Python scripts to be parallelised 
(run over multiple processors).

Cython (http://cython.org/) allows you to write parts of your code in C/C++, making it very fast.