# Programming for Data Analysis Assignment 2019

## Problem Statement
The assignment concerns the `numpy.random` package in Python.
Include a Jupyter notebook (this one) to explain the use of the package including detailed explanations of at least five of the distributions provided for in the package.
There are four distinct task to be carried out in the Jupyter Notebook.
1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions. 
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.

## Documents required for the submission
A git repository containing:
-  a README
- a git ignore file
- a Jupyter Notebook

## Requirements
A good submission will be clearly organised and contain concise explanations of the particularities of the dataset. The analysis contained within the notebook will be well conceived, interesting, and well researched. Note that part of this assignment is about the use of Jupyter notebooks and so you should make use of all the functionality available in the software including images, links, code and plots. You may use any Python libraries that you wish, whether they have been discussed in class or not.



### instructions as per the video - delete later. 
This is just for me to refer to.
The documentation for the NumPy package at [numpy.random docs (1.16.1 reference/routines)](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html)is broken down into four sections. 

- [simple random data](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html#simple-random-data)  
- [permutations](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html#permutations)  
- [distributions](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html#distributions)    
- [random generator](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html#random-generator)  

My understanding of this assignment from the video is that we are look at the numpy.random package and instead of just going through every function in the package (as this is already well documented), go through the numpy.random package and describe it more in your own words - in less robotic terms. No need to go through every single function but instead take a handful of the functions of random.numpy and describe them using things like plots, descriptive statistics or anything else we fancy instead of just noting the parameters, arguments and outputs of each function in the documentation. Give a more real descriptive flavour of what the package is about.




# First a quick overview of the NumPy package

Here I will start by making some notes from the lectures on the NumPy package, the package documentation and the Python for Data Analysis book chapter 4 by Wes McKinney. 

The [numpy quickstart tutorial](https://numpy.org/devdocs/user/quickstart.html)provides a good overview of the NumPy package.
I won't go into too much detail here as this assignment is just about the `numpy.random` package but just a few points to refer to!

[what is numpy](https://docs.scipy.org/doc/numpy/user/whatisnumpy.html)

`NumPy` is a Python package that is very quick at dealing with numbers in very long lists or even lists of lists - NumPy works well with arrays of numbers. Raw Python's built-in matrix operations are quite inefficient when compared to NumPy's capabilities. NumPy is a specialist package that deals with multi-dimensional arrays that resemble lists within lists. 

NumPy arrays are rectangular in shape, such as 2-dimensional array where every row and every column have the same number of columns. The number of rows and columns can be different but each row must contain the same number of elements and each column contains the same number of elements.
Multi-dimensional arrays with more than two dimensions can be created. Two-dimensional array are probably the mosy commonly used. 
Operations can be performed on NumPy arrays very quickly. NumPy has many algorithms for dealing with numerical operations on arrays.
NumPy can be used to simulate data. While csv data is commonly imported into NumPy for analysis (or more likely through the Python Pandas packages which uses NumPy in the background), it is often handy to be able to simulate data to analyse before the real data may be collected or available.

NumPy is typically used through other packages. It is a specialist packages for dealing with arrays of numbers.  While NumPy provides a computational foundation for working with numbers, it does not have the modelling or scientific functionality of other computational packages in Python but these packages use NumPy's array objects. 
 
Numpy works well with arrays of numbers and does operations on matrices in a far more efficient  way tha using raw python matrix operations. Matrix operations are very commonly used in computer applications including analysing data, resizing or compressing images or converting cd's into mpgs.


- NumPy is short for Numerical Python. 
- NumPy is one of the most important foundational packages for numerical computing in Python. 
- NumPy's main data structure is an `ndarray` - an a homogenuous multidimensional array. An *ndarray* is also known by the alias *array*.
- A NumPy array is like a table of elements that are all of the same type. The elements which are usually numbers are indexed by a tuple of positive integers. Axes refer to the dimensions of a NumPy array.

- NumPy has many mathematical functions that avoid the need to use loop as they work on entire arrays of data. 
- NumPy is designed for efficiency on large arrays of data which is partly due to the way NumPy stores data in a contiguous block of memory. NumPy's algorithms can operate on this memory without any type checking. NumPy arrays use much less memory than python's own built-in sequences. 
- NumPy arrays use *vectorisation*. This is where batch operations can be performed on arrays without using loops. 
- arithmetic operations are carried out *elementwise* on an array and result in a new array being created.
- some operations on arrays actually modify an array in place and do not create a new one. `+=`, `-=`
 
## NumPy's ndarray object. 
The *ndarray* is an N-dimensional or multi-dimensional array object. It is a fast and flexible container for larger datasets in Python. You can perform mathematical operations on entire arrays using the same kind of syntax to similar operations on scalar values. 
- The data in an *ndarray* must be homogeneous meaning that all it's data elements must of the same type.
- Arrays have a *ndim* attribute for the number of axes or dimensions of the array
- Arrays have a *shape* attribute which is a tuple that indicates the size of the array in each dimension.
The length of the *shape* tupple is the number of axes that the array has. 
- Arrays have a *dtype* attribute which is an object that describes the data type of the array.
- The `size` attribute is the total number of elements in the array.


An ndarray can be created using the Numpy `array` function which takes any sequence-like objects such as lists, nested lists etc. Other NumPy functions create new arrays such as `zeros`, `ones`, `empty`, `arange`, `full` and `eye` among others. All these functions create an array object. 



## numpy.random

- `numpy.random` is a sub-package of the numpy package. 

- `numpy.random` generates random numbers. 
There are many different ways to generate (pseudo) random numbers. These random numbers are really pseudo random numbers as computers can't actually generate random numbers.

There are many many functions in the numpy.random package that relate to generating random numbers from different probability distributions and therefore it is not really possible to know all of these without referring to the documentation. 
Plots can illustrate the difference between different probability distributions. 
see random generation, random state, set state, get stats

### Random sampling / numpy.random
https://docs.scipy.org/doc/numpy/reference/random/index.html?highlight=numpy.random#module-numpy.random

assumptions made about the type of random numbers you want, depend on distributions.
- Permutations 
- shuffle - 
Shuffle changes array in place while permutations doesnt.




###  The numpy.random package documentation on docs.scipy.org

I noticed that the documentation in the latest version 1.17 does not seem to be listing `numpy.random.rand()` function in the same way as the documentation for NumPy version 1.16 (or NumPy versions 1.15 and 1.14).

[numpy-1.15.1/reference.routines.random](https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html) has `numpy.random.rand(d0,d1,...,dn)`
The [numpy/reference/random](https://docs.scipy.org/doc/numpy/reference/random) has a random.sampling(numpy.random) section which has a [quick start guide](https://docs.scipy.org/doc/numpy/reference/random/index.html#quick-start) and it outlines some changes since the videos were made and refers to the **generator** as a replacement for Random.State.

I'll have to go through the documentation and see what the changes are.

- https://docs.scipy.org/doc/numpy/reference/random/generated/numpy.random.Generator.random.html#numpy.random.Generator.random

Note that the documents under this url refer to `numpy.random.generator` but I think this is just the newer form.


In [None]:
### numpy.random.rand()
pass the dimensions into the fucntion. if you don't pass the dimesions yuo get a scalar.
[numpy.random.rand](https://docs.scipy.org/doc/numpy/reference/random/generated/numpy.random.Generator.random.html#numpy.random.Generator.random).

You get an array so that every row has the same number of elements, every column has the same number of elements. can be different number of rows as columns.

### References
- Python for Data Analysis - chapter 4 NumPy Basics: Arrays and Vectorised Computation by Wes McKinney
