# Installing Anaconda

<img src="https://www.anaconda.com/wp-content/uploads/2018/06/cropped-Anaconda_horizontal_RGB-1-600x102.png" alt="drawing" width="150" align="middle"/>


This section will walkthrough how to install Anaconda and Python. Anaconda is a free and open-source distribution of Python that focuses on data science, machine learning and data analysis. 

Anaconda uses a package management system _conda_. Anaconda can be installed on Windows, Linux and MacOS. 

## Download Anaconda
* [Download Anaconda](https://www.anaconda.com/distribution/)


## Installation Documentation:
* [Anaconda Installation Guide](https://docs.anaconda.com/anaconda/install/)

## Running Jupyter Notebook

<img src="https://jupyter.readthedocs.io/en/latest/_static/_images/jupyter.svg" alt="jupyter" width="150" align="middle"/>

* [Jupyter Notebook documentation](https://jupyter.readthedocs.io/en/latest/running.html)


## numpy Documentation 

* [numpy Official Documentation](https://docs.scipy.org/doc/numpy/)

I'll refer to the relevant sections of the numpy documentation throughout this tutorial as we work through these examples. 

### Running Jupyter Notebook Cells

Jupyter notebook is a great way to code iteratively. You can run a cell by clicking on the "Run" button above or you can use the shortcut by holding both __"Shift + Enter"__ or __"Shift + Return"__ if you are on a mac.

## Programming in Python & numpy

[Python](https://www.python.org/) is a high-level general purpose programming language. It comes with several great built in libraries and built in functions that make it easy to get started programming quickly. The syntax is also very human readable.

When working with matrices using a scientific library like [numpy](https://numpy.org/) can be helpful. Below we will start writing code for our randomly generated matrix problem.

Here on line 7 is an example of an `import` statement. An `import` statement allows you to bring in other existing code into your program. This existing code is called a library. There is a bit more to importing code but is outside the scope of this project. For now, I will leave this [link](https://www.codementor.io/@sheena/python-path-virtualenv-import-for-beginners-du107r3o1) if you want to learn more about it on your own.

In [2]:
# I'll be using a lot of comments in this notebook.
# In this first cell we are going to import the libraries to use for
# creating our matrix. 

# numpy is imported as 'np' as a convention. 

import numpy as np 

Python lists is one of the most basic data structures in Python. Here is a quick run down on some [list basics]:(https://www.tutorialspoint.com/python/python_lists.htm). 

Link to the [Python Official Documentation on Lists](https://docs.python.org/3/tutorial/datastructures.html)

In [3]:
# There are a couple data structures that we will rely on. As you may 
# know the basic data structure is a numpy array (vector).
# To create an array we need to leverage an existing Python data structure: list
# Here we are going to use a list (denoted by square brackets) to create an array. 

# Creating list mylist
mylist = [1,2,3,4]

# Transforming mylist into an array:
myarray = np.array(mylist)

# Reviewing the output of myarray:
print(myarray)

# Still looks like a list but it is really a numpy array. Let's 
# take a closer look.

[1 2 3 4]


`type` is a Python built-in (pre-packaged) function. More information about `type` is available [here](https://docs.python.org/3/library/functions.html#type)

In [4]:
# Reviewing the data type of myarray:
print(type(myarray))

# Notice that the array is called an ndarray. This is short for 
# n-dimensional array. We can quickly find the shape (number of 
# rows vs columns) by using a numpy method. 

<class 'numpy.ndarray'>


### Documentation for Shape

* https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html

In [5]:
# This means we have an array with four entries. We will use this later
# when verifying the shape of our randomly generated matrix. 
myarray.shape

(4,)

# Generating a Random Number

[Relevant Doc: Generating Randomness](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.random.html)

numpy provides a random class that generates real or integer numbers for us automatically. Let's take a look:

In [6]:
# How to generate a single real random number. By default it will
# generate a real number between 0 and 1.

np.random.random()

0.5808285831200546

In [7]:
# How to generate a single integer number. Here, the input determines
# what kind of data is returned to us. 

# If we enter one digit it will randomly select an integer from 0 to
# one less the provided number. For example, if we enter 3 into our
# randint method numpy will generate a number between 0 and 2. This 
# means it excludes the number we provided. Everything up to but not
# including the input will be randomly generated. 

np.random.randint(3)

2

# Generating a Random Matrix

[Relevant doc: Random Integers](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html)

This is helpful in explaining the parameters for the randint method.

In [8]:
# Creating a matrix of shape 1000 x 1000 with randomly generated 
# integers from 1 to 9 (included).

data = np.random.randint(low=1, high=10, size=(1000,1000))

# This randint method is nice because it does the work of creating
# the randomly generated matrix with lower and upper thresholds for
# the element values. Plus, it defines the size of the matrix for you.


In [9]:
# Preview of the data. Too big to display all of it. What jupyter does
# is it gives you a preview of the first few records and then skip
# to the tail (or end) and show you the last few records in the 
# matrix. 
data

array([[7, 4, 9, ..., 1, 6, 5],
       [2, 1, 9, ..., 7, 6, 2],
       [2, 1, 8, ..., 8, 8, 9],
       ...,
       [3, 2, 3, ..., 9, 4, 7],
       [4, 4, 8, ..., 6, 4, 9],
       [3, 7, 8, ..., 5, 9, 8]])

In [10]:
# The shape method returns a tuple of the (rows, columns) for 
# the data array. 

data.shape

(1000, 1000)

In [11]:
# Checking the maximum value in the matrix. We would expect 9 since
# we defined our high parameter as 10. Recall that it is up to but not
# including the high number. 

data.max()

9

In [12]:
# Verifying that the lowest integer in our matrix matches the 
# parameters above. 

data.min()

1

Slicing and indexing arrays is a quick notation for splitting arrays and matrices into smaller parts. Please refer to the documentation for more [information]:(https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)

In [13]:
# This shows the first two arrays in the data matrix. 
data[:2]

array([[7, 4, 9, ..., 1, 6, 5],
       [2, 1, 9, ..., 7, 6, 2]])

# Finding Inverse of a Matrix

Here, we will use the Linear Algebra class in the numpy library.

Relevant doc: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

In [31]:
# Let's create a matrix called A using the randint method.

A = np.random.randint(low=1, high=10, size=(1000,1000))

In [32]:
# Creates the inverse of matrix A.
inv_A = np.linalg.inv(A)

In [33]:
# Let's preview A.
inv_A

array([[ 0.16534819,  0.02048358,  0.20419251, ...,  0.08223049,
         0.19077942, -0.11074238],
       [ 0.02987578, -0.00484866,  0.07188619, ...,  0.0249551 ,
         0.03525042, -0.0228002 ],
       [-0.12338765, -0.01607628, -0.14600891, ..., -0.05990657,
        -0.1448447 ,  0.08123378],
       ...,
       [-0.10186225, -0.00114145, -0.08034437, ..., -0.0203317 ,
        -0.05344795,  0.02203437],
       [-0.03053698, -0.00073196, -0.02667256, ..., -0.01260905,
        -0.03475255,  0.02674155],
       [-0.18604652, -0.0014895 , -0.15529609, ..., -0.042388  ,
        -0.1079657 ,  0.07397349]])

In [17]:
#  Okay, but how do we know that this actually generated the inverse
# of matrix A? We know that the dot product of a matrix and its inverse
# results in an identity matrix. 

np.dot(inv_A, A)

array([[ 1.00000000e+00,  5.05706588e-14,  2.99760217e-14, ...,
         3.69704267e-14,  8.49320614e-15, -2.80886425e-14],
       [ 2.04281037e-14,  1.00000000e+00, -5.20694599e-14, ...,
        -5.26245714e-14,  9.17044218e-14, -1.21902488e-13],
       [-6.89170943e-14, -1.42733048e-14,  1.00000000e+00, ...,
        -1.02633180e-13, -5.60107516e-14, -1.57672486e-13],
       ...,
       [ 1.84643967e-14,  1.30451205e-14,  9.74220704e-15, ...,
         1.00000000e+00,  1.64313008e-14,  1.19904087e-14],
       [ 2.92266211e-14,  3.14054338e-14,  1.69031455e-14, ...,
         2.75890422e-14,  1.00000000e+00,  1.73194792e-14],
       [-3.39867023e-14, -3.59157148e-14, -1.92068583e-14, ...,
        -3.77198273e-14, -1.89570581e-14,  1.00000000e+00]])

In [29]:
# Looks a bit odd but this is because the numbers are very small.
# Let's use round and abs methods to clean this up.

identity_matrix = np.dot(inv_A, A)

# Absolute Value of rounded dot_product
identity_matrix = np.abs(identity_matrix.round())

In [30]:
identity_matrix

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [9]:
# Ok, what about a matrix of size 10,000 x 10,000? We will reuse
# the same code from above but change the size. Recall, the size
# defines the (rows, columns) shape of your matrix. 

# For this example, I am being explicit with the parameters.
new_matrix = np.random.randint(low=1, high=10, size=(10000,10000))


In [10]:
new_matrix

array([[1, 3, 7, ..., 9, 8, 2],
       [1, 8, 5, ..., 3, 8, 8],
       [4, 7, 1, ..., 6, 4, 7],
       ...,
       [1, 2, 1, ..., 4, 3, 3],
       [9, 2, 3, ..., 8, 8, 6],
       [8, 4, 7, ..., 2, 6, 8]])

In [11]:
new_matrix.shape

(10000, 10000)