# Python for Data Analysis: Numpy

## Lesson Outline
1. Introduction to Numpy
2. Creating an array
3. Creating Arrays from Scratch
4. NumPy array attributes
5. NumPy Array Indexing: Accessing Elements of an Array
6. Copying Arrays
7. Exponents and Logarithms
8. Some Statistics
9. Reading from a file
10. Conclusion and QA

# Introduction to Numpy

As mentioned earlier, Python base functionality is rather limited. We need to use and import Python libraries. 

Now let us see what Numpy is all about (https://numpy.org/). Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Although datasets come from a diverse set of sources and formats, it is quite useful to think of data as arrays of some quantities (integers, float numbers, string or others)


Anyway, if you want to install Matplotlib and Numpy, use the following steps:

* You have Anaconda installed
* Go to your Windows Start button and select Anaconda prompt
* When the window opens, type 'pip install matplotlib'
* Wait till it says it is successful
* Type 'pip install numpy'
* Wait till the installation is successful.
* Close the window

Efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science. We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package.

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.'


The first step in using  Numpy is to import it in to Python. Typically a comment like 'import numpy' will suffice. However, most people import numpy (or any other package) with an **alias**. In this case, the alias for numpy is np. 

So all Numpy commands will have a prefix of **np.** If you did not use an alias, the command would have a prefix of **numpy** which is not a problem, but the use of an alias saves some extra typing.

You will do this by:

In [None]:
                       # np is the alias for numpy

Note that I am borrowing heavily from this book and have included the author's note. He has been kind enough to put the book in its entirety online.


*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

### Creating an array
We can create a numpy array of integer numbers, using the np.array command. Note that an array is just a collection of like objects. They have to be like objects. Unlike a list you cannot mix up integer, float, string and other variables.

In [None]:
# integer array:
    # Notice the use of brackets.

Use of the np.arange function. Just like the range() function in Python, it generates an array of integers from 0 to n-1.

The command below is interpreted as 'Generate numbers from 0 to 5 in steps of 0.5'.

Use of the np.linspace function. This function is similar to np.arange. The command below is interpreted as 'Generate 6 equispaced numbers from 0 to 10'.

Use of the np.quantile function to get any quantile or percentile

In [None]:

    # this gets the 50th value

Adding an element to an array requires the use of np.append or np.insert. np.append adds an element to the end of the array while np.insert inserts an element in a specific position

## Creating Arrays from Scratch

Create an array of just zeroes. 

Create an array with a linear sequence starting at zero and ending at 20 with the step size 2. This is similar to the range function we have used previously.

In [None]:
# achieving above using numpy


Create an array of 11 evenly spaced values between 0 and 10.

Create a 1 x 3 array of uniformly distributed random values between 0 and 1.

Create a 1 x 3 array of normally distributed values with mean 0 and standard deviation 1.

In [None]:
np.std(a)

In [None]:
 # check out why the mean is not zero (0)

Create a 1 x 3 array of random integers in the interval [0, 10)

## NumPy array attributes

## NumPy Array Indexing: Accessing Elements of an Array
An important thing to keep in mind is that in Python, counting starts from 0. So if there are 10 elements in an array, they are indexed from 0, 1, 2, 3 etc all the way to 9.

In [None]:
            # Notice the use of square brackets to access the elemtns of an array

In [None]:
     # This accesses the last cell of an array

In [None]:
       # This access the second last element of an array

In [None]:
   # Two-dimensional array
 # Three-dimensional array

Access the entire row #1

Access the entire column #2

Access specific elements in column #2

In [None]:
        # Note: read-up how to access specific elements ! 

check this: https://blog.finxter.com/how-to-get-specific-elements-from-a-list/

### You can also access elements of a 3-D array

In [None]:
 # x, y, z

In [None]:
                # Access all the elements of second column of x2

In [None]:
                # Access all the elements of the second row of x2

In [None]:
               # Access the 0th, 1st, 2nd and 3rd elements of the second column of x2

read more: https://realpython.com/numpy-tutorial/

## Copying Arrays. Extremely Important
A strange thing that happens in Python is that when you copy an array, and then modify the copy, the original array is also changed, so BE VERY CAREFUL.

In [None]:
                          # Modify one element of the array, y

This can often trip people up if not careful. So always, if you wish to keep the original array, then use the copy statement instead.

## Exponents and Logarithms

In [None]:
# import this

## Some Statistics
![image.png](attachment:image.png)

## Reading from a file
First always make sure that the file is stored in the same directory as your Jupyter notebook. If not, you will get an error message unless you specify the actual path of the file. The example below reads from a file and saves it in a numpy array called 'data'.

We are also using the option 'skip_header = 1' becasue there is a header in the file that we do not want to read. All the core data values begin from the 2nd line onwards.

You can also ask Numpy to directly read from a .csv file. A .csv file is like an Excel file but all fields are separated by a comma (even if it doesnt appear to be so when you open the file using MS Excel). The extension .csv refers to 'comma separated values'.

In [None]:
# Skip the first line because it contains the header


In [None]:
# Print the shape of the numpy array


In [None]:
# Print the dimensions of the numpy array


In [None]:
# Print the size of the numpy array


We can also make copies of the original Numpy array, coredata. 
Notice that we are using the .copy() command to ensure that changes we make in each of sub-arrays does not impact the original numpy array data. We are selecting specific columns corresponding to different measurements

**Note the feature and index of this core data**
- Depth 0
- Porosity 1
- TOC 2
- Quartz 3
- Calcite 4

## Conclusion & QA