# Introduction to Jupyter, Numpy and Pandas

This is a Jupyter Notebook, an interactive browser based application where you can write and execute code, display results, and develop a narrative about your work. The notebook is a sequence of cells, into which you can enter code snippets and which will, after you execute them, produce output cells with the the result.

In [5]:
print('Hello, world!')


Hello, world!


#### Use markdown cells to describe to a reader...and sometimes future versions of yourself...what you are doing and why.  

You can change a cell to a code cell by clicking outside and pressing 'Y'.  
You can change to markdown by pressing 'M'.

In [None]:
# Math is easy
2+2==4


**Anaconda** is a platform that contains Jupyter, and hundreds of other packages that are very useful and popular.<br>
Today we'll do a brief introduction to the basics of Jupyter notebooks, and touch on some of the tools <br>
you'll be using in this course, such as **_Numpy and Pandas_**.

In [None]:
import numpy as np

### The central object in numpy is a multidimensional array

Numpy arrays can be used to represent scalars, vectors, matrices, tensors.

In [51]:
# integer array from a list:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

Python has a built in array object. It is *much* faster to work with Numpy arrays.  
There are some issues worth remembering: Python lists can have elements of different types, but Numpy arrays  
must have elements of the same type

In [54]:
# You can explicity declare the datatype:

np.array([1.4, 2, 3, 4], dtype='float32')


array([1.4, 2. , 3. , 4. ], dtype=float32)

Let's say you have a *nested* list that you pass to <code>np.array<code>

In [55]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

There are a variety of ways to generate arrays

In [56]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [8]:
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [58]:
# (like the pythion range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [59]:
a = np.array([range(i, i + 3) for i in [2, 4, 6]])
print(a.size)
print(a.shape)
print(a.ndim)

9
(3, 3)
2


Array slicing displays views of subarrays. These are *not* copies! 

In [60]:
# Make a 3D array with random numbers

np.random.seed(0)  

x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

x3

array([[[5, 0, 3, 3, 7],
        [9, 3, 5, 2, 4],
        [7, 6, 8, 8, 1],
        [6, 7, 7, 8, 1]],

       [[5, 9, 8, 9, 4],
        [3, 0, 3, 5, 0],
        [2, 3, 8, 1, 3],
        [3, 3, 7, 0, 1]],

       [[9, 9, 0, 4, 7],
        [3, 2, 7, 2, 0],
        [0, 4, 5, 5, 6],
        [8, 4, 1, 4, 9]]])

In [62]:
x3[2][1]

array([3, 2, 7, 2, 0])

To copy an array, or subarray, use .copy()

In [63]:
x3_sub = x3[:2, :2].copy()
print(x3_sub)

[[[5 0 3 3 7]
  [9 3 5 2 4]]

 [[5 9 8 9 4]
  [3 0 3 5 0]]]


### The Pandas package is used for imporrting and preparing data

In [20]:
import pandas as pd

If we have some data, it can be organized into an array-like object, with an **Index** for samples and **Titles** for features

In [21]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

In [22]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [23]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [24]:
states.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

In [25]:
states.columns

Index(['population', 'area'], dtype='object')