# Introduction to NumPy and Pandas

## Learning Objectives
* Python, Pandas, NumPy
* Gain experience forking our class repo and testing your solutions.

In [3]:
%load_ext autoreload
%autoreload 2


from pathlib import Path
home = str(Path.home()) # all other paths are relative to this path. change to something else if this is not the case on your system

import sys
sys.path.insert(0,'..')

import py487

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Python, NumPy, and Pandas

While other IDEs exist for Python development and for data science related activities, one of the most popular environments is Jupyter Notebooks.

This lab is not intended to teach you everything you will use in this course. Instead, it is designed to give you exposure to some critical components from NumPy and Pandas that we will rely upon routinely. 

## Stop and read
Please read and reference the following as your progress through this course. 

* [What is the Jupyter Notebook?](https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/What%20is%20the%20Jupyter%20Notebook.ipynb#)
* [Notebook Tutorial](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook)
* [Notebook Basics](https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb)

## Exercises 1-5

Answer the following using numpy.

#### Exercise 1. Make an array a of size 6 × 4 where every element is a 2.

In [5]:
a = py487.misc.exercise_1()
a

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

#### Exercise 2. Make an array b of size 6 × 4 that has 3 on the leading diagonal and 1 everywhere else. (You can do this without loops.)

In [6]:
b = py487.misc.exercise_2()
b

array([[3., 1., 1., 1.],
       [1., 3., 1., 1.],
       [1., 1., 3., 1.],
       [1., 1., 1., 3.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

#### Stop and think: Why can you multiply these two matrices together? Why does a * b work, but not dot(a,b)?

In [7]:
a*b

array([[6., 2., 2., 2.],
       [2., 6., 2., 2.],
       [2., 2., 6., 2.],
       [2., 2., 2., 6.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.]])

In [8]:
import numpy as np

np.dot(a,b)

ValueError: shapes (6,4) and (6,4) not aligned: 4 (dim 1) != 6 (dim 0)

#### YOUR SOLUTION HERE

#### Stop and Think: Compute dot(a.transpose(),b) and dot(a,b.transpose()). Why are the results different shapes?

In [9]:
np.dot(a.transpose(),b)

array([[16., 16., 16., 16.],
       [16., 16., 16., 16.],
       [16., 16., 16., 16.],
       [16., 16., 16., 16.]])

In [10]:
np.dot(a,b.transpose())

array([[12., 12., 12., 12.,  8.,  8.],
       [12., 12., 12., 12.,  8.,  8.],
       [12., 12., 12., 12.,  8.,  8.],
       [12., 12., 12., 12.,  8.,  8.],
       [12., 12., 12., 12.,  8.,  8.],
       [12., 12., 12., 12.,  8.,  8.]])

#### YOUR SOLUTION HERE

#### Exercise 3. Find the overall mean of matrix ``c``.

In [11]:
np.random.seed(1)
c = np.random.rand(6,4)
display(c)
m = py487.misc.exercise_3(c)
m

array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01],
       [1.46755891e-01, 9.23385948e-02, 1.86260211e-01, 3.45560727e-01],
       [3.96767474e-01, 5.38816734e-01, 4.19194514e-01, 6.85219500e-01],
       [2.04452250e-01, 8.78117436e-01, 2.73875932e-02, 6.70467510e-01],
       [4.17304802e-01, 5.58689828e-01, 1.40386939e-01, 1.98101489e-01],
       [8.00744569e-01, 9.68261576e-01, 3.13424178e-01, 6.92322616e-01]])

np.float64(0.4216819949520792)

#### Exercise 4. Find the column and row means of matrix ``c``.

In [12]:
row_means,col_means = py487.misc.exercise_4(c)
display(row_means)
display(col_means)

array([0.35994836, 0.19272886, 0.50999956, 0.4451062 , 0.32862076,
       0.69368823])

array([0.3971745 , 0.62609144, 0.18112797, 0.48233407])

#### Exercise 5. Write a function that consists of a set of loops that run through an array and counts the number of ones in it. Do the same thing without using a for loop. For inspiration, check out the following. You don't need to use all of them, but pick one.

In [13]:
b == 1

array([[False,  True,  True,  True],
       [ True, False,  True,  True],
       [ True,  True, False,  True],
       [ True,  True,  True, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [14]:
np.where(b == 1)

(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5]),
 array([1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3]))

In [15]:
np.array(b.flat)

array([3., 1., 1., 1., 1., 3., 1., 1., 1., 1., 3., 1., 1., 1., 1., 3., 1.,
       1., 1., 1., 1., 1., 1., 1.])

In [16]:
np.where(b.flat == 1)

(array([ 1,  2,  3,  4,  6,  7,  8,  9, 11, 12, 13, 14, 16, 17, 18, 19, 20,
        21, 22, 23]),)

In [17]:
c1,c2 = py487.misc.exercise_5(b)
c1,c2

(20, np.int64(20))

## Excercises 6-7

**Stop and read:**
We will use Pandas at times throughout this course. Please read and study [10 minutes to Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) before proceeding to any of the exercises below.

#### Exercise 6. Repeat Exercise 1, but create a Pandas DataFrame instead of a NumPy array.

In [18]:
a = py487.misc.exercise_6()
a

Unnamed: 0,0,1,2,3
0,2,2,2,2
1,2,2,2,2
2,2,2,2,2
3,2,2,2,2
4,2,2,2,2
5,2,2,2,2


#### Exercise 7. Repeat exercise 2 using a DataFrame instead.

In [19]:
b = py487.misc.exercise_7()
b

Unnamed: 0,0,1,2,3
0,3.0,1.0,1.0,1.0
1,1.0,3.0,1.0,1.0
2,1.0,1.0,3.0,1.0
3,1.0,1.0,1.0,3.0
4,1.0,1.0,1.0,1.0
5,1.0,1.0,1.0,1.0


#### Stop and think: What if we want to go from a pandas dataframe to a numpy array?

In [20]:
b.values

array([[3., 1., 1., 1.],
       [1., 3., 1., 1.],
       [1., 1., 3., 1.],
       [1., 1., 1., 3.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [21]:
# Good job!
# Woohoo!