# Dylan Gross

# Python and NumPy

While other IDEs exist for Python development and for data science related activities, one of the most popular environments is Jupyter Notebooks.

This lab is not intended to teach you everything you will use in this course. Instead, it is designed to give you exposure to some critical components from NumPy that we will rely upon routinely.

## Exercise 0
Please read and reference the following as your progress through this course. 

* [What is the Jupyter Notebook?](https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/What%20is%20the%20Jupyter%20Notebook.ipynb#)
* [Notebook Tutorial](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook)
* [Notebook Basics](https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb)

**In the space provided below, what are three things that still remain unclear or need further explanation?**

I have used Jupyter Notebook before and have no questions at this point.

## Exercises 1-7
For the following exercises please read the Python appendix in the Marsland textbook and answer problems A.1-A.7 in the space provided below.

## Exercise 1

In [1]:
import numpy as np

a = np.full((6, 4), 2)
print(a)

[[2 2 2 2]
 [2 2 2 2]
 [2 2 2 2]
 [2 2 2 2]
 [2 2 2 2]
 [2 2 2 2]]


## Exercise 2

In [25]:
b = np.full((6, 4), 1)
np.fill_diagonal(b, 3)
print(b)

[[3 1 1 1]
 [1 3 1 1]
 [1 1 3 1]
 [1 1 1 3]
 [1 1 1 1]
 [1 1 1 1]]


## Exercise 3

In [26]:
a * b

Unnamed: 0,0,1,2,3
0,6,2,2,2
1,2,6,2,2
2,2,2,6,2
3,2,2,2,6
4,2,2,2,2
5,2,2,2,2


In [27]:
np.dot(a,b)

ValueError: shapes (6,4) and (6,4) not aligned: 4 (dim 1) != 6 (dim 0)

In order to compute a dot product, the y dimension of the first array must match the x dimension of the second array.
ie: we COULD do dot(arr[6,6], arr[6,4])

## Exercise 4

In [28]:
np.dot(a.transpose(), b)

array([[16, 16, 16, 16],
       [16, 16, 16, 16],
       [16, 16, 16, 16],
       [16, 16, 16, 16]])

In [29]:
np.dot(a, b.transpose())

array([[12, 12, 12, 12,  8,  8],
       [12, 12, 12, 12,  8,  8],
       [12, 12, 12, 12,  8,  8],
       [12, 12, 12, 12,  8,  8],
       [12, 12, 12, 12,  8,  8],
       [12, 12, 12, 12,  8,  8]])

Performing the dot product will create an array whose dimensions are the x dimension of the first array and the y dimension of the second array. The difference in the way we transpose the matrix and perform the dot product will leave those dimensions as both 6 or both 4.

## Exercise 5

In [30]:
def printing_on_screen():
    print("ThIS IS ME PRINTING ON THE SCREEN...")
    
printing_on_screen()

ThIS IS ME PRINTING ON THE SCREEN...


## Exercise 6

In [31]:
def randomness():
    i = 1
    while i < 6:
        print("Matrix " + str(i))
        arr = np.random.randint(100, size=(5,5))
        print("Sum: " + str(arr.sum()))
        print("Mean: " + str(arr.mean()))
        print("")
        i += 1
        
randomness()

Matrix 1
Sum: 1128
Mean: 45.12

Matrix 2
Sum: 1273
Mean: 50.92

Matrix 3
Sum: 1273
Mean: 50.92

Matrix 4
Sum: 1300
Mean: 52.0

Matrix 5
Sum: 998
Mean: 39.92



## Exercise 7

In [32]:
def naive_ones_finder(arr):
    count = 0
    for i in range(0, len(arr)):
        for j in range(0, len(arr[0])):
            if arr[i][j] == 1:
                count += 1
    return count
    
arr = np.array([[3,5, 1, 3], [1, 2, 6, 2]])    
naive_ones_finder(arr)

2

In [33]:
def np_where(arr):
    return np.sum(arr[np.where(arr == 1)])

np_where(arr)

2

## Excercises 8-???
While the Marsland book avoids using another popular package called Pandas, we will use it at times throughout this course. Please read and study [10 minutes to Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) before proceeding to any of the exercises below.

## Exercise 8
Repeat exercise A.1 from Marsland, but create a Pandas DataFrame instead of a NumPy array.

In [34]:
import pandas as pd

a = pd.DataFrame(np.full((6, 4), 2))
print(a)

   0  1  2  3
0  2  2  2  2
1  2  2  2  2
2  2  2  2  2
3  2  2  2  2
4  2  2  2  2
5  2  2  2  2


## Exercise 9
Repeat exercise A.2 using a DataFrame instead.

In [35]:
b = pd.DataFrame(np.full((6, 4), 1))
print(b)

   0  1  2  3
0  1  1  1  1
1  1  1  1  1
2  1  1  1  1
3  1  1  1  1
4  1  1  1  1
5  1  1  1  1


## Exercise 10
Repeat exercise A.3 using DataFrames instead.

In [36]:
a * b

Unnamed: 0,0,1,2,3
0,2,2,2,2
1,2,2,2,2
2,2,2,2,2
3,2,2,2,2
4,2,2,2,2
5,2,2,2,2


## Exercise 11
Repeat exercise A.7 using a dataframe.

In [44]:
def pd_where(df):
    return int(df.apply(pd.Series.value_counts).loc[1].sum())

df = pd.DataFrame([[3,5, 1, 1], [1, 2, 6, 2]])
pd_where(df)

3

## Exercises 12-14
Now let's look at a real dataset, and talk about ``.loc``. For this exercise, we will use the popular Titanic dataset from Kaggle. Here is some sample code to read it into a dataframe.

In [12]:
titanic_df = pd.read_csv(
    "https://raw.githubusercontent.com/dlsun/data-science-book/master/data/titanic.csv"
)
titanic_df

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0000,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.0000,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0000,1,2,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0000,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",female,14.5000,1,0,2665,14.4542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,26.5000,0,0,2656,7.2250,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27.0000,0,0,2670,7.2250,,C,,,


Notice how we have nice headers and mixed datatypes? That is one of the reasons we might use Pandas. Please refresh your memory by looking at the 10 minutes to Pandas again, but then answer the following.

## Exercise 12
How do you select the ``name`` column without using .iloc?

In [13]:
titanic_df["name"]

0                         Allen, Miss. Elisabeth Walton
1                        Allison, Master. Hudson Trevor
2                          Allison, Miss. Helen Loraine
3                  Allison, Mr. Hudson Joshua Creighton
4       Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
                             ...                       
1304                               Zabour, Miss. Hileni
1305                              Zabour, Miss. Thamine
1306                          Zakarian, Mr. Mapriededer
1307                                Zakarian, Mr. Ortin
1308                                 Zimmerman, Mr. Leo
Name: name, Length: 1309, dtype: object

## Exercise 13
After setting the index to ``sex``, how do you select all passengers that are ``female``? And how many female passengers are there?

In [14]:
## YOUR SOLUTION HERE
titanic_df.set_index('sex',inplace=True)
titanic_df.loc["female"]
titanic_df

## Exercise 14
How do you reset the index?

In [20]:
titanic_df.reset_index(inplace=True)
titanic_df

Unnamed: 0,sex,pclass,survived,name,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,female,1,1,"Allen, Miss. Elisabeth Walton",29.0000,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,male,1,1,"Allison, Master. Hudson Trevor",0.9167,1,2,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,female,1,0,"Allison, Miss. Helen Loraine",2.0000,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,male,1,0,"Allison, Mr. Hudson Joshua Creighton",30.0000,1,2,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,female,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",25.0000,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,female,3,0,"Zabour, Miss. Hileni",14.5000,1,0,2665,14.4542,,C,,328.0,
1305,female,3,0,"Zabour, Miss. Thamine",,1,0,2665,14.4542,,C,,,
1306,male,3,0,"Zakarian, Mr. Mapriededer",26.5000,0,0,2656,7.2250,,C,,304.0,
1307,male,3,0,"Zakarian, Mr. Ortin",27.0000,0,0,2670,7.2250,,C,,,
