# Numpy

Utilize the slides & references to documentation to solve each question: https://numpy.org/doc/stable/index.html

We start this section off with a demo of numpy code (appropriately commented) and then challenge you to use your conceptual thinking to implement your own code and answer questions.

In [198]:
import numpy as np 

# create a numpy array of 10 numbers
# https://numpy.org/doc/stable/reference/generated/numpy.arange.html
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [199]:
# print out # of dimensions
print("DIMENSION", arr.ndim)

# print out shape
print("SHAPE", arr.shape)

# print out type
print("DTYPE", arr.dtype)

DIMENSION 1
SHAPE (14,)
DTYPE int32


In [200]:
test_arr = np.arange(5)
test_arr

array([0, 1, 2, 3, 4])

In [201]:
test_arr = np.arange(5, 15, 3)
test_arr

array([ 5,  8, 11, 14])

In [202]:
test_arr = np.array([(1, 2, 3), (4, 5, 6)])
test_arr

array([[1, 2, 3],
       [4, 5, 6]])

In [203]:
test_arr = np.zeros((5, 5))
test_arr

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [204]:
# rearrange array into 7 rows of 2 columns
new_arr = arr.reshape(7, 2)
new_arr

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12],
       [13, 14]])

In [205]:
x = new_arr[new_arr % 2 == 0]
new_arr[new_arr % 2 == 0] = 1
print(x)

[ 2  4  6  8 10 12 14]


In [206]:
# apply simple maths
mult_arr = new_arr * 2
mult_arr

array([[ 2,  2],
       [ 6,  2],
       [10,  2],
       [14,  2],
       [18,  2],
       [22,  2],
       [26,  2]])

In [207]:
mult_arr.dtype

dtype('int32')

In [208]:
div_arr = new_arr / 2
div_arr

array([[0.5, 0.5],
       [1.5, 0.5],
       [2.5, 0.5],
       [3.5, 0.5],
       [4.5, 0.5],
       [5.5, 0.5],
       [6.5, 0.5]])

In [209]:
div_arr.dtype

dtype('float64')

In [210]:
# apply slicing
reversed = mult_arr[::-1]
reversed

array([[26,  2],
       [22,  2],
       [18,  2],
       [14,  2],
       [10,  2],
       [ 6,  2],
       [ 2,  2]])

In [211]:
# get view
slice = arr[0:3]
slice

array([1, 1, 3])

In [212]:
# modify original array
arr[0:3] = [3, 2, 1]
arr

array([ 3,  2,  1,  1,  5,  1,  7,  1,  9,  1, 11,  1, 13,  1])

In [213]:
# print out view
slice

array([3, 2, 1])

In [214]:
# boolean index & save 
greater_5 = arr[arr > 5]
greater_5

array([ 7,  9, 11, 13])

In [215]:
# modify boolean index
arr[arr > 5] = 0
arr

array([3, 2, 1, 1, 5, 1, 0, 1, 0, 1, 0, 1, 0, 1])

In [216]:
greater_5

array([ 7,  9, 11, 13])

## Q1

Questions 1 through 5 tests your conceptual understanding of numpy. We will be working with a made-up numpy array hypothetically created via:

```python
test_arr = np.array([(2, 4, 6), (8, 10, 12)])
```

Answer these questions without simulating (running) code.

What will be the output of `test_arr.shape`?

write answer here

## Q2

What will be the output of:

```python
test_arr * 1.5
```

write answer here

## Q3

What will be the data type of this new ndarray?

```python
test_arr * 1.5
```

write answer here

## Q4

What will be the output of:

```python
x = test_arr[0:2]
test_arr[0:2] = np.array([(3, 5, 7), (9, 11, 13)])
print(x)
```

write answer here

## Q5

What will be the output of:

```python
x = test_arr[test_arr % 2 == 1]
test_arr[test_arr % 2 == 1] = 2
print(x)
```

write answer here

# Pandas

Utilize the slides & references to documentation to solve each question: https://pandas.pydata.org/docs/ 

We start this section off with a demo of pandas code (appropriately commented) and then challenge you to use your conceptual thinking to implement your own code and answer questions.

In [322]:
import pandas as pd

# create a dataframe of coffee products from a specific warehouse
df = pd.DataFrame({"type": ["Arabica", "Robusta", "Excelsa", "Liberica"], "price_per_kg": [5.07, 2.12, 4.38, 3.12], "kgs": ["5,000", "10,000", "5,000", "5,000"]})

df.head()

Unnamed: 0,type,price_per_kg,kgs
0,Arabica,5.07,5000
1,Robusta,2.12,10000
2,Excelsa,4.38,5000
3,Liberica,3.12,5000


In [323]:
# set the index of your df for more specific descriptors
df.set_index('type', inplace=True)
df.head()

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,5.07,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [324]:
# view the shape
df.shape

(4, 2)

In [325]:
# view basic information including nulls & dtypes
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Arabica to Liberica
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price_per_kg  4 non-null      float64
 1   kgs           4 non-null      object 
dtypes: float64(1), object(1)
memory usage: 96.0+ bytes


In [326]:
# quantities of distribution in quant. data
df.describe()

Unnamed: 0,price_per_kg
count,4.0
mean,3.6725
std,1.31264
min,2.12
25%,2.87
50%,3.75
75%,4.5525
max,5.07


In [327]:
# deflation! adjust the price of robusta coffee
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
pop_bean = df.iloc[0:2]
pop_bean

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,5.07,5000
Robusta,2.12,10000


In [328]:
df.iloc[0, 0] = 4.98  
df

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [329]:
pop_bean

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000


In [330]:
# Try to set to alphabetic data?
pop_bean = df.iloc[0:2]
pop_bean

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000


In [331]:
df.iloc[0, 0] = "a"
df

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,a,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [332]:
pop_bean

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000


In [333]:
# there is no guarantee that this will work (it might! but not guaranteed)
# avoid this
df.iloc[0, :]["price_per_kg"] = 4.98
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.iloc[0, :]["price_per_kg"] = 4.98


Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,a,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [334]:
# a better approach is the following
df.loc["Arabica", "price_per_kg"] = 4.98

In [335]:
# convert a column by removing commas and converting to number
df["kgs"] = df["kgs"].apply(lambda x: x.replace(",", ""))
df

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [336]:
df["kgs"] = df["kgs"].astype(int)
df

Unnamed: 0_level_0,price_per_kg,kgs
type,Unnamed: 1_level_1,Unnamed: 2_level_1
Arabica,4.98,5000
Robusta,2.12,10000
Excelsa,4.38,5000
Liberica,3.12,5000


In [337]:
df.describe()

Unnamed: 0,kgs
count,4.0
mean,6250.0
std,2500.0
min,5000.0
25%,5000.0
50%,5000.0
75%,6250.0
max,10000.0


In [338]:
# seems like we are not counting `price_per_kg` as a float ...
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Arabica to Liberica
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   price_per_kg  4 non-null      object
 1   kgs           4 non-null      int32 
dtypes: int32(1), object(1)
memory usage: 252.0+ bytes


## Q6

Questions 6 through 8 tests your conceptual understanding of numpy. We will be working with a made-up pandas dataframe hypothetically created via:

```python
df_test = pd.DataFrame({"order_id": [1, 2, 3, 4, 5], "customer_name": ["Bob", "Bob", "Yazmin", "Meena", "Bob"], "order_desc": ["peanuts 35 oz", "peanuts 35 oz", "humidifier", "sleeping blindflod", "peanuts 100 oz"]})
df_test.set_index("order_id")
```

Answer these questions without simulating (running) code.

What will be the output of `df_test.shape`?

write answer here

## Q7

What will be the output of:

```python
df_test.iloc[0:2, :]["customer_name"]
```

write answer here

## Q8

What will be the output of:

```python
df_test.iloc[0:2, :]["order_desc"] = "cashews 35 oz"
```

write answer here

## Q9


What would be a better way to write this piece of code?

write answer here