# 1. Numpy arrays

Numpy has many different types of data "containers": lists, dictionaries, tuples etc. However none of them allows for efficient numerical calculation, in particular not in multi-dimensional cases (think e.g. of operations on images). Numpy has been developed exactly to fill this gap. It provides a new data structure, the **numpy array**, and a large library of operations that allow to: 
- generate such arrays
- modify such arrays (projection, extraction of sub-arrays etc.)
- apply mathematical operations on them

Numpy is the base of almost the entire Python scientific programming stack. Many libraries build on top of Numpy, either by providing specialized functions to operate on them (e.g. scikit-image for image processing) or by creating more complex data containers on top of it. The data science library Pandas that will also be presented in this course is a good example of this with its dataframe structures.

In [None]:
import numpy as np

In [None]:
mylist = [2,5,3,9,5,2]
mylist

In [None]:
type(mylist)

In [None]:
myarray = np.array(mylist)
myarray

In [None]:
type(myarray)

### Array Type
Just like when we create regular variables in Python, arrays receive a type when created. Unlike regular list, **all** elements of an array always have the same type. The type of an array can be recovered through the ```.dtype``` method:

In [None]:
myarray.dtype

In [None]:
myarray2 = np.array([1.2, 6, 7.6, 5])
myarray2, myarray2.dtype

In [None]:
myarray.shape

In [None]:
my2d_list = [[1,2,3], [4,5,6]]

my2d_array = np.array(my2d_list)
my2d_array

In [None]:
my2d_array.shape

In [None]:
one_array = np.ones((2,3))
one_array

In [None]:
zero_array = np.zeros((2,3))
zero_array

### Dot
**Linear Algebra Matrix Multiplication**

Let's represent that in matrix form:

$$\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}=\begin{bmatrix} x_01 & x_1 \\ x_02 & x_2 \\ \vdots & \vdots \\ x_0n & x_n \end{bmatrix} \begin{bmatrix} w_0 & w_1 \end{bmatrix}$$

Rules:

$$y_1=(x_01⋅w_0)+(x_1⋅w_1)$$

$$y_2=(x_02⋅w_0)+(x_2⋅w_1)$$

$$\dots$$

$$y_n=(x_0n⋅w_0)+(x_n⋅w_1)$$

In [None]:
X = np.array([[1, 5], [1, 3], [1, 4]])
w = np.array([2, 3])

# Compute the dot product between 'X' and 'w'
# This calculates the predicted values for each data point as: X[0]*w[0] + X[1]*w[1]
# Result is an array where each element corresponds to the prediction for a row in X

# y1=(1⋅2)+(5⋅3)=2+15=17
# y2=(1⋅2)+(3⋅3)=2+9=11
# y3=(1⋅2)+(4⋅3)=2+12=14

y = np.dot(X, w)
y

In [None]:
A = np.arange(12).reshape((3,4))
B = np.arange(4)
A,B              

### <span style="color:red">Please add your comments in here</span>.

In [None]:
np.dot(A,B)
# Add your comments
# Please explain how to use np.dot to perform matrix multiplication for A and B.
# Provide a step-by-step explanation of the calculation process as above.


In [None]:
A @ B

### Dictionary 

A Python dictionary is a built-in data structure used to store data as key-value pairs. It is highly efficient for looking up, adding, and deleting data by key. Below are the rules and features that govern Python dictionaries:

In [None]:
my_dict = {1: "value", "key": 42, (2, 3): "tuple_key_value"}
# Invalid: my_dict = {[1, 2]: "value"}  # Lists cannot be keys

In [None]:
my_dict[1]

In [None]:
my_dict["key"]

In [None]:
my_dict[(2,3)]

# 2. Pandas

In [None]:
import pandas as pd
myDict = {'k1': ['one', 'two'] * 3 + ['two'],
         'k2': [1,1,2,3,3,4,4]}

myDict

In [None]:
import pandas as pd
data = pd.DataFrame(myDict)
data

### Renaming Axis Indexes
#### Just like values, axis labels can also be transformed by a function or mapping to produce differntly labeled objects.
#### We can also modify axes in-place without any new data structure.

In [None]:
data = pd.DataFrame(np.arange(12).reshape((3,4)),
                   index = ['Ohio', 'Indiana', 'New York'],
                   columns = ['one', 'two', 'three', 'four'])

In [None]:
np.arange(12).reshape((3,4))

In [None]:
data

### Read CSV

In [None]:
df = pd.read_csv('out.csv')
df

In [None]:
df.columns

In [None]:
df[['Age','Height(in cm)']]

### Save CSV

In [None]:
# df = pd.DataFrame({ 
#     'Name': ['John', 'Sammy', 'Joe'], 
#     'Age': [45, 38, 90], 
#     'Height(in cm)': [150, 180, 160] 
# }) 
# df.to_csv('out.csv', index=False)


### <span style="color:red">Please add your comments in here</span>.

In [None]:
# Please Save your full name, student ID, and hometown into a csv file, name the CSV file your first name and upload it to GitHub.

# 3. Visualization
### Pandas Visualization

In [None]:
df.plot(x="Name", y=["Age", "Height(in cm)"], kind="bar") 

In [None]:
# plotting Height 
ax = df.plot(x="Name", y="Height(in cm)", kind="bar") 
# plotting age on the same axis 
df.plot(x="Name", y="Age", kind="bar",ax=ax, color="red") 

### matplotlib.pyplot Visualization

In [None]:
import matplotlib.pyplot as plt

In [None]:
names = ['John', 'Sammy', 'Joe']
ages = [45, 38, 90]
heights = [150, 180, 160] 

# X positions for bars
x = np.arange(len(names))
width = 0.35  # Width of the bars

# Plotting the bars
fig, ax = plt.subplots()
ax.bar(x - width/2, ages, width, label='Age')
ax.bar(x + width/2, heights, width, label='Height (in cm)')

# Adding labels and title
ax.set_xlabel('Name')
ax.set_ylabel('Values')
ax.set_title('Age and Height Comparison')
ax.set_xticks(x)
ax.set_xticklabels(names)
ax.legend(loc=2)

plt.show()

In [None]:
names = ['John', 'Sammy', 'Joe']
ages = [45, 38, 90]
heights = [150, 180, 160] 

# X positions for bars
x = np.arange(len(names))

# Plotting the stacked bars
fig, ax = plt.subplots()

ax.bar(x, heights,width=0.4, label='Height (in cm)', color='blue')
ax.bar(x, ages,width=0.4,  label='Age', color='red') #bottom=heights, 


# Adding labels and title
ax.set_xlabel('Name')
ax.set_ylabel('Values')
ax.set_title('Age and Height Comparison (Stacked)')
ax.set_xticks(x)
ax.set_xticklabels(names)
ax.legend()

plt.show()

In [None]:
# Convert names to numerical values for plotting
x = np.arange(len(names))

# Plotting linear results
fig, ax = plt.subplots()
ax.plot(x, heights, marker='o', label='Height (in cm)', color='blue', linestyle='--')
ax.plot(x, ages, marker='s', label='Age', color='red', linestyle='-')

# Adding labels and title
ax.set_xlabel('Name')
ax.set_ylabel('Values')
ax.set_title('Linear Results of Age and Height')
ax.set_xticks(x)
ax.set_xticklabels(names)
ax.legend()

plt.show()