# Numpy, Pandas, Scikit-Learn

## NumPy: [NumPy](https://numpy.org/) [Quickstart](https://numpy.org/devdocs/user/quickstart.html)

From the last practicals -> Go through the Quickstart and learn the basic commands of numpy. Why do you think that we should use this for further machine learning tasks?

### Some more NumPy tasks

In [2]:
import numpy as np

#### Loops

In [4]:
# use numpy for loops, e. g. how can you loop floating point numbers in Python e. g. (0.1, 0.2, 0.3, 0.4, ..., 1).
# Of course, there are solutions to do that but try it out with numpy arange.
import numpy as np

# 1) Code the pythonic way for looping floating point numbers: [0.1, 0.2, 0.3, 0.4, ..., 1]
numbers = [i/10 for i in range(1,11)]
print(numbers)
# 2) Code the Numpy way:
numpy_numbers = np.arange(0.1,1.1,0.1)
print(numpy_numbers)



# This applies for more than only this example. But what do you find more readable?

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### List operations

In [11]:
# Define the list with name _list in pythonic way and numpy way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# 1) pythonic way
p_list = [[i,i+1] for i in range(1,11,2)]
print(p_list)

# 2) Numpy way
n_list = np.arange(1,11,1).reshape(5,2)
print(n_list)

[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


#### Matrix multiplication

In [21]:
# Define the pythonic way for the following numpy expressions

#### Task 1
_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]
# 1) Numpy way

# Numpy transpose
_np_list = np.array(_list).T
print('Numpy transpose', _np_list)

# 2) Pythonic way for numpy transpose
# YOUR CODE...
p_list = []
for i in range(3):
    p_list.append([_list[0][i], _list[1][i]])

print("Python transpose: ",p_list)
    


#### Task 2
_list1 = [[2, 3, 7],
        [4, 6, 7]]
_list2 = [[3, 4],
          [7, 1],
          [1, 8]]
# 1) Numpy mat mul
_np_mul = np.matmul(_list1, _list2)
print('\nNumpy mat mul:\n', _np_mul)

# 2) Pythonic way
# YOUR CODE...
result = [[0 for _ in range(len(_list2[0]))] for _ in range(len(_list1))]

    # Perform matrix multiplication
for i in range(len(_list1)):
    for j in range(len(_list2[0])):
        for k in range(len(_list2)):
            result[i][j] += _list1[i][k] * _list2[k][j]

    # Print the result
print("Result of matrix multiplication:")
for row in result:
    print(row)

Numpy transpose [[2.1 4.5]
 [3.5 6.6]
 [7.2 7.7]]
Python transpose:  [[2.1, 4.5], [3.5, 6.6], [7.2, 7.7]]

Numpy mat mul:
 [[34 67]
 [61 78]]
Result of matrix multiplication:
[34, 67]
[61, 78]


## Pandas: [Pandas](https://pandas.pydata.org/) [Quickstart](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [23]:
import pandas as pd

In [24]:
# Download the churn dataset from Kaggle: https://www.kaggle.com/datasets/shubh0799/churn-modelling?resource=download
# 1) Read the dataset using pandas
df = pd.read_csv("./Churn_Modelling.csv")

# 2) Print the columns of the pandas DataFrame
print("Columns of the DataFrame:")
print(df.columns)

# 3) Drop the column 'CustomerId'
df.drop(columns=['CustomerId'], inplace=True)

# 4) Print the values for the columns 'Gender', 'Age', 'Tenure', 'Balance' only
print("Values for selected columns:")
print(df[['Gender', 'Age', 'Tenure', 'Balance']])

# 5) Return only the rows where Geography == 'France' and columns 'Gender', 'Age', 'Tenure', 'Balance'
print("Rows where Geography is France and selected columns:")
print(df[df['Geography'] == 'France'][['Gender', 'Age', 'Tenure', 'Balance']])

# 6) Group by the columns 'Geography' and 'Gender' and use the mean function to aggregate the churn rate ('Exited' column)
grouped_data = df.groupby(['Geography', 'Gender']).agg({'Exited': 'mean'})
print("Grouped data with mean churn rate:")
print(grouped_data)


Columns of the DataFrame:
Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')
Values for selected columns:
      Gender  Age  Tenure    Balance
0     Female   42       2       0.00
1     Female   41       1   83807.86
2     Female   42       8  159660.80
3     Female   39       1       0.00
4     Female   43       2  125510.82
...      ...  ...     ...        ...
9995    Male   39       5       0.00
9996    Male   35      10   57369.61
9997  Female   36       7       0.00
9998    Male   42       3   75075.31
9999  Female   28       4  130142.79

[10000 rows x 4 columns]
Rows where Geography is France and selected columns:
      Gender  Age  Tenure    Balance
0     Female   42       2       0.00
2     Female   42       8  159660.80
3     Female   39       1       0.00
6       Male   50       7       0.00
8       Male

## SKlearn: [Scikit-learn](https://scikit-learn.org/stable/) [Quickstart](https://scikit-learn.org/stable/getting_started.html)

In [26]:
# We start the k nearest neighbor algorithm with sklearn
from sklearn.neighbors import KNeighborsClassifier

In [33]:
# Use the following values and labels to calcualte the kNN
X = np.arange(0, 9).reshape(9,1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
print(X)
print(y)
# we know that we have three clusters [0 to 2] has label 0; [3 to 5] has label 1; [6 to 8] has label 2

[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]
[0, 0, 0, 1, 1, 1, 2, 2, 2]


In [27]:
# Start the KNN algorithm with the values above
# YOUR CODE ...
knn = KNeighborsClassifier(n_neighbors=3)


In [34]:
# predict the label for value 4
# YOUR CODE ...
knn.fit(X, y)
prediction = knn.predict([[4]])

print("Predicted label for value 4:", prediction)

Predicted label for value 4: [1]
