![ALT_TEXT_FOR_SCREEN_READERS](./header.png)

# Exercise 0 Start with Python

python is a widely used programming language in machine learning.
Many libraries implement almost all known methods of machine learning.
So it makes sense to learn python as well as the two most used libraries numpy and pandas.

## Preparation Courses

Walk through the following three online classes about python, numpy and pandas.

- https://www.w3schools.com/python/default.asp 
- https://www.w3schools.com/python/pandas/default.asp
- https://www.w3schools.com/python/numpy/default.asp



## Considerations

- With python focus on control flow (if, loops, functions) and data types (values, lists, dictionaries)
- With pandas focus of import and export, creation of dataframes, merging and slicing. It is important to understand the different types of accessing the data (e.g.loc) and the meaning of the bracket operators. 
- With numpy focus on creating arrays, multidimensional arrays, merging of data, transformations and slicing

## Assignment

The following notebook contains 10 tasks to be solved using python, pandas and numpy.

# Task 1: Basics of Pandas - Reading CSV and Basic Data Inspection

Understand how to read data from a CSV file and perform basic data inspection.

- Load a CSV file into a Pandas DataFrame.
- Display the first 5 rows.
- Print the summary information of the DataFrame.
- Show basic statistics (mean, median, min, max) of numerical columns.

In [None]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [23, 25, 30, 22, 28],
    "Height_cm": [165, 170, 180, 160, 175],
    "Weight_kg": [55, 70, 80, 50, 68]
}

# Create a pandas dataframe from dictionaries
df = pd.DataFrame(data)

# Save to CSV file
df.to_csv("students.csv", index=False)

In [None]:
# your code goes here


# Task 2: Handling Date Columns - Loading CSV and Date Conversion

Learn how to handle date columns and convert them into Pandas datetime objects.

- Load a CSV file containing a column with dates.
- Convert the date column to a Pandas datetime object.
- Extract the year, month, and day from the date column.


In [None]:
data_dates = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Join_Date": ["2022-01-15", "2021-12-12", "2020-06-10", "2019-11-11", "2023-07-22"]
}

df = pd.DataFrame(data_dates)
df.to_csv("students_dates.csv", index=False)

In [None]:
# your code goes here


# Task 3: Column Selection and Row Filtering

Practice selecting specific columns and filtering rows based on conditions.

- Select and display the Name and Age columns from the DataFrame.
- Filter out students who are older than 25 years and display the result.

In [None]:
# your code goes here


# Task 4: Slicing and Indexing Rows

Learn how to slice rows and use different types of indexing.

- Display the first 3 rows of the DataFrame.
- Use .iloc to select the 2nd and 4th row.
- Use .loc to select rows where the "Name" is "Charlie" or "Eva".

In [None]:
# your code goes here


# Task 5: Handling Missing Values - Replacement and Removal

Understand how to detect, replace, and remove missing values.

- Take the generated DataFrame with some missing values.
- Replace missing values in value columns with the column's mean.
- Drop rows that have any missing values.

In [None]:
import numpy as np

data_missing = {
    "Name": ["Alice", "Bob", np.nan, "David", "Eva"],
    "Age": [23, np.nan, 30, 22, np.nan],
    "Height_cm": [165, 170, 180, np.nan, 175]
}

df = pd.DataFrame(data_missing)

In [None]:
# your code goes here


# Task 6: Data Concatenation and Merging

Learn how to concatenate multiple DataFrames and merge them based on a common column.

- Take the two separate DataFrames: students and scores.
- Concatenate them along the columns (students and scores)
- Concatenate them along the rows (students and students_2)
- Merge the DataFrames based on a common column (result of concatenate along rows and scores)

In [None]:
students = pd.DataFrame({
    "StudentID": [1, 2, 3],
    "Name": ["Alice", "Bob", "Charlie"]
})

students_2 = pd.DataFrame({
    "StudentID": [4, 5, 6],
    "Name": ["Ferdl", "Hanni", "Sepp"]
})

scores = pd.DataFrame({
    "StudentID": [1, 2, 4],
    "Score": [85, 90, 88]
})


In [None]:
# your code goes here



# Task 7: Introduction to NumPy - Basic Array Creation and Manipulation

Familiarize yourself with creating NumPy arrays and performing basic manipulations.

- Take a NumPy array of 10 random integers.
- Reshape the array into a 2x5 matrix.
- Slice the array to get specific elements and rows/columns (element at (0,2), first row, second column)

In [None]:
import numpy as np

array = np.random.randint(0, 100, size=10)


In [None]:
# your code here


# Task 8: Array Operations - Element-wise and Matrix Operations

Practice basic arithmetic operations and matrix manipulations in NumPy.

- Create two NumPy arrays of size 3x3 with random numbers.
- Perform element-wise addition, subtraction, multiplication, and division.
- Calculate the dot product of the two arrays.

In [None]:
# your code here


# Task 9: Normalization and Standardization of NumPy Array Values

Learn how to normalize and standardize the values in a NumPy array.

Normalization is the process of scaling the values of an array so that they fall within a specific range, usually between 0 and 1.

Standardization (or Z-score normalization) transforms the data so that it has a mean of 0 and a standard deviation of 1.


- Take a 1D NumPy array of 10 random float values.
- Normalize the array so that all values lie between 0 and 1.
- Standardize the array

In [None]:
array = np.random.uniform(-10, 10, 20)

In [None]:
# your code here


# Task 10: One-Hot Encoding Using NumPy

Implement one-hot encoding for categorical data using NumPy.

A one-hot encoding takes a set of strings and assigns a unique integer index to each string. The index
represents a position in an array. The array has the same length as the count of different strings in the set.
If a strings needs to be represented, take the index of the string and assign a 1 at the indexed position. All other
values in the array are 0.

- Take a NumPy array of categorical labels: ["dog", "cat", "bird", "dog", "bird"].
- Convert these labels into a one-hot encoded format using NumPy.

In [None]:
labels = np.array(["dog", "cat", "bird", "dog", "bird"])
unique_labels = np.unique(labels)

In [None]:
# your code here
