# 🐼 Python, NumPy & Pandas Practice Exercises 2
This notebook includes a variety of exercises to help you prepare for your exam. Topics include Python functions, NumPy, and Pandas data manipulation.


### 🟢 Exercise 1 – Name Length Dictionary
You are given a list of student names. Create a dictionary where each student's name is the key, and the value is the length of their name.
Only include names that are longer than 5 characters.

In [None]:
students = ["Anna", "Jonathan", "Leo", "Catherine"]

### 🟡 Exercise 2 – Price Tag Dictionary
Given a list of tuples where each tuple contains a product name and its price, create a dictionary where:
- The key is the product name
- The value is `"expensive"` if price > 100, otherwise `"cheap"`



In [None]:
products = [("Laptop", 999), ("Mouse", 25), ("Keyboard", 75), ("Monitor", 150)]

### 🟢 Exercise 3 – Sum of All Numbers (Dynamic Parameters)
Create a function called `sum_all` that takes any number of arguments using `*args` and returns their total sum.

**Example:**
```python
sum_all(1, 2, 3)       → 6
sum_all(10, 20, 30, 5) → 65

### 🟡 Exercise 4 – Sort by Last Character (Lambda Function)
Given a list of words, sort the list **by the last character** of each word using a lambda function.



In [None]:
# hint: sorted(words , key = lambda ...)

In [None]:
words = ["chicken", "banana", "kiwi", "grape"]

### 🔴 Exercise 5 – Filter and Transform Names
You are given a list of names. Use `filter()` to select names longer than 4 characters,
then use `map()` to transform the filtered names into uppercase using a `lambda` function.


In [None]:
names = ["Ana", "John", "Elizabeth", "Bob", "Catherine"]

### 🔴 Exercise 6 – Email Domain Filter with List Comprehension
You are given a list of email addresses.
Use **list comprehension** to extract only the usernames (the part before `@`) **for emails that belong to the domain `example.com`**.


In [None]:
emails = [
    "alice@example.com",
    "bob@gmail.com",
    "charlie@example.com",
    "dana@yahoo.com"
]

### 🟡 Exercise 7 – Convert and Reformat Dates
You are given a list of date strings in the format `"DD/MM/YYYY"`.
Write a function that parses these strings into `datetime` objects, then returns a new list where each date is formatted as `"Month_Name DD, YYYY"` (e.g., "April 12, 2025").


In [None]:
dates = ["12/04/2025", "01/01/2024", "25/12/2023"]

### 🟡 Exercise 8 – Expiration Date Calculator
You are given a list of issue dates in the format `"YYYY-MM-DD"`.
Each item has a 30-day validity period.
Write a function that returns a list of expiration dates in the same format by **adding 30 days** using `timedelta`.

In [None]:
# timedelta(days=30)
issue_dates = ["2025-03-01", "2025-04-10", "2025-12-25"]

### 🟡 Exercise 9 – Find Maximum in Random Matrix (Row & Column)
1. Generate a **4x5 NumPy matrix** with **random integers between 1 and 99** (inclusive).
2. Set `seed = 42` for reproducibility.
3. Find the **maximum value in each row** and **each column**.
4. Print:
   - The row-wise max values and their indexes
   - The column-wise max values and their indexes

In [None]:
import numpy as np

### 🟡 Exercise 10 – Select All Values Greater Than 50
Create a NumPy array of shape `(4, 4)` filled with random integers between `10` and `99`,
<br>
- use **boolean masking** to select and print all values greater than 50.
- replace even numbers with value 100 and print matrix


### 🟡 Exercise 11 – Assign Grade Labels Using `np.where()`
You are given a NumPy array of student scores (out of 100).
Use `np.where()` to label the students as `"Pass"` if score ≥ 60, otherwise `"Fail"`.



In [None]:
scores = np.array([55, 67, 45, 89, 73, 38, 91])

### 🟡 Exercise 12 – Fill NaNs with Column Means
You are given a 3x4 NumPy array that contains some `np.nan` values.
Replace each `np.nan` with the **mean of its column** (ignoring `nan` when calculating the mean).


In [None]:
# np.where
# np.nanmean()
arr = np.array([
    [1.0, 2.0, np.nan, 4.0],
    [5.0, np.nan, 7.0, 8.0],
    [9.0, 10.0, 11.0, np.nan]
])

### You are given a hr_data, start by importing csv, and answer following questions

In [2]:
import pandas as pd

# Load the dataset
df = pd.read_csv('hr_data.csv')
df.head()

Unnamed: 0,EmpID,Name,Surname,Department,Age,Salary,Bonus,YearsAtCompany,PerformanceScore
0,EMP1000,Frank,Garcia,Marketing,59,93335.0,500.0,2,Poor
1,EMP1001,Alice,Jones,Finance,49,40965.0,,1,Poor
2,EMP1002,Alice,Johnson,Finance,35,54538.0,1500.0,8,Good
3,EMP1003,Dana,Davis,IT,28,38110.0,1500.0,7,Average
4,EMP1004,Bob,Brown,Marketing,41,57266.0,,12,Excellent


### 🟢 Q1. Show basic info (column names, non-null counts, data types).

### 🟢 Q2. Filter and show all employees from the IT department.

### 🟡 Q3. Working with nan values.
- Find how many values are missing in bonus column
- Replace with zero
- Last step after replacing bonus values, drop all rows that conatin null values

### 🟡 Q4. Create a new column 'TotalCompensation' as the sum of Salary and Bonus.

### 🟡 Q5. Apply a function to categorize Age into 'Young' (<30), 'Mid' (30-45), 'Senior' (>45).

### 🟡 Q6. For each department find average Salary and average Bonus.
- then find which department has highest Salary

### 🟡 Q7. Using apply function
- extract just number part from EmpId, example: EMP1001, extract 1001
- save to a newcolumn
- make that column index column

### 🟡 Q7. Which department has younger employees (mean)

### 🟡 Q8. Find average Salary by PerformanceScore
- Is there any relation between salary and performance
- Do the same for Salary and YearsAtCompany

### 🟠 Q9. For each PerformanceScore category, count how many employees fall under each Department.

Sample of Expected Results

| PerformanceScore | Department | EmpID |
|------------------|------------|--------|
| Average          | Finance    | 3      |
| Average          | HR         | 5      |
| Average          | IT         | 2      |
| Average          | Marketing  | 5      |
| Average          | Sales      | 5      |
| Excellent        | Finance    | 3      |
| Excellent        | HR         | 5      |

### 🟠 Q10. For each department, find the most paid employee name and surname.

In [None]:
# longest approach, using transform('max') instead of max()

### 🔴 Q11. Identify Potential Promotions
Find employees who:
- Have a `PerformanceScore` of `"Excellent"`
- Have `YearsAtCompany` > 5
- Earn less than the **average salary of their department**

Return a filtered DataFrame with columns: `EmpID`, `Name`, `Department`, `Salary`, `YearsAtCompany`, `PerformanceScore`.


### 🔴 Q12. Custom Score Encoding using `apply()`
Create a new column `ScoreValue` that encodes `PerformanceScore` using the following logic:
- "Excellent" → 4
- "Good" → 3
- "Average" → 2
- "Poor" → 1
- NaN → 0

Use a **custom function with `apply()`** (not `.map()`), and then compute the **average `ScoreValue` per Department**.


### 🔴 Q13. Top 3 Longest-Serving Employees per Department
For each Department, find the top 3 employees with the **highest `YearsAtCompany`**.

Return a DataFrame sorted by Department, then by YearsAtCompany (descending).
Include columns: `EmpID`, `Name`, `Surname`, `Department`, `YearsAtCompany`.


In [None]:
#hint: transform('max')