# Part 1 – Pandas DataFrames  

For this section, we will work with the **National Household Survey (ENAHO)** dataset extracted from [INEI](https://proyectos.inei.gob.pe/microdatos/).  
In this [shared drive folder](https://drive.google.com/drive/folders/1h00GwfCRyq0Grem3bR26yxc33ELYJwG8?usp=sharing), you will find:  
- A reference questionnaire (you can review it to identify questions of interest).  
- Three datasets corresponding to the following modules: **Housing (200)**, **Education (300)**, and **Labor (500)**.  
Download files in your local. 
---

1. Set your working directory and **import the dataset** `Enaho01A-2023-300.csv` using Pandas.  

> Note: Consider the file encoding (`UTF-8` or `ISO-8859-10`).  
> Example: `df = pd.read_csv("datos.csv", encoding="ISO-8859-10")`  

- Read and display the **first 5 rows**.  
- Convert the column names into a **list** and print it.  
- Check the **data types** of the DataFrame.  
- Select a subsample containing the variables `['CONGLOME', 'VIVIENDA', 'HOGAR', 'CODPERSO']` and between 3–5 additional variables of your interest.  


---

2. **Data Manipulation (Data Cleaning):**  
- Explore the DataFrame using summary functions.  
- Identify if there are missing values.  
- If they exist, remove them.  


---

3. Import a second dataset (choose between `Enaho01A-2023-200.csv` or `Enaho01A-2023-500.csv`).  
- Display the **first 5 rows**.  
- Convert the column names into a **list** and print it.  
- Check the **data types**.  
- Select a subsample containing the variables `['CONGLOME', 'VIVIENDA', 'HOGAR', 'CODPERSO']` and between 3–5 additional variables of your interest.  
- Perform the following modifications:  
  - **A.** Change the data type of a variable (e.g., from text to numeric).  
  - **B.** Modify some values in a specific column.  

---

4. **Merging Datasets:**  [2 pts]
- Identify the common columns between the two datasets (from questions 16 and 18).  
- Verify whether the values match in both datasets. If not, correct the mismatched values to ensure a proper merge.  

> Recommendation: Use the following as common columns:  
> `common_columns = ['CONGLOME', 'VIVIENDA', 'HOGAR', 'CODPERSO']`  
> in `pd.merge(..., on=common_columns, how=...)`.  

- Perform the **merge**.  
- Display the **first 5 rows** of the resulting DataFrame.  

---

5. In the resulting DataFrame:  
- Group the data by a variable of your choice using `groupby()`.  
- Calculate a relevant statistical indicator, for example: the **average income per category**.  


---
# Part 2 – If conditions 

---
6. Basic If Condition  
Write a Python function that checks if a given number is positive.  
- If the number is greater than zero, return `"The number X is positive."`.  
- Otherwise, return `"The number X is not positive."`.  


---

7. If Condition with Multiple Expressions  
Create a program that checks the temperature (in Celsius) and returns a message depending on the value:  
- If the temperature is **below 0**, return `"It is freezing."`.  
- If the temperature is between **0 and 20**, return `"It is cold."`.  
- If the temperature is between **21 and 30**, return `"It is warm."`.  
- If the temperature is greater than **30**, return `"It is hot."`.  


---

8. Logical Operators   [2 pts]
Write a function that determines if a person is eligible for a scholarship based on these conditions:  
- The person must have a GPA greater than **3.5** **AND**  
- Either their extracurricular activities are `"Yes"` **OR** they have community service hours greater than **50**.  

The function should return:  
- `"Eligible for scholarship."` if conditions are met.  
- `"Not eligible for scholarship."` otherwise.  


---

9. Python Identity Operators  
Create two lists:  
```python
list1 = [1, 2, 3]
list2 = [1, 2, 3]
list3 = list1 
```

Check with identity operators:
```python
list1 is list2
list1 is list3
list1 == list2 
 ```

Explain the difference in the results for each comparison.


---

10. Nested If Statement  [2 pts] 

Write a function that takes a student's score and determines the grade:  

- If the score is greater than or equal to 90, return `"A"`.  
- If the score is between 80 and 89:  
  - Check if the score is exactly 85, then return `"B+"`.  
  - Otherwise, return `"B"`.  
- If the score is between 70 and 79, return `"C"`.  
- Otherwise, return `"Fail"`. 

--- 

# Part 3 – For loops

---

11. For Loop in NumPy  

Write a for loop using **NumPy** to iterate through an array of numbers `[10, 20, 30, 40, 50]` and print each value multiplied by 2.  

- Re-question: How would you modify the loop so that it stores the results in a new NumPy array instead of just printing them?  


---

12. For Loop in List  

Create a list of words: `["python", "loop", "list", "iteration"]`.  
Write a for loop to print the length of each word.  

- Re-question: How can you rewrite the same loop using a **list comprehension**?  


---

13. For Loop in Dictionary  

Given a dictionary of student scores:  
`{"Alice": 85, "Bob": 92, "Charlie": 78, "Diana": 88}`  

Write a for loop to print each student's name along with their score.  

- Re-question: Modify the loop so that it only prints the names of students who scored above 80.  


---

14. For Loop using Range  

Write a for loop using `range()` to print all even numbers between 1 and 20.  

- Re-question: How would you change the loop to also calculate the **sum** of these even numbers while iterating?  


---

15. Iterations over Pandas (ENAHO dataset)  [2 pts] 

Suppose you are analyzing the **National Household Survey (ENAHO)** dataset, specifically the file **`ENAHO01A-2023-400`**.  
The question of interest is **`P41601`**: *“¿Cuánto fue el monto total por la compra o servicio?”*.  

Write a `for` loop that iterates over the column `P41601` and prints values greater than **5000**.  

- **Re-question:** How would you optimize this task using **pandas vectorized operations** (e.g., boolean indexing) instead of a `for` loop, to make the analysis faster and more efficient?  

