# Day 4: NumPy Fundamentals

1. Overview  
2. Setting Up NumPy  
3. Vectorized Operations  
4. Indexing, Slicing & Masking  
5. Exercise 1: Array Creation & Properties  
6. Exercise 2: Indexing & Slicing Solar Data  
7. Exercise 3: Boolean Masking & Filtering  
8. Exercise 4: Array Statistics  

---

## 1. Overview

Welcome to Day 4 of my Energy Analytics journey! Today’s goals are to:

- Learn how to create and manipulate NumPy arrays  
- Perform vectorized operations for faster data processing  
- Explore indexing, slicing, and boolean masking  
- Compute basic array statistics (mean, median, std)  
- Complete four energy-related exercises in my Jupyter notebook  

---

## 2. Setting Up NumPy

In this section, I will:

- Import NumPy with the standard alias  
  ```python
  import numpy as np

- Create arrays from Python list
- Use `np.array()`, `np.zeros()`, and `np.ones()`
- Check arrays properties: `.shape`, `.dtype`, `.ndim`

In [10]:
import numpy as np
arr = np.array([1, 2, 3])
print(arr.shape, arr.dtype, arr.ndim)

(3,) int64 1


---

## 3. Vectorized Operations

I will learn to:

- Add, subtract, multiply, divide arrays elementwise  
- Use scalar broadcasting  
- Compare arrays to produce boolean masks  

In [11]:
a = np.array([5.4, 6.2, 5.6])
b = np.array([1.0, 1.0, 1.0])
print(a + b)          # add 1 kwh/m² to each
print(a * 2)          # double each value
print(a > 5.5)        # boolean mask for high irradiance

[6.4 7.2 6.6]
[10.8 12.4 11.2]
[False  True  True]


---

## 4. Indexing, Slicing & Masking

I will practice:

- Accessing single elements (`arr[0]`)  
- Slicing ranges (`arr[1:4]`)  
- Multi-dimensional indexing  
- Boolean masking to filter values  

In [12]:
data = np.arange(10)     # [0,1,.....,9]
print(data[2:7])         # slice
mask = data % 2 == 0
print(data[mask])        # even numbers

[2 3 4 5 6]
[0 2 4 6 8]


---

### Exercise 1: Array Creation & Properties

In this cell, I:

1. Create a NumPy array `irr_wh = [5200, 4800, 5300, 5100, 4950]`  
2. Convert to kWh with a vectorized operation → `irr_kwh`  
3. Print `irr_kwh`, its shape, dtype, and ndim  

- **Why?**  
  Vectorized ops are much faster than Python loops for large datasets.  
- **How?**  
  I use `np.array()` and simple arithmetic on the array.  

In [None]:
import numpy as np

irr_wh = np.array([5200, 4800, 5300, 5100, 4950])

# convert wh to kwh
irr_kwh = irr_wh / 1000

print(irr_kwh.shape, irr_kwh.dtype, irr_kwh.ndim)

(5,) float64 1


---
### Exercise 2: Indexing & Slicing Solar Data

In this cell, I:

1. Given `irr_kwh`, extract days 2–4 (indices 1:4) into `mid_month`  
2. Print `mid_month` and its statistics  

- **Why?**  
  Slicing lets me focus on specific periods (e.g., peak season).  
- **How?**  
  I use `irr_kwh[1:4]` to get the subarray.  

In [7]:
import numpy as np

mid_month = irr_kwh[1:4]
print(np.mean(mid_month))

5.066666666666666


---

### Exercise 3: Boolean Masking & Filtering

In this cell, I:

1. Create a boolean mask for irradiance ≥ 5.1 kWh/m²  
2. Apply the mask to `irr_kwh` to get `high_irr`  
3. Print `high_irr`  

- **Why?**  
  Masking helps me isolate days with excellent solar potential.  
- **How?**  
  I use `irr_kwh >= 5.1` and apply it directly to the array.  

In [8]:
high_irr = irr_kwh >= 5.1
print(high_irr)

[ True False  True  True False]


---

### Exercise 4: Array Statistics

In this cell, I:

1. Compute `mean`, `median`, and `standard deviation` of `irr_kwh` using NumPy functions  
2. Print the results with descriptive text  

- **Why?**  
  Summary statistics are essential to understand data distribution.  
- **How?**  
  I use `np.mean()`, `np.median()`, and `np.std()` on the array.  

In [9]:
print(np.mean(irr_kwh), np.median(irr_kwh), np.std(irr_kwh))

5.069999999999999 5.1 0.17776388834631177
