# 02 - Array Creation Basics

## Introduction

NumPy provides many ways to create arrays. Understanding different creation methods helps you initialize arrays efficiently for data engineering tasks.

## What You'll Learn

- Creating arrays from lists
- Creating arrays with zeros, ones, empty
- Creating arrays with arange, linspace, logspace
- Creating arrays with random numbers
- Array data types (dtypes)
- Type conversion


In [1]:
import numpy as np


## Creating Arrays from Lists

The most common way to create arrays is from Python lists or tuples.


In [2]:
# From a list
arr1 = np.array([1, 2, 3, 4, 5])
print("From list:", arr1)

# From a tuple
arr2 = np.array((1, 2, 3, 4, 5))
print("From tuple:", arr2)

# 2D array from nested lists
arr3 = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:")
print(arr3)


From list: [1 2 3 4 5]
From tuple: [1 2 3 4 5]
2D array:
[[1 2 3]
 [4 5 6]]


## Creating Arrays with Initial Values

NumPy provides functions to create arrays filled with specific values:


In [3]:
# Zeros: Create array filled with zeros
zeros_1d = np.zeros(5)
print("1D zeros:", zeros_1d)

zeros_2d = np.zeros((3, 4))
print("\n2D zeros (3x4):")
print(zeros_2d)

# Ones: Create array filled with ones
ones_1d = np.ones(5)
print("\n1D ones:", ones_1d)

ones_2d = np.ones((2, 3))
print("\n2D ones (2x3):")
print(ones_2d)

# Full: Create array filled with a specific value
full_arr = np.full((3, 3), 7)
print("\nArray filled with 7:")
print(full_arr)


1D zeros: [0. 0. 0. 0. 0.]

2D zeros (3x4):
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

1D ones: [1. 1. 1. 1. 1.]

2D ones (2x3):
[[1. 1. 1.]
 [1. 1. 1.]]

Array filled with 7:
[[7 7 7]
 [7 7 7]
 [7 7 7]]


In [4]:
# Empty: Create array with uninitialized values (faster, but contains garbage)
# Use with caution - values are unpredictable
empty_arr = np.empty((2, 3))
print("Empty array (uninitialized):")
print(empty_arr)

# Identity matrix: Square matrix with 1s on diagonal, 0s elsewhere
identity = np.eye(4)
print("\n4x4 Identity matrix:")
print(identity)


Empty array (uninitialized):
[[1. 1. 1.]
 [1. 1. 1.]]

4x4 Identity matrix:
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


## Creating Arrays with Ranges

NumPy provides several functions to create arrays with sequences of numbers:


In [5]:
# arange: Similar to Python's range, but returns an array
# np.arange(start, stop, step)
arr_range = np.arange(0, 10, 2)
print("arange(0, 10, 2):", arr_range)

arr_range2 = np.arange(5)  # Default start=0, step=1
print("arange(5):", arr_range2)

arr_range3 = np.arange(1, 10)  # Default step=1
print("arange(1, 10):", arr_range3)


arange(0, 10, 2): [0 2 4 6 8]
arange(5): [0 1 2 3 4]
arange(1, 10): [1 2 3 4 5 6 7 8 9]


In [6]:
# linspace: Create array with evenly spaced numbers over a specified interval
# np.linspace(start, stop, num)
# Always includes both endpoints
linspace_arr = np.linspace(0, 1, 5)
print("linspace(0, 1, 5):", linspace_arr)

linspace_arr2 = np.linspace(0, 10, 11)
print("linspace(0, 10, 11):", linspace_arr2)

# Useful for creating test data or plotting
x = np.linspace(0, 2*np.pi, 100)  # 100 points from 0 to 2π
print(f"\nlinspace(0, 2π, 100) - first 5 values: {x[:5]}")


linspace(0, 1, 5): [0.   0.25 0.5  0.75 1.  ]
linspace(0, 10, 11): [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

linspace(0, 2π, 100) - first 5 values: [0.         0.06346652 0.12693304 0.19039955 0.25386607]


In [7]:
# logspace: Create array with numbers spaced evenly on a log scale
# np.logspace(start, stop, num, base=10)
logspace_arr = np.logspace(0, 2, 5)  # 10^0 to 10^2, 5 points
print("logspace(0, 2, 5):", logspace_arr)
print("These are:", [f"10^{i}" for i in np.linspace(0, 2, 5)])


logspace(0, 2, 5): [  1.           3.16227766  10.          31.6227766  100.        ]
These are: ['10^0.0', '10^0.5', '10^1.0', '10^1.5', '10^2.0']


## Creating Random Arrays

NumPy's random module provides functions to generate arrays with random values. This is very useful for testing, simulations, and initializing machine learning models.


In [8]:
# Set random seed for reproducibility (important for testing!)
np.random.seed(42)

# rand: Uniform distribution [0, 1)
random_arr = np.random.rand(3, 4)
print("Random array (3x4) from uniform [0,1):")
print(random_arr)

# randn: Standard normal distribution (mean=0, std=1)
normal_arr = np.random.randn(3, 4)
print("\nRandom array from standard normal:")
print(normal_arr)


Random array (3x4) from uniform [0,1):
[[0.37454012 0.95071431 0.73199394 0.59865848]
 [0.15601864 0.15599452 0.05808361 0.86617615]
 [0.60111501 0.70807258 0.02058449 0.96990985]]

Random array from standard normal:
[[-0.46947439  0.54256004 -0.46341769 -0.46572975]
 [ 0.24196227 -1.91328024 -1.72491783 -0.56228753]
 [-1.01283112  0.31424733 -0.90802408 -1.4123037 ]]


In [9]:
# randint: Random integers in a given range
# np.random.randint(low, high, size)
int_arr = np.random.randint(1, 10, (3, 4))
print("Random integers [1, 10):")
print(int_arr)

# random: Uniform distribution [0, 1) - similar to rand
uniform_arr = np.random.random((2, 3))
print("\nUniform random (2x3):")
print(uniform_arr)


Random integers [1, 10):
[[3 7 4 9]
 [3 5 3 7]
 [5 9 7 2]]

Uniform random (2x3):
[[0.94888554 0.96563203 0.80839735]
 [0.30461377 0.09767211 0.68423303]]


## Specifying Data Types

You can specify the data type when creating arrays for memory efficiency and precision control.


In [10]:
# Specify dtype when creating arrays
arr_int32 = np.array([1, 2, 3], dtype=np.int32)
print(f"int32 array: {arr_int32}, dtype: {arr_int32.dtype}")

arr_float64 = np.array([1.0, 2.0, 3.0], dtype=np.float64)
print(f"float64 array: {arr_float64}, dtype: {arr_float64.dtype}")

# Create zeros with specific dtype
zeros_float = np.zeros((3, 3), dtype=np.float32)
print(f"\nZeros with float32: dtype = {zeros_float.dtype}")

# Memory comparison
arr_int64 = np.array([1, 2, 3], dtype=np.int64)
arr_int32 = np.array([1, 2, 3], dtype=np.int32)
print(f"\nMemory: int64 = {arr_int64.nbytes} bytes, int32 = {arr_int32.nbytes} bytes")


int32 array: [1 2 3], dtype: int32
float64 array: [1. 2. 3.], dtype: float64

Zeros with float32: dtype = float32

Memory: int64 = 24 bytes, int32 = 12 bytes


## Type Conversion

You can convert arrays from one data type to another:


In [11]:
# Create integer array
arr_int = np.array([1, 2, 3, 4, 5])
print(f"Original: {arr_int}, dtype: {arr_int.dtype}")

# Convert to float
arr_float = arr_int.astype(np.float64)
print(f"Converted to float: {arr_float}, dtype: {arr_float.dtype}")

# Convert to string
arr_str = arr_int.astype(str)
print(f"Converted to string: {arr_str}, dtype: {arr_str.dtype}")

# Using astype() method (returns new array)
arr_new = arr_int.astype(np.float32)
print(f"\nUsing astype(): {arr_new}, dtype: {arr_new.dtype}")


Original: [1 2 3 4 5], dtype: int64
Converted to float: [1. 2. 3. 4. 5.], dtype: float64
Converted to string: ['1' '2' '3' '4' '5'], dtype: <U21

Using astype(): [1. 2. 3. 4. 5.], dtype: float32


## Common Data Types

Here are the most commonly used NumPy data types:

| Type | Description | Size |
|------|-------------|------|
| int8, int16, int32, int64 | Signed integers | 1, 2, 4, 8 bytes |
| uint8, uint16, uint32, uint64 | Unsigned integers | 1, 2, 4, 8 bytes |
| float16, float32, float64 | Floating point | 2, 4, 8 bytes |
| complex64, complex128 | Complex numbers | 8, 16 bytes |
| bool | Boolean | 1 byte |
| object | Python objects | Variable |
| string_ | Fixed-length string | Variable |

**Best Practice**: Use the smallest data type that fits your data to save memory.


In [12]:
# Examples of different data types
arr_bool = np.array([True, False, True], dtype=bool)
print(f"Boolean: {arr_bool}, dtype: {arr_bool.dtype}")

arr_complex = np.array([1+2j, 3+4j], dtype=complex)
print(f"Complex: {arr_complex}, dtype: {arr_complex.dtype}")

arr_string = np.array(['hello', 'world'], dtype='U10')  # Unicode string, max 10 chars
print(f"String: {arr_string}, dtype: {arr_string.dtype}")


Boolean: [ True False  True], dtype: bool
Complex: [1.+2.j 3.+4.j], dtype: complex128
String: ['hello' 'world'], dtype: <U10


## Summary

In this notebook, you learned:

1. **Array creation methods**: From lists, zeros, ones, full, empty, identity
2. **Range functions**: arange, linspace, logspace for creating sequences
3. **Random arrays**: rand, randn, randint, random for generating random data
4. **Data types**: Understanding and specifying dtypes for memory efficiency
5. **Type conversion**: Using astype() to convert between data types

**Key Takeaways**:
- Choose the right creation method for your use case
- Use appropriate data types to save memory
- Random arrays are essential for testing and ML initialization
- linspace is great for creating evenly spaced test data

**Next Steps**: In the next notebook, we'll learn how to access and modify array elements through indexing and slicing.
