# NumPy Introduction Summary

## Overview
This chapter covers techniques for effectively loading, storing, and manipulating in-memory data in Python. The topic encompasses datasets from various sources and formats, including documents, images, sound clips, numerical measurements, and more.

# Documentation
[Numpy Docs](https://numpy.org/doc/stable/)

## Fundamental Data Concept
Despite the apparent heterogeneity of data types, all data can be fundamentally thought of as arrays of numbers:
- Images can be represented as two-dimensional arrays of numbers representing pixel brightness
- Sound clips can be represented as one-dimensional arrays of intensity versus time
- Text can be converted into numerical representations through various methods
- The first step in data analysis is transforming data into arrays of numbers

## NumPy's Role
Efficient storage and manipulation of numerical arrays is fundamental to data science. Python provides specialized tools for this:
- **NumPy package** (covered in this chapter)
- **Pandas package** (covered in Chapter 3)

## What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It provides:
- A multidimensional array object (ndarray)
- Various derived objects (masked arrays, matrices)
- Routines for fast operations on arrays including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, and random simulation

## Core NumPy Features

### The ndarray Object
At the core of NumPy is the `ndarray` object, which encapsulates n-dimensional arrays of homogeneous data types. Key characteristics:
- **Fixed size**: NumPy arrays have a fixed size at creation, unlike Python lists which can grow dynamically
- **Homogeneous data types**: All elements must be of the same data type and size in memory
- **Compiled performance**: Many operations are performed in compiled code for speed
- **Efficient operations**: Advanced mathematical operations on large datasets are executed more efficiently than with Python's built-in sequences

### Key Differences from Python Sequences
1. **Size flexibility**: Python lists can grow dynamically, NumPy arrays cannot
2. **Data type consistency**: NumPy arrays require homogeneous data types
3. **Performance**: NumPy operations are much faster for large datasets
4. **Scientific computing integration**: Most scientific Python packages use NumPy arrays internally

## Why is NumPy Fast?

### Vectorization
Vectorization eliminates explicit looping and indexing in code by handling these operations behind the scenes in optimized, pre-compiled C code. Benefits include:
- More concise and readable code
- Fewer bugs due to reduced code complexity
- Code that more closely resembles standard mathematical notation
- More "Pythonic" code without inefficient for loops

### Broadcasting
Broadcasting describes the implicit element-by-element behavior of operations in NumPy:
- All operations (arithmetic, logical, bit-wise, functional) broadcast by default
- Supports operations between arrays of different shapes when the smaller array can be expanded unambiguously
- Enables efficient operations without explicit loops

### Performance Comparison
- **Python lists**: Require explicit loops and are inefficient for large datasets
- **C code**: Fast but requires more complex implementation and loses Python benefits
- **NumPy**: Combines the speed of C with the simplicity of Python syntax

## NumPy in the Scientific Ecosystem
- NumPy arrays are the de-facto standard for multi-dimensional data interchange in Python
- Most scientific and mathematical Python packages use NumPy arrays internally
- Even packages that accept Python sequences convert them to NumPy arrays for processing
- Understanding NumPy is essential for effective use of scientific Python software

## Object-Oriented Approach
NumPy fully supports object-oriented programming:
- `ndarray` is a class with numerous methods and attributes
- Methods are mirrored by functions in the NumPy namespace
- Allows programmers to choose between object-oriented and functional paradigms

## Installation and Setup
- For manual installation, visit the NumPy website for instructions
- Recommended version: 1.8 or later
- Convention: import NumPy using `np` as an alias: `import numpy as np`



### How NumPy Manages Memory Internally
A key reason NumPy is so fast is its efficient memory management. This is a crucial concept to understand.

- **Contiguous Memory Block**: Unlike Python lists, which store pointers to objects that can be scattered anywhere in memory, a NumPy array's data is stored in a single, continuous block of memory. This is the most important reason for NumPy's performance. It allows for optimized, pre-compiled C code to perform operations on the entire array at once, leveraging CPU features like vectorization and improving memory access patterns (cache locality).

- **Homogeneous Data Types**: All elements in a NumPy array must have the same data type (e.g., `int64`, `float32`). Consequently, every element occupies the same amount of memory. This uniformity allows NumPy to calculate the memory address of any element in the array with a simple arithmetic formula, without having to look up pointers.

- **Strides**: NumPy keeps track of the data layout in memory through a concept called "strides". The strides are a tuple of integers specifying the number of bytes to "step" in each dimension to move to the next element. This clever mechanism allows NumPy to interpret the same block of memory in different ways (e.g., as a 1D array or a 2D matrix) without making copies of the data.

### Basic Usage & Examples
Here are some examples to get you started with NumPy.

**1. Importing NumPy**
```python
import numpy as np

# Check the installed version. It's good practice to know what version you are working with.
print(f"NumPy Version: {np.__version__}")
```


**2. Creating Arrays**
NumPy arrays can be created from Python lists. They can be of any dimension.
```python
# Create a 1D array from a Python list
arr_1d = np.array([1, 2, 3, 4, 5]) # np.arange(6) does the same
print("1D Array:")
print(arr_1d)

# Create a 2D array (matrix) from a list of lists
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(arr_2d)
```

**3. Inspecting Array Attributes**
You can easily inspect the properties of a NumPy array.
```python
# The type of a NumPy array is ndarray
print(f"\nType of arr_1d: {type(arr_1d)}")

# Get the dimensions of the array
print(f"Dimensions of arr_1d: {arr_1d.ndim}")
print(f"Dimensions of arr_2d: {arr_2d.ndim}")

# Get the shape of the array (number of elements in each dimension)
print(f"Shape of arr_1d: {arr_1d.shape}")
print(f"Shape of arr_2d: {arr_2d.shape}")

# Get the data type of the elements in the array
print(f"Data type of arr_1d: {arr_1d.dtype}")
```


```python
# Example of memory information for our 2D array
print("\nMemory information for arr_2d:")
print(f"Data type: {arr_2d.dtype}")
print(f"Size of each item: {arr_2d.itemsize} bytes")
print(f"Total size of array: {arr_2d.nbytes} bytes")
print(f"Memory Strides: {arr_2d.strides} bytes")
```

### Getting Help
NumPy has extensive built-in documentation accessible via the `help()` function.
```python
# The `help()` function can be used to get the full documentation for a module or object.
# Uncomment the lines below to try them. Be aware they produce a lot of output.

help(np)
help(np.ndarray)
np? 
np??
```





