<a href="https://colab.research.google.com/github/alongiladi/Machine_Learning_With_Python/blob/main/NumPy_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy: The Superpower for Data in Python ðŸš€

Welcome to the world of NumPy! If you're interested in data science, machine learning, or any kind of scientific computing in Python, NumPy is a library you'll want to be best friends with. Think of it as a super-powered calculator that can handle huge amounts of data with ease. It's built around a special kind of list called an N-dimensional array, and it's packed with tools for everything from complex math to generating random numbers.

## Getting Started: Creating Your First Arrays

First things first, let's bring NumPy into our project. The standard way to do this is to import it with the alias `np`. This is a widely-used convention that makes your code easier for others to read.

**Why is this important?**
This single line of code is your gateway to all of NumPy's amazing features. In machine learning, you'll be constantly working with data, and NumPy provides the foundation for organizing and manipulating that data efficiently. Whether you're dealing with images, text, or sensor readings, NumPy is where the journey begins.

## The Power of Zeros: Initializing Your Data

The `zeros` function is a simple but powerful tool. It lets you create an array filled with zeros. This is incredibly useful when you need to create a placeholder for data that you'll fill in later.

In [None]:
a = np.zeros(5)
print(a)

[0. 0. 0. 0. 0.]


**Real-world application:**
Imagine you're building a machine learning model. You might start by initializing the 'weights' or 'biases' of your model to zero. This array is your blank canvas.

Creating a 2D array (a matrix) is just as easy. Just provide a tuple with the number of rows and columns you want. Hereâ€™s a 3x4 matrix:

**Real-world application:**
This could represent a simple game board, a black and white image, or the initial state of a neural network layer. For example, in a game of tic-tac-toe, a 3x3 matrix of zeros could be the starting point.

## Know Your Array: A Bit of Vocabulary

* **Axis:** In NumPy, a dimension is called an axis.
* **Rank:** The number of axes is the rank. A 2D matrix has a rank of 2.
* **Shape:** The length of each axis is the shape. Our 3x4 matrix has a shape of `(3, 4)`.
* **Size:** The total number of elements in the array. For our 3x4 matrix, the size is 12.

## Let's Create a 3x4 Matrix of Zeros

**Why this is useful:**
This creates a 3x4 matrix filled with zeros and assigns it to the variable `a`. In machine learning, you might use a matrix like this to represent a batch of data, where each row is a data point and each column is a feature.

## What's the Shape of My Array?

**Why this is important:**
Knowing the shape of your arrays is crucial in machine learning. If you try to perform an operation on two arrays with incompatible shapes, you'll get an error. Checking the shape helps you debug your code and ensure your data is structured correctly.

## How Many Dimensions Do I Have?

In [None]:
  # equal to len(a.shape)

**Why this matters:**
This tells you the number of dimensions (or the rank) of your array. For a 2D array, it will be 2. In deep learning, you'll often work with 'tensors', which are just multi-dimensional arrays. The number of dimensions tells you what kind of data you're dealing with (e.g., a 1D vector, a 2D matrix, or a 3D tensor for an image).

## How Many Elements in Total?

**Why this is useful:**
This gives you the total number of elements in your array. It's a quick way to get a sense of the scale of your data.

## Beyond 2D: N-Dimensional Arrays

You can create arrays with any number of dimensions. Hereâ€™s a 3D array (rank 3) with the shape `(2,3,4)`:

**Real-world application:**
This could represent a color image (with height, width, and color channels), a video (with frames, height, and width), or a batch of sentences in natural language processing.

## The `ndarray`: NumPy's Core

All NumPy arrays are of the type `ndarray`:

**Why this is important:**
The `ndarray` is the heart of NumPy. It's a powerful and efficient data structure that allows for fast numerical operations. Understanding that you're working with `ndarray` objects is key to using NumPy effectively.

## More Array Creation Tricks

NumPy provides many other ways to create arrays.

### `np.ones`
Here's a 3x4 matrix full of ones:

**Real-world application:**
This can be useful for creating masks in image processing or for initializing weights in a neural network to a uniform value.

### `np.full`
Create an array of a given shape, filled with a specific value. Here's a 3x4 matrix full of `Ï€`.

**Why this is useful:**
This is great for initializing an array to a specific constant value that you'll use in your calculations.

### `np.empty`
Create an uninitialized 2x3 array. The contents are unpredictable because it's whatever is in that spot in memory at the time.

**Why this is useful:**
When you need to allocate memory for an array but don't want to waste time initializing it, `np.empty` is your friend. You can then fill it with your data later.

### `np.array`
You can also create a NumPy array from a regular Python list.

**Why this is important:**
This is how you'll often convert your existing data into a NumPy array so you can take advantage of all its powerful features.

### `np.arange`
Similar to Python's `range` function, `np.arange` lets you create an array with a sequence of numbers.

**Why this is useful:**
This is great for creating sequences of numbers for indexing, plotting, or generating sample data.

You can also specify a step size:

**Why this is useful:**
This allows you to create sequences with non-integer steps, which is useful for creating grids of values for things like hyperparameter tuning in machine learning.

## Reshaping Your Data

The `reshape` function is one of the most powerful tools in your NumPy arsenal. It lets you change the shape of an array without changing its data. Importantly, the new array points to the *same* data, so changing one will change the other.

In [None]:

print(g)

This creates a 1D array with numbers from 0 to 23. Now, let's reshape it.

In [None]:

print(g2)

**Why this is so important:**
Reshaping is a fundamental operation in machine learning. You'll constantly be reshaping your data to fit the input requirements of different models. For example, you might have a long 1D array of pixel values that you need to reshape into a 2D image.

## Generating Random Data

NumPy's `random` module is your go-to for creating arrays with random values. This is essential for everything from initializing the weights of a neural network to creating synthetic data for testing your models.

### `np.random.rand`
Hereâ€™s a 3x4 matrix with random numbers between 0 and 1 (from a uniform distribution):

**Why this is important:**
Random initialization is a key concept in machine learning. It helps to break the symmetry in your model and allows it to learn more effectively.

### `np.random.randn`
Hereâ€™s a 3x4 matrix with random numbers from a standard normal distribution (mean 0, variance 1):

**Why this is important:**
The normal distribution is incredibly common in nature and statistics. Initializing weights from a normal distribution is a standard practice in deep learning.

## Visualizing the Distributions

Let's see what these two distributions look like. We'll use `matplotlib` to create a histogram.

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.hist(np.random.rand(100000), density=True, bins=100, histtype="step", color="blue", label="rand")
plt.hist(np.random.randn(100000), density=True, bins=100, histtype="step", color="red", label="randn")
plt.axis([-2.5, 2.5, 0, 1.1])
plt.legend(loc = "upper left")
plt.title("Random distributions")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

NameError: name 'plt' is not defined

**What this shows us:**
The blue line (`rand`) shows the uniform distribution, where every value between 0 and 1 has an equal chance of being chosen. The red line (`randn`) shows the normal distribution, where values are clustered around the mean (0) and become less likely as you move further away.

**Exercise: Your Turn to Be a Data Scientist!**

**The Scenario:**
You're a data scientist at a gaming company. You've just collected data on the reaction times of two groups of players: 'Pros' and 'Amateurs'. You need to analyze this data to see if there's a significant difference between the two groups.

**Your Mission:**
1.  **Generate the Data:** Create two synthetic datasets to represent the reaction times of the two groups.
2.  **Analyze the Data:** Calculate some basic statistics for each group.
3.  **Explore the Data:** Practice your NumPy skills by creating and manipulating arrays.
4.  **Compare the Groups:** See how the two groups stack up against each other.

**Instructions:**

1.  **Generate the Data:**
    *   **Amateurs:** Create a NumPy array with 1000 random values from a normal distribution with a mean of 250ms and a standard deviation of 50ms.
    *   **Pros:** Create a NumPy array with 1000 random values from a normal distribution with a mean of 150ms and a standard deviation of 30ms.

2.  **Analyze the Data:**
    For each group, calculate:
    *   The average reaction time (mean).
    *   The standard deviation.
    *   The fastest and slowest reaction times (min and max).
    *   The 25th, 50th (median), and 75th percentiles.

3.  **Explore the Data:**
    *   Create a 1D NumPy array to represent the player IDs (from 0 to 999).
    *   Use `np.arange` to create an array of the same player IDs.
    *   Reshape this array into a 2D array with 500 rows and 2 columns.
    *   Check the dimensionality of your arrays.

4.  **Compare the Groups:**
    *   Count how many amateurs had a faster reaction time than the average pro player.

In [None]:
import numpy as np

# Step 1: Generate the data
amateurs =
pros =

# Step 2: Calculate basic statistics
def compute_stats(feature):
    mean =
    std =
    min_value =
    max_value =
    percentiles =

    return mean, std, min_value, max_value, percentiles

stats_amateurs =
stats_pros =

# Display the results
print("Amateur Player Statistics:")
print(f"Average Reaction Time: {}ms, Std: {}ms, Min: {}ms, Max: {}ms")
print(f"25th, 50th, 75th percentiles: {}")

print("Pro Player Statistics:")
print(f"Average Reaction Time: {}ms, Std: {}ms, Min: {}ms, Max: {}ms")
print(f"25th, 50th, 75th percentiles: {}")

# Step 3: Working with Arrays
# Create a 1D array using np.array
player_ids =
print("1D Array of player IDs:")
print(player_ids)

# Create a 2D array using np.arange and np.reshape
reshaped_ids =
print("Reshaped Array of player IDs (500 rows, 2 columns):")
print(reshaped_ids)

# Check the dimensionality using ndim
print("Dimensionality of the 1D array:", )
print("Dimensionality of the 2D array:", )

# Step 4: Compare the two features
avg_pro_reaction_time =
faster_amateurs =
print(f"Number of amateurs faster than the average pro: {}")