### Introduction to Numpy
Numpy is a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

##### Installing and Importing Numpy
To use Numpy, you first need to install it if you haven't already. You can do this using pip:

In [1]:
!pip install numpy



After installation, you can import Numpy in your Python code as follows:

In [3]:
import numpy as np

##### Use Case
Before we dive into the details, let's briefly discuss a use case where Numpy shines. Consider you have a list of numbers and you want to perform some operations on them. Let's see how Numpy simplifies this compared to using Python lists.

In [4]:
# Using Python Lists
numbers = [1, 2, 3, 4, 5]
squared = [x**2 for x in numbers]
sum_squared = sum(squared)

# Using Numpy
import numpy as np
numbers_np = np.array(numbers)
squared_np = numbers_np**2
sum_squared_np = np.sum(squared_np)

In this example, Numpy allows you to perform element-wise operations on arrays, making your code concise and efficient.

### Motivation: Why to use Numpy? - How is it different from Python Lists?
Python lists are flexible but may not be optimized for numerical operations. Numpy arrays offer several advantages:

- **Performance**: Numpy operations are faster than Python lists, especially for large datasets.
- **Concise Code**: Numpy enables concise and readable code for mathematical operations.
- **Multidimensional Arrays**: Numpy supports multi-dimensional arrays, which are essential for tasks like image processing and machine learning.

In [5]:
import timeit
# Using Python Lists
python_list = list(range(1, 10001))
python_square_time = timeit.timeit(lambda: [x**2 for x in python_list], number=10000)

# Using Numpy
numpy_array = np.arange(1, 10001)
numpy_square_time = timeit.timeit(lambda: numpy_array**2, number=10000)

print(f"Time taken for Python List: {python_square_time} seconds")
print(f"Time taken for Numpy Array: {numpy_square_time} seconds")


Time taken for Python List: 24.204637699993327 seconds
Time taken for Numpy Array: 0.05883340002037585 seconds


### Basics of Numpy Arrays
Numpy arrays are the core data structure for numerical computations in Numpy. They are homogeneous, meaning all elements are of the same data type, which allows for efficient mathematical operations.

**Creating a Basic Numpy Array**
1. From a List - array(), shape, ndim

In [7]:
# Create a Numpy array from a Python list
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)

# Check the shape and dimensions of the array
print("Shape of the array:", my_array.shape)
print("Number of dimensions:", my_array.ndim)


Shape of the array: (5,)
Number of dimensions: 1


2. From a range and step size - arange()

In [8]:
import numpy as np

# Create an array from 0 to 9 with a step of 2
my_range_array = np.arange(0, 10, 2)
print(my_range_array)


[0 2 4 6 8]


In [9]:
my_array = np.array([1, 2, 3])
print(type(my_array))

<class 'numpy.ndarray'>


#### Why Numpy Computes Faster?
Numpy's speed can be attributed to three key factors:

- **Dense Memory Packing**: Numpy arrays are homogenous, meaning they have a consistent data type. This allows Numpy to store data densely in memory, reducing memory overhead and improving access times.

- **Parallel Processing**: Numpy can divide computational tasks into multiple subtasks and process them in parallel, taking full advantage of multi-core processors. This parallelism significantly speeds up computations.

- **C Implementation**: Numpy's core functions are implemented in C, a low-level language known for its speed. When you perform operations on Numpy arrays, the underlying C code is executed, making Numpy much faster than equivalent Python lists.

#### Why C Arrays are Faster?
C arrays get the speed advantage over Python lists mainly because of these cool things:

- **All in the Same Gang**: In C, arrays store buddies of the same data type together. They're like a well-organized club where everyone has the same drink. This makes it super easy to handle them because they all fit into the same-sized memory slots.

- **Memory Mates**: C arrays keep all their stuff in one place, like putting all your books on a single bookshelf. When you need something, you just grab it from that one spot. Python lists, though, they keep a list of where things are (like bookmarks), and when you want something, you first look up where it's stored and then fetch it. A bit like checking an index in a library before you can read a book.

- **Direct Access**: With C arrays, if you want something, you just go straight to it using an index. No need for middlemen or extra steps. Python lists, on the other hand, make you go through a couple of hoops before you get what you want.

**But here's the twist**: Numpy, which is like C's trendy cousin, can do all of this while still letting you mix and match different types of data. Unlike C arrays, Numpy arrays keep the flexibility to hold different things while being lightning-fast with contiguous data. It's like having the best of both worlds!

### 1D Arrays and Indexing
Numpy makes it easy to work with 1D arrays efficiently, allowing you to access and manipulate data with ease.

#### Indexing and Slicing on 1D Arrays
**Indexing**

You can access elements in a 1D Numpy array using indexing, just like you would with Python lists:

In [10]:
# Create a 1D Numpy array
my_array = np.array([10, 20, 30, 40, 50])

# Access elements using indices
element = my_array[2]  # Access the third element (30)

In [11]:
element

30

#### Slicing
Slicing allows you to extract a portion of a 1D array. It's like cutting a cake into smaller pieces:

In [12]:
# Create a 1D Numpy array
my_array = np.array([10, 20, 30, 40, 50])

# Slice the array to get a subset
subset = my_array[1:4]  # Get elements from index 1 to 3 (20, 30, 40)


In [13]:
subset

array([20, 30, 40])

#### Masking (Fancy Indexing)
You can also use a mask, which is a boolean array, to select specific elements based on a condition:

In [14]:
# Create a 1D Numpy array
my_array = np.array([10, 20, 30, 40, 50])

# Create a mask for values greater than 30
mask = my_array > 30

# Use the mask to select elements
selected_elements = my_array[mask]  # Get elements greater than 30 (40, 50)


In [16]:
mask

array([False, False, False,  True,  True])

In [17]:
selected_elements

array([40, 50])

### Operations on 1D Arrays
Numpy provides a wide range of operations that you can perform on 1D arrays efficiently.

#### Universal Functions (ufunc) on 1D Array
Universal functions, or ufuncs, are functions that operate element-wise on Numpy arrays. They make it easy to perform mathematical operations on entire arrays.

In [18]:
# Create a 1D Numpy array
my_array = np.array([1, 2, 3, 4, 5])

# Add 2 to each element
result = my_array + 2  # [3, 4, 5, 6, 7]


In [19]:
result

array([3, 4, 5, 6, 7])

#### Aggregate Function/Reduction functions
Numpy provides functions for aggregating data in arrays, such as finding the sum, mean, minimum, and maximum:

In [20]:
# Create a 1D Numpy array
my_array = np.array([1, 2, 3, 4, 5])

# Calculate the sum, mean, minimum, and maximum
total = np.sum(my_array)
average = np.mean(my_array)
minimum = np.min(my_array)
maximum = np.max(my_array)


In [21]:
total

15

In [22]:
average

3.0

In [23]:
minimum

1

In [24]:
maximum

5

#### Use Case: Daily Steps Tracking
Imagine you're tracking the number of steps you take each day as part of a fitness routine. You can use a 1D Numpy array to manage and analyze this data.

In [25]:
# Daily steps taken for a week
steps = np.array([8000, 9500, 10000, 8500, 11000, 7500, 9000])

# Calculate the total steps for the week
total_steps = np.sum(steps)
print("Total steps taken for the week:", total_steps)

# Calculate the average daily steps
average_daily_steps = np.mean(steps)
print("Average daily steps:", average_daily_steps)

# Find the day with the highest number of steps
max_steps_day = np.argmax(steps) + 1  # Adding 1 to account for 0-based indexing
print("Day with the highest number of steps:", max_steps_day)

# Determine how many days you reached your daily step goal (e.g., 10,000 steps)
goal = 10000
days_met_goal = np.sum(steps >= goal)
print("Number of days you reached your step goal:", days_met_goal)


Total steps taken for the week: 63500
Average daily steps: 9071.42857142857
Day with the highest number of steps: 5
Number of days you reached your step goal: 2


In the previous code, we used several Numpy functions to analyze the daily steps tracking data:

#### `np.sum()`

The `np.sum()` function is used to calculate the total sum of elements in a Numpy array. In this case, it helped us find the total number of steps taken for the week.

#### `np.mean()`

The `np.mean()` function calculates the average of elements in a Numpy array. We used it to determine the average daily steps taken.

#### `np.argmax()`

With `np.argmax()`, we found the index of the maximum value in the Numpy array, indicating the day with the highest number of steps. We added 1 to account for 0-based indexing.

#### `np.sum()` (Again)

We employed `np.sum()` once more, this time with a condition (`steps >= goal`), to count the number of days where you reached your daily step goal of 10,000 steps.

These Numpy functions make it easy to perform data analysis and calculations efficiently on 1D arrays. They are valuable tools for tasks involving numerical data, such as fitness tracking and beyond.


### Reshaping and 2D Arrays

Reshaping is like changing the way you arrange things to make them more manageable. In Numpy, we use this trick to turn a 1D array into a 2D one, which can be super handy.


In [1]:
import numpy as np

# Imagine you have a list of numbers
my_list = [1, 2, 3, 4, 5, 6]

# Let's convert it into a 2D array, like arranging them in rows and columns
reshaped_array = np.array(my_list).reshape(2, 3)

# Now you can easily pick any number from this 'grid' using row and column numbers
element = reshaped_array[1, 2]  # The second row, third column has our number

# You can reshape data to prepare it for specific tasks or analysis

In [3]:
reshaped_array

array([[1, 2, 3],
       [4, 5, 6]])

In [5]:
element

6

### 2D Arrays and Indexing

Working with 2D arrays opens up a world of possibilities for organizing and analyzing data. In this section, we'll delve into the concept of 2D arrays and learn how to index and slice them effectively.

In [6]:
# Create a 2D Numpy array
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Access elements in the 2D array using row and column indices
element = my_2d_array[1, 2]  # Access the element in the second row, third column


In [7]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [8]:
element

6

### Operations on 2D Arrays
Numpy provides a wide range of operations that you can perform on 2D arrays efficiently.

Now that we're comfortable with 2D arrays, let's dive into the exciting world of operations you can perform on them. From arithmetic operations to aggregating data, Numpy has got you covered.

In [9]:
# Create a 2D Numpy array
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Add 5 to all elements
result = my_2d_array + 5

# Calculate the sum of all elements
total_sum = np.sum(my_2d_array)

# Find the minimum and maximum values
min_value = np.min(my_2d_array)
max_value = np.max(my_2d_array)


In [10]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [11]:
result

array([[ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [12]:
total_sum

45

In [13]:
min_value

1

In [14]:
max_value

9

### Slicing

In [15]:
# Create a 2D Numpy array
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Access a single element using row and column indices
element = my_2d_array[1, 2]  # Access the element in the second row, third column

# Slice rows and columns to extract subsets of the array
row_slice = my_2d_array[0:2, :]  # Slice the first two rows
column_slice = my_2d_array[:, 1]  # Slice the second column


In [16]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [17]:
element

6

In [18]:
row_slice

array([[1, 2, 3],
       [4, 5, 6]])

In [19]:
column_slice

array([2, 5, 8])

### Why Do We Need the 'Axis' Parameter?

**Question**: Imagine you have a 2D array filled with numbers, like a grid of data. Now, let's say you want to do something special, like finding the sum of numbers in each row or the mean of numbers in each column. How can you tell your computer to do this?

In [20]:
# Create a 2D Numpy array
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculate the sum along rows (axis=1)
row_sum = np.sum(my_2d_array, axis=1)

# Calculate the mean along columns (axis=0)
column_mean = np.mean(my_2d_array, axis=0)


In [21]:
row_sum

array([ 6, 15, 24])

In [22]:
column_mean

array([4., 5., 6.])

When we work with 2D arrays, the 'axis' parameter helps us specify whether we want to perform operations along rows or columns. It's like giving directions to our computer on which way to go.

- **axis=1** means we want to move along rows, so we calculate the sum of numbers in each row.
- **axis=0** means we want to move along columns, so we calculate the mean of numbers in each column.

By using the 'axis' parameter, we can tell our computer exactly what we want to do with our 2D data, making it a powerful tool for data analysis and manipulation.

### Logical Operations on 2D Arrays

#### np.where()
**Explanation**: np.where() is a powerful function that helps you find the indices of elements in an array that satisfy a given condition. It returns the indices where the condition is met.

**Use Case**: It's commonly used to filter or locate specific elements or values within an array. For example, you can use it to find all values in an array greater than a certain threshold, identify outliers, or replace values based on conditions.

In [23]:
# Create a Numpy array
my_array = np.array([1, 2, 3, 4, 5])

# Find indices where values are greater than 3
indices = np.where(my_array > 3)


In [24]:
indices

(array([3, 4], dtype=int64),)

##### np.any()
**Explanation**: np.any() checks if any element in the array satisfies a given condition. It returns a boolean value (True or False).

**Use Case**: It's useful when you want to determine if at least one element in an array meets a certain condition. For instance, you can use it to check if there's any missing data in a dataset.

In [25]:
# Create a Numpy array
my_array = np.array([1, 2, 3, 4, 5])

# Check if any value is greater than 3
has_value_greater_than_3 = np.any(my_array > 3)


In [26]:
has_value_greater_than_3

True

##### np.all()
**Explanation**: np.all() checks if all elements in the array satisfy a given condition. It returns a boolean value (True or False).

**Use Case**: It's valuable when you want to ensure that all elements in an array meet a specific condition. For example, you can use it to verify if all students in a class passed a particular exam.

In [27]:
# Create a Numpy array
exam_scores = np.array([80, 85, 88, 90, 92])

# Check if all scores are greater than or equal to 80
all_scores_above_80 = np.all(exam_scores >= 80)


In [28]:
all_scores_above_80

True

### Sorting and Aggregating Data in 2D Arrays

In [29]:
# Create a 2D Numpy array representing exam scores
exam_scores = np.array([[80, 75, 90], [65, 92, 88], [78, 82, 87]])

# Sort the data along rows (axis=1)
sorted_scores = np.sort(exam_scores, axis=1)

# Calculate the mean score for each student (axis=1)
student_mean_scores = np.mean(exam_scores, axis=1)

# Calculate the maximum score in each exam (axis=0)
max_scores = np.max(exam_scores, axis=0)


In [30]:
exam_scores

array([[80, 75, 90],
       [65, 92, 88],
       [78, 82, 87]])

In [31]:
sorted_scores

array([[75, 80, 90],
       [65, 88, 92],
       [78, 82, 87]])

In [32]:
student_mean_scores

array([81.66666667, 81.66666667, 82.33333333])

In [33]:
max_scores

array([80, 92, 90])

### Splitting and Merging Arrays in Numpy

#### hsplit and vsplit
**Explanation**: Think of hsplit as cutting a 2D array into horizontal strips and vsplit as cutting it into vertical strips. These functions allow you to split your data along rows or columns.

**Use Cases**:

- hsplit: Useful for splitting data along columns. For instance, separating feature columns from target columns.
- vsplit: Handy for dividing data along rows. It can be used to split a dataset into training and testing sets.

In [35]:
# Create a 2D Numpy array
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

# Split into two horizontal parts (rows)
horizontal_split = np.hsplit(data, [1,2])

# Split into two vertical parts (columns)
vertical_split = np.vsplit(data, 3)


In [36]:
horizontal_split

[array([[1],
        [4],
        [7]]),
 array([[2],
        [5],
        [8]]),
 array([[3],
        [6],
        [9]])]

In [37]:
vertical_split

[array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]

#### hstack and vstack
**Explanation**: Think of hstack as placing two arrays side by side, and vstack as stacking them on top of each other. These functions allow you to combine arrays horizontally or vertically.

**Use Cases**:

- hstack: Useful when you have feature vectors as separate arrays and want to create a combined feature matrix.
- vstack: Valuable for appending new data or rows to an existing dataset.

In [38]:
# Create two Numpy arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Merge horizontally (side by side)
horizontal_merge = np.hstack((array1, array2))

# Merge vertically (stack on top of each other)
vertical_merge = np.vstack((array1, array2))


In [39]:
horizontal_merge

array([1, 2, 3, 4, 5, 6])

In [40]:
vertical_merge

array([[1, 2, 3],
       [4, 5, 6]])

### Broadcasting and Dimension Manipulation in Numpy

Broadcasting is a unique feature in Numpy that enables you to perform operations on arrays with different shapes. It automatically expands smaller arrays to match the shape of larger ones, making element-wise operations possible even when their shapes don't exactly match.

In [41]:
# Create a Numpy array and a scalar
my_array = np.array([1, 2, 3])
scalar = 5

# Broadcasting: Add the scalar to the array
result = my_array + scalar


In [42]:
result

array([6, 7, 8])

In this example, we have a Numpy array my_array and a scalar 5. Even though their shapes don't match, broadcasting allows us to add the scalar to each element of the array, resulting in array([6, 7, 8]).

In [43]:
# Create a 2D Numpy array and a 1D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
row = np.array([10, 20, 30])

# Broadcasting: Add the 1D array to each row of the 2D array
result = matrix + row

In this example, we have a 2D array matrix and a 1D array row. Broadcasting allows us to add the 1D array to each row of the 2D array

In [44]:
result

array([[11, 22, 33],
       [14, 25, 36]])

In [45]:
# Create a 2D Numpy array and a 1D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
column = np.array([[10], [20]])

# Broadcasting: Add the 1D array as a column to the 2D array
result = matrix + column

In [46]:
result

array([[11, 12, 13],
       [24, 25, 26]])

#### Simplified Rules for Broadcasting 
###### Shapes Must Match or Have One Dimension Equal to 1:

When performing operations with arrays, their shapes must either match or have one dimension equal to 1 for compatibility.
###### Element-Wise Comparison:

Broadcasting works by comparing and operating on corresponding elements of the arrays, element by element.
###### Result Adopts the Shape of the Largest Array:

The resulting array will have the shape of the largest input array.

### Shallow Copy vs. Deep Copy in Numpy

In [47]:
# Create a Numpy array
original_array = np.array([1, 2, 3])

# Make a shallow copy
shallow_copy = original_array.view()

# Modify an element in the original array
original_array[0] = 99

# Check if the change is reflected in the shallow copy
is_reflected = (shallow_copy[0] == 99)


In [48]:
original_array

array([99,  2,  3])

In [49]:
shallow_copy

array([99,  2,  3])

In [50]:
original_array

array([99,  2,  3])

In [51]:
is_reflected

True

In [52]:
# Create another Numpy array
original_array = np.array([4, 5, 6])

# Make a deep copy
deep_copy = original_array.copy()

# Modify an element in the original array
original_array[0] = 44

# Check if the change is reflected in the deep copy
is_reflected = (deep_copy[0] == 44)


In [53]:
original_array

array([44,  5,  6])

In [54]:
deep_copy

array([4, 5, 6])

In [55]:
original_array

array([44,  5,  6])

In [56]:
deep_copy

array([4, 5, 6])

#### Sales Data Analysis with Numpy

In this code snippet, we perform data analysis on sample sales data using the Numpy library in Python. The dataset consists of information such as the date of the transaction, customer ID, product ID, quantity sold, and total revenue.

##### Generating Sample Data

We start by generating sample sales data using Numpy's `np.array` function. Each row in the array represents a sales transaction, with columns for various data points. To handle mixed data types (such as strings and numbers), we specify the data type as 'object' using `dtype=object`.

##### Data Analysis

1. **Calculating Total Revenue**:
   We calculate the total revenue by summing the 'Total Revenue' column. To ensure numerical operations, we convert this column to the 'float' data type.

2. **Finding the Best-Selling Product**:
   We determine the best-selling product by identifying the row with the highest quantity sold. This involves converting the 'Quantity' column to the 'int' data type for numerical comparison.

3. **Identifying the Highest Sales Day**:
   We identify the day with the highest total revenue by locating the row with the maximum total revenue. Similar to total revenue, we convert the 'Total Revenue' column to 'float' for accurate comparison.

The code is designed to provide valuable insights into the sample sales data, demonstrating how Numpy can be used for data manipulation and analysis tasks.


In [58]:
import numpy as np

# Generate sample sales data (Date, Customer ID, Product ID, Quantity, Total Revenue)
sample_data = np.array([
    ["2023-01-01", 101, 1, 5, 250.0],
    ["2023-01-02", 102, 2, 3, 180.0],
    ["2023-01-02", 103, 1, 2, 100.0],
    ["2023-01-03", 104, 3, 4, 320.0],
    ["2023-01-03", 101, 2, 2, 120.0],
    ["2023-01-04", 102, 1, 3, 150.0]
], dtype=object)  # Use 'dtype=object' to allow mixed data types

# Calculate total revenue (sum of the 'Total Revenue' column)
total_revenue = np.sum(sample_data[:, 4].astype(float))  # Convert to float for numerical operations

# Find the best-selling product (product with the highest quantity sold)
best_selling_product_index = np.argmax(sample_data[:, 3].astype(int))  # Convert to int for numerical comparison
best_selling_product_id = sample_data[best_selling_product_index, 2]

# Identify the highest sales day (date with the maximum total revenue)
highest_sales_day_index = np.argmax(sample_data[:, 4].astype(float))  # Convert to float for numerical comparison
highest_sales_day = sample_data[highest_sales_day_index, 0]

# Print the results
print("Sample Sales Data:")
print(sample_data)

print("\nCalculating Total Revenue:")
print("Total Revenue:", total_revenue)

print("\nFinding the Best-Selling Product:")
print("Product with Highest Quantity Sold (Index):", best_selling_product_index)
print("Product ID of the Best-Selling Product:", best_selling_product_id)

print("\nIdentifying the Highest Sales Day:")
print("Index of the Day with Highest Total Revenue:", highest_sales_day_index)
print("Date of the Highest Sales Day:", highest_sales_day)


Sample Sales Data:
[['2023-01-01' 101 1 5 250.0]
 ['2023-01-02' 102 2 3 180.0]
 ['2023-01-02' 103 1 2 100.0]
 ['2023-01-03' 104 3 4 320.0]
 ['2023-01-03' 101 2 2 120.0]
 ['2023-01-04' 102 1 3 150.0]]

Calculating Total Revenue:
Total Revenue: 1120.0

Finding the Best-Selling Product:
Product with Highest Quantity Sold (Index): 0
Product ID of the Best-Selling Product: 1

Identifying the Highest Sales Day:
Index of the Day with Highest Total Revenue: 3
Date of the Highest Sales Day: 2023-01-03
