# Getting Started with NumPy
This notebook will help you get familiar with NumPy and its core functionalities. We will cover:
1. **Getting Familiar with NumPy**
2. **Data Manipulation**
3. **Data Aggregation**
4. **Data Analysis**
5. **Application in Data Science**
6. **Summary**

Let's start by installing NumPy (if you haven't already) and importing it.

In [7]:
# Install NumPy (uncomment the following line if NumPy is not installed)
# !pip install numpy

# Import NumPy
import numpy as np

# Verify the version
print("NumPy version:", np.__version__)

NumPy version: 2.1.0


## 1. Getting Familiar with NumPy
NumPy is a powerful library for numerical computations in Python. Here are some core functionalities:

### Array Creation
You can create NumPy arrays from Python lists or tuples using `np.array()`.

In [8]:
import numpy as np

# Create a NumPy array from a list
arr = np.array([1, 2, 3, 4, 5])
print(arr)

[1 2 3 4 5]


### Basic Operations
NumPy arrays support element-wise operations.

In [9]:

# Array addition
arr2 = arr + 10
print(arr2)

# Element-wise multiplication
arr3 = arr * 2
print(arr3)

[11 12 13 14 15]
[ 2  4  6  8 10]



### Array Properties
You can access the shape, size, and datatype of an array.

In [10]:
# Array properties
print("Shape:", arr.shape)
print("Size:", arr.size)
print("Datatype:", arr.dtype)

Shape: (5,)
Size: 5
Datatype: int64


In [11]:
# Verify the version
print("NumPy version:", np.__version__)

# Array creation
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Basic operations
arr2 = arr + 10
print("Array + 10:", arr2)
arr3 = arr * 2
print("Array * 2:", arr3)

# Array properties
print("Shape:", arr.shape)
print("Size:", arr.size)
print("Datatype:", arr.dtype)

NumPy version: 2.1.0
Array: [1 2 3 4 5]
Array + 10: [11 12 13 14 15]
Array * 2: [ 2  4  6  8 10]
Shape: (5,)
Size: 5
Datatype: int64


## 2. Data Manipulation
Let's dive into data manipulation using NumPy.

### Array Indexing and Slicing
You can access specific elements or slices of a NumPy array.

In [12]:
# Create an array
arr = np.array([10, 20, 30, 40, 50])

# Indexing
print("Element at index 2:", arr[2])

# Slicing
print("Elements from index 1 to 3:", arr[1:4])

Element at index 2: 30
Elements from index 1 to 3: [20 30 40]


### Reshaping Arrays
You can change the shape of an array without changing its data.

In [13]:
# Reshape to 2x3 array
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Reshaped array:", arr)

Reshaped array: [[1 2 3]
 [4 5 6]]



### Mathematical Operations
Apply mathematical functions to arrays.

In [14]:
# Element-wise square root
sqrt_arr = np.sqrt(arr)
print("Square root of array:", sqrt_arr)

# Sum of array elements
sum_arr = np.sum(arr)
print("Sum of elements:", sum_arr)

Square root of array: [[1.         1.41421356 1.73205081]
 [2.         2.23606798 2.44948974]]
Sum of elements: 21


In [15]:
# Array creation
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Basic operations
arr2 = arr + 10
print("Array + 10:", arr2)
arr3 = arr * 2
print("Array * 2:", arr3)

# Array properties
print("Shape:", arr.shape)
print("Size:", arr.size)
print("Datatype:", arr.dtype)


Array: [1 2 3 4 5]
Array + 10: [11 12 13 14 15]
Array * 2: [ 2  4  6  8 10]
Shape: (5,)
Size: 5
Datatype: int64


## 3. Data Aggregation
NumPy provides various functions to compute summary statistics.

### Summary Statistics
Compute mean, median, standard deviation, and sum.

In [16]:
# Create an array
arr = np.array([10, 20, 30, 40, 50])

# Mean
mean_val = np.mean(arr)
print("Mean:", mean_val)

# Median
median_val = np.median(arr)
print("Median:", median_val)

# Standard Deviation
std_dev = np.std(arr)
print("Standard Deviation:", std_dev)

# Sum
sum_val = np.sum(arr)
print("Sum:", sum_val)

Mean: 30.0
Median: 30.0
Standard Deviation: 14.142135623730951
Sum: 150


### Grouping Data
Perform aggregations on grouped data.

In [17]:
# Example with multiple arrays
categories = np.array(["Cars", "Films", "Games", "Fruits"])
values = np.array([50, 60, 70, 80])

# Sum of values
sum_by_category = {cat: val for cat, val in zip(categories, values)}

# Print the sum by category
print("Sum by category:", sum_by_category)

Sum by category: {np.str_('Cars'): np.int64(50), np.str_('Films'): np.int64(60), np.str_('Games'): np.int64(70), np.str_('Fruits'): np.int64(80)}


In [4]:
# Create an array
arr = np.array([10, 20, 30, 40, 50])

# Mean
mean_val = np.mean(arr)
print("Mean:", mean_val)

# Median
median_val = np.median(arr)
print("Median:", median_val)

# Standard Deviation
std_dev = np.std(arr)
print("Standard Deviation:", std_dev)

# Sum
sum_val = np.sum(arr)
print("Sum:", sum_val)

# Example with multiple arrays
categories = np.array(["Cars", "Films", "Games", "Fruits"])
values = np.array([50, 60, 70, 80])

# Sum of values
sum_by_category = {cat: val for cat, val in zip(categories, values)}
# Print the sum by category
print("Sum by category:", sum_by_category)


Mean: 30.0
Median: 30.0
Standard Deviation: 14.142135623730951
Sum: 150
Sum by category: {np.str_('Cars'): np.int64(50), np.str_('Films'): np.int64(60), np.str_('Games'): np.int64(70), np.str_('Fruits'): np.int64(80)}


## 4. Data Analysis
NumPy provides tools for more advanced data analysis, such as finding correlations and identifying outliers.

### Finding Correlations
Use NumPy to compute correlation coefficients between arrays.

In [18]:
# Create two arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Compute correlation coefficient
correlation = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient:", correlation)

Correlation coefficient: 0.9999999999999999


### Identifying Outliers
Identify outliers based on the standard deviation from the mean.

In [19]:
# Create an array with potential outliers
data = np.array([10, 12, 14, 18, 20, 100])

# Mean and standard deviation
mean_data = np.mean(data)
std_dev_data = np.std(data)

# Identify outliers
outliers = data[np.abs(data - mean_data) > 2 * std_dev_data]
print("Outliers:", outliers)

Outliers: [100]


### Calculating Percentiles
Compute percentiles to understand the distribution of data.

In [20]:
# Create an array
scores = np.array([55, 67, 78, 89, 90, 92])

# Calculate percentiles
percentiles_25 = np.percentile(scores, 25)
percentiles_75 = np.percentile(scores, 75)
print("25th percentile:", percentiles_25)
print("75th percentile:", percentiles_75)

25th percentile: 69.75
75th percentile: 89.75


In [21]:
# Create two arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Compute correlation coefficient
correlation = np.corrcoef(x, y)[0, 1]
print("Correlation coefficient:", correlation)

# Create an array with potential outliers
data = np.array([10, 12, 14, 18, 20, 100])

# Mean and standard deviation
mean_data = np.mean(data)
std_dev_data = np.std(data)

# Identify outliers
outliers = data[np.abs(data - mean_data) > 2 * std_dev_data]
print("Outliers:", outliers)

# Create an array
scores = np.array([55, 67, 78, 89, 90, 92])

# Calculate percentiles
percentiles_25 = np.percentile(scores, 25)
percentiles_75 = np.percentile(scores, 75)
print("25th percentile:", percentiles_25)
print("75th percentile:", percentiles_75)


Correlation coefficient: 0.9999999999999999
Outliers: [100]
25th percentile: 69.75
75th percentile: 89.75


## 5. Application in Data Science
NumPy is extensively used in data science for its efficient handling of large datasets and numerical computations.

### Advantages of NumPy
1. **Performance**: NumPy is optimized for performance, making operations on large arrays much faster than traditional Python lists.
2. **Convenience**: It provides a comprehensive suite of mathematical functions and operations.
3. **Integration**: NumPy arrays are compatible with many other libraries used in data science.

### Real-World Examples
#### Machine Learning
NumPy arrays are used for feature vectors, matrix operations, and statistical computations in machine learning models.

In [22]:
# Example of matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
product = np.dot(matrix_a, matrix_b)
print("Matrix product:", product)

Matrix product: [[19 22]
 [43 50]]


#### Financial Analysis
NumPy is used for financial calculations, such as risk assessments, and portfolio optimization.

In [23]:
# Example of calculating returns
prices = np.array([100, 105, 110, 115])
returns = np.diff(prices) / prices[:-1]
print("Returns:", returns)

Returns: [0.05       0.04761905 0.04545455]


#### Scientific Research
In scientific research, NumPy is used for data analysis, simulations, and modeling.

In [24]:
# Example of simulating random data
random_data = np.random.normal(loc=0, scale=1, size=1000)
print("Random data:", random_data[:10])


Random data: [ 0.58424332 -0.03322177 -1.00087097  2.12031501 -0.06670876  0.12240053
  0.03651409  0.1454606  -0.38390724 -0.26665251]


In [25]:
# Example of matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
product = np.dot(matrix_a, matrix_b)
print("Matrix product:", product)

# Example of calculating returns
prices = np.array([100, 105, 110, 115])
returns = np.diff(prices) / prices[:-1]
print("Returns:", returns)

# Example of simulating random data
random_data = np.random.normal(loc=0, scale=1, size=1000)
print("Random data:", random_data[:10])


Matrix product: [[19 22]
 [43 50]]
Returns: [0.05       0.04761905 0.04545455]
Random data: [ 1.58025794 -0.15385587 -0.4006139   0.90312774 -0.77717497 -0.15015452
  0.21561123 -0.60174144 -0.06650711  1.38677188]


## How NumPy is Used in Data Science

NumPy is a powerful library in Python that helps with various tasks in data science. Here’s how it is commonly used:

### 1. **Data Cleaning and Preparation**
   - **Handling Missing Data**: NumPy helps find and fill in missing values in your data.
   - **Feature Scaling**: It can adjust data to a common scale, which is important for many machine learning algorithms.

### 2. **Working with Matrices**
   - **Matrix Calculations**: NumPy allows you to perform calculations on matrices, like multiplying them or finding their properties.
   - **Data Transformations**: You can easily apply changes to data matrices, like rotating or scaling them.

### 3. **Simulations and Random Data**
   - **Random Data Generation**: NumPy can create random numbers and simulate different scenarios, which is useful for statistical analysis.
   - **Numerical Calculations**: It helps in performing complex calculations and solving equations.

### 4. **Summarizing Data**
   - **Descriptive Statistics**: You can quickly calculate average, median, and standard deviation to understand your data better.
   - **Aggregating Data**: NumPy makes it easy to summarize large amounts of data.

### 5. **Handling Large Datasets**
   - **Speed and Efficiency**: NumPy arrays are faster and use less memory compared to regular Python lists, making it easier to work with large datasets.
   - **Broadcasting**: You can perform operations on arrays of different sizes without using loops, which speeds up your code.

### 6. **Machine Learning**
   - **Data Preparation**: NumPy helps with organizing and preparing data for machine learning models.
   - **Model Training**: Many machine learning libraries use NumPy arrays for training and testing models.

### 7. **Scientific Research**
   - **Data Visualization**: While NumPy itself doesn’t create plots, it works well with libraries like Matplotlib to visualize data.
   - **Statistical Analysis**: You can use NumPy for analyzing data distributions and performing hypothesis tests.

### 8. **Financial Analysis**
   - **Analyzing Time Series**: NumPy helps analyze financial data over time, such as stock prices.
   - **Risk Management**: It can calculate financial risks and analyze investment performance.

### Summary
NumPy is essential for data science because it handles large amounts of data quickly, performs complex calculations, and integrates well with other tools. Whether you’re preparing data, running simulations, or analyzing results, NumPy is a key part of the data science process.