**NUMPY TUTORIAL**
# NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier transform, and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

NumPy stands for Numerical Python.

# use of numpy
* In Python we have lists that serve the purpose of arrays, but they are slow to process.

* NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

* The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.

* Arrays are very frequently used in data science, where speed and resources are very important.

**creating NUMPY arrays**

In [1]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [3]:
# Create a 0-D array with
arr = np.array(42)

print(arr)


42


In [4]:
#Create a 1-D array containing the values
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

[1 2 3 4 5]


In [5]:
#Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

[[1 2 3]
 [4 5 6]]


In [6]:
#Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:

import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


In [7]:
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


**NumPy Array Indexing**

In [8]:
#Array Indexing
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])

1


**NumPy Array slicing**

In [9]:
#arrays Slicing
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])

[2 3 4 5]


**Numpy Array Reshaping**

In [10]:
#Reshaping arrays
#Reshape From 1-D to 2-D
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)


[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [None]:
#Reshaping arrays
#Reshape From 1-D to 3-D
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)

**Mathematical Functions On Array**

In [11]:
#creating mathematical operations
#sum of 2 arrays
import numpy as np
arr1=np.array([[1,2,3],[4,5,6]])
arr2=np.array([[11, 22, 33], [44, 55, 66]])
arr3=arr1+arr2;
print("the sum of arrays is\n" ,arr3)

the sum of arrays is
 [[12 24 36]
 [48 60 72]]


In [None]:
#SUBTRATION of 2 arrays
import numpy as np
arr1=np.array([[1,2,3],[4,5,6]])
arr2=np.array([[11, 22, 33], [44, 55, 66]])
arr3=arr1-arr2;
print("the SUB of arrays is\n" ,arr3)

In [12]:
#multiplication of 2 arrays
import numpy as np
arr1=np.array([[1,2,3],[4,5,6]])
arr2=np.array([[11, 22, 33], [44, 55, 66]])
arr3=arr1*arr2;
print("the product of arrays is\n" ,arr3)

the product of arrays is
 [[ 11  44  99]
 [176 275 396]]


In [13]:
arr=np.add(10,20)
print(arr)

30


**Statistical Function**

In [14]:
#Statistics functions-mean\median
import numpy as np
a = np.array([111, 222,333,444, 555])
print("mean of a is ",np.mean(a))
print("median of a is ",np.median(a))

mean of a is  333.0
median of a is  333.0


In [15]:
#Statistics functions-standard devitation\variance
import numpy as np
a = np.array([111, 222,333, 444, 555])
print("standard devitation of a is ",np.std(a))
print("variance of a is ",np.var(a))

standard devitation of a is  156.97770542341354
variance of a is  24642.0


In [19]:
# numpy.sum() method
import numpy as np

# 1D array
arr = [20, 2, .2, 10, 4]

print("\nSum of arr : ", np.sum(arr))

print("Sum of arr(uint8) : ", np.sum(arr, dtype = np.uint8))
print("Sum of arr(float32) : ", np.sum(arr, dtype = np.float32))

print ("\nIs np.sum(arr).dtype == np.uint : ",
    np.sum(arr).dtype == np.uint)

print ("\nIs np.sum(arr).dtype == np.uint8 : ",
    np.sum(arr).dtype == np.uint8)



Sum of arr :  36.2
Sum of arr(uint8) :  36
Sum of arr(float32) :  36.2

Is np.sum(arr).dtype == np.uint :  False

Is np.sum(arr).dtype == np.uint8 :  False


**Data Analysis Using Numpy**

In [22]:
# Simulating data for analysis
data = np.random.randn(1000)  # 1000 data points from a standard normal distribution

# Finding correlations (e.g., between two random datasets)
data2 = np.random.randn(1000)
correlation = np.corrcoef(data, data2)[0, 1]

# Identifying outliers (values > 3 standard deviations from the mean)
outliers = data[np.abs(data) > 3]

# Calculating percentiles
percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)
percentile_75 = np.percentile(data, 75)

print("Correlation between data1 and data2:", correlation)
print("Outliers in data:", outliers)
print("25th Percentile:", percentile_25)
print("50th Percentile (Median):", percentile_50)
print("75th Percentile:", percentile_75)


Correlation between data1 and data2: 0.011348615586600047
Outliers in data: [-3.55684042 -3.64329445  3.25560783  3.0433858   3.16297832  3.12487337
  3.03068065]
25th Percentile: -0.6767889600241951
50th Percentile (Median): -0.016940314322712084
75th Percentile: 0.6390771010287094


### Conclusion: The Role of NumPy in Data Science

In the realm of data science, NumPy plays a crucial role due to its ability to efficiently handle and manipulate large datasets, a common requirement in various data-driven fields. By leveraging the power of NumPy, data science professionals can achieve significant improvements in both the performance and scalability of their numerical computations.

### **Advantages of Using NumPy Over Traditional Python Data Structures:**

1. **Performance and Speed:**
   - NumPy is implemented in C and takes advantage of highly optimized algorithms, which makes it significantly faster than traditional Python lists or loops when performing operations on large datasets. This speed advantage is critical when working with big data, as it allows for rapid data processing and analysis.

2. **Memory Efficiency:**
   - NumPy arrays are more memory-efficient than Python lists. This efficiency comes from NumPy’s use of a contiguous block of memory to store array elements, allowing for efficient reading, writing, and manipulation. For data science professionals dealing with massive datasets, this reduction in memory usage can make a substantial difference.

3. **Vectorization and Broadcasting:**
   - NumPy enables vectorized operations, meaning operations can be applied to entire arrays without the need for explicit loops. This leads to more concise and readable code. Additionally, NumPy's broadcasting feature allows operations on arrays of different shapes, enhancing flexibility and simplifying the code.

4. **Comprehensive Mathematical Functions:**
   - NumPy provides a vast array of built-in mathematical functions, such as statistical operations, linear algebra, and random number generation. These functions are highly optimized for performance and allow data scientists to perform complex mathematical calculations with ease and precision.

5. **Integration with Other Libraries:**
   - NumPy serves as the foundation for many other powerful data science libraries, such as pandas, SciPy, scikit-learn, and TensorFlow. This integration makes NumPy indispensable in the data science ecosystem, as it allows seamless transitions between different tools and libraries used in the data analysis pipeline.

6. **Ease of Data Manipulation and Analysis:**
   - With NumPy, data manipulation tasks like reshaping, indexing, slicing, and aggregating data become straightforward and efficient. This is especially important in data science, where data preparation and preprocessing are crucial steps before any analysis or modeling can be performed.

### **Conclusion:**

The use of NumPy in data science is not just beneficial; it is often essential. Its advantages over traditional Python data structures in terms of performance, memory efficiency, and ease of use make it a preferred tool for numerical computations. Whether performing basic data manipulation, conducting statistical analysis, or building complex machine learning models, NumPy provides the robust and efficient infrastructure needed to handle the vast quantities of data that modern data science demands.

### Real-World Examples of NumPy's Crucial Capabilities

1. **Machine Learning:**
   - **Data Preprocessing:** NumPy is essential for normalizing data, creating training/test splits, and scaling features. It also handles image data efficiently in deep learning tasks.
   - **Linear Algebra:** Key machine learning algorithms, such as linear regression and neural networks, rely on NumPy for fast matrix operations and other linear algebra tasks.

2. **Financial Analysis:**
   - **Time Series Analysis:** NumPy is used to compute rolling statistics, moving averages, and other time-dependent calculations, crucial for forecasting and trend analysis.
   - **Portfolio Optimization:** NumPy's mathematical functions help in optimizing asset allocation by calculating covariance, returns, and risk metrics efficiently.

3. **Scientific Research:**
   - **Physics and Chemistry:** NumPy handles large datasets from experiments or simulations, enabling efficient analysis of atomic interactions and other physical phenomena.
   - **Genomics:** In bioinformatics, NumPy is used for managing and analyzing large-scale DNA sequences, aiding in genetic research and evolutionary studies.

### Conclusion:
NumPy's ability to efficiently process large datasets and perform complex calculations makes it indispensable in machine learning, financial analysis, and scientific research, among other fields.