# Numpy for Data Science
Numpy(numerical python) is a module that provides:
- Data strauctures like multidimentional arrays.
- Mathematical functions for fast operations without writing loops.
- Faster operations as it is built on C API.

<br>
<br>
<br>

## Basics:

### Install Numpy:
To install numpy in your system, run the below command.

### Import numpy
Import numpy with an alias 'np'

In [1]:
import numpy as np

<br>
<br>
<br>

## Numpy Arrays
The numpy arrays are multidimrntional objects that allow mathematical operations on blocks of data without using loops, i.e, operations on arrays can be performed with similar syntaxes to the operations between scalar elelments.

### Creating Numpy 1D arrays:
Numpy arrays can be created in various ways:

In [2]:
# using the np.array() function:
data= np.array([1,2,3,4,5,6],dtype='int64')
data

array([1, 2, 3, 4, 5, 6], dtype=int64)

The np.array() function accepts any type of sequence object and creates a numpy array with its elements.

In [3]:
# using the np.arange() function:
data2= np.arange(1,10,2)
data2

array([1, 3, 5, 7, 9])

The np.arange() function accepts an interval of integers and returns an evenly spaced array of integers from the interval. The space can be specified by the 'step' parameter. The values in the array are in the open interval (start,stop).

In [4]:
# other options.
# np.zeros()
zero=np.zeros(5)
zero

array([0., 0., 0., 0., 0.])

In [5]:
# np.ones()
one=np.ones(8)
one

array([1., 1., 1., 1., 1., 1., 1., 1.])

### Creating Numpy Multidimentional arrays:
Numpy multidimentional arrays can be created using nested lists or by using the arr.reshape() function.

In [6]:
# Nested lists
data=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [7]:
# arr.rashape()
data2=np.arange(1,20,2).reshape((2,5))
data2

array([[ 1,  3,  5,  7,  9],
       [11, 13, 15, 17, 19]])

### Properties of an array

In [8]:
# dtype
print(data.dtype)

#shape
print(data.shape)

#ndim
print(data.ndim)

#size
print(data.size)

int32
(3, 4)
2
12


<br>
<br>
<br>

## Data manipulation using numpy

### Arithmetic operations on arrays 
Arithmetic operations on numpy arrays can be performed with the similar syantaxes fo scalar elements.

In [9]:
# on 1D arrays
arr1=np.array([2,4,6,8])
arr2=np.array([1,3,5,7])
print(arr1+arr2)

[ 3  7 11 15]


By simply using a '+' operator, addition is performed in between individual elements of both the arrays.

In [10]:
# on multidimentional arrays
arr1=np.array([[1,2,3,4],[5,6,7,8]])
arr2=np.array([[10,11,12,13],[14,15,16,17]])
print(arr1+arr2)

[[11 13 15 17]
 [19 21 23 25]]


In [11]:
# arrays of different dimentions
arr1=np.array([1,2,3,4])
arr2=np.array([[10,11,12,13],[20,21,22,23]])
print(arr1+arr2)

[[11 13 15 17]
 [21 23 25 27]]


When performing arithmetic between arrays of different dimentions, the smaller array to be ‘broadcast’ across the larger one, ensuring that they have compatible shapes for these operations. This ability of numpy arrays is called 'Broadcasting'.

### Array Indexing and slicing

In [12]:
arr= np.random.randint(1,10,20)
print(arr)

[2 3 6 6 9 2 6 1 7 9 4 5 7 3 5 3 8 5 9 2]


In [13]:
#indexing
print(arr[3]) #9
print(arr[5]) #7

# negative indexing
print(arr[-1]) #2
print(arr[-10]) #6

6
2
2
4


In [14]:
# Slicing
print(arr[0:]) # print the whole array
print(arr[5:10]) #print the array from index 5 to 9
print(arr[-20:]) # print the whole array using negative indexing
print(arr[-15:-10]) #print the array from index -10 to -15 (5 to 9)

[2 3 6 6 9 2 6 1 7 9 4 5 7 3 5 3 8 5 9 2]
[2 6 1 7 9]
[2 3 6 6 9 2 6 1 7 9 4 5 7 3 5 3 8 5 9 2]
[2 6 1 7 9]


<br>
<br>
<br>

## Data Aggregation using Numpy
Numpy provides various aggregate functions that allow you to perform computations across the entire array or along a specified axis. 

In [15]:
arr= np.random.randint(1,100,(5,4))
print(arr)

[[30  6 62 92]
 [54 84  9 75]
 [40 48 71  7]
 [ 6 80 49 16]
 [80 17 38 33]]


#### np.sum()

In [16]:
print(np.sum(arr))   # sum all the elements in the array 
print(np.sum(arr,axis=1))  # row wise sum
print(np.sum(arr,axis=0))  # column wise sum

897
[190 222 166 151 168]
[210 235 229 223]


#### np.mean()

In [17]:
print(np.mean(arr))   # mean of all the elements in the array 
print(np.mean(arr,axis=1))  # row wise mean
print(np.mean(arr,axis=0))  # column wise mean

44.85
[47.5  55.5  41.5  37.75 42.  ]
[42.  47.  45.8 44.6]


#### np.min()

In [18]:
print(np.min(arr))   # minimum of all the elements in the array 
print(np.min(arr,axis=1))  # row wise minimum
print(np.min(arr,axis=0))  # column wise minimum

6
[ 6  9  7  6 17]
[6 6 9 7]


#### np.max()

In [19]:
print(np.max(arr))   # maximum of all the elements in the array 
print(np.max(arr,axis=1))  # row wise maximum
print(np.max(arr,axis=0))  # column wise maximum

92
[92 84 71 80 80]
[80 84 71 92]


#### np.median()

In [20]:
print(np.median(arr))   # median of all the elements in the array 
print(np.median(arr,axis=1))  # row wise median
print(np.median(arr,axis=0))  # column wise median

44.0
[46.  64.5 44.  32.5 35.5]
[40. 48. 49. 33.]


<br>

Other available aggragate functions:
- **np.std()** : compute standard deviation
- **np.var()** : compute variance
- **np.percentile()**: compute percentile

<br>
<br>
<br>

## Data Analysis using Numpy

We can use NumPy to perform data analysis tasks such as finding correlations, identifying outliers, and calculating percentiles.
Consider a large arrays with about 100 elements.

In [21]:
arr1=np.random.uniform(1,1000,100)
arr2=np.random.uniform(1,1000,100)

#### Correlation Coefficient:

In [22]:
correlation = np.corrcoef(arr1, arr2)[0][1]
print("Correlation:", correlation)

Correlation: -0.0637195420782247


#### Detecting outliers:
Outliers are data points that are significantly different from the rest of the data. They can be identified using standard deviation or interquartile range (IQR) methods. Here, we'll use the standard deviation approach. Identifying outliers in arr1 (values more than 1 standard deviations from the mean)

In [23]:
mean = np.mean(arr1)
std_dev = np.std(arr1)
outliers = arr1[np.abs(arr1 - mean) > 1 * std_dev]
print("Number of outliers in arr1:", len(outliers))

Number of outliers in arr1: 44


#### Calculate Percentiles:
Percentiles are used to understand the distribution of data. The nth percentile is the value below which n percent of the data falls.

In [24]:
percentile_25 = np.percentile(arr1, 25)
percentile_50 = np.percentile(arr1, 50)  # Median
percentile_90 = np.percentile(arr1, 90)

print("25th percentile:", percentile_25)
print("50th percentile (median):", percentile_50)
print("90th percentile:", percentile_90)

25th percentile: 202.92492100216782
50th percentile (median): 535.7601694466144
90th percentile: 860.5243969686566


<br>
<br>
<br>

## Applications of Numpy in Data Science:

### Advantages of Numpy:

- **Speed**: Compared to traditional python Data structures, Numpy arrays are a lot faster especially for larger datasets. This is because, numpy module is built using the C API which is faster than python.
  A graph showing the difference between the spped of numpy and list vs the input size is shown below:
  ![image.png](attachment:e40bf7a7-bdfb-42a9-bd24-fd6a9ff3777f.png)

- **Vectorized Computations**: Instead of writing loops to apply operations to each element of an array, NumPy allows you to perform these operations on entire arrays at once.
- **View**: View is a Numpy feature that allow the acessing and manipulating a part of an array without copying the data itself.
- **Statistical Computations**: Numpy provides various builtin statistical functions that allow easy analysis and computation.
- **Broadcasting**: While performing operations between two arrays of different shapes, the smaller array is distributed across the larger one making them compatable for the operation.

### Applications in Real World:

- **Image Processing**: Digital images are represented as multi-dimensional Numpy arrays. Python’s imageio library can be used to read an image file into a Numpy array. Once in this format, Numpy’s array manipulation capabilities can be used to process and manipulate the image.
- **Machine Learning**: Most of the mathematical equations can be represented in linear algebra and in matrixes. Fitting curves, finding best fit parameters, dimensionality reduction, robotic motion, protein dynamics .. all can be phrased as matrix operations.
- **Financial Analysis**: Most of the mathematical equations can be represented in linear algebra and in matrixes. Fitting curves, finding best fit parameters, dimensionality reduction, robotic motion, protein dynamics .. all can be phrased as matrix operations.
  