<a href="https://colab.research.google.com/github/VyshnaviVunnamatla/CarSales-Dashboard/blob/main/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# GETTING FAMILIAR WITH NUMPY


In [None]:
import numpy as np

**Creating arrays**

In [None]:
# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])

# Creating a 2D array (matrix)
array_2d = np.array([[1, 2, 3], [4, 5, 6]])

**Basic operations**

In [None]:
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])

# Basic Operations
add_arrays = array_a + array_b
multiply_arrays = array_a * array_b
square_array = np.square(array_a)

print("Addition of Arrays is: ",add_arrays)
print("Multiplication of arrays is: ",multiply_arrays)
print("square of a array a is: ",square_array)

Addition of Arrays is:  [5 7 9]
Multiplication of arrays is:  [ 4 10 18]
square of a array a is:  [1 4 9]


**Array Properties**

In [None]:
print("Shape of array:", array_2d.shape)
print("Size of array:", array_2d.size)
print("Data type of elements:", array_2d.dtype)
print("Number of dimensions:", array_2d.ndim)
print("Number of bytes:", array_2d.nbytes)
print("Item size:", array_2d.itemsize)

Shape of array: (2, 3)
Size of array: 6
Data type of elements: int64
Number of dimensions: 2
Number of bytes: 48
Item size: 8


# DATA MANIPULATION


**Indexing and Slicing**

In [None]:
# Indexing in 1D array
first_element = array_1d[0]
print(first_element)

# Slicing in 1D array
sub_array = array_1d[1:4]
print(sub_array)

# Indexing in 2D array
element_2d = array_2d[1, 2]
print(element_2d)

# Slicing in 2D array
sub_array_2d = array_2d[:, 1:3]
print(sub_array_2d)

1
[2 3 4]
6
[[2 3]
 [5 6]]


**Reshaping**

In [None]:
reshaped_array = array_2d.reshape(3, 2)  # Reshape 2x3 array to 3x2
print(reshaped_array)

flattened_array = array_2d.flatten()     # Convert 2D array to 1D
print(flattened_array)

[[1 2]
 [3 4]
 [5 6]]
[1 2 3 4 5 6]


**Mathematical operations**

In [None]:
sin_array = np.sin(array_1d)  # Apply sine function element-wise
print(sin_array)

log_array = np.log(array_1d + 1)  # Apply natural logarithm (shifted to avoid log(0))
print(log_array)

[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
[0.69314718 1.09861229 1.38629436 1.60943791 1.79175947]


# DATA AGGREGATION


**Appliying statistics**


In [None]:
data_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

mean_value = np.mean(data_array)  # Mean of array
print("Mean of the array is: ",mean_value)

median_value = np.median(data_array)  # Median of array
print("Median of the array is: ",median_value)

std_dev_value = np.std(data_array)  # Standard deviation
print("Standard deviation of the array is: ",std_dev_value)

sum_value = np.sum(data_array)  # Sum of all elements
print("Sum of the array is: ",sum_value)


Mean of the array is:  5.0
Median of the array is:  5.0
Standard deviation of the array is:  2.581988897471611
Sum of the array is:  45


**Grouping Data**

In [None]:
even_numbers = data_array[data_array % 2 == 0]
print("Even numbers in the array are: ",even_numbers)

odd_numbers = data_array[data_array % 2 != 0]
print("Odd numbers in the array are: ",odd_numbers)

Even numbers in the array are:  [2 4 6 8]
Odd numbers in the array are:  [1 3 5 7 9]


**Splitting**

In [None]:
split_array = np.split(data_array, 3)
print("Splitted array is: ",split_array)

Splitted array is:  [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]


# DATA ANALYSIS

**Correlation**

In [None]:
array_x = np.array([1, 2, 3, 4, 5])
array_y = np.array([10, 20, 30, 40, 50])

correlation_matrix = np.corrcoef(array_x, array_y)  # Correlation coefficient matrix
print("Correlation coefficient matrix:", correlation_matrix)

Correlation coefficient matrix: [[1. 1.]
 [1. 1.]]


**Outlier detection**

In [None]:
#Identify data points that are significantly different from others

mean_value = np.mean(data_array)
std_dev_value = np.std(data_array)

outliers = data_array[np.abs(data_array - mean_value) > std_dev_value]
print("Outliers in the array are: ",outliers)


Outliers in the array are:  [1 2 8 9]


**Percentiles**

In [None]:
percentile_25 = np.percentile(data_array, 25)
percentile_50 = np.percentile(data_array, 50)
percentile_75 = np.percentile(data_array, 75)

print("25th percentile:", percentile_25)
print("50th percentile (median):", percentile_50)
print("75th percentile:", percentile_75)

25th percentile: 3.0
50th percentile (median): 5.0
75th percentile: 7.0


## APPLICATION IN DATA SCIENCE

In the program created, NumPy plays a crucial role in simplifying data manipulation, aggregation, and analysis tasks. Here’s how it benefits data science professionals:

**Efficiency and Speed:**

NumPy performs operations on entire arrays at once, eliminating the need for explicit loops in Python. This leads to faster computations, which is critical when working with large datasets.

**Memory Efficiency:**

NumPy arrays consume less memory compared to Python lists, which store elements as objects.

Data science often involves working with large datasets, and NumPy's efficient memory usage makes it possible to process these datasets without consuming excessive memory.

NumPy provides a wide range of mathematical, statistical, and linear algebra functions that simplify complex data analysis. This includes functions for computing correlations, identifying outliers, and calculating percentiles, all of which are common tasks in data science.

Many data science libraries like Pandas, scikit-learn, and TensorFlow are built on top of NumPy.

**Real world examples where NumPy is used**

In real-world applications, NumPy's performance is crucial. Whether it’s speeding up data preparation in

machine learning pipelines

optimizing portfolios in finance

or running simulations in scientific research

NumPy provides the necessary tools for handling large datasets and performing complex calculations efficiently.