# Basics of Python libraries for EDA

### Numpy

NumPy is a powerful Python library for numerical computing that provides efficient handling of large arrays and matrices. It forms the foundation for many other scientific computing libraries in Python. 

1. NumPy stands for Numerical Python and is widely used for numerical computations in Python programming.
2. It provides a multidimensional array object called ndarray, which is fast, flexible, and memory-efficient.
3. NumPy arrays can be created using the array() function, and elements can be accessed using indexing and slicing operations.
4. It offers a wide range of mathematical functions, including basic arithmetic operations, trigonometric functions, logarithms, and more.
5. NumPy supports broadcasting, which allows for efficient element-wise operations on arrays of different shapes.
6. The library includes powerful linear algebra functions, such as matrix multiplication, eigenvalue calculation, and matrix inversion.
7. NumPy provides tools for reading and writing array data to and from disk, making it easy to work with large datasets.
8. It offers advanced indexing and slicing techniques, such as boolean indexing and fancy indexing, enabling efficient data manipulation.
9. NumPy seamlessly integrates with other scientific computing libraries, such as SciPy, Matplotlib, and Pandas.
10. It is extensively used in fields like data analysis, machine learning, image processing, and scientific research due to its efficiency and versatility.

In [2]:
import numpy as np

Creating a NumPy array:

In [3]:
arr = np.array([1, 2, 3, 4, 5])
print(arr)


[1 2 3 4 5]


Accessing array elements:

In [4]:
print(arr[0])  
print(arr[2:4])  


1
[3 4]


Performing arithmetic operations on arrays:

In [5]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

addition = arr1 + arr2
print(addition)  
multiplication = arr1 * arr2
print(multiplication)  


[5 7 9]
[ 4 10 18]


NumPy functions for mathematical operations:

In [6]:
arr = np.array([1, 2, 3, 4, 5])

print(np.sum(arr))
print(np.mean(arr))  
print(np.max(arr))
print(np.min(arr))

15
3.0
5
1


Reshaping arrays:

In [7]:
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)

[[1 2 3]
 [4 5 6]]


Transposing arrays:

In [8]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
transposed_arr = np.transpose(arr)
print(transposed_arr)

[[1 4]
 [2 5]
 [3 6]]


Matrix multiplication:

In [9]:
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

result = np.dot(matrix1, matrix2)
print(result)


[[19 22]
 [43 50]]


Generating random numbers:

In [10]:
random_array = np.random.rand(3, 3)
print(random_array)


[[0.22669038 0.83770507 0.72176037]
 [0.4035912  0.58394948 0.76101382]
 [0.44996824 0.75152236 0.81029029]]


Saving and loading arrays:

In [11]:
arr = np.array([1, 2, 3, 4, 5])
np.save('array.npy', arr)  # Saving the array to a file

loaded_arr = np.load('array.npy')  # Loading the array from the file
print(loaded_arr)


[1 2 3 4 5]


Creating an array with a specified data type:

In [12]:
arr = np.array([1, 2, 3], dtype=float)
print(arr.dtype)  


float64


Reshaping arrays using the -1 parameter:

In [13]:
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, -1)  # -1 automatically calculates the appropriate number of columns
print(reshaped_arr)


[[1 2 3]
 [4 5 6]]


Generating an array with a sequence of numbers:

In [14]:
sequence = np.arange(1, 11)  
print(sequence)

[ 1  2  3  4  5  6  7  8  9 10]


Creating an array of zeros and ones:

In [15]:
zeros = np.zeros((2, 3))  
print(zeros)

ones = np.ones((3, 2)) 
print(ones)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]


Finding unique elements in an array:

In [16]:
arr = np.array([1, 2, 3, 1, 2, 4, 5, 4])
unique_elements = np.unique(arr)
print(unique_elements)


[1 2 3 4 5]


Applying mathematical functions to arrays:

In [17]:
arr = np.array([1, 2, 3])
print(np.exp(arr))  
print(np.sin(arr))  
print(np.sqrt(arr))  

[ 2.71828183  7.3890561  20.08553692]
[0.84147098 0.90929743 0.14112001]
[1.         1.41421356 1.73205081]


Concatenating arrays:

In [18]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

concatenated_arr = np.concatenate((arr1, arr2))
print(concatenated_arr)

[1 2 3 4 5 6]


Sorting arrays:

In [19]:
arr = np.array([3, 1, 5, 2, 4])
sorted_arr = np.sort(arr)
print(sorted_arr)


[1 2 3 4 5]


Indexing and slicing in multi-dimensional arrays:

In [20]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr[0, 1])  # Accessing an element
print(arr[:, 1])  # Accessing a column
print(arr[1:3, :2])  # Slicing rows and columns


2
[2 5 8]
[[4 5]
 [7 8]]


Broadcasting operations:

In [21]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 2

result = arr + scalar  # Broadcasting the scalar to each element of the array
print(result)


[[3 4 5]
 [6 7 8]]


Finding the index of the maximum and minimum values in an array:

In [22]:
arr = np.array([3, 1, 5, 2, 4])
max_index = np.argmax(arr)
min_index = np.argmin(arr)
print(max_index)  # Output: 2
print(min_index)  # Output: 1


2
1


Element-wise comparison and boolean masking:

In [23]:
arr = np.array([1, 2, 3, 4, 5])
mask = arr > 3  # Boolean mask indicating which elements are greater than 3
print(mask)  
filtered_arr = arr[mask]  # Applying the boolean mask to filter the array
print(filtered_arr)  


[False False False  True  True]
[4 5]


Performing matrix operations:

In [24]:
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

elementwise_product = matrix1 * matrix2  # Element-wise multiplication
matrix_product = np.matmul(matrix1, matrix2)  # Matrix multiplication
print(elementwise_product)
print(matrix_product)


[[ 5 12]
 [21 32]]
[[19 22]
 [43 50]]


Working with random numbers and distributions:

In [25]:
random_number = np.random.random()  # Generate a random number between 0 and 1
print(random_number)

random_array = np.random.rand(3, 3)  # Generate a 3x3 array of random numbers between 0 and 1
print(random_array)

normal_distribution = np.random.normal(loc=0, scale=1, size=(3, 3))  # Generate an array from a normal distribution
print(normal_distribution)


0.20378319307869053
[[0.85184288 0.56512478 0.22101503]
 [0.42970917 0.75815806 0.73725569]
 [0.61181372 0.62096953 0.55305078]]
[[-0.12384667 -0.49418301  0.43992517]
 [ 2.20203934  1.50467383 -1.01219948]
 [-0.04959203  0.72294942 -1.40310427]]


Reshaping arrays using the reshape() function:

In [26]:
arr = np.arange(1, 10)  
reshaped_arr = arr.reshape(3, 3)
print(reshaped_arr)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Performing statistical calculations on arrays:

In [27]:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr)
median = np.median(arr)
std_dev = np.std(arr)
print(mean)  
print(median)  
print(std_dev)  

3.0
3.0
1.4142135623730951


Applying functions along a specific axis:

In [28]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
sum_along_rows = np.sum(arr, axis=0)  # Sum of each column
sum_along_columns = np.sum(arr, axis=1)  # Sum of each row
print(sum_along_rows)
print(sum_along_columns)

[5 7 9]
[ 6 15]


Stacking arrays vertically or horizontally:

In [29]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
vertical_stack = np.vstack((arr1, arr2))  # Stack arrays vertically
horizontal_stack = np.hstack((arr1, arr2))  # Stack arrays horizontally
print(vertical_stack)
print(horizontal_stack)


[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]


Applying element-wise functions to arrays:

In [30]:
arr = np.array([1, 2, 3])
square_fn = np.vectorize(lambda x: x**2)
result = square_fn(arr)
result


array([1, 4, 9])

# Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures and functions to efficiently work with structured data.

1. Pandas is built on top of NumPy and introduces two main data structures: Series and DataFrame.
2. A Series is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a database table.
3. A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It represents a tabular data structure, similar to a spreadsheet or a SQL table.
4. Pandas provides functions to read data from various file formats such as CSV, Excel, SQL databases, and more.
5. It allows for easy indexing, slicing, and filtering of data using intuitive syntax and powerful querying capabilities.
6. Pandas supports handling missing data through techniques like data imputation or dropping missing values.
7. It provides functionality for data alignment, merging, and joining of multiple datasets based on common columns or indices.
8. Pandas offers a wide range of statistical and mathematical functions for data aggregation, summarization, and analysis.
9. Visualization of data is made easy with Pandas, as it integrates well with other libraries like Matplotlib and Seaborn.
10. Pandas is widely used in data analysis, data preprocessing, feature engineering, and exploratory data analysis tasks, making it an essential tool in the data science ecosystem.

Pandas simplifies working with structured data, making it a popular choice for data manipulation and analysis tasks in Python.