#### Numpy
NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

**Key Features:**

**N-dimensional Array (ndarray):**
The core feature of NumPy is the ndarray object, which is a fast, flexible container for large datasets in Python. It allows you to perform operations on large datasets quickly and efficiently.

**Mathematical Functions:**
NumPy offers a wide range of mathematical functions to perform operations such as statistical calculations, linear algebra, Fourier transforms, and more.

**Broadcasting:**
NumPy supports broadcasting, which allows you to perform arithmetic operations on arrays of different shapes without needing to manually reshape them.
Integration with C/C++ and Fortran:

You can integrate NumPy with code written in C, C++, or Fortran to improve performance or reuse existing numerical code.

**Array Indexing:**
NumPy provides powerful tools for slicing, indexing, and iterating over arrays, which makes data manipulation and analysis straightforward and efficient.

**What is the difference between a NumPy array and a Python list?**

Solution:

Speed: NumPy arrays are faster and more efficient than Python lists.

Functionality: NumPy arrays support a wide range of mathematical operations, which are not available with Python lists.

Memory: NumPy arrays consume less memory compared to Python lists.

Homogeneity: NumPy arrays are homogeneous, meaning all elements are of the same data type, whereas Python lists can contain elements of different data types.


In [3]:
import numpy as np
import pandas as  pd

In [35]:
# From a list
array_from_list = np.array([1, 2, 3, 4, 5])
print(array_from_list)

# From a tuple
array_from_tuple = np.array((1, 2, 3, 4, 5))
print(array_from_tuple)

# Using arange
array_with_arange = np.arange(0, 10, 2)
print(array_with_arange)

# Using zeros
array_of_zeros = np.zeros((3, 3))
print(array_of_zeros)

# Using ones
array_of_ones = np.ones((2, 4))
print(array_of_ones)


[1 2 3 4 5]
[1 2 3 4 5]
[0 2 4 6 8]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [27]:
arr1=np.arange(0,2)
arr2=np.arange(6,10).reshape(2,-1)
arr3=np.array([[0,1],[1,0]])
print(arr2)
print(arr1)

[[6 7]
 [8 9]]
[0 1]


In [28]:
#Q: Add arr1 and arr2
arr1+arr2


array([[ 6,  8],
       [ 8, 10]])

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when

they are equal, or
one of them is 1.

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes.

In [32]:
#Q2: How can you concatinate  arr2 and arr3 (horizontally&vertically)?
np.concatenate((arr2,arr3),axis=1)
# np.concatenate((arr2,arr3),axis=0)

array([[6, 7, 0, 1],
       [8, 9, 1, 0]])

In [34]:
#Q3.Find the index of the top 2 elements of [4,1,5,7,0]
np.argsort([4,1,5,7,0])[-2:]

array([2, 3], dtype=int64)

In [37]:
#Q.How do you reshape a NumPy array?
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a)

# Reshape 3x3 to 1x9
reshaped_array = a.reshape(1, 9)
print(reshaped_array)



[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3 4 5 6 7 8 9]]


In [38]:
#Q.How do you find the maximum and minimum values in a NumPy array?

a = np.array([1, 2, 3, 4, 5])

# Maximum value
max_value = a.max()
print(max_value)  # Output: 5

# Minimum value
min_value = a.min()
print(min_value)  # Output: 1


5
1


In [39]:
#Q.How do you use NumPy for linear algebra operations?
import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Matrix multiplication
result = np.dot(a, b)
print(result)
# Output:
# [[19 22]
#  [43 50]]

# Transpose of a matrix
transpose = np.transpose(a)
print(transpose)
# Output:
# [[1 3]
#  [2 4]]

# Inverse of a matrix
inverse = np.linalg.inv(a)
print(inverse)
# Output:
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(a)
print(eigenvalues)
# Output: [ 5.37228132 -0.37228132]
print(eigenvectors)
# Output:
# [[-0.82456484 -0.41597356]
#  [-0.56576746  0.90937671]]


[[19 22]
 [43 50]]
[[1 3]
 [2 4]]
[[-2.   1. ]
 [ 1.5 -0.5]]
[-0.37228132  5.37228132]
[[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


In [40]:
#Q.How do you handle missing data in a NumPy array?
import numpy as np

# Create an array with NaN values
a = np.array([1, 2, np.nan, 4, 5, np.nan])

# Find the indices of NaN values
nan_indices = np.isnan(a)
print(nan_indices)  # Output: [False False  True False False  True]

# Replace NaN values with a specific value, e.g., 0
a[nan_indices] = 0
print(a)  # Output: [1. 2. 0. 4. 5. 0.]


[False False  True False False  True]
[1. 2. 0. 4. 5. 0.]


#### Pandas

What is pandas?

Pandas is an open-source Python library used for data manipulation and analysis.
It provides data structures like DataFrames and Series, which are powerful tools for handling and analyzing data.



**What are the primary data structures in pandas?**

The primary data structures in pandas are:

Series: A one-dimensional labeled array capable of holding any data type.

DataFrame: A two-dimensional labeled data structure with columns of potentially different data types

In [42]:
#How do you create a DataFrame from a dictionary?

data = {'column1': [1, 2, 3], 'column2': [4, 5, 6]}
df = pd.DataFrame(data)

In [45]:
#How can you read a CSV file into a pandas DataFrame?
#df = pd.read_csv('file_path.csv')


In [47]:
#How do you select a specific column from a DataFrame?

#column_data = df['column_name']

#How do you filter rows in a DataFrame based on a condition?

#filtered_df = df[df['column_name'] > value]

In [48]:
#How do you handle missing values in a DataFrame?
df.dropna(inplace=True)
df.fillna(value, inplace=True)
#How can you remove duplicate rows in a DataFrame?
df.drop_duplicates(inplace=True)


In [51]:
#How do you group data in a DataFrame and compute aggregate statistics?
#grouped_df = df.groupby('column_name').agg({'column_to_aggregate': 'mean'})

#What is the difference between apply and map functions in pandas?

#apply is used to apply a function along an axis (rows or columns) of the DataFrame.
#map is used to apply a function element-wise on a Series.


Advanced Operations
How do you merge two DataFrames?

python
Copy code

merged_df = pd.merge(df1, df2, on='common_column')

In [52]:
#What is a pivot table in pandas and how do you create one?
#pivot_table = df.pivot_table(values='value_column', index='index_column', columns='column_column', aggfunc='mean')



Explain the concat function and how it differs from merge.

concat is used to concatenate DataFrames along a particular axis (rows or columns).

merge is used for SQL-style joins of DataFrames based on common columns or indices.


How would you handle a large dataset that doesn't fit into memory?

Use the chunksize parameter in read_csv to load the data in chunks.

Use libraries like Dask that extend pandas functionalities for larger-than-memory computations.

How do you check for the data types of columns in a DataFrame?

python
Copy code

df.dtypes

What are some best practices when working with pandas?

Understand the structure of your data.

Avoid using loops for row-wise operations.

Make use of the built-in functions and methods for efficient data manipulation.

Clean and preprocess your data before analysis.
