
#### **1. What is NumPy, and why is it widely used in Python?**  
NumPy (Numerical Python) is a powerful Python library used for numerical and scientific computing. It provides support for large multidimensional arrays and matrices, along with a collection of mathematical functions to operate on these data structures efficiently. NumPy is widely used due to its speed (achieved through its underlying C implementation), support for broadcasting, vectorized operations, and its ability to interface with other libraries like SciPy, Pandas, and machine learning tools.




#### **2. How does broadcasting work in NumPy?**  
Broadcasting allows NumPy to perform operations on arrays of different shapes by "stretching" the smaller array along dimensions to match the larger array's shape without making additional copies of data. For example, adding a scalar to a 2D array involves broadcasting the scalar value across all elements of the array.



#### **3. What is a Pandas DataFrame?**  
A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table and is one of the core data structures in Pandas for data manipulation and analysis.



#### **4. Explain the use of the groupby() method in Pandas.**  
The `groupby()` method is used for splitting data into groups based on some criteria, applying a function to each group independently, and combining the results. It is commonly used for aggregation, transformation, and filtration of data.



#### **5. Why is Seaborn preferred for statistical visualizations?**  
Seaborn is preferred for statistical visualizations because it provides a high-level interface for drawing attractive and informative statistical graphics. It integrates closely with Pandas and offers built-in themes, color palettes, and functions for common statistical plots like heatmaps, boxplots, violin plots, and more.



#### **6. What are the differences between NumPy arrays and Python lists?**  
 **Speed:** NumPy arrays are faster due to their optimized C implementation.

 **Data Types:** NumPy arrays have a fixed data type, while Python lists can contain elements of different types.

 **Memory Efficiency:** NumPy arrays use less memory compared to lists.

 **Functionalities:** NumPy arrays support advanced mathematical and matrix operations, which are not available for Python lists.



#### **7. What is a heatmap, and when should it be used?**  
A heatmap is a graphical representation of data where individual values are represented using a color scale. It is often used to visualize correlations, frequencies, or densities in datasets.



#### **8. What does the term “vectorized operation” mean in NumPy?**  
Vectorized operations in NumPy refer to performing element-wise computations on arrays without using explicit loops. These operations are faster and more efficient because they leverage NumPy's underlying C implementation.



#### **9. How does Matplotlib differ from Plotly?**  
 **Matplotlib:** A static plotting library that is versatile and highly customizable but lacks interactivity by default.

 **Plotly:** A library designed for creating interactive, web-based plots with ease. It includes built-in support for zooming, panning, and exporting plots.



#### **10. What is the significance of hierarchical indexing in Pandas?**  
Hierarchical indexing allows Pandas to handle data with multiple levels of indexing (e.g., multi-indexing). It is particularly useful for working with complex datasets like time series, enabling multi-dimensional data representation within a 2D DataFrame.



#### **11. What is the role of Seaborn’s pairplot() function?**  
The `pairplot()` function in Seaborn generates a grid of scatter plots for pairwise relationships in a dataset, along with histograms for individual variable distributions. It is ideal for visualizing patterns and correlations in multi-dimensional data.



#### **12. What is the purpose of the describe() function in Pandas?**  
The `describe()` function provides a statistical summary of numerical columns in a DataFrame, including count, mean, standard deviation, minimum, maximum, and quartiles.



#### **13. Why is handling missing data important in Pandas?**  
Handling missing data is essential to ensure data integrity and avoid biased or inaccurate analyses. Pandas provides methods to detect, fill, or drop missing values efficiently.



#### **14. What are the benefits of using Plotly for data visualization?**  
Plotly is highly interactive, supports 3D visualizations, and integrates well with web-based frameworks. It offers pre-built themes and interactive features like tooltips, zooming, and exporting visualizations.



#### **15. How does NumPy handle multidimensional arrays?**  
NumPy supports n-dimensional arrays, allowing for efficient storage and manipulation of multi-dimensional data. Operations like slicing, broadcasting, and reshaping work seamlessly with these arrays.



#### **16. What is the role of Bokeh in data visualization?**  
Bokeh is a Python library for creating interactive visualizations for web applications. It supports high-performance rendering, interactivity, and integration with modern web frameworks.



#### **17. Explain the difference between apply() and map() in Pandas.**  
 **apply():** Used to apply a function along an axis (rows or columns) of a DataFrame.  

 **map():** Used for element-wise transformations on a Pandas Series.



#### **18. What are some advanced features of NumPy?**  
- Broadcasting  
- Vectorized operations  
- Masked arrays  
- Linear algebra functions (e.g., `linalg`)  
- Random number generation  



#### **19. How does Pandas simplify time series analysis?**  
Pandas offers powerful time series tools like date range generation, resampling, rolling, and shifting. It also provides support for handling datetime objects, time zones, and time-based indexing.



#### **20. What is the role of a pivot table in Pandas?**  
A pivot table is used to summarize, aggregate, and reshape data within a DataFrame. It allows for generating reports by grouping data along rows and columns.



#### **21. Why is NumPy’s array slicing faster than Python’s list slicing?**  
NumPy arrays are stored in contiguous memory blocks and leverage optimized low-level operations, making slicing operations significantly faster compared to Python lists.



#### **22. What are some common use cases for Seaborn?**  
- Visualizing statistical data  
- Correlation analysis (e.g., heatmaps)  
- Comparing distributions (e.g., boxplots, violin plots)  
- Pairwise relationships (e.g., pairplots)  

# ***PRACTICAL QUESTIONS***

Q1. How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
import numpy as np

# Create a 2D NumPy array
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculate the sum of each row
row_sums = np.sum(array_2d, axis=1)
print("Row sums:", row_sums)


Q2. Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Column1': [10, 20, 30, 40], 'Column2': [15, 25, 35, 45]}
df = pd.DataFrame(data)

# Find the mean of a specific column
column_mean = df['Column1'].mean()
print("Mean of Column1:", column_mean)


Q3. Create a scatter plot using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]

# Create a scatter plot
plt.scatter(x, y, color='blue')
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()


Q4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np

# Create a DataFrame
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)

# Calculate the correlation matrix
correlation_matrix = df.corr()

# Visualize it with a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix Heatmap")
plt.show()


Q5. Generate a bar plot using Plotly.

In [None]:
import plotly.express as px

# Sample data
data = {'Categories': ['A', 'B', 'C'], 'Values': [10, 20, 30]}
fig = px.bar(data, x='Categories', y='Values', title="Bar Plot")
fig.show()


Q6. Create a DataFrame and add a new column based on an existing column.

In [None]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Numbers': [1, 2, 3, 4]})

# Add a new column based on an existing column
df['Squared'] = df['Numbers'] ** 2
print(df)


Q7. Write a program to perform element-wise multiplication of two NumPy arrays.

In [None]:
import numpy as np

# Create two NumPy arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Perform element-wise multiplication
result = array1 * array2
print("Element-wise multiplication:", result)


Q8. Create a line plot with multiple lines using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]

# Create a line plot
plt.plot(x, y1, label="Line 1", color="blue")
plt.plot(x, y2, label="Line 2", color="red")
plt.title("Line Plot with Multiple Lines")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()


Q9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

In [None]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Filter rows where the column value is greater than 25
filtered_df = df[df['Values'] > 25]
print(filtered_df)


Q10. Create a histogram using Seaborn to visualize a distribution.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

# Create a histogram
sns.histplot(data, bins=5, kde=True)
plt.title("Histogram")
plt.show()


Q11. Perform matrix multiplication using NumPy.

In [None]:
import numpy as np

# Define two matrices
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Perform matrix multiplication
result = np.dot(matrix1, matrix2)
print("Matrix Multiplication Result:\n", result)


Q12. Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
import pandas as pd

# Load a CSV file
df = pd.read_csv('sample.csv')

# Display the first 5 rows
print(df.head())


Q13. Create a 3D scatter plot using Plotly.

In [None]:
import plotly.express as px

# Sample data
data = {'X': [1, 2, 3], 'Y': [4, 5, 6], 'Z': [7, 8, 9]}
fig = px.scatter_3d(data, x='X', y='Y', z='Z', title="3D Scatter Plot")
fig.show()
