# 1. What is NumPy, and why is it widely used in Python?
- NumPy (Numerical Python) is a library for numerical computing. It's fast and efficient for array operations and supports multi-dimensional arrays, broadcasting, and many mathematical functions.

# 2. How does broadcasting work in NumPy?
- Broadcasting allows NumPy to perform operations on arrays of different shapes by automatically expanding their dimensions to match.

# 3. What is a Pandas DataFrame?
- A DataFrame is a 2D labeled data structure in Pandas, similar to a table in SQL or Excel, with columns of potentially different types.

# 4. Explain the use of the groupby() method in Pandas.
- groupby() is used to split the data into groups, apply a function (like sum, mean), and combine the results.

# 5. Why is Seaborn preferred for statistical visualizations?
- Seaborn is built on Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics.

# 6. What are the differences between NumPy arrays and Python lists?

| **Feature**                | **NumPy Array**                                               | **Python List**                                   |
|---------------------------|---------------------------------------------------------------|--------------------------------------------------|
| **Data Type**             | Homogeneous (all elements are of same type)                   | Heterogeneous (can hold different data types)    |
| **Performance**           | Faster due to optimized C backend                             | Slower due to interpreted nature                 |
| **Memory Efficiency**     | Uses less memory                                              | More memory usage                                |
| **Operations**            | Supports vectorized operations                                | Needs loops for element-wise operations          |
| **Functionality**         | Has built-in mathematical and statistical functions           | Lacks advanced mathematical support              |
| **Multidimensional Support** | Native support for multidimensional arrays                | Limited support (via nested lists)               |
| **Broadcasting**          | Supported                                                     | Not supported                                    |
| **Indexing and Slicing**  | Advanced slicing and masking                                  | Basic slicing only                               |


# 7. What is a heatmap, and when should it be used?
- A heatmap is a data visualization showing values in a matrix as colors. Use it to show correlations or intensities.

# 8. What does “vectorized operation” mean in NumPy?
- It refers to operations applied element-wise without explicit loops, making them faster and more efficient.

# 9. How does Matplotlib differ from Plotly?

- Matplotlib: static, 2D plotting library.

- Plotly: interactive, browser-based plots with 3D and dynamic capabilities.

# 10. What is the significance of hierarchical indexing in Pandas?
- Hierarchical indexing allows you to work with multiple index levels (multi-index) in rows and/or columns, enabling more complex data representations.

# 11. What is the role of Seaborn’s pairplot() function?
- pairplot() creates a matrix of scatter plots to show relationships between all pairs of features in a dataset, useful for exploratory data analysis.

# 12. What is the purpose of the describe() function in Pandas?
- It provides summary statistics of numerical columns, including count, mean, std, min, max, and quartiles.

# 13. Why is handling missing data important in Pandas?
- Missing data can lead to incorrect results. Pandas provides functions to detect, remove, or fill missing values to ensure accurate analysis.

# 14. What are the benefits of using Plotly for data visualization?

- Interactivity (zoom, hover)

- High-quality visuals

- Support for 3D plots and dashboards

- Easy integration with web apps (e.g., Dash)

# 15. How does NumPy handle multidimensional arrays?
- NumPy uses the ndarray structure to handle arrays of any shape or dimension efficiently, supporting operations like slicing, reshaping, and broadcasting.

# 16. What is the role of Bokeh in data visualization?
- Bokeh is used to create interactive and scalable visualizations for web browsers, useful for dashboards and large streaming data.

# 17. Explain the difference between apply() and map() in Pandas.

- map() is used on Series for element-wise operations.

- apply() is used on Series or DataFrames to apply functions along an axis (rows or columns).

# 18. What are some advanced features of NumPy?

- Broadcasting

- Universal functions (ufuncs)

- Structured arrays

- Memory-mapped files

- Integration with C/C++

# 19. How does Pandas simplify time series analysis?
- Pandas offers built-in time/date indexing, resampling, shifting, frequency conversion, and rolling window statistics.

# 20. What is the role of a pivot table in Pandas?
- A pivot table summarizes data, reshaping it based on categories and applying aggregation functions like mean or sum.

# 21. Why is NumPy’s array slicing faster than Python’s list slicing?
- NumPy arrays are stored in contiguous memory blocks and use optimized C backend, whereas Python lists are pointers to objects.

# 22. What are some common use cases for Seaborn?

- Visualizing distributions (histplot, kdeplot)

- Correlation matrices (heatmap)

- Category comparison (boxplot, violinplot)

- Pairwise relationships (pairplot, lmplot)

In [None]:
# 1. How do you create a 2D NumPy array and calculate the sum of each row?
import numpy as np

array_2d = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(array_2d, axis=1)
print("2D Array:\n", array_2d)
print("Sum of each row:", row_sums)


In [None]:
# 2. Write a Pandas script to find the mean of a specific column in a DataFrame
import pandas as pd

data = {'Math': [90, 80, 70, 85]}
df = pd.DataFrame(data)
mean_value = df['Math'].mean()
print("Mean of Math column:", mean_value)


In [None]:
# 3. Create a scatter plot using Matplotlib
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()


In [None]:
# 4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
import seaborn as sns

df = pd.DataFrame({'A': [1, 2, 3], 'B': [5, 6, 7], 'C': [8, 9, 10]})
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


In [None]:
# 5. Generate a bar plot using Plotly
import plotly.express as px

df = pd.DataFrame({'Fruit': ['Apples', 'Bananas', 'Cherries'], 'Count': [10, 15, 7]})
fig = px.bar(df, x='Fruit', y='Count', title="Fruit Count")
fig.show()


In [None]:
# 6. Create a DataFrame and add a new column based on an existing column
df = pd.DataFrame({'Scores': [70, 80, 90]})
df['Double'] = df['Scores'] * 2
print(df)


In [None]:
# 7. Write a program to perform element-wise multiplication of two NumPy arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print("Element-wise multiplication:", result)


In [None]:
# 8. Create a line plot with multiple lines using Matplotlib
x = [1, 2, 3, 4]
y1 = [10, 20, 25, 30]
y2 = [15, 18, 22, 27]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.title("Multiple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()


In [None]:
# 9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold
df = pd.DataFrame({'Value': [10, 20, 5, 30]})
filtered_df = df[df['Value'] > 15]
print("Filtered DataFrame:\n", filtered_df)


In [None]:
# 10. Create a histogram using Seaborn to visualize a distribution
sns.histplot(data=[10, 20, 20, 30, 30, 40, 50], kde=True)
plt.title("Distribution Histogram")
plt.show()


In [None]:
# 11. Perform matrix multiplication using NumPy
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[2, 0], [1, 2]])
result = np.dot(matrix1, matrix2)
print("Matrix multiplication result:\n", result)


In [None]:
# 12. Use Pandas to load a CSV file and display its first 5 rows
# Assuming 'data.csv' is in your Colab environment
df = pd.read_csv('data.csv')
print(df.head())


In [None]:
# 13. Create a 3D scatter plot using Plotly
import plotly.graph_objects as go

fig = go.Figure(data=[go.Scatter3d(
    x=[1, 2, 3], y=[4, 5, 6], z=[7, 8, 9],
    mode='markers',
    marker=dict(size=5, color='blue')
)])
fig.update_layout(title='3D Scatter Plot')
fig.show()
