1. What is NumPy, and why is it widely used in Python?

NumPy (Numerical Python) is a library for high-performance numerical computing in Python. It provides support for multidimensional arrays and optimized mathematical functions.
✔ Fast (written in C under the hood)
✔ Memory-efficient compared to Python lists
✔ Basis for other libraries (Pandas, SciPy, Scikit-learn, etc.)

2. How does broadcasting work in NumPy?

Broadcasting allows NumPy to perform operations on arrays of different shapes by automatically expanding dimensions where possible.
Example:

a = np.array([1,2,3])
b = 2
a + b  # [3,4,5]


Here, b is “broadcasted” to match the shape of a.

3. What is a Pandas DataFrame?

A DataFrame is a 2D tabular data structure in Pandas with labeled rows and columns, similar to an Excel sheet or SQL table. It allows for efficient data manipulation and analysis.

4. Explain the use of the groupby() method in Pandas.

groupby() is used for splitting data into groups based on some criteria, applying functions (like mean, sum), and combining results.
Example:

df.groupby('Category')['Sales'].sum()

5. Why is Seaborn preferred for statistical visualizations?

Seaborn is built on Matplotlib but:
✔ Provides high-level APIs for common statistical plots
✔ Has built-in support for Pandas DataFrames
✔ Provides attractive default styles and color palettes
✔ Simplifies plots like boxplots, violin plots, and heatmaps

6. What are the differences between NumPy arrays and Python lists?

Speed: NumPy arrays are faster (C-optimized).

Memory: Arrays use less memory.

Functionality: Arrays support vectorized operations; lists do not.

Homogeneity: Arrays require all elements to be of the same type; lists can store mixed types.

7. What is a heatmap, and when should it be used?

A heatmap is a data visualization where values are represented as colors in a matrix.
✔ Useful for correlation matrices, missing value analysis, or any 2D data where intensity matters.

8. What does the term “vectorized operation” mean in NumPy?

Vectorized operations mean applying a function to an entire array without explicit loops.
Example:

a = np.array([1,2,3])
a * 2   # [2,4,6]


This is faster than looping over elements.

9. How does Matplotlib differ from Plotly?

Matplotlib: Static, publication-quality 2D plots; customizable but more code-heavy.

Plotly: Interactive, browser-based, supports zoom/hover, better for dashboards.

10. What is the significance of hierarchical indexing in Pandas?

Hierarchical (multi-level) indexing allows multiple levels of row/column labels.
✔ Enables working with higher-dimensional data in 2D form.
✔ Useful for grouped/aggregated data.

11. What is the role of Seaborn’s pairplot() function?

pairplot() creates scatterplots and histograms for pairwise relationships among numeric variables.
✔ Great for exploring correlations in datasets.

12. What is the purpose of the describe() function in Pandas?

describe() generates summary statistics (count, mean, std, min, quartiles, max) for numeric (or categorical with include='all') columns.

13. Why is handling missing data important in Pandas?

Missing data can bias results, reduce accuracy, or break computations. Pandas provides tools like dropna(), fillna(), and interpolation to handle them.

14. What are the benefits of using Plotly for data visualization?

✔ Interactive visualizations
✔ Built-in dashboard support
✔ Easy integration with web apps (Dash)
✔ Handles large datasets and 3D plots

15. How does NumPy handle multidimensional arrays?

NumPy supports n-dimensional arrays (ndarray). Each array has:

Shape (dimensions)

Strides (steps in memory)

Data type (dtype)
This allows for efficient mathematical operations across multiple axes.

16. What is the role of Bokeh in data visualization?

Bokeh is a Python library for interactive, web-ready visualizations.
✔ Produces plots as HTML/Javascript
✔ Better suited for dashboards and web embedding compared to Matplotlib.

17. Explain the difference between apply() and map() in Pandas.

map(): Works element-wise on a Series.

apply(): Works on rows or columns of a DataFrame or on a Series.
Example:

df['col'].map(lambda x: x*2)
df.apply(np.sum, axis=0)  # sum per column

18. What are some advanced features of NumPy?

Linear algebra (np.linalg)

FFTs (np.fft)

Random sampling (np.random)

Memory mapping for big data

Broadcasting rules

Masked arrays

19. How does Pandas simplify time series analysis?

✔ Built-in datetime support
✔ Resampling and frequency conversion (resample())
✔ Rolling window functions (rolling())
✔ Time zone handling
✔ Easy indexing with DatetimeIndex

20. What is the role of a pivot table in Pandas?

pivot_table() reshapes data for summarization.
✔ Similar to Excel pivot tables
✔ Allows grouping, aggregation, and comparison across multiple dimensions

21. Why is NumPy’s array slicing faster than Python’s list slicing?

Arrays are stored in contiguous memory blocks, so slicing just creates a view (no data copy).

Lists store references to objects (scattered in memory), making slicing slower.

22. What are some common use cases for Seaborn?

Exploring distributions (histplot, kdeplot)

Comparing categories (boxplot, barplot, violinplot)

Correlation analysis (heatmap, pairplot)

Regression analysis (regplot, lmplot)

Multi-variable visualizations with style and ease

In [None]:
1. Create a 2D NumPy array and calculate the sum of each row
import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

row_sums = arr.sum(axis=1)
print("Row sums:", row_sums)

In [None]:
2. Write a Pandas script to find the mean of a specific column in a DataFrame
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

mean_age = df['Age'].mean()
print("Mean Age:", mean_age)

In [None]:
3. Create a scatter plot using Matplotlib
import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)

plt.scatter(x, y, color='blue', alpha=0.7)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot")
plt.show()

In [None]:
4. Calculate the correlation matrix using Seaborn and visualize with a heatmap
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 4, 5, 6]
})

corr = df.corr()
sns.heatmap(corr, annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

In [None]:
5. Generate a bar plot using Plotly
import plotly.express as px

data = {'Fruits': ['Apple', 'Banana', 'Cherry'],
        'Quantity': [10, 20, 15]}

fig = px.bar(data, x='Fruits', y='Quantity', title="Fruit Quantities")
fig.show()

In [None]:
6. Create a DataFrame and add a new column based on an existing column
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Salary': [5000, 6000, 7000]
})

df['Bonus'] = df['Salary'] * 0.1
print(df)

In [None]:
7. Perform element-wise multiplication of two NumPy arrays
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = a * b
print("Element-wise multiplication:", result)

In [None]:
8. Create a line plot with multiple lines using Matplotlib
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label="sin(x)")
plt.plot(x, y2, label="cos(x)")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Multiple Line Plot")
plt.legend()
plt.show()

In [None]:
9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 40, 92]
})

filtered_df = df[df['Score'] > 50]
print(filtered_df)

In [None]:
10. Create a histogram using Seaborn to visualize a distribution
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

data = np.random.randn(1000)  # Normal distribution
sns.histplot(data, bins=30, kde=True)
plt.title("Histogram with KDE")
plt.show()

In [None]:
11. Perform matrix multiplication using NumPy
import numpy as np

A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

C = np.dot(A, B)
print("Matrix multiplication:\n", C)

In [None]:
12. Use Pandas to load a CSV file and display its first 5 rows
import pandas as pd

df = pd.read_csv("data.csv")  # replace with your filename
print(df.head())

In [None]:
13. Create a 3D scatter plot using Plotly
import plotly.express as px
import numpy as np

df = pd.DataFrame({
    'x': np.random.rand(50),
    'y': np.random.rand(50),
    'z': np.random.rand(50),
    'color': np.random.rand(50)
})

fig = px.scatter_3d(df, x='x', y='y', z='z', color='color', size='color')
fig.show()