1. What is NumPy, and why is it widely used in Python?
- NumPy is a core scientific computing library for Python. It provides fast, memory-efficient multidimensional arrays and mathematical functions, making numerical computations much faster than native Python lists.

2. How does broadcasting work in NumPy?
-  Broadcasting lets NumPy perform operations on arrays of different shapes by automatically expanding smaller arrays to match larger ones (without copying data).
- Example: adding a 1D array to each row of a 2D array.

3. What is a Pandas DataFrame?
-  Pandas DataFrame is a 2-dimensional labeled data structure (rows + columns), similar to an Excel sheet or SQL table. It’s used to store, clean, and analyze structured data.

4. Explain the use of the groupby() method in Pandas.
- groupby() splits data into groups based on column values and applies aggregate functions like sum(), mean(), or count().
- Example: Average salary per department.

5. Why is Seaborn preferred for statistical visualizations?
-  Seaborn provides beautiful default styles and high-level functions for statistical plots like distributions, box plots, and heatmaps. It works seamlessly with Pandas DataFrames and Matplotlib.

6. What are the differences between NumPy arrays and Python lists?
-  | Feature   | NumPy Array    | Python List         |
| --------- | -------------- | ------------------- |
| Speed     | Much faster    | Slower              |
| Data type | Same type only | Mixed types allowed |
| Memory    | Efficient      | Less efficient      |
| Math ops  | Vectorized     | Needs loops         |

7. What is a heatmap, and when should it be used?
- A heatmap displays data values using colors.
- Use it to:
- Show correlations
- Compare values across categories
- Visualize matrices (e.g., confusion matrix)

8. What does the term “vectorized operation” mean in NumPy?
- It means performing operations on entire arrays at once instead of looping element-by-element.
- Faster
- Cleaner code
- Uses low-level C optimizations

9. How does Matplotlib differ from Plotly?
-  Matplotlib → static plots, highly customizable, great for reports
- Plotly → interactive plots (zoom, hover), great for dashboards & web apps

10. What is the significance of hierarchical indexing in Pandas?
-  Hierarchical indexing (MultiIndex) lets you store higher-dimensional data in a 2D DataFrame.
- Example: data grouped by year → month → product.

11. What is the role of Seaborn’s pairplot() function?
- pairplot() creates scatterplots for every pair of numerical columns and histograms for distributions.
Great for:
- Finding relationships
- Quick exploratory data analysis (EDA)

12. What is the purpose of the describe() function in Pandas?
- describe() provides summary statistics:
- count
- mean
- std
- min/max
- quartiles
- It’s a fast way to understand your dataset.

13. Why is handling missing data important in Pandas?
- Missing values can:
- Break models
- Skew statistics
- Cause errors
- Pandas provides fillna(), dropna(), and interpolation to clean data properly.

14. What are the benefits of using Plotly for data visualization?
- Plotly offers:
- Interactive charts
- Web-friendly visuals
- Dashboards
- 3D plotting
- Hover tooltips

15. How does NumPy handle multidimensional arrays?
- NumPy supports N-dimensional arrays (ndarray).
- Example:
1D → vector
2D → matrix
3D → images / tensors
- All operations are optimized for speed

16. What is the role of Bokeh in data visualization?
- Bokeh is used for creating interactive, browser-based visualizations. It’s great for building web dashboards and real-time streaming plots.

17. Explain the difference between apply() and map() in Pandas.
- | Function  | Works on           | Use case                           |
| --------- | ------------------ | ---------------------------------- |
| `map()`   | Series only        | Element-wise transformation        |
| `apply()` | Series & DataFrame | Row-wise or column-wise operations |

18. What are some advanced features of NumPy?
- Broadcasting
- Vectorization
- Linear algebra (linalg)
- Random number generation
- Memory views & slicing
- FFT (Fast Fourier Transform)

19. How does Pandas simplify time series analysis?
- Pandas supports:
- DateTime indexing
- Resampling (daily → monthly)
- Rolling windows
- Time-based slicing
- Time zone handling
- Perfect for financial & sensor data.

20. What is the role of a pivot table in Pandas?
- A pivot table reshapes data to summarize values.
- Example: Total sales by region and product.

21. Why is NumPy’s array slicing faster than Python’s list slicing?
- NumPy arrays are:
- Stored in contiguous memory
- Implemented in C
- Optimized for vectorized operations
- So slicing avoids Python-level loops → much faster

22. What are some common use cases for Seaborn?
- Seaborn is commonly used for:
- Correlation heatmaps
- Distribution plots
- Box plots & violin plots
- Pairwise relationships
- Statistical comparisons

In [None]:
#1 How do you create a 2D NumPy array and calculate the sum of each row?
import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

row_sums = arr.sum(axis=1)
print(row_sums)

In [None]:
#2 Write a Pandas script to find the mean of a specific column in a DataFrame.
import pandas as pd

df = pd.DataFrame({
    "age": [20, 22, 24, 26],
    "score": [80, 85, 90, 95]
})

mean_score = df["score"].mean()
print(mean_score)

In [None]:
#3 Create a scatter plot using Matplotlib.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 15, 8, 12]

plt.scatter(x, y)
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Scatter Plot")
plt.show()

In [None]:
#4 How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({
    "math": [80, 85, 90, 95],
    "science": [78, 82, 88, 92],
    "english": [70, 75, 85, 90]
})

corr = df.corr()
sns.heatmap(corr, annot=True)
plt.show()

In [None]:
#5 Generate a bar plot using Plotly.
import plotly.express as px

data = {
    "city": ["A", "B", "C"],
    "population": [100, 150, 120]
}

fig = px.bar(data, x="city", y="population", title="City Population")
fig.show()

In [None]:
#6 Create a DataFrame and add a new column based on an existing column.
import pandas as pd

df = pd.DataFrame({
    "price": [100, 200, 300]
})

df["price_with_tax"] = df["price"] * 1.18
print(df)

In [None]:
#7 Write a program to perform element-wise multiplication of two NumPy arrays.
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = a * b
print(result)

In [None]:
#8 Create a line plot with multiple lines using Matplotlib.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y1 = [10, 20, 30, 40]
y2 = [15, 25, 35, 45]

plt.plot(x, y1, label="Line 1")
plt.plot(x, y2, label="Line 2")
plt.legend()
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Multiple Lines")
plt.show()

In [None]:
#9 Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.
import pandas as pd

df = pd.DataFrame({
    "name": ["A", "B", "C", "D"],
    "marks": [50, 75, 90, 40]
})

filtered_df = df[df["marks"] > 60]
print(filtered_df)

In [None]:
#10 Create a histogram using Seaborn to visualize a distribution.
import seaborn as sns
import matplotlib.pyplot as plt

data = [10, 20, 20, 30, 30, 30, 40, 50]

sns.histplot(data, bins=5)
plt.title("Distribution")
plt.show()

In [None]:
#11 Perform matrix multiplication using NumPy.
import numpy as np

A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

result = A @ B   # or np.dot(A, B)
print(result)

In [None]:
#12 Use Pandas to load a CSV file and display its first 5 rows.
import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())

In [None]:
#13 Create a 3D scatter plot using Plotly.
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    "x": [1, 2, 3, 4],
    "y": [10, 20, 30, 40],
    "z": [5, 15, 25, 35]
})

fig = px.scatter_3d(df, x="x", y="y", z="z", title="3D Scatter Plot")
fig.show()