## Data Toolkit — Questions & Answers
This notebook contains concise theory answers and practical code cells with outputs for each programming question.

Run all cells in Google Colab to see outputs for the code cells.

### Q: What is NumPy, and why is it widely used in Python
**Answer:** NumPy is a library for numerical computing that provides the ndarray for efficient array operations. It's widely used because it enables fast, memory-efficient computations and vectorized operations.

### Q: How does broadcasting work in NumPy
**Answer:** Broadcasting lets NumPy perform arithmetic between arrays of different shapes by virtually expanding the smaller array along missing dimensions without copying data.

### Q: What is a Pandas DataFrame
**Answer:** A DataFrame is a 2-dimensional labeled data structure in Pandas (rows and columns), similar to a table or spreadsheet.

### Q: Explain the use of the groupby() method in Pandas
**Answer:** groupby() splits data into groups based on column(s), allowing aggregate functions (mean, sum, count) to be applied to each group.

### Q: Why is Seaborn preferred for statistical visualizations
**Answer:** Seaborn provides high-level functions for attractive, informative statistical plots with built-in themes and easier handling of data frames.

### Q: What are the differences between NumPy arrays and Python lists
**Answer:** NumPy arrays are homogeneous, stored in contiguous memory, and support vectorized operations (faster). Lists are heterogeneous and slower for large numeric computations.

### Q: What is a heatmap, and when should it be used
**Answer:** A heatmap is a colored matrix visualization used to show magnitude or correlations across two dimensions—commonly used for correlation matrices and feature importance.

### Q: What does the term “vectorized operation” mean in NumPy
**Answer:** Vectorized operations apply computations to whole arrays at once (C-level loops), avoiding explicit Python loops for speed.

### Q: How does Matplotlib differ from Plotly
**Answer:** Matplotlib is a static plotting library (good for simple, publication plots). Plotly creates interactive, web-ready visualizations with zoom and hover tools.

### Q: What is the significance of hierarchical indexing in Pandas
**Answer:** Hierarchical (MultiIndex) indexing allows multiple index levels on rows or columns, making it easier to represent higher-dimensional data in 2D structures.

### Q: What is the role of Seaborn’s pairplot() function
**Answer:** pairplot() plots pairwise relationships in a dataset (scatterplots for variable pairs and distributions on the diagonal) to explore relationships.

### Q: What is the purpose of the describe() function in Pandas
**Answer:** describe() returns summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for numeric columns.

### Q: Why is handling missing data important in Pandas
**Answer:** Missing data can bias results or cause errors; handling (drop, fill, interpolate) ensures correct analyses and model inputs.

### Q: What are the benefits of using Plotly for data visualization
**Answer:** Plotly offers interactive, publishable web plots, easy sharing, and rich charts (3D, maps) that users can interact with.

### Q: How does NumPy handle multidimensional arrays
**Answer:** NumPy uses ndarray with a shape tuple and contiguous memory layout to represent multidimensional arrays efficiently.

### Q: What is the role of Bokeh in data visualization
**Answer:** Bokeh creates interactive, browser-based visualizations suitable for dashboards and streaming plots.

### Q: Explain the difference between apply() and map() in Pandas
**Answer:** map() is for element-wise operations on a Series. apply() can be used on Series or DataFrame to apply a function row/column-wise and is more general.

### Q: What are some advanced features of NumPy
**Answer:** Advanced features include broadcasting rules, advanced indexing, linear algebra routines, FFT, random sampling, and memory views.

### Q: How does Pandas simplify time series analysis
**Answer:** Pandas provides datetime types, resampling, rolling/window functions, shifting, and frequency conversion for easy time series processing.

### Q: What is the role of a pivot table in Pandas
**Answer:** pivot_table() summarizes data by grouping and aggregating values into a new table; it's useful for cross-tabulations.

### Q: Why is NumPy’s array slicing faster than Python’s list slicing
**Answer:** NumPy slicing returns views on contiguous memory without type checks, while list slicing constructs new Python objects element-by-element.

### Q: What are some common use cases for Seaborn?
**Answer:** Seaborn is commonly used for visualizing distributions, categorical data, correlation heatmaps, regression plots, and pairwise relationships.

### Practical: Create a 2D NumPy array and calculate the sum of each row

In [None]:
import numpy as np
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
print('Array:\n', arr)
print('Row sums:', arr.sum(axis=1))

### Practical: Write a Pandas script to find the mean of a specific column in a DataFrame

In [None]:
import pandas as pd
data = {'A':[10,20,30], 'B':[5,15,25]}
df = pd.DataFrame(data)
print('DataFrame:\n', df)
print('\nMean of column A:', df['A'].mean())

### Practical: Create a scatter plot using Matplotlib

In [None]:
import matplotlib.pyplot as plt
x = [1,2,3,4]
y = [10,20,25,30]
plt.figure()
plt.scatter(x,y)
plt.title('Scatter Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

### Practical: How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = pd.DataFrame({'A':[1,2,3,4], 'B':[4,5,6,7], 'C':[7,8,9,10]})
print('Data:\n', data)
corr = data.corr()
print('\nCorrelation matrix:\n', corr)
plt.figure()
sns.heatmap(corr, annot=True)
plt.title('Correlation heatmap')
plt.show()

### Practical: Generate a bar plot using Plotly

In [None]:
import plotly.express as px
import pandas as pd
data = pd.DataFrame({'Name':['A','B','C'], 'Value':[10,20,15]})
fig = px.bar(data, x='Name', y='Value', title='Bar Plot')
fig.show()

### Practical: Create a DataFrame and add a new column based on an existing column

In [None]:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3], 'B':[10,20,30]})
df['C'] = df['B'] * 2
print(df)

### Practical: Write a program to perform element-wise multiplication of two NumPy arrays

In [None]:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5,6])
print('Array a:', a)
print('Array b:', b)
print('Element-wise multiplication:', a * b)

### Practical: Create a line plot with multiple lines using Matplotlib

In [None]:
import matplotlib.pyplot as plt
x = [1,2,3,4]
y1 = [1,4,9,16]
y2 = [2,4,6,8]
plt.figure()
plt.plot(x,y1,label='y1')
plt.plot(x,y2,label='y2')
plt.legend()
plt.title('Multiple Lines')
plt.show()

### Practical: Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

In [None]:
import pandas as pd
df = pd.DataFrame({'A':[5,10,15], 'B':[20,25,30]})
print('Original DataFrame:\n', df)
print('\nFiltered rows (A > 7):\n', df[df['A'] > 7])

### Practical: Create a histogram using Seaborn to visualize a distribution

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.figure()
sns.histplot(data, bins=30, kde=True)
plt.title('Histogram with KDE')
plt.show()

### Practical: Perform matrix multiplication using NumPy

In [None]:
import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print('Matrix A:\n', a)
print('\nMatrix B:\n', b)
print('\nA dot B:\n', np.dot(a,b))

### Practical: Use Pandas to load a CSV file and display its first 5 rows

In [None]:
import pandas as pd
from io import StringIO
csv_data = 'A,B,C\n1,2,3\n4,5,6\n7,8,9\n10,11,12\n13,14,15'
df = pd.read_csv(StringIO(csv_data))
print(df.head())

### Practical: Create a 3D scatter plot using Plotly

In [None]:
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'x':[1,2,3,4], 'y':[10,11,12,13], 'z':[5,6,7,8]})
fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()