# Data Toolkit Assignment

## Theory Questions

1. **What is NumPy, and why is it widely used in Python?**  
   NumPy is a Python library for numerical computing. It helps with working on large arrays and matrices, and has many functions for mathematical operations.

2. **How does broadcasting work in NumPy?**  
   Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes by stretching the smaller array to match the larger one.

3. **What is a Pandas DataFrame?**  
   A DataFrame is a 2D table in Pandas with rows and columns, like an Excel sheet. It makes handling and analyzing data easier.

4. **Explain the use of the groupby() method in Pandas.**  
   The groupby() method groups data based on a column's values, so you can apply functions like sum or mean to each group.

5. **Why is Seaborn preferred for statistical visualizations?**  
   Seaborn makes complex visualizations easy and beautiful. It also has built-in themes and works well with Pandas data.

6. **What are the differences between NumPy arrays and Python lists?**  
   NumPy arrays are faster and use less memory. They support mathematical operations directly, unlike Python lists.

7. **What is a heatmap, and when should it be used?**  
   A heatmap is a color-coded table that shows data intensity. It’s useful for visualizing correlations or patterns.

8. **What does the term "vectorized operation" mean in NumPy?**  
   It means performing operations on entire arrays without using loops, making the code faster and simpler.

9. **How does Matplotlib differ from Plotly?**  
   Matplotlib is great for static, simple plots, while Plotly is better for interactive, web-based visualizations.

10. **What is the significance of hierarchical indexing in Pandas?**  
    Hierarchical indexing lets you use multiple levels of row or column labels, making complex data easier to manage.

11. **What is the role of Seaborn’s pairplot() function?**  
    The pairplot() function shows pairwise relationships in a dataset, making it easier to spot correlations.

12. **What is the purpose of the describe() function in Pandas?**  
    The describe() function gives summary statistics like mean, count, and standard deviation for each column.

13. **Why is handling missing data important in Pandas?**  
    Missing data can affect analysis, so it’s important to handle it by filling, dropping, or imputing values.

14. **What are the benefits of using Plotly for data visualization?**  
    Plotly makes interactive, attractive, and web-friendly visualizations without much effort.

15. **How does NumPy handle multidimensional arrays?**  
    NumPy supports arrays with multiple dimensions, allowing you to access elements using multiple indices.

16. **What is the role of Bokeh in data visualization?**  
    Bokeh helps create interactive, web-ready visualizations with simple, flexible code.

17. **Explain the difference between apply() and map() in Pandas.**  
    apply() works on rows or columns, while map() works element-wise on a single series.

18. **What are some advanced features of NumPy?**  
    NumPy supports complex math, linear algebra, random sampling, and fast Fourier transforms.

19. **How does Pandas simplify time series analysis?**  
    Pandas makes time-based indexing, resampling, and rolling window calculations easy.

20. **What is the role of a pivot table in Pandas?**  
    A pivot table summarizes data, making it easier to analyze and explore large datasets.

21. **Why is NumPy’s array slicing faster than Python’s list slicing?**  
    NumPy arrays use continuous memory blocks, making slicing faster and more memory-efficient.

22. **What are some common use cases for Seaborn?**  
    Seaborn is great for visualizing distributions, correlations, categorical data, and statistical plots.

---

## Practical Exercises

### Create a 2D NumPy array and calculate the sum of each row
```python
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = array.sum(axis=1)
print(row_sums)
```

### Find the mean of a specific column in a DataFrame
```python
import pandas as pd
df = pd.DataFrame({"A": [10, 20, 30], "B": [40, 50, 60]})
mean_B = df['B'].mean()
print(mean_B)
```

### Create a scatter plot using Matplotlib
```python
import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [4, 5, 6]
plt.scatter(x, y)
plt.show()
```

### Calculate and visualize the correlation matrix with Seaborn
```python
import seaborn as sns
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True)
```

### Generate a bar plot using Plotly
```python
import plotly.express as px
df = pd.DataFrame({"Fruit": ["Apple", "Banana", "Orange"], "Count": [10, 20, 30]})
fig = px.bar(df, x='Fruit', y='Count')
fig.show()
```

### Create a DataFrame and add a new column based on an existing column
```python
df['C'] = df['A'] * 2
print(df)
```

### Element-wise multiplication of two NumPy arrays
```python
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
result = array1 * array2
print(result)
```

### Create a line plot with multiple lines using Matplotlib
```python
plt.plot([1, 2, 3], [4, 5, 6], label='Line 1')
plt.plot([1, 2, 3], [6, 5, 4], label='Line 2')
plt.legend()
plt.show()
```

### Filter rows where a column value is greater than a threshold
```python
filtered_df = df[df['A'] > 15]
print(filtered_df)
```

### Create a histogram using Seaborn
```python
sns.histplot(df['A'])
```

### Perform matrix multiplication using NumPy
```python
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
result_matrix = np.dot(matrix1, matrix2)
print(result_matrix)
```

### Load a CSV file and display the first 5 rows
```python
df = pd.read_csv('data.csv')
print(df.head())
```

### Create a 3D scatter plot using Plotly
```python
import plotly.graph_objects as go
fig = go.Figure(data=[go.Scatter3d(x=[1, 2, 3], y=[4, 5, 6], z=[7, 8, 9], mode='markers')])
fig.show()
```
