## Data Toolkit
### Theoey



1. **What is NumPy, and why is it widely used in Python?**
   NumPy is a library for numerical computing in Python that provides powerful n-dimensional arrays.
   It is widely used because of its speed, efficiency, and support for mathematical operations.

2. **How does broadcasting work in NumPy?**
   Broadcasting allows NumPy to perform operations on arrays of different shapes.
   It automatically expands smaller arrays to match larger ones during calculations.

3. **What is a Pandas DataFrame?**
   A DataFrame is a 2D, tabular data structure with labeled rows and columns.
   It is widely used for data manipulation, cleaning, and analysis.

4. **Explain the use of the groupby() method in Pandas.**
   The `groupby()` method is used to split data into groups based on column values.
   It enables efficient aggregation, transformation, and analysis of grouped data.

5. **Why is Seaborn preferred for statistical visualizations?**
   Seaborn provides high-level functions to create informative and attractive plots easily.
   It integrates well with Pandas and supports complex statistical visualization.

6. **What are the differences between NumPy arrays and Python lists?**
   NumPy arrays are faster, use less memory, and support vectorized operations.
   Python lists are more flexible but slower for numerical computations.

7. **What is a heatmap, and when should it be used?**
   A heatmap is a graphical representation of data using color intensity.
   It is best used for visualizing correlation, density, or matrix data.

8. **What does the term “vectorized operation” mean in NumPy?**
   Vectorized operations allow applying functions to entire arrays without loops.
   They make code faster, cleaner, and more efficient.

9. **How does Matplotlib differ from Plotly?**
   Matplotlib is a static visualization library with detailed customization options.
   Plotly provides interactive and dynamic plots ideal for dashboards.

10. **What is the significance of hierarchical indexing in Pandas?**
    Hierarchical indexing allows multiple levels of row/column labels.
    It helps in handling complex datasets and performing advanced slicing.

11. **What is the role of Seaborn’s pairplot() function?**
    The `pairplot()` function creates scatterplots for all numeric variable pairs.
    It is useful for identifying relationships and distributions in datasets.

12. **What is the purpose of the describe() function in Pandas?**
    The `describe()` function generates summary statistics of numerical columns.
    It helps in quickly understanding data distribution and spread.

13. **Why is handling missing data important in Pandas?**
    Missing data can lead to biased or incorrect analysis results.
    Pandas provides tools to fill, drop, or impute missing values.

14. **What are the benefits of using Plotly for data visualization?**
    Plotly allows creating highly interactive, zoomable, and shareable visualizations.
    It supports 3D plots, dashboards, and web integration.

15. **How does NumPy handle multidimensional arrays?**
    NumPy supports `ndarray` objects for multi-dimensional data storage.
    It provides methods for reshaping, slicing, and operating on them efficiently.

16. **What is the role of Bokeh in data visualization?**
    Bokeh is used to create interactive, web-ready visualizations in Python.
    It is suitable for dashboards and large dataset visualizations.

17. **Explain the difference between apply() and map() in Pandas.**
    `map()` works element-wise on a Series, applying a function to each value.
    `apply()` works on Series or DataFrame, applying functions across rows/columns.

18. **What are some advanced features of NumPy?**
    NumPy offers linear algebra, Fourier transforms, and random number generation.
    It also supports broadcasting and advanced indexing techniques.

19. **How does Pandas simplify time series analysis?**
    Pandas provides date-time indexing, resampling, and frequency conversion tools.
    It simplifies handling trends, seasonality, and time-based calculations.

20. **What is the role of a pivot table in Pandas?**
    A pivot table summarizes data by grouping and aggregating values.
    It helps restructure and analyze datasets efficiently.

21. **Why is NumPy’s array slicing faster than Python’s list slicing?**
    NumPy arrays store data in contiguous memory blocks.
    This allows faster access and manipulation compared to Python lists.

22. **What are some common use cases for Seaborn?**
    Seaborn is used for visualizing distributions, correlations, and categorical data.
    It provides plots like boxplots, heatmaps, and violin plots.




### Code

In [None]:
# 1. How do you create a 2D NumPy array and calculate the sum of each row?
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.sum(axis=1))

# 2. Write a Pandas script to find the mean of a specific column in a DataFrame
import pandas as pd
df = pd.DataFrame({"A": [10, 20, 30], "B": [5, 15, 25]})
print(df["A"].mean())

# 3. Create a scatter plot using Matplotlib
import matplotlib.pyplot as plt
x, y = [1, 2, 3, 4], [5, 6, 7, 8]
plt.scatter(x, y)
plt.show()

# 4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
import seaborn as sns
df = pd.DataFrame(np.random.randn(5, 3), columns=list("ABC"))
sns.heatmap(df.corr(), annot=True)
plt.show()

# 5. Generate a bar plot using Plotly
import plotly.express as px
df = pd.DataFrame({"Fruit": ["Apple", "Banana", "Cherry"], "Count": [10, 20, 15]})
fig = px.bar(df, x="Fruit", y="Count")
fig.show()

# 6. Create a DataFrame and add a new column based on an existing column
df = pd.DataFrame({"X": [1, 2, 3]})
df["Y"] = df["X"] * 2
print(df)

# 7. Write a program to perform element-wise multiplication of two NumPy arrays
a, b = np.array([1, 2, 3]), np.array([4, 5, 6])
print(a * b)

# 8. Create a line plot with multiple lines using Matplotlib
x = [1, 2, 3, 4]
plt.plot(x, [y*2 for y in x], label="2x")
plt.plot(x, [y*3 for y in x], label="3x")
plt.legend()
plt.show()

# 9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold
df = pd.DataFrame({"A": [5, 10, 15, 20]})
print(df[df["A"] > 10])

# 10. Create a histogram using Seaborn to visualize a distribution
sns.histplot(df["A"], bins=5)
plt.show()

# 11. Perform matrix multiplication using NumPy
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b))

# 12. Use Pandas to load a CSV file and display its first 5 rows
df = pd.read_csv("sample.csv")
print(df.head())

# 13. Create a 3D scatter plot using Plotly
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6], "z": [7, 8, 9]})
fig = px.scatter_3d(df, x="x", y="y", z="z")
fig.show()
