#  DATA TOOLKIT

1. What is NumPy, and why is it widely used in Python
-  NumPy (Numerical Python) is a Python library used for efficient numerical computations, providing support for large multidimensional arrays, matrices, and mathematical functions. It’s widely used because it is fast, memory-efficient, and supports vectorized operations, making it ideal for data analysis, machine learning, and scientific computing.

2. How does broadcasting work in NumPy
-  Broadcasting in NumPy allows arrays of different shapes to be combined in arithmetic operations. Smaller arrays are automatically “stretched” to match the shape of larger arrays without making copies, following specific broadcasting rules, which helps perform element-wise operations efficiently.

3. What is a Pandas DataFrame
-  A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled rows and columns. It is one of Pandas’ core data structures for storing and manipulating structured data.

4. Explain the use of the groupby() method in Pandas
-  The groupby() method in Pandas is used to split data into groups based on column values, apply functions (like sum, mean, or custom functions) to each group, and combine the results into a new DataFrame or Series.

5. Why is Seaborn preferred for statistical visualizations
-  Seaborn is preferred because it is built on top of Matplotlib and provides a high-level interface for creating attractive, informative statistical graphics with minimal code. It includes built-in themes, color palettes, and functions for complex plots like heatmaps, violin plots, and pair plots.

6. What are the differences between NumPy arrays and Python lists
-  NumPy arrays are homogeneous (same data type), support vectorized operations, and are more memory-efficient, while Python lists can store mixed data types, are more flexible, but slower for numerical computations.

7. What is a heatmap, and when should it be used
-  A heatmap is a data visualization that uses color shading to represent values in a matrix. It’s used to show data density, correlations, or magnitude patterns in datasets, such as a correlation matrix.

8. What does the term “vectorized operation” mean in NumPy
-  A vectorized operation is one that applies an operation to entire arrays at once, without using explicit loops. This improves speed and code readability.

9. How does Matplotlib differ from Plotly
-  Matplotlib is a static, low-level visualization library ideal for publication-quality plots, while Plotly is an interactive plotting library that allows zooming, hovering, and real-time updates, making it better for dashboards and web-based visualizations.

10. What is the significance of hierarchical indexing in Pandas
-  Hierarchical indexing (MultiIndex) allows multiple levels of row or column labels, enabling more complex data organization, easier subsetting, and working with higher-dimensional data in a 2D DataFrame.

11. What is the role of Seaborn’s pairplot() function
-  The pairplot() function creates a matrix of scatter plots for numerical features and histograms or KDE plots on the diagonal, helping visualize relationships between multiple variables at once.

12. What is the purpose of the describe() function in Pandas
-  The describe() function generates summary statistics of numerical columns, including count, mean, standard deviation, min, quartiles, and max.

13. Why is handling missing data important in Pandas
-  Handling missing data ensures accuracy in analysis, prevents errors in calculations, and improves the reliability of insights. Missing data can be handled by filling, imputing, or removing affected rows/columns.

14. What are the benefits of using Plotly for data visualization
-  Plotly provides interactive, dynamic visualizations, supports multiple chart types, integrates with web frameworks, and allows exporting plots to HTML for sharing.

15. How does NumPy handle multidimensional arrays
-  NumPy handles multidimensional arrays (ndarrays) by storing them in contiguous memory blocks and providing attributes like .shape and .ndim for dimensions, allowing efficient indexing, slicing, and broadcasting.

16. What is the role of Bokeh in data visualization
-  Bokeh is a Python library for creating interactive, browser-based visualizations that can handle streaming data and large datasets efficiently.

17. Explain the difference between apply() and map() in Pandas
-  map() is used for element-wise transformations on Series, while apply() can apply a function to an entire DataFrame or Series, operating row-wise or column-wise.

18. What are some advanced features of NumPy
-  Advanced features include broadcasting, vectorization, masked arrays, structured arrays, random number generation, FFT (Fast Fourier Transform), and linear algebra functions.

19. How does Pandas simplify time series analysis
-  Pandas provides specialized date/time data types, resampling, shifting, rolling windows, and built-in functions for indexing and manipulating time series data.

20. What is the role of a pivot table in Pandas
-  A pivot table reshapes data, summarizing it based on categorical values, aggregating data, and providing an easy way to compare different categories.

21. Why is NumPy’s array slicing faster than Python’s list slicing
NumPy’s array slicing is faster because arrays are stored in contiguous memory and operations are implemented in optimized C code, while Python lists are arrays of pointers to objects.

22. What are some common use cases for Seaborn
Common use cases include visualizing distributions (histograms, KDE plots), relationships (scatter, line plots), comparisons (bar, box plots), and correlations (heatmaps, pair plots).



# Practical answer

1. How do you create a 2D NumPy array and calculate the sum of each row

In [None]:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
row_sum = np.sum(arr, axis=1)
print(row_sum)


2.  Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
import pandas as pd
data = {'Name': ['A', 'B', 'C'], 'Score': [85, 90, 78]}
df = pd.DataFrame(data)
mean_score = df['Score'].mean()
print(mean_score)


3. 25. Create a scatter plot using Matplotlib

In [None]:
import matplotlib.pyplot as plt
x = [5, 7, 8, 7]
y = [99, 86, 87, 88]
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()


4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap

In [None]:
import seaborn as sns
import pandas as pd
data = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 5, 6, 7],
    'C': [7, 8, 9, 10]
})
corr = data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()


5. Generate a bar plot using Plotly

In [None]:
import plotly.express as px
data = {'Fruit': ['Apple', 'Banana', 'Cherry'], 'Quantity': [10, 20, 15]}
fig = px.bar(data, x='Fruit', y='Quantity', title='Fruit Quantity')
fig.show()


# 6. Create a DataFrame and add a new column based on an existing column

In [None]:
import pandas as pd
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Score': [85, 90, 78]})
df['Grade'] = ['Pass' if x >= 80 else 'Fail' for x in df['Score']]
print(df)


7. Write a program to perform element-wise multiplication of two NumPy arrays

In [None]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print(result)


8. Create a line plot with multiple lines using Matplotlib

In [None]:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [2, 3, 4, 5, 6]
y2 = [3, 4, 5, 6, 7]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()


9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

In [None]:
import pandas as pd
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Score': [85, 90, 78]})
filtered_df = df[df['Score'] > 80]
print(filtered_df)


10. Create a histogram using Seaborn to visualize a distribution

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
sns.histplot(data, kde=True)
plt.show()


11. Perform matrix multiplication using NumPy

In [None]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
result = np.dot(a, b)
print(result)


12. Use Pandas to load a CSV file and display its first 5 rows

In [None]:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())


13. Create a 3D scatter plot using Plotly


In [None]:
import plotly.express as px
import pandas as pd
df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 11, 12, 13],
    'z': [5, 6, 7, 8]
})
fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()
