##       ** data toolkit**

Q1. What is NumPy, and why is it widely used in Python?
- NumPy is a Python library used for numerical computing. It provides fast and efficient handling of large datasets through n-dimensional arrays, which are much quicker than Python lists. Many data science libraries like Pandas and SciPy are built on top of NumPy because of its speed and reliability.

Q2. How does broadcasting work in NumPy?
 - Broadcasting is a feature that lets NumPy perform operations on arrays of different shapes. It automatically stretches the smaller array to match the shape of the larger one. For example, adding a scalar to every element of an array works without writing loops.

Q3. What is a Pandas DataFrame?
- A DataFrame in Pandas is a two-dimensional labeled data structure, like a table in Excel. It has rows and columns with labels, making it easy to work with structured data.

Q4. Explain the use of the groupby() method in Pandas.
- The groupby() method is used to split data into groups based on one or more columns and then apply functions such as sum, mean, or count. For instance, it can be used to calculate the total sales for each region in a dataset.

Q5. Why is Seaborn preferred for statistical visualizations?
- Seaborn is preferred because it creates attractive charts with simple commands and works directly with Pandas DataFrames. It is especially good for statistical plots like boxplots, violin plots, pair plots, and heatmaps.

Q6. What are the differences between NumPy arrays and Python lists?
- NumPy arrays are faster, use less memory, and support mathematical operations directly. Python lists are slower, can store mixed data types, and require loops for calculations.

Q7. What is a heatmap, and when should it be used?
- A heatmap is a color-coded matrix used to show the intensity of values. It is often used for correlation matrices or confusion matrices to make patterns easier to see.

Q8. What does the term “vectorized operation” mean in NumPy?
- Vectorized operation means applying calculations on entire arrays without loops. For example, a + b adds two arrays element by element. This is faster than using Python loops.

Q9. How does Matplotlib differ from Plotly?
- Matplotlib is mainly used for static and publication-quality plots. Plotly creates interactive plots where users can zoom, pan, and hover, making it better for dashboards and presentations.

Q10. What is the significance of hierarchical indexing in Pandas?
- Hierarchical indexing (or multi-indexing) allows more than one level of indexing for rows or columns. It is useful when working with multi-dimensional data in a two-dimensional DataFrame.

Q11. What is the role of Seaborn’s pairplot() function?
- The pairplot() function shows scatterplots for every pair of numerical variables and distributions on the diagonal. It helps in quickly spotting relationships and correlations between features.

Q12. What is the purpose of the describe() function in Pandas?
- The describe() function gives summary statistics such as mean, standard deviation, minimum, maximum, and quartiles. It is a quick way to understand the distribution of a dataset.

Q13. Why is handling missing data important in Pandas?
- Handling missing data is important because missing values can affect analysis and lead to incorrect results. Pandas provides methods like dropna() to remove them and fillna() to replace them with suitable values.

Q14. What are the benefits of using Plotly for data visualization?
- Plotly allows the creation of interactive and web-ready visualizations. It supports advanced charts such as 3D plots, maps, and animations, which makes it popular for dashboards and presentations.

Q15. How does NumPy handle multidimensional arrays?
- NumPy stores multidimensional arrays in continuous memory and uses shape and strides to navigate them. This makes it efficient for working with higher-dimensional data like matrices and tensors.

Q16. What is the role of Bokeh in data visualization?
- Bokeh is a library for creating interactive plots and dashboards. It is especially useful for large or streaming datasets and for embedding interactive graphics into web applications.

Q17. Explain the difference between apply() and map() in Pandas.
- The map() function is used on a Series to apply a function to each element. The apply() function can be used on a DataFrame to apply functions row-wise or column-wise.

Q18. What are some advanced features of NumPy?
- Some advanced features are broadcasting, fancy indexing, linear algebra functions, random number generation, Fourier transforms, and memory-mapped arrays for handling very large data.

Q19. How does Pandas simplify time series analysis?
- Pandas provides built-in support for time series through DatetimeIndex. Functions like resample(), shift(), and rolling() make it easy to handle frequency conversion, lag analysis, and moving averages.

Q20. What is the role of a pivot table in Pandas?
- A pivot table summarizes data in a table format. It can show totals or averages grouped by different categories, similar to pivot tables in Excel.

Q21. Why is NumPy’s array slicing faster than Python’s list slicing?
- Array slicing in NumPy is faster because it creates a view of the data instead of copying it. List slicing always creates a new list, which takes more time and memory.

Q22. What are some common use cases for Seaborn?
- Seaborn is commonly used for plotting distributions, comparing groups, analyzing relationships between variables, and creating heatmaps for correlation analysis.

# **Practical**

Q1. How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

row_sum = arr.sum(axis=1)
print(row_sum)


Q2. Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
import pandas as pd

data = {'Name': ['A', 'B', 'C'],
        'Marks': [85, 90, 78]}
df = pd.DataFrame(data)

mean_marks = df['Marks'].mean()
print(mean_marks)


Q3. Create a scatter plot using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

x = [5, 7, 8, 7, 6, 9]
y = [99, 86, 87, 88, 100, 86]

plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot Example")
plt.show()


Q4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'Math': [90, 80, 70, 60],
    'Science': [85, 75, 65, 55],
    'English': [88, 78, 68, 58]
})

corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()


Q5. Generate a bar plot using Plotly.

In [None]:
import plotly.express as px
import pandas as pd

data = {'Fruit': ['Apple', 'Banana', 'Mango'],
        'Quantity': [10, 15, 7]}
df = pd.DataFrame(data)

fig = px.bar(df, x='Fruit', y='Quantity', title="Bar Plot Example")
fig.show()


Q6. Create a DataFrame and add a new column based on an existing column.

In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['A', 'B', 'C'],
                   'Marks': [50, 70, 90]})

df['Result'] = df['Marks'].apply(lambda x: 'Pass' if x >= 60 else 'Fail')
print(df)


Q7. Write a program to perform element-wise multiplication of two NumPy arrays.


In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = a * b
print(result)


Q8. Create a line plot with multiple lines using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]

plt.plot(x, y1, label="Line 1")
plt.plot(x, y2, label="Line 2")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()


Q9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['A', 'B', 'C', 'D'],
                   'Marks': [40, 75, 60, 85]})

filtered = df[df['Marks'] > 60]
print(filtered)


Q10. Create a histogram using Seaborn to visualize a distribution.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
sns.histplot(data, bins=5, kde=True)
plt.show()


Q11. Perform matrix multiplication using NumPy.

In [None]:
import numpy as np

a = np.array([[1, 2],
              [3, 4]])
b = np.array([[5, 6],
              [7, 8]])

result = np.dot(a, b)
print(result)


Q12. Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())


Q13. Create a 3D scatter plot using Plotly.

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 4, 5],
    'y': [10, 20, 30, 40, 50],
    'z': [5, 10, 15, 20, 25]
})

fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()
