###THEORY QUESTIONS

###1. What is NumPy, and why is it widely used in Python?
> NumPy is a Python library for numerical and scientific computing. It provides support for large multidimensional arrays, matrices, and a collection of high-level mathematical functions. It is widely used because it is faster and more memory-efficient than Python lists.

###2. How does broadcasting work in NumPy?
> Broadcasting is a feature in NumPy that allows operations on arrays of different shapes without explicitly replicating data, making computations more memory and time efficient.

###3. What is a Pandas DataFrame?
> A DataFrame in Pandas is a 2-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns), similar to a spreadsheet or SQL table.

###4. Explain the use of the groupby() method in Pandas.
> groupby() in Pandas is used to group data based on one or more keys and apply aggregation functions like sum, mean, etc., for easier data analysis.

###5. Why is Seaborn preferred for statistical visualizations?
> Seaborn provides beautiful, high-level interface for creating attractive and informative statistical graphics with less code compared to Matplotlib.

###6. What are the differences between NumPy arrays and Python lists?
> NumPy arrays are faster and consume less memory.

> Arrays support element-wise operations, unlike lists.

> Arrays are homogeneous; lists can hold mixed types.

###7. What is a heatmap, and when should it be used?
> A heatmap is a graphical representation of data where individual values are represented using colors. It is used to visualize correlation matrices, missing data patterns, or density distributions.

###8. What does the term “vectorized operation” mean in NumPy?
> A vectorized operation means performing an operation on entire arrays without writing explicit loops, making the code faster and more readable.

###9. How does Matplotlib differ from Plotly?
> Matplotlib: Static, traditional 2D plotting library.

Plotly: Interactive and dynamic visualizations, with zoom, hover, and clickable features.

###10. What is the significance of hierarchical indexing in Pandas?
> Hierarchical indexing allows multiple (two or more) index levels on an axis, which makes it easier to work with higher-dimensional data in a 2D table.

###11. What is the role of Seaborn’s pairplot() function?
> pairplot() automatically plots pairwise relationships in a dataset and supports color grouping by category, making it easy to explore relationships between multiple variables.

###12. What is the purpose of the describe() function in Pandas?
> describe() provides summary statistics (mean, median, min, max, standard deviation) of numeric columns, helping quickly understand the dataset.

###13. Why is handling missing data important in Pandas?
> Handling missing data prevents errors in data analysis, ensures model accuracy, and maintains dataset integrity by avoiding incorrect conclusions.

###14. What are the benefits of using Plotly for data visualization?
> Interactive charts.

> Easy sharing via web.

> Beautiful and customizable visualizations.

> Supports 3D plots and animations.

###15. How does NumPy handle multidimensional arrays?
> NumPy uses ndarray objects which can have multiple dimensions (2D, 3D, etc.), allowing efficient storage and manipulation of large datasets.

###16. What is the role of Bokeh in data visualization?
> Bokeh is a Python library for creating interactive, browser-based visualizations that can handle large streaming datasets and dashboards.

###17. Explain the difference between apply() and map() in Pandas.
> map() is used for element-wise transformations on a Series.

>apply() can be used on both Series and DataFrames and allows applying custom functions along rows or columns.

###18. What are some advanced features of NumPy?
> Broadcasting.

> Vectorization.

> Masked arrays.

> Structured arrays.

> Linear algebra operations.

###19. How does Pandas simplify time series analysis?
> Pandas offers built-in functions for date parsing, resampling, shifting, and rolling windows, making time series analysis easy and efficient.

###20. What is the role of a pivot table in Pandas?
> Pivot tables summarize data by aggregating it based on categorical fields, similar to pivot tables in Excel.

###21. Why is NumPy’s array slicing faster than Python’s list slicing?
> NumPy arrays are stored in contiguous memory blocks, allowing faster access and operations without the overhead of dynamic typing.

###22. What are some common use cases for Seaborn?
> Heatmaps.

> Correlation plots.

> Distribution plots (histograms, KDE).

> Categorical plots (boxplots, violin plots).

> Regression analysis.

###PRACTICAL QUESTIONS

###1. How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
import numpy as np

array_2d = np.array([[1, 2, 3], [4, 5, 6]])
row_sum = np.sum(array_2d, axis=1)
print("Sum of each row:", row_sum)

###2. Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
import pandas as pd

data = {'Marks': [85, 90, 78, 92]}
df = pd.DataFrame(data)
mean_value = df['Marks'].mean()
print("Mean of Marks column:", mean_value)

###3. Create a scatter plot using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

plt.scatter(x, y)
plt.title('Scatter Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

###4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
import seaborn as sns
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [5, 6, 7, 8]
})

corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

###5. Generate a bar plot using Plotly.

In [None]:
import plotly.express as px

data = {'Fruits': ['Apple', 'Banana', 'Mango'], 'Count': [10, 20, 15]}
fig = px.bar(data, x='Fruits', y='Count', title='Fruit Count')
fig.show()

###6. Create a DataFrame and add a new column based on an existing column.

In [None]:
import pandas as pd

df = pd.DataFrame({'Marks': [80, 90, 70]})
df['Grade'] = ['A' if mark > 75 else 'B' for mark in df['Marks']]
print(df)

###7. Write a program to perform element-wise multiplication of two NumPy arrays.

In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print("Element-wise multiplication:", result)

###8. Create a line plot with multiple lines using Matplotlib.

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y1 = [10, 20, 25, 30]
y2 = [5, 15, 20, 25]

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.title('Multiple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

###9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

In [None]:
import pandas as pd

df = pd.DataFrame({'Marks': [65, 85, 75, 90]})
filtered_df = df[df['Marks'] > 80]
print(filtered_df)

###10. Create a histogram using Seaborn to visualize a distribution.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

data = [10, 20, 20, 30, 30, 30, 40, 50]
sns.histplot(data, kde=True)
plt.title('Histogram with KDE')
plt.show()

###11. Perform matrix multiplication using NumPy.

In [None]:
import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
result = np.matmul(a, b)
print("Matrix multiplication result:\n", result)

###12. Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
import pandas as pd

df = pd.read_csv('your_file.csv')
print(df.head())

###13. Create a 3D scatter plot using Plotly.

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 20, 25, 30],
    'z': [5, 15, 20, 25]
})

fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()