**1.What is NumPy, and why is it widely used in Python?**

->NumPy (Numerical Python) is a powerful library for numerical computing in Python. It provides support for large, multidimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently. It is widely used due to its speed, memory efficiency, and support for vectorized operations.

**2.How does broadcasting work in NumPy?**

->Broadcasting allows operations between arrays of different shapes by automatically expanding the smaller array to match the dimensions of the larger one. This avoids explicit loops and enhances performance.

**3.What is a Pandas DataFrame?**

->A DataFrame is a two-dimensional, labeled data structure in Pandas, similar to a table in SQL or an Excel spreadsheet. It consists of rows and columns, supporting heterogeneous data types.



**4.Explain the use of the groupby() method in Pandas.**

->The groupby() method is used for splitting a DataFrame into groups based on a specified column, applying a function to each group, and then combining the results. It's useful for aggregating data.

**5.Why is Seaborn preferred for statistical visualizations?**

->Seaborn provides aesthetically pleasing and informative statistical graphics, built on top of Matplotlib. It simplifies complex visualizations like heatmaps, violin plots, and pair plots.

**6.What are the differences between NumPy arrays and Python lists?**

->NumPy arrays are more memory-efficient.
They support vectorized operations, making computations faster.
Arrays have a fixed data type, whereas lists can store mixed types

**7.What is a heatmap, and when should it be used?**

->A heatmap is a color-coded matrix used to visualize correlations, patterns, and intensity of data relationships, commonly used for correlation matrices and confusion matrices.

**8.What does the term "vectorized operation" mean in NumPy?**

->A vectorized operation means applying an operation to entire arrays at once, without explicit loops, improving performance significantly.

**9.How does Matplotlib differ from Plotly?**

->Matplotlib is static and customizable but requires more manual effort.
Plotly is interactive and better for dashboards but requires more computational power.

**10.What is the significance of hierarchical indexing in Pandas?**

->Hierarchical indexing allows multiple levels of indexing, enabling complex data structures like multi-dimensional tables within a DataFrame.

**11.What is the role of Seaborn's pairplot() function?**

->The pairplot() function generates pairwise scatter plots for all numerical features in a dataset, useful for identifying correlations.

**12.What is the purpose of the describe() function in Pandas?**

->The describe() function provides summary statistics for numerical columns, including mean, standard deviation, min, max, and quartiles.

**13.Why is handling missing data important in Pandas?**

->Missing data can lead to inaccurate analysis. Pandas provides methods (fillna(), dropna()) to handle missing values efficiently.

**14.What are the benefits of using Plotly for data visualization?**

->Interactive visualizations
Easy integration with web applications
Support for 3D and complex plots

**15.How does NumPy handle multidimensional arrays?**

->NumPy uses the ndarray object, which supports multiple dimensions and efficient indexing, slicing, and reshaping.

**16.What is the role of Bokeh in data visualization?**

->Bokeh is used for creating interactive web-based visualizations, providing high-performance tools for big data.

**17.Explain the difference between apply() and map() in Pandas.**

->apply() applies a function to each row or column of a DataFrame.
map() is used for element-wise transformations on a Series.

**18.What are some advanced features of NumPy?**

->Broadcasting
Universal functions (ufuncs)
Structured arrays
Memory mapping

**19.How does Pandas simplify time series analysis?**

->Pandas provides date-time indexing, resampling, time-based filtering, and rolling-window functions for time series analysis.

**20.What is the role of a pivot table in Pandas?**

->A pivot table summarizes data based on specified index and columns, making it useful for aggregating large datasets.

**21.Why is NumPy's array slicing faster than Python's list slicing?**

->NumPy arrays are stored in contiguous memory blocks, allowing efficient slicing without copying data.

**22.What are some common use cases for Seaborn?**

->Analyzing distributions (histograms, KDE plots)
Visualizing relationships (scatter plots, pair plots)
Heatmaps for correlations

#**Practical** **Implementations**


In [None]:
#1.Create a 2D NumPy array and calculate the sum of each row.

import numpy as np

# Creating a 2D NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Calculating the sum of each row
row_sums = arr.sum(axis=1)

print("2D NumPy Array:")
print(arr)
print("\nSum of each row:", row_sums)


In [None]:
#2. Find the mean of a specific column in a Pandas DataFrame.
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})

# Finding the mean of column 'A'
mean_A = df['A'].mean()

print("DataFrame:")
print(df)
print("\nMean of column 'A':", mean_A)


In [None]:
#3. Calculate the correlation matrix using Seaborn and visualize it with a heatmap.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Creating a DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [4, 5, 6, 7], 'C': [7, 8, 9, 10]})

# Calculating correlation matrix
correlation_matrix = df.corr()

# Creating a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", linewidths=0.5)

# Displaying the plot
plt.title("Correlation Matrix Heatmap")
plt.show()


In [None]:
#4.Create a scatter plot using Matplotlib.

import matplotlib.pyplot as plt

# Data for the scatter plot
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]

# Creating the scatter plot
plt.scatter(x, y, color='blue', marker='o')

# Adding labels and title
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot Example")

# Displaying the plot
plt.show()


In [None]:
#5. Generate a bar plot using Plotly.
import plotly.express as px
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Category': ['A', 'B', 'C', 'D'], 'Values': [10, 15, 7, 12]})

# Creating a bar plot
fig = px.bar(df, x='Category', y='Values', title="Bar Plot Example", color='Values')

# Displaying the plot
fig.show()


In [None]:
#6. Create a DataFrame and add a new column based on an existing column.
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'A': [10, 20, 30], 'B': [5, 10, 15]})

# Adding a new column 'C' based on column 'A'
df['C'] = df['A'] * 2

print("Updated DataFrame:")
print(df)


In [None]:
#7. Perform element-wise multiplication of two NumPy arrays.
import numpy as np

# Creating two NumPy arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

# Element-wise multiplication
result = arr1 * arr2

print("Array 1:", arr1)
print("Array 2:", arr2)
print("Element-wise multiplication result:", result)


In [None]:
#8. Create a line plot with multiple lines using Matplotlib.
import matplotlib.pyplot as plt

# Data for two lines
x = [1, 2, 3, 4, 5]
y1 = [10, 20, 30, 40, 50]
y2 = [5, 15, 25, 35, 45]

# Plotting the lines
plt.plot(x, y1, label='Line 1', color='blue', linestyle='--', marker='o')
plt.plot(x, y2, label='Line 2', color='red', linestyle='-', marker='s')

# Adding labels and title
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Line Plot with Multiple Lines")
plt.legend()

# Displaying the plot
plt.show()


In [None]:
#9. Filter rows where a column value is greater than a threshold.
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'A': [10, 20, 30, 40], 'B': [5, 15, 25, 35]})

# Filtering rows where column 'A' values are greater than 15
filtered_df = df[df['A'] > 15]

print("Filtered DataFrame:")
print(filtered_df)


In [None]:
#10. Create a histogram using Seaborn to visualize a distribution.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Creating a DataFrame with random values
df = pd.DataFrame({'Values': [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 9]})

# Creating a histogram
sns.histplot(df['Values'], bins=5, kde=True, color='blue')

# Adding title
plt.title("Histogram Example")

# Displaying the plot
plt.show()


In [None]:
#11. Perform matrix multiplication using NumPy.
import numpy as np

# Creating two matrices
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])

# Performing matrix multiplication
result = np.dot(mat1, mat2)

print("Matrix 1:")
print(mat1)
print("\nMatrix 2:")
print(mat2)
print("\nMatrix Multiplication Result:")
print(result)


In [None]:
#12. Load a CSV file and display its first 5 rows.
import pandas as pd

# Loading a CSV file (Replace 'data.csv' with the actual file name)
df = pd.read_csv('data.csv')

# Displaying the first 5 rows
print("First 5 rows of the DataFrame:")
print(df.head())


In [None]:
#13. Create a 3D scatter plot using Plotly.
import plotly.express as px
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': [10, 20, 30, 40, 50]})

# Creating a 3D scatter plot
fig = px.scatter_3d(df, x='A', y='B', z='C', title="3D Scatter Plot Example")

# Displaying the plot
fig.show()
