1. What is NumPy, and why is it widely used in Python?
Answer- NumPy (Numerical Python) is a powerful library used for numerical computing in Python. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. NumPy is widely used due to its speed, ability to handle large datasets, and built-in functions for mathematical and statistical operations.

2. How does broadcasting work in NumPy?
Answer- Broadcasting in NumPy allows operations on arrays of different shapes without explicitly reshaping them. When performing element-wise operations, NumPy automatically expands smaller arrays to match the shape of larger ones, following specific broadcasting rules. This makes computations more efficient and avoids unnecessary memory allocation.

3. What is a Pandas DataFrame?
Answer- A Pandas DataFrame is a two-dimensional, labeled data structure similar to a table or spreadsheet. It consists of rows and columns, where each column can hold different data types. It provides powerful data manipulation, filtering, and analysis capabilities.

4. Explain the use of the groupby() method in Pandas.
Answer- The groupby() method in Pandas is used to group data based on one or more columns. It enables operations like aggregation (e.g., sum, mean), transformation, and filtering on grouped data, making it useful for analyzing categorical data efficiently.

5. Why is Seaborn preferred for statistical visualizations?
Answer- Seaborn is preferred for statistical visualizations because it provides a high-level interface for creating aesthetically pleasing and informative graphs. It integrates well with Pandas, supports complex visualizations like heatmaps and pair plots, and includes built-in themes for better presentation.

6. What are the differences between NumPy arrays and Python lists?
Answer- ->NumPy arrays support vectorized operations, making them faster than Python lists.

->They require less memory as they store homogeneous data types.

->NumPy provides built-in functions for mathematical operations, whereas Python lists require loops.

->Lists are more flexible as they can store different data types, while NumPy arrays store elements of the same type.

7. What is a heatmap, and when should it be used?
Answer- A heatmap is a graphical representation of data using colors to indicate values. It is useful for visualizing relationships between variables, correlation matrices, or patterns in large datasets.

8. What does the term “vectorized operation” mean in NumPy?
Answer- A vectorized operation in NumPy refers to performing operations on entire arrays without using explicit loops. This improves performance by leveraging low-level optimizations, making computations faster and more efficient.

9. How does Matplotlib differ from Plotly?
Answer- ->Matplotlib is a static visualization library that provides fine-grained control over plots but requires more customization.

->Plotly is an interactive visualization library that allows zooming, panning, and hover functionalities, making it more suitable for dashboards and web applications.

10. What is the significance of hierarchical indexing in Pandas?
Answer- Hierarchical indexing allows multiple levels of indexing in Pandas, enabling more flexible data representation and complex data analysis. It helps in organizing multi-dimensional data within a single DataFrame.

11. What is the role of Seaborn’s pairplot() function?
Answer- The pairplot() function in Seaborn creates a grid of scatter plots for pairwise relationships in a dataset. It is useful for visualizing patterns, distributions, and correlations between multiple numerical variables.

12. What is the purpose of the describe() function in Pandas?
Answer- The describe() function in Pandas provides summary statistics for numerical columns in a DataFrame, including count, mean, standard deviation, min, max, and quartiles. It helps in quickly understanding the distribution of data.

13. Why is handling missing data important in Pandas?
Answer- Handling missing data is crucial because missing values can lead to incorrect analysis and errors in calculations. Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values efficiently.

14. What are the benefits of using Plotly for data visualization?
Answer- ->Interactivity: Allows zooming, hovering, and tooltips.

->Web integration: Easily integrates with dashboards and web applications.

->Wide range of plots: Supports 3D plots, maps, and animations.

->Customization: Offers high flexibility for styling visualizations.

15. How does NumPy handle multidimensional arrays?
Answer- NumPy provides the ndarray object to store and manipulate multi-dimensional arrays efficiently. It supports indexing, slicing, reshaping, and broadcasting to perform operations across multiple dimensions.

16. What is the role of Bokeh in data visualization?
Answer- Bokeh is a Python library used for interactive and web-based visualizations. It allows users to create dynamic dashboards, plots, and graphs with real-time data updates.

17. Explain the difference between apply() and map() in Pandas.
Answer- ->apply() is used for applying a function to entire rows or columns of a DataFrame.

->map() is used for element-wise transformations on Series objects.

18. What are some advanced features of NumPy?
Answer- ->Broadcasting: Enables operations on arrays of different shapes.

->Vectorized operations: Enhances speed by eliminating loops.

->Memory efficiency: Uses contiguous memory storage.

->Linear algebra and Fourier transform functions.

->Random number generation and statistical tools.

19. How does Pandas simplify time series analysis?
Answer- ->Pandas provides powerful time series functionality, including:

->Datetime indexing and resampling.

->Shifting and rolling window operations.

->Handling time zones and frequency conversion.

20. What is the role of a pivot table in Pandas?
Answer- A pivot table in Pandas helps in summarizing and analyzing data by grouping, aggregating, and rearranging it in a structured format. It is useful for data exploration and reporting.

21. Why is NumPy’s array slicing faster than Python’s list slicing?
Answer- NumPy’s array slicing is faster because it creates views instead of copies, meaning no additional memory is used. Additionally, NumPy arrays are stored in contiguous memory blocks, making operations more efficient.

22. What are some common use cases for Seaborn?
Answer- ->Exploratory data analysis (EDA).

->Statistical visualizations like histograms and boxplots.

->Correlation heatmaps.

->Visualizing categorical data using bar plots and violin plots.

->Pair plots to analyze relationships between multiple variables.


In [2]:
1. How do you create a 2D NumPy array and calculate the sum of each row?
Answer- import numpy as np

# Creating a 2D NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculating the sum of each row
row_sums = arr.sum(axis=1)
print(row_sums)

2. Write a Pandas script to find the mean of a specific column in a DataFrame.
Answer- import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}

df = pd.DataFrame(data)

# Finding the mean of the 'Salary' column
mean_salary = df['Salary'].mean()
print(mean_salary)
Output: 60000.0

3. Create a scatter plot using Matplotlib.
Answer- import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]

# Creating the scatter plot
plt.scatter(x, y, color='blue', marker='o')

# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot Example')

# Show plot
plt.show()

4. How do you calculate the correlation matrix using Seaborn and visualize it
with a heatmap?
Answer- import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Creating a sample DataFrame
data = {'A': np.random.rand(10),
        'B': np.random.rand(10),
        'C': np.random.rand(10)}

df = pd.DataFrame(data)

# Calculating the correlation matrix
corr_matrix = df.corr()

# Visualizing with a heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix Heatmap')
plt.show()

5. Generate a bar plot using Plotly.
Answer- import plotly.express as px
import pandas as pd

# Sample data
data = {'Category': ['A', 'B', 'C', 'D'],
        'Values': [10, 20, 15, 25]}

df = pd.DataFrame(data)

# Creating the bar plot
fig = px.bar(df, x='Category', y='Values', title='Bar Plot Example')

# Show plot
fig.show()

6. Create a DataFrame and add a new column based on an existing column.
Answer- import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Salary': [50000, 60000, 70000]}

df = pd.DataFrame(data)

# Adding a new column with a 10% bonus on Salary
df['Bonus'] = df['Salary'] * 0.1

print(df)
Output:    Name  Salary   Bonus
0  Alice   50000  5000.0
1    Bob   60000  6000.0
2 Charlie   70000  7000.0

7. Write a program to perform element-wise multiplication of two NumPy arrays.
Answer- import numpy as np

# Creating two NumPy arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

# Performing element-wise multiplication
result = arr1 * arr2

print(result)
Output: [5 12 21 32]

8. Create a line plot with multiple lines using Matplotlib.
Answer- Line Plot with Multiple Lines using Matplotlib

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [1, 4, 9, 16, 25]
y2 = [2, 4, 6, 8, 10]

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()

9. Generate a Pandas DataFrame and filter rows where a column value is greater
than a threshold.
Answer- Filter Rows in Pandas DataFrame

import pandas as pd

Create a sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)

Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)

10. Create a histogram using Seaborn to visualize a distribution.
Answer- Histogram using Seaborn

import seaborn as sns
import matplotlib.pyplot as plt

Create a sample dataset
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

Create a histogram
sns.histplot(data, bins=5)
plt.show()

11. Perform matrix multiplication using NumPy.
Answer- Matrix Multiplication using NumPy

import numpy as np

Create two sample matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

Perform matrix multiplication
C = np.matmul(A, B)
print(C)

12. Use Pandas to load a CSV file and display its first 5 rows.
Answer- Load CSV File using Pandas

import pandas as pd

Load the CSV file
df = pd.read_csv('data.csv')

Display the first 5 rows
print(df.head())

13. Create a 3D scatter plot using Plotly.
Answer- 3D Scatter Plot using Plotly

import plotly.graph_objects as go

Create sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
z = [10, 8, 6, 4, 2]

Create a 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(x=x, y=y, z=z, mode='markers')])
fig.show()




SyntaxError: invalid decimal literal (<ipython-input-2-3dd625045db8>, line 179)