1....... What is NumPy, and why is it widely used in Python?
NumPy is a Python library used for numerical computing. It provides powerful data structures like the multidimensional array (ndarray) and functions for performing efficient mathematical operations. It's widely used due to:

High performance (vectorized operations in C)

Convenient syntax

Extensive functionality (e.g., linear algebra, FFT, random number generation)

2......How does broadcasting work in NumPy?
Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes by automatically expanding their dimensions without copying data. This simplifies code and enhances performance by avoiding explicit loops.

3....... What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional, labeled data structure with columns of potentially different types (like a table in SQL or Excel). It is ideal for data manipulation and analysis in Python.

4......... Explain the use of the groupby() method in Pandas
The groupby() method is used to split data into groups based on a key (e.g., a column), apply a function (like mean or sum), and combine the results. It’s fundamental for data aggregation and summarization.

5....... Why is Seaborn preferred for statistical visualizations?
Seaborn is preferred because it builds on Matplotlib and provides:

High-level interface for drawing attractive and informative plots

Built-in themes and color palettes

Functions for visualizing statistical relationships (e.g., regression plots, distributions)

6....... What are the differences between NumPy arrays and Python lists?
NumPy arrays are fixed-type, more memory-efficient, and support vectorized operations.

Python lists are flexible (can hold mixed types) but slower for numerical tasks.

7....... What is a heatmap, and when should it be used?
A heatmap is a graphical representation of data where values are depicted with color. It is used to:

Show correlation matrices

Visualize intensities or frequencies

Highlight patterns in data tables

8...... What does the term “vectorized operation” mean in NumPy?
A vectorized operation refers to performing operations on entire arrays rather than element by element. This is faster and more concise, leveraging low-level optimizations.

9...... How does Matplotlib differ from Plotly?
Matplotlib is static and traditional, suitable for publications.

Plotly is interactive and web-based, ideal for dashboards and exploratory data analysis.

10........ What is the significance of hierarchical indexing in Pandas?
Hierarchical indexing (MultiIndex) allows multiple levels of indexing on rows or columns, enabling:

Complex data structures

Easy slicing and dicing

Reshaping operations like pivoting

11....... What is the role of Seaborn’s pairplot() function?
The pairplot() function creates a matrix of scatter plots (and histograms/ KDEs) for each pair of variables. It’s useful for exploring relationships in multivariate datasets.

12...... What is the purpose of the describe() function in Pandas?
The describe() function provides a quick statistical summary of numeric (or categorical, with include='all') columns, including count, mean, std, min, and quartiles.

13...... Why is handling missing data important in Pandas?
Handling missing data is essential to:

Ensure the accuracy of analyses

Avoid errors in operations

Maintain model performance in machine learning tasks

14....... What are the benefits of using Plotly for data visualization?
Plotly offers:

Interactive plots with zoom/pan/hover

Easy integration with web apps (Dash)

Rich chart types (e.g., 3D, choropleths)

Export to HTML and dashboards

15....... How does NumPy handle multidimensional arrays?
NumPy handles multidimensional arrays (ndarrays) with:

N-dimensional shape and indexing

Broadcasting rules

Efficient storage and operations using strides and vectorization

17........A What is the role of Bokeh in data visualization?
Bokeh is a Python library for creating interactive, browser-based visualizations. It excels in:

Streaming and real-time data

Embedding plots in web apps

High-performance dashboards

18....... Explain the difference between apply() and map() in Pandas
**map()** is used for element-wise transformations on Series (typically with a dict or function).

**apply()** is more flexible, used on Series or DataFrames to apply a function across rows or columns.

18...... What are some advanced features of NumPy?
Advanced NumPy features include:

Broadcasting for efficient operations on arrays of different shapes

Universal functions (ufuncs) for element-wise operations

Masked arrays to handle missing or invalid data

Structured arrays for handling complex records

Memory mapping for working with large datasets without loading them fully into memory

Linear algebra routines, FFT, and random number generation tools

19...... How does Pandas simplify time series analysis?
Pandas offers rich time series support, including:

Datetime indexing and resampling

Frequency conversion (e.g., daily to monthly)

Rolling windows for moving averages and statistics

Date offset and shifting operations

Time zone handling
This allows analysts to handle time-stamped data with minimal effort.



20........ What is the role of a pivot table in Pandas?
Summarizes data by grouping and aggregating (e.g., sum, mean, count).

Allows multi-dimensional grouping (like in Excel pivot tables).

Helps reshape data for analysis or visualization.

Makes it easier to spot trends and patterns by cross-tabulating data.

21...... Why is NumPy’s array slicing faster than Python’s list slicing?
Contiguous memory layout: Arrays are stored in a compact, typed memory block.

Slicing returns views, not copies — no data duplication.

Operations in compiled C code, bypassing Python’s interpreter overhead.

Fixed data type: Reduces overhead from dynamic typing present in lists.

22.......... What are some common use cases for Seaborn?
Exploring distributions: Histograms, KDEs, rug plots.

Comparing groups: Box plots, violin plots, bar plots.

Visualizing relationships: Scatter plots with regression lines, line plots.

Heatmaps: Correlation matrices or pivoted tables.

Multi-plot grids: FacetGrid and pairplot for comparing subsets or variables.



In [None]:
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculate sum of each row
row_sums = np.sum(arr, axis=1)

print("Row sums:", row_sums)


import pandas as pd

# Example DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Find mean of the 'Salary' column
mean_salary = df['Salary'].mean()

print("Mean Salary:", mean_salary)


In [None]:
import matplotlib.pyplot as plt

# Example data
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]

# Create scatter plot
plt.scatter(x, y)
plt.title("Scatter Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()


In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Example DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 2, 3, 2]
})

# Calculate correlation matrix
corr = df.corr()

# Plot heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix Heatmap")
plt.show()


In [None]:
import plotly.express as px

data = {'Fruit': ['Apples', 'Oranges', 'Bananas'], 'Count': [10, 15, 7]}
fig = px.bar(data, x='Fruit', y='Count', title="Fruit Count")
fig.show()


In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df['Age in 5 Years'] = df['Age'] + 5

print(df)


In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = a * b
print("Element-wise multiplication:", result)


In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y1 = [1, 4, 9, 16]
y2 = [1, 2, 3, 4]

plt.plot(x, y1, label='Squared')
plt.plot(x, y2, label='Linear')
plt.legend()
plt.title("Multiple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()


In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 60, 90]})
filtered_df = df[df['Score'] > 80]

print(filtered_df)


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

data = [10, 20, 20, 30, 40, 50, 50, 50, 60]
sns.histplot(data, bins=5, kde=True)
plt.title("Histogram with KDE")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


In [None]:
import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

result = np.dot(A, B)
print("Matrix multiplication result:\n", result)


In [None]:
import pandas as pd

# Replace 'your_file.csv' with the actual file path
df = pd.read_csv('your_file.csv')

print(df.head())


In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 11, 12, 13],
    'z': [100, 110, 120, 130]
})

fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()
