In [None]:


### Theoretical Questions

#1. What is NumPy, and why is it widely used in Python?
"""NumPy is a Python library for numerical computing, providing support for large, multidimensional arrays and matrices,
 along with mathematical functions to operate on them. It is widely used due to its efficiency, vectorized operations, broadcasting capabilities,
 and seamless integration with libraries like Pandas, SciPy, and Matplotlib for scientific computing and data analysis."""

#2. How does broadcasting work in NumPy?
"""Broadcasting in NumPy allows operations on arrays of different shapes by automatically expanding smaller arrays
 to match the shape of larger ones without copying data. For example, adding a scalar to an array applies the scalar to each element,
  or adding a 1D array to a 2D array expands the 1D array across rows or columns."""

#3. What is a Pandas DataFrame?
"""A Pandas DataFrame is a 2D, tabular data structure with labeled rows and columns, similar to a spreadsheet or SQL table.
 It supports heterogeneous data types and provides powerful methods for data manipulation, filtering, grouping, and analysis."""

#4. Explain the use of the groupby() method in Pandas?
"""The `groupby()` method in Pandas groups data by one or more columns and applies an aggregation function
(e.g., mean, sum, count) to each group. It’s used for summarizing data, such as calculating average sales by region or counting occurrences by category."""

#5. Why is Seaborn preferred for statistical visualizations?
"""Seaborn is preferred for statistical visualizations because it builds on Matplotlib, offering a high-level interface,
  aesthetically pleasing defaults, and specialized functions like `pairplot`, `heatmap`, and `boxplot` for visualizing statistical relationships,
  distributions, and categorical data."""

#6. What are the differences between NumPy arrays and Python lists?
"""NumPy arrays are fixed-size, homogeneous (same data type), memory-efficient, and support vectorized operations and multidimensional structures.
 Python lists are dynamic, heterogeneous (mixed data types), less memory-efficient, and lack built-in vectorized operations, typically used for 1D data."""

#7. What is a heatmap, and when should it be used?
"""A heatmap is a visualization where data values are represented as colors in a matrix. It’s used to show relationships
 (e.g., correlations between variables), intensity of values (e.g., frequency in a dataset), or patterns in data, such as in correlation matrices or geographic data analysis."""

#8. What does the term “vectorized operation” mean in NumPy?
"""A vectorized operation in NumPy is an operation applied element-wise to entire arrays without explicit loops, leveraging optimized C-based computations for speed.
 For example, `array1 + array2` adds corresponding elements of two arrays efficiently."""

#9. How does Matplotlib differ from Plotly?
"""Matplotlib is a static, highly customizable plotting library ideal for publication-quality plots but requires more code for interactivity.
 Plotly is an interactive, web-based library supporting dynamic visualizations, 3D plots, and dashboards, making it easier for online sharing and interactive applications."""

#10. What is the significance of hierarchical indexing in Pandas?
"""Hierarchical indexing (MultiIndex) in Pandas allows multiple levels of row or column indices, enabling complex data organization and analysis.
 It’s useful for grouping data by multiple categories, such as sales by year and region, facilitating advanced slicing and aggregation."""

#11. What is the role of Seaborn’s pairplot() function?
"""Seaborn’s pairplot() creates a grid of scatter plots for pairwise relationships between variables,
 with histograms or KDE plots on the diagonal. It’s used to explore correlations, distributions, and relationships in a dataset,
  especially during exploratory data analysis."""

#12. What is the purpose of the describe() function in Pandas?
"""The describe() function in Pandas provides summary statistics for numeric columns in a DataFrame,
including count, mean, standard deviation, min, max, and quartiles. It’s used for quick data exploration and understanding the distribution of data."""

#13. Why is handling missing data important in Pandas?
"""Handling missing data in Pandas is crucial to avoid biased results, errors in calculations, or model failures.
 Pandas provides methods like fillna(), dropna(), and interpolation to manage missing values, ensuring accurate data analysis and modeling."""

#14. What are the benefits of using Plotly for data visualization?
"""Plotly offers interactive plots with zoom, pan, and hover capabilities, web-based visualizations for easy sharing,
 support for 3D plots and dashboards, and cross-platform compatibility with Python, R, and JavaScript, making it ideal for dynamic and online applications."""

#15. How does NumPy handle multidimensional arrays?
"""NumPy stores multidimensional arrays (ndarrays) as contiguous memory blocks, enabling efficient indexing, slicing, and operations.
 It supports N-dimensional arrays with attributes like shape, ndim, and dtype, optimized for numerical computations."""

#16. What is the role of Bokeh in data visualization?
"""Bokeh is a Python library for creating interactive, web-based visualizations. It’s ideal for building dashboards,
 real-time data applications, and complex plots rendered in browsers using JavaScript, supporting dynamic and scalable visualizations."""

#17. Explain the difference between apply() and map() in Pandas?
"""apply() operates on entire DataFrame rows/columns or Series, applying a function to each element or axis, suitable
for complex operations. `map()` operates only on Series, applying a function or dictionary mapping to each element, ideal for simple transformations."""

#18. What are some advanced features of NumPy?
"""NumPy’s advanced features include broadcasting for shape-compatible operations, advanced
 indexing (boolean, fancy indexing), linear algebra functions (np.linalg), random number generation (np.random),
 and Fast Fourier Transforms (FFT) for signal processing."""

#19. How does Pandas simplify time series analysis?
"""Pandas simplifies time series analysis with datetime indexing, time-based slicing, resampling (e.g., daily to monthly),
 rolling/expanding windows for moving calculations, and time zone handling, making it easy to manipulate and analyze temporal data."""

#20. What is the role of a pivot table in Pandas?
"""A pivot table in Pandas (pivot_table()) reshapes data and aggregates values based on specified rows, columns, and values.
 It’s used for summarizing data, such as calculating average sales by region and product category."""

#21. Why is NumPy’s array slicing faster than Python’s list slicing?
"""NumPy’s array slicing is faster because arrays are stored in contiguous memory with fixed data types,
  enabling efficient access and operations. Python lists are dynamic, store references, and require more overhead, making slicing slower."""

#22. What are some common use cases for Seaborn?
"""Common use cases for Seaborn include visualizing distributions (histograms, KDE plots), exploring relationships (scatter plots, pairplots),
 creating heatmaps for correlations, and plotting categorical data (box plots, violin plots, bar plots) for statistical analysis."""



### Practical Questions

#1. How do you create a 2D NumPy array and calculate the sum of each row?


import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

row_sums = np.sum(array_2d, axis=1)

print("2D Array:\n", array_2d)
print("Row Sums:", row_sums)

array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


row_sums = np.sum(array_2d, axis=1)

print("2D Array:\n", array_2d)
print("Row Sums:", row_sums)


array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


row_sums = np.sum(array_2d, axis=1)

print("2D Array:\n", array_2d)
print("Row Sums:", row_sums)


#2. Write a Pandas script to find the mean of a specific column in a DataFrame?


import pandas as pd


data = {'name': ['Alice', 'Bob', 'Charlie'], 'score': [85, 90, 95]}
df = pd.DataFrame(data)

mean_score = df['score'].mean()

print("DataFrame:\n", df)
print("Mean of score column:", mean_score)


#3. Create a scatter plot using Matplotlib?

import matplotlib.pyplot as plt


x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

plt.scatter(x, y, color='blue', label='Data Points')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.legend()
plt.show()


#4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {'A': [1, 2, 3, 4, 5], 'B': [2, 4, 5, 4, 5], 'C': [5, 4, 3, 2, 1]}
df = pd.DataFrame(data)

corr = df.corr()

sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix Heatmap')
plt.show()


#5. Generate a bar plot using Plotly?

import plotly.graph_objects as go


categories = ['A', 'B', 'C']
values = [10, 20, 15]


fig = go.Figure(data=[go.Bar(x=categories, y=values, marker_color='skyblue')])
fig.update_layout(title='Bar Plot', xaxis_title='Categories', yaxis_title='Values')
fig.show()


#6. Create a DataFrame and add a new column based on an existing column?

import pandas as pd

data = {'score': [85, 90, 95]}
df = pd.DataFrame(data)


df['grade'] = df['score'].apply(lambda x: 'A' if x >= 90 else 'B')

print("DataFrame with new column:\n", df)


#7. Write a program to perform element-wise multiplication of two NumPy arrays?
import numpy as np


array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])


result = array1 * array2

print("Array 1:", array1)
print("Array 2:", array2)
print("Element-wise multiplication:", result)



#8. Create a line plot with multiple lines using Matplotlib?

import matplotlib.pyplot as plt


x = [1, 2, 3, 4]
y1 = [10, 20, 25, 30]
y2 = [15, 25, 20, 10]


plt.plot(x, y1, label='Line 1', color='blue')
plt.plot(x, y2, label='Line 2', color='red')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Multiple Line Plot')
plt.legend()
plt.show()


#9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold?

import pandas as pd


data = {'name': ['Alice', 'Bob', 'Charlie'], 'score': [85, 90, 95]}
df = pd.DataFrame(data)


filtered_df = df[df['score'] > 88]

print("Filtered DataFrame:\n", filtered_df)

#10. Create a histogram using Seaborn to visualize a distribution?

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt


data = np.random.randn(1000)

sns.histplot(data, bins=30, kde=True, color='purple')
plt.title('Histogram with KDE')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()


#11. Perform matrix multiplication using NumPy?

import numpy as np
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])


result = np.dot(matrix1, matrix2)

print("Matrix 1:\n", matrix1)
print("Matrix 2:\n", matrix2)
print("Matrix Multiplication Result:\n", result)


#12. Use Pandas to load a CSV file and display its first 5 rows?

import pandas as pd


data = {'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
df.to_csv('sample.csv', index=False)


df = pd.read_csv('sample.csv')
print("First 5 rows:\n", df.head())


#13. Create a 3D scatter plot using Plotly?

import plotly.graph_objects as go
import numpy as np


x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)


fig = go.Figure(data=[go.Scatter3d(x=x, y=y, z=z, mode='markers',
                                   marker=dict(size=5, color=z, colorscale='Viridis'))])
fig.update_layout(title='3D Scatter Plot', scene=dict(xaxis_title='X', yaxis_title='Y', zaxis_title='Z'))
fig.show()

