In [None]:
""" Q1. What is NumPy, and why is it widely used in Python?
NumPy (Numerical Python) is the foundational package for scientific computing in Python. It provides:

An ndarray object for efficient, homogeneous, multi­dimensional arrays

Vectorized operations, broadcasting rules, and element­-wise calculations

A rich collection of mathematical functions (linear algebra, Fourier transforms, random simulations)

Because NumPy arrays reside in contiguous memory, support fast C-level loops, and integrate seamlessly with other scientific libraries (SciPy, pandas, scikit-learn), they deliver dramatic performance gains over pure Python lists for numerical work.



Q2. How does broadcasting work in NumPy?
Broadcasting describes how NumPy treats arrays of different shapes during arithmetic operations:

If dimensions differ, NumPy “stretches” the smaller array along the dimension of size 1 so that both operands have compatible shapes

Two dimensions are compatible when they are equal or one of them is 1

Missing dimensions are implicitly treated as size 1 on the left

Example:   """

#python
import numpy as np

a = np.array([[1, 2, 3],
              [4, 5, 6]])    # shape (2, 3)
b = np.array([10, 20, 30])   # shape   (3,)
c = a + b                    # b is broadcast to [[10,20,30],
                             #                 [10,20,30]]
"""  Q3. What is a Pandas DataFrame?
A Pandas DataFrame is a two­dimensional, size­-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

Each column is a Series sharing the same index

Supports automatic alignment, missing data handling, and heterogeneous dtypes

Ingests data from CSV, SQL, Excel, JSON, and more



Q4. Explain the use of the groupby() method in Pandas?
DataFrame.groupby() implements the “split-apply-combine” paradigm:

Split the data into groups based on one or more keys (columns)

Apply an aggregation, transformation, or filtering function to each group independently

Combine the results into a new DataFrame or Series

This method enables tasks like computing per‐category sums, means, counts, custom aggregates, or filtering groups by size or value.




Q5. Why is Seaborn preferred for statistical visualizations?
Seaborn, built atop Matplotlib and integrated with Pandas, excels at statistical plotting because it provides:

High­level functions for common statistical plots (boxplot, violinplot, heatmap, pairplot)

Beautiful default themes and color palettes tuned for statistical clarity

Automatic handling of DataFrame semantics (categorical ordering, NaNs)

Built­in support for visualizing distributions, regression lines, error bars, and faceting



Q6. What are the differences between NumPy arrays and Python lists?
Homogeneity

NumPy arrays are homogeneous (all elements share one dtype)

Python lists can contain mixed types

Memory layout

NumPy arrays occupy contiguous blocks of memory (better cache locality)

Lists store pointers to separate objects scattered in memory

Performance

NumPy operations execute in optimized C loops (vectorized)

List operations require Python interpreter overhead per element

Functionality

NumPy offers broadcasting, slicing, indexing, linear algebra, and more

Lists provide general-purpose sequences without numeric optimizations



Q7. What is a heatmap, and when should it be used?
A heatmap is a graphical representation of data where values are encoded as colors in a matrix:

Ideal for visualizing large, dense tables of numbers

Highlights patterns, correlations, clusters, or outliers at a glance

Commonly used for correlation matrices, geospatial density, and time–category pivot tables

Use when you need to compare values across two categorical axes and spot areas of high/low intensity quickly




Q8. What does the term “vectorized operation” mean in NumPy?
Vectorized operations refer to applying arithmetic or boolean operations directly to entire arrays without explicit Python loops. Behind the scenes, NumPy dispatches to optimized C code that processes all elements in a compiled loop.

Leads to concise code in idiomatic mathematical notation

Minimizes Python interpreter overhead

Enables use of CPU SIMD instructions and low‐level optimizations




Q9. How does Matplotlib differ from Plotly?
Matplotlib

Primarily static, publication­quality 2D (and limited 3D) plots

Highly customizable down to every artist, but often verbose

Integrated in offline scripts and Jupyter notebooks

Plotly

Interactive, web-friendly charts (pan, zoom, hover, click events)

Rich palette of chart types (3D, geographic maps, animations)

Easy embedding in Dash apps or HTML pages for dashboards

Use Matplotlib for static figures and fine‐grained custom control. Use Plotly when interactivity and web deployment are priorities.



Q 10. What is the significance of hierarchical indexing in Pandas?
Hierarchical indexing (MultiIndex) allows multiple index levels on an axis, enabling:

Compact representation of higher-dimensional data in a 2-D DataFrame or 1-D Series

Grouping and aggregation by multiple categorical keys simultaneously

Powerful reshaping (stack/unstack), slicing, and cross‐tabulation operations

Cleaner multi­level pivot tables and time series with multiple key levels




Q11. What is the role of Seaborn’s pairplot() function?
seaborn.pairplot() creates a matrix of plots showing pairwise relationships among variables in a DataFrame:

Diagonal: univariate distributions (histogram or KDE)

Off-diagonals: scatterplots or density plots

Useful for quick exploratory analysis of correlations and distribution patterns across multiple numeric features





Q12. What is the purpose of the describe() function in Pandas?
DataFrame.describe() generates descriptive statistics summarizing each numeric column by default:

count, mean, std, min, quartiles (25%, 50%, 75%), and max

With include='all', also summarizes object and category columns (unique, top, freq)

Provides a fast, one-line summary of data distribution and spread for exploratory data analysis





Q13. Why is handling missing data important in Pandas?
Missing values (NaN, None) can skew analysis, break algorithms, and lead to misleading conclusions. Pandas offers:

Detection (isna(), notna())

Removal (dropna() for rows/columns)

Imputation (fillna() with constants, forward/backward fill, or statistical values)

Interpolation and combining methods

Proper handling ensures data integrity, accurate modeling, and reliable insights.





Q14. What are the benefits of using Plotly for data visualization?
Plotly shines because it provides:

Interactive charts with zoom, pan, hover tooltips, and clickable legends

Support for advanced chart types (3D surfaces, geographic maps, sankey diagrams)

Seamless integration with web frameworks (Dash) for live dashboards

Built-in export to static images (PNG, SVG, PDF) or HTML embeds

An open, customizable API for styling and event handling

Q15. How does NumPy handle multidimensional arrays?
NumPy’s core ndarray supports N-dimensional arrays:

Defined by a shape tuple of lengths per dimension

strides specify byte offsets to traverse dimensions

Homogeneous datatypes (dtype) ensure fixed element size

Provides fast indexing, slicing, and matrix operations for up to 32-D arrays

Under the hood, everything maps to a contiguous block of memory with efficient C loops




Q16. What is the role of Bokeh in data visualization?
Bokeh is a Python library for creating interactive, web-ready visualizations:

Renders plots in modern browsers using HTML, CSS, and JavaScript

Supports streaming and real-time data updates

Offers server support for interactive dashboards (Bokeh server)

Provides rich widgets (sliders, dropdowns) for user-driven data exploration

Integrates with Jupyter notebooks and Flask/Django for embedding





Q17. Explain the difference between apply() and map() in Pandas?
Series.map()

Applies a scalar function, dictionary, or Series mapping to each element of a Series

Returns a new Series of the same shape

Only available on Series

Series.apply() / DataFrame.apply()

Applies a function row- or column-wise on a DataFrame (or element-wise on a Series)

Can return a Series, DataFrame, or scalar depending on the function

Works on both Series (element-wise) and DataFrame (axis-wise)

Use map() for simple element remapping, apply() for arbitrary row/column or element logic.

  
  
  Q18. What are some advanced features of NumPy?
Masked and structured arrays for hetero­geneous data

Universal functions (ufuncs) with custom loop definitions and broadcasting

Fast Fourier transform (numpy.fft) and convolution

Linear algebra routines (numpy.linalg), eigenvalues, and solvers

Random sampling with numpy.random.Generator and reproducible streams

Integration with C/Fortran via f2py and memory views

Memory-mapped arrays (np.memmap) for out-of-core data handling




Q19. How does Pandas simplify time series analysis?
Pandas provides built-in time series tools:

DatetimeIndex and time-aware indexing/slicing (df['2020':'2021'])

Resampling (resample('M').mean(), asfreq()) for frequency conversion

Rolling and expanding windows for moving statistics (rolling(window=7).mean())

Date offset aliases ('D', 'H', 'Q') and date range generation (date_range())

Built-in support for time zones, business days, and period indexing





Q20. What is the role of a pivot table in Pandas?
pivot_table() reshapes data into a spreadsheet‐style summary:

Groups values by one or more index and columns keys

Aggregates with functions (mean, sum, count, custom dict)

Supports fill_value for missing cells and margins for subtotals

Returns a DataFrame with a hierarchical index for multi-level summaries




Q21. Why is NumPy’s array slicing faster than Python’s list slicing?
Contiguity: NumPy slices create views referencing the same contiguous memory block, avoiding data copies

Strided indexing: Only pointer arithmetic updates strides, not element reallocation

Vectorized operations: Subsequent operations on slices run in C loops without Python overhead

Memory locality: Cached CPU access speeds up sequential reads

In contrast, Python list slicing must allocate a new list and copy each element with interpreter overhead.





Q22. What are some common use cases for Seaborn?
Exploratory Data Analysis (EDA): Quick distribution and relationship checks (pairplot, distplot)

Statistical plots: Boxplots, violin plots, and categorical comparisons (boxplot, catplot)

Correlation heatmaps: Visualize variable interdependencies (heatmap)

Regression analysis: Fit and visualize trend lines (lmplot, regplot)

Faceting: Multi-panel grids to compare subsets (FacetGrid, relplot)

Seaborn streamlines the creation of publication-ready statistical graphics with minimal code."""





#PRACTICAL QUESTIONS



#  Q1 Create a 2D NumPy array and calculate the sum of each row

#python
import numpy as np

# Create a 2D array from nested lists
arr = np.array([[0, 1, 2],
                [3, 4, 5],
                [6, 7, 8]])

# Sum across each row (axis=1)
row_sums = arr.sum(axis=1)
print(row_sums)  # Output: [ 3 12 21]
Here, axis=1 tells NumPy to collapse each row into its sum, producing a 1D array where each element is the sum of one row.

#    Q2 Write a Pandas script to find the mean of a specific column in a DataFrame

#python
import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [1.5, 2.5, 3.5, 4.5]
})

# Compute mean of column 'B'
mean_b = df['B'].mean()
print(mean_b)  # Output: 3.0
The mean() method on a Series returns the average of its values, ignoring missing data by default.



#    Q3 Create a scatter plot using Matplotlib

#python
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

plt.scatter(x, y)
plt.title('Basic Scatter Plot')
plt.xlabel('X Values')
plt.ylabel('Y Values')
plt.show()
plt.scatter() plots each (x, y) pair as a point on a 2D plane, ideal for exploring relationships between two numeric variables.

#   Q4 How do you Calculate the correlation matrix using Seaborn and visualize it with a heatmap

#python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [2, 3, 4, 5]
})

# Compute correlation matrix
corr = df.corr()

# Plot heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()
df.corr() computes pairwise Pearson correlations; sns.heatmap() then renders the matrix as a color‐coded grid, with annotations for exact values.

#    Q5 Generate a bar plot using Plotly

#python
import plotly.express as px

# Sample data
data = {'category': ['A', 'B', 'C'], 'value': [10, 20, 15]}
df = px.data.tips()  # or your own DataFrame

# Simple bar chart
fig = px.bar(data_frame=df, x='category', y='value', title='Bar Plot')
fig.show()
Plotly Express’s px.bar() takes a DataFrame and column names for x and y, producing an interactive bar chart that can be embedded or exported.

#    Q6 Create a DataFrame and add a new column based on an existing column

python
import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'price': [100, 200, 300],
    'tax_rate': [0.1, 0.2, 0.15]
})

# New column = price + tax
df['total'] = df['price'] * (1 + df['tax_rate'])
print(df)
#You can perform element-wise arithmetic on Series to derive new columns. Here df['total'] holds the post‐tax price for each row.

#     Q7 Write a program to perform element-wise multiplication of two NumPy arrays

#python
import numpy as np

x1 = np.array([1, 2, 3])
x2 = np.array([4, 5, 6])

# Element-wise product
product = x1 * x2
print(product)  # Output: [ 4 10 18]

# Or equivalently:
product2 = np.multiply(x1, x2)
print(product2)  # Output: [ 4 10 18]
Using * on ndarrays or np.multiply() multiplies corresponding elements to produce a new array of the same shape.

#     Q8 Create a line plot with multiple lines using Matplotlib

#python
import matplotlib.pyplot as plt

# Sample data
x = [0, 1, 2, 3, 4]
y1 = [0, 1, 4, 9, 16]
y2 = [0, 1, 2, 3, 4]

plt.plot(x, y1, label='Squares')
plt.plot(x, y2, label='Linear')
plt.title('Multiple Lines')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
#Calling plt.plot() multiple times on the same axes overlays each line; plt.legend() then generates a legend mapping labels to lines.

#     Q9 Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

#python
import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'age': [18, 25, 30, 40],
    'score': [88, 92, 85, 90]
})

# Filter rows where score > 90
filtered = df[df['score'] > 90]
print(filtered)
Boolean indexing with df[condition] returns only the rows where the condition is True, here picking scores above 90.

#    Q10Create a histogram using Seaborn to visualize a distribution

#python
import seaborn as sns
import numpy as np

# Generate data
data = np.random.normal(loc=0, scale=1, size=500)

# Plot histogram
sns.histplot(data, bins=30, kde=True)
histplot() draws a histogram of the data; adding kde=True overlays a smooth density estimate for deeper insight into distribution shape.

#    Q11 Perform matrix multiplication using NumPy

#python
import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix product
C = np.dot(A, B)
# Or equivalently in newer Python
C2 = A @ B

print(C)   # [[19 22]
           #  [43 50]]
#Use np.dot() or the @ operator to carry out true matrix multiplication, summing products of row–column pairs to form the result matrix.

#     Q12 Use Pandas to load a CSV file and display its first 5 rows

python
import pandas as pd

# Read CSV into DataFrame
df = pd.read_csv('data.csv')

# Show top 5 rows
print(df.head())
pd.read_csv() loads a CSV file into a DataFrame; df.head() then prints the first five rows by default for a quick preview.

#    Q13 Create a 3D scatter plot using Plotly

python
import plotly.express as px

# Use built-in iris dataset
df = px.data.iris()

# 3D scatter plot
fig = px.scatter_3d(
    df,
    x='sepal_length',
    y='sepal_width',
    z='petal_length',
    color='species',
    title='3D Iris Scatter'
)
fig.show()
px.scatter_3d() extends 2D scatter logic into three dimensions, allowing interactive rotation and zoom to explore clusters in 3D space