In [None]:
1.What is NumPy, and why is it widely used in Python?
ANS.NumPy (Numerical Python) is a powerful open-source library in Python primarily used for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently.

| Reason                              | Description                                                                                                              |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
|  Efficient Array Operations       | Provides a powerful N-dimensional array object that is faster than lists.                                                |
|  Performance                       | Internally written in C, making array computations extremely fast.                                                       |
|  Rich Mathematical Functions      | Includes advanced mathematical, statistical, and linear algebra functions.                                               |
|  Integration with Other Libraries | Used as the foundation for many libraries like **Pandas**, **SciPy**, **scikit-learn**, **TensorFlow**, and **PyTorch**. |
|  Broadcasting                     | Allows operations on arrays of different shapes without writing loops.                                                   |
|  Ideal for Scientific Computing   | Widely used in data analysis, machine learning, image processing, etc.                                                   |
|  Memory Efficient                 | Consumes less memory than standard Python lists for large datasets.                                                      |


In [None]:
2. How does broadcasting work in NumPy?
ANS. Broadcasting in NumPy is a powerful mechanism that allows arrays of different shapes to be used together in arithmetic operations without needing to manually reshape them.

When performing operations on two arrays, NumPy compares their shapes element-wise from right to left and applies these rules:

If dimensions are equal, they’re compatible.

If one of the dimensions is 1, it is stretched to match the other.

If they don’t match and none is 1, an error is raised.

In [None]:
3. What is a Pandas DataFrame?
ANS.A Pandas DataFrame is a two-dimensional, labeled data structure in Python, similar to a table in a database, Excel spreadsheet, or SQL result set.

It is one of the core data structures in the Pandas library, designed for data analysis and manipulation.
A Pandas DataFrame is a flexible and powerful structure for working with structured data in Python. It is essential for data analysis, cleaning, transformation, and visualization tasks.

In [None]:
4.Explain the use of the groupby() method in Pandas.
ANS.The groupby() method in Pandas is used to split a DataFrame into groups based on the values in one or more columns, and then apply functions (like sum, mean, count, etc.) to each group.

It follows the Split-Apply-Combine strategy:

Split the data into groups

Apply a function to each group

Combine the results into a new DataFrame

| Operation           | Description                          |
| ------------------- | ------------------------------------ |
| `groupby().sum()`   | Sum of each group                    |
| `groupby().mean()`  | Average of each group                |
| `groupby().count()` | Number of non-null entries per group |
| `groupby().max()`   | Max value per group                  |
| `groupby().min()`   | Min value per group                  |
| `groupby().agg()`   | Apply multiple functions             |


In [None]:
5.Why is Seaborn preferred for statistical visualizations?
ANS.Seaborn is preferred for statistical visualizations in Python because it provides high-level, easy-to-use functions for creating attractive and informative graphics, especially when working with pandas DataFrames and statistical data.
Examples of Seaborn Plots
1. Histogram + KDE
import seaborn as sns
sns.histplot(data=df, x='Salary', kde=True)


In [None]:
6.What are the differences between NumPy arrays and Python lists?
ANS.| Feature                       | NumPy Arrays                                          | Python Lists                                |
| ----------------------------- | ---------------------------------------------------------- | ----------------------------------------------- |
|  **Data Type**              | Homogeneous (all elements must be of the same type)        | Heterogeneous (can hold different types)        |
|  **Performance**            | Much faster (written in C, uses vectorized operations)     | Slower due to interpreted loops                 |
|  **Memory Usage**           | More memory-efficient                                      | Higher memory usage                             |
|  **Mathematical Operations** | Supports element-wise operations directly (e.g., `a + b`)  | Must use loops or list comprehensions           |
|  **Functionality**          | Supports broadcasting, slicing, matrix operations, etc.    | Basic data structure, lacks numerical functions |
|  **Looping**                | Avoids loops using vectorized operations                   | Requires explicit `for` loops                   |
|  **Scientific Use**         | Ideal for data analysis, machine learning, and simulations | Not suitable for heavy numerical computation    |
| **Built-in Functions**     | Extensive (e.g., `np.sum`, `np.mean`, `np.dot`, etc.)      | Limited without importing extra libraries       |

   example python list
   a = [1, 2, 3]
b = [4, 5, 6]
c = [x + y for x, y in zip(a, b)]
print(c)  # [5, 7, 9]
NumPy Array:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b
print(c)  # [5 7 9]


In [None]:
7.What is a heatmap, and when should it be used?
ANS.A heatmap is a data visualization technique that uses color gradients to represent the magnitude or intensity of values in a matrix-like structure (rows and columns). It's widely used in data analysis, especially for showing correlations, patterns, or relationships between variables.

Use Case	Description
 Correlation Matrix	Visualize relationships between numerical variables
 Large Data Grids	Show values in a 2D matrix (e.g., confusion matrix, distance matrix)
 Categorical vs Numerical Data	Compare numerical values across categories
 Missing Data Visualization	Identify where data is missing (NaN) in a dataset
 Genomic/Scientific Data	Display intensity or expression levels in biology, chemistry, etc.


In [None]:
8.What does the term “vectorized operation” mean in NumPy?
ANS .The term “vectorized operation” in NumPy refers to the process of performing operations on entire arrays (vectors, matrices, etc.) without using explicit loops in Python.

Instead of looping through elements one by one, NumPy applies operations at the array level, which is faster and more efficient, thanks to its underlying C implementation.

 Examples of Vectorized Operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)  # [5 7 9]

# Element-wise comparison
print(a > 2)  # [False False  True]
# Element-wise square
print(a ** 2)  # [1 4 9]

In [None]:
9.How does Matplotlib differ from Plotly?
ANS. | Feature/Aspect       | **Matplotlib**                                 | **Plotly**                                               |
| -------------------- | ---------------------------------------------- | -------------------------------------------------------- |
| Type**          | Static plotting library                        | Interactive plotting library                             |
| Foundation**    | Base library for many others (like Seaborn)    | Built on D3.js and WebGL for browser-based interactivity |
|Best For**      | Quick, static, publication-quality plots       | Interactive dashboards and web apps                      |
| Interactivity** | Very limited (zoom/pan in some GUIs)           | Fully interactive: hover, zoom, tooltips, click events   |
|Output Format** | Images (PNG, PDF, SVG)                         | HTML, JSON, or embedded web components                   |
| 3D Plots**      | Basic, less intuitive                          | Fully interactive and easier 3D plots                    |
| *Customization** | Highly customizable with low-level control     | Also customizable, but more declarative                  |
|Ease of Use**   | More code-heavy for complex visuals            | Simpler for creating complex visualizations quickly      |
| Integration**   | Widely used with Jupyter, scientific computing | Great for dashboards (Dash), web, and Jupyter            |
|*Licensing**     | Open-source (BSD)                              | Open-source core (MIT), advanced features in Plotly Pro  |


In [None]:
10. What is the significance of hierarchical indexing in Pandas?
ANS. ierarchical indexing (also known as MultiIndexing) in Pandas allows you to have multiple levels of indexes (or labels) on a single axis (rows or columns) of a DataFrame or Series.
This enables you to organize, group, and analyze complex datasets more efficiently.
| Feature                            | Description                                                                            |
| ---------------------------------- | -------------------------------------------------------------------------------------- |
| Multi-level Grouping**        | Enables grouping and sub-grouping (like country → state → city)                        |
| Organizes Complex Data**      | Makes it easier to work with multi-dimensional or panel data                           |
| Efficient Data Selection**    | Allows flexible slicing, filtering, and selection based on multiple levels             |
| Improved Data Aggregation**   | Used with `groupby()` results for grouped summaries with multiple keys                 |
| Foundation for Pivot Tables** | Essential for reshaping and unstacking data with `pivot_table`, `stack()`, `unstack()` |

In [None]:
11.What is the role of Seaborn’s pairplot() function?
ANS.The pairplot() function in Seaborn is a powerful tool for visualizing relationships between multiple variables in a dataset.
It creates a grid of scatter plots for every pair of numerical features.

On the diagonal, it usually displays histograms or KDE plots to show the distribution of each variable.

It helps in performing exploratory data analysis (EDA) by revealing:

Correlations

Clusters

Outliers

Distributions

In [None]:
12.What is the purpose of the describe() function in Pandas?
ANS.The describe() function in Pandas is used to generate summary statistics of a DataFrame or Series. It provides a quick overview of the central tendency, dispersion, and shape of a dataset’s distribution.

To summarize and understand numerical (and optionally categorical) data quickly by providing key statistical metrics.

import pandas as pd

df = pd.DataFrame({
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000]
})

print(df.describe())



In [None]:
13.Why is handling missing data important in Pandas?
ANS. Handling missing data is critically important in Pandas (and data analysis in general) because incomplete or inconsistent data can lead to incorrect results, biased models, and misleading insights.
| Reason                              | Explanation                                                               |
| ----------------------------------- | ------------------------------------------------------------------------- |
|Maintains Data Quality**       | Missing data reduces the reliability of analysis, reporting, and modeling |
|Ensures Accurate Statistics**  | Summary stats like mean, median, std can be distorted by NaNs             |
|Required for ML Algorithms**   | Most machine learning models **do not accept NaNs** and will raise errors |
|Avoids Skewed Visualizations** | Plots and charts can misrepresent data if missing values are not handled  |
|Improves Data Integrity**      | Cleaning missing data makes your dataset consistent and usable            |


In [None]:
14.What are the benefits of using Plotly for data visualization?
ANS. Plotly is a powerful, modern data visualization library in Python that excels in creating interactive, publication-quality plots. It’s widely used in data science, analytics, and web-based dashboards.
| Benefit                                  | Description                                                                              |
| ---------------------------------------- | ---------------------------------------------------------------------------------------- |
|Interactivity**                     | Supports zoom, pan, hover tooltips, and click events **out of the box**                  |
|Wide Range of Chart Types**         | Supports line, bar, scatter, pie, heatmaps, 3D plots, choropleths, maps, etc.            |
|Web-Based Output**                 | Plots render in HTML and are easy to embed in web apps or Jupyter notebooks              |
|Dash Integration**                  | Seamlessly integrates with **Dash** for building full-fledged **interactive dashboards** |
|Responsive Design**                 | Plots automatically resize and adapt to screen size (mobile-friendly)                    |
|*Highly Customizable**               | Detailed control over layout, annotations, colors, and interactivity                     |
|*Export Options**                    | Can save plots as HTML, PNG, SVG, PDF, or static image                                   |
|No JavaScript Required**            | Python developers can create advanced visualizations **without JS coding**               |
|Built-in Aggregation and Faceting** | Easily group data, add facets, and filter visuals for deeper insights                    |
|*Works with Pandas & NumPy**         | Compatible with pandas DataFrames and NumPy arrays                                       |



In [None]:
15. How does NumPy handle multidimensional arrays?
ANS.NumPy handles multidimensional arrays using its powerful ndarray (N-dimensional array) object, which allows you to store and operate on data in 2D, 3D, or even higher dimensions efficiently.

NumPy handles multidimensional arrays using the ndarray object, allowing fast, flexible operations across any number of dimensions — ideal for scientific computing, image processing, machine learning, and more.
A multidimensional array in NumPy is simply an array with more than one axis (dimension). For example:

1D array → vector

2D array → matrix (rows and columns)

3D+ array → tensor (e.g., image stacks, time-series of matrices)



In [None]:
16.What is the role of Bokeh in data visualization?
ANS. Bokeh is a powerful Python library used for creating interactive, browser-based visualizations. Its main goal is to make dynamic and engaging visual analytics that are easy to share as web apps or embedded dashboards
| Purpose                           | Description                                                                |
| --------------------------------- | -------------------------------------------------------------------------- |
| *Interactive Plots**         | Provides zooming, panning, tooltips, sliders, and selection tools          |
|Web Integration**            | Outputs directly to **HTML**, JavaScript, or Jupyter Notebooks             |
|Dashboard Building**         | Can create interactive dashboards with **layouts, widgets, and callbacks |
|Streaming & Real-Time Data | Supports live-updating plots and data streams                              |
|Server Applications        | Bokeh apps can run on a **Bokeh server** to enable Python callbacks        |


In [None]:
17.Explain the difference between apply() and map() in Pandas.
ANS. | Feature           | `map()`                            | `apply()`                                  |
| ----------------- | ---------------------------------- | ------------------------------------------ |
|  Works on       | Series (1D) only                   | Series or DataFrame (1D or 2D)             |
|  Scope          | Element-wise                       | Element-wise **or** row/column-wise        |
|  Output         | Series                             | Can return Series, scalar, DataFrame, etc. |
|  Function types | Accepts dict, Series, or function  | Accepts only function                      |
|  Use Case       | Simple element-wise transformation | Complex row/column-wise computations       |
    example....
Using map() – Element-wise transformation in a Series

 import pandas as pd

s = pd.Series([1, 2, 3, 4])

# Square each element
s.map(lambda x: x**2)
Using apply() on a Series – More flexible than map()
# Add 10 to each element
s.apply(lambda x: x + 10)


In [None]:
18.What are some advanced features of NumPy?
ANS. NumPy offers a range of advanced features that make it a powerful tool for high-performance scientific computing and data analysis. Below are some of the most important and widely used advanced capabilities:

| Feature           | Use Case                                      |
| ----------------- | --------------------------------------------- |
| Broadcasting      | Operations between arrays of different shapes |
| Structured Arrays | Heterogeneous data like rows in a table       |
| Linear Algebra    | Matrix math, decomposition                    |
| Random Sampling   | Simulations and testing                       |
| Memory Views      | Efficient slicing without copying             |
1. Broadcasting
import numpy as np
a = np.array([[1], [2], [3]])
b = np.array([10, 20, 30])
a + b  # Automatically expands dimensions to match

2.Vectorized Operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b  # Element-wise multiplication
3. Memory Views / Slicing Without Copying

a = np.array([1, 2, 3, 4])
b = a[1:3]
b[0] = 99
# a is now [1, 99, 3, 4]


In [None]:
19.How does Pandas simplify time series analysis?
ANS. Date Parsing and Indexing
Pandas can automatically parse date strings and convert them into datetime objects using pd.to_datetime().

You can set a datetime column as the index, allowing intuitive time-based indexing and slicing.
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
 2. Time-Based Slicing
Once the index is set to a datetime, you can easily filter data by year, month, day, etc.
df['2023']            # All data from 2023
df['2023-07']         # All data from July 2023
3. Resampling
The resample() method allows you to change the frequency of time series data (e.g., daily to monthly) and apply aggregation functions like mean, sum, etc.
df.resample('M').mean()  # Monthly average
df.resample('W').sum()   # Weekly sum
4.Lag and Differencing
shift() lets you create lag features for modeling.

diff() helps remove trends or seasonality to make data stationary
df['lag_1'] = df['value'].shift(1)
df['diff'] = df['value'].diff()



In [None]:
20.What is the role of a pivot table in Pandas?
ANS.Role of a Pivot Table in Pandas:
 1. Data Summarization
It helps you aggregate data using functions like mean(), sum(), count(), etc., across one or more categories.
pd.pivot_table(data, values='sales', index='region', columns='product', aggfunc='sum')
2. Multi-Dimensional Grouping
Pivot tables allow you to group data by rows (index) and columns, creating a two-dimensional summary.

index → rows (e.g., region)

columns → columns (e.g., year)

values → values to aggregate (e.g., revenue)

3. Flexible Aggregation Functions
You can use any aggregation function:

'mean', 'sum', 'count', 'min', 'max', etc.

Or even custom functions using aggfunc.
4. Handling Missing Data
You can fill missing values using the fill_value parameter:
pd.pivot_table(df, ..., fill_value=0)

  Example:
  import pandas as pd

data = {
    'region': ['East', 'West', 'East', 'West'],
    'product': ['A', 'A', 'B', 'B'],
    'sales': [100, 200, 150, 250]
}

df = pd.DataFrame(data)

pivot = pd.pivot_table(df, values='sales', index='region', columns='product', aggfunc='sum')



In [None]:
21.Why is NumPy’s array slicing faster than Python’s list slicing?
ANS. 1. Homogeneous Data Type
NumPy arrays store data in a contiguous block of memory with fixed, homogeneous data types (e.g., all int32 or float64).

Python lists are heterogeneous, meaning each element is a separate Python object with its own memory reference.

 Result: NumPy accesses memory more efficiently, enabling faster computation and slicing.
 2. Slicing Creates Views, Not Copies
NumPy slicing returns a view (not a copy) of the original array whenever possible.

Changes to the slice reflect in the original array.

Python list slicing creates a new list (a copy), which takes extra time and memory.
import numpy as np
a = np.array([1, 2, 3, 4])
b = a[1:3]   # view, not a copy
3. Optimized C Backend
NumPy is implemented in C and uses vectorized operations under the hood.

Python lists are handled by the interpreter, and operations are performed element by element (loop-based).

 Result: NumPy operations, including slicing, avoid Python-level loops and overhead.
 4. Lower Memory Overhead
NumPy arrays store just the raw data and metadata (like shape, dtype, strides).

Python lists store full Python objects and pointers, which adds significant overhead.

In [None]:
22.What are some common use cases for Seaborn?
ANS. Seaborn is a powerful Python data visualization library built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics.
1. Exploratory Data Analysis (EDA)
Quickly understand relationships, distributions, and patterns in the dataset.

Commonly used in the early stages of data science and machine learning projects
sns.pairplot(df)
sns.heatmap(df.corr())
2. Visualizing Distributions
Understand the spread, skewness, and modality of a single variable.
sns.histplot(df['age'])
sns.kdeplot(df['salary'])
3. Comparing Categories
Compare numerical data across categories using:

Bar plots, box plots, violin plots, strip plots.
sns.boxplot(x='gender', y='income', data=df)
sns.violinplot(x='region', y='sales', data=df)
4. Analyzing Relationships Between Variables
Visualize the correlation or trends between two variables.
sns.scatterplot(x='age', y='salary', data=df)
sns.regplot(x='experience', y='income', data=df)  # adds a regression line



In [None]:
                                                    {Practical}

In [None]:
1. How do you create a 2D NumPy array and calculate the sum of each row?
ANS. Step-by-Step Exa
import numpy as np

# Step 1: Create a 2D NumPy array
array_2d = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Step 2: Calculate the sum of each row
row_sums = np.sum(array_2d, axis=1)

# Output the result
print("2D Array:")
print(array_2d)
print("Sum of each row:", row_sums)
 Explanation:
np.array([...]): Creates a 2D array.

np.sum(array_2d, axis=1):

axis=1 means sum across columns, so you get the row-wise sum.

 Output:
2D Array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Sum of each row: [ 6 15 24]

In [None]:
2.Write a Pandas script to find the mean of a specific column in a DataFrame.
ANS.Example Script
import pandas as pd

# Sample data
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate the mean of the 'salary' column
mean_salary = df['salary'].mean()

# Output the result
print("Mean salary:", mean_salary)
df['salary']: Accesses the specific column.

.mean(): Computes the average of the values in that column.

In [None]:
3. Create a scatter plot using Matplotlib.
ANS.import matplotlib.pyplot as plt

# Sample data
x = [10, 20, 30, 40, 50]
y = [15, 25, 35, 30, 50]

# Create scatter plot
plt.scatter(x, y, color='blue', marker='o', label='Data Points')

# Add labels and title
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('Simple Scatter Plot')
plt.legend()

# Show plot
plt.show()


In [None]:
4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
ANS. To calculate the correlation matrix and visualize it with a heatmap using Seaborn, follow these steps:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample DataFrame
data = {
    'age': [25, 30, 45, 35, 40],
    'income': [50000, 60000, 80000, 75000, 90000],
    'expenses': [20000, 25000, 30000, 28000, 35000]
}

df = pd.DataFrame(data)

# Step 1: Calculate correlation matrix
correlation_matrix = df.corr()

# Step 2: Plot the heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

# Step 3: Display the plot
plt.title("Correlation Matrix Heatmap")
plt.show()
df.corr() – Computes the correlation matrix (Pearson correlation by default).

sns.heatmap() – Creates a heatmap:

annot=True – shows values inside boxes.

cmap='coolwarm' – sets the color scale.

fmt=".2f" – formats numbers to 2 decimal places.

plt.show() – Displays the plot.

In [None]:
5.Generate a bar plot using Plotly.
ANS. import plotly.express as px

# Sample data
data = {
    'Category': ['A', 'B', 'C', 'D'],
    'Values': [10, 25, 15, 30]
}

# Create DataFrame
import pandas as pd
df = pd.DataFrame(data)

# Create bar plot
fig = px.bar(df, x='Category', y='Values', title='Bar Plot Example')

# Show plot
fig.show()
px.bar() creates a bar chart.

x='Category', y='Values' define the axes.

title adds a title to the chart.

fig.show() opens an interactive chart in your browser or notebook.

In [None]:
6. Create a DataFrame and add a new column based on an existing column.
ANS.import pandas as pd

# Step 1: Create a sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'score': [85, 92, 78]
}

df = pd.DataFrame(data)

# Step 2: Add a new column based on 'score'
# Example: Grade based on score
df['grade'] = df['score'].apply(
    lambda x: 'A' if x >= 90 else 'B' if x >= 80 else 'C'
)

# Display the DataFrame
print(df)
Output:
     name  score grade
0   Alice     85     B
1     Bob     92     A
2 Charlie     78     C

In [None]:
7.Write a program to perform element-wise multiplication of two NumPy arrays.
ANS.import numpy as np

# Step 1: Create two NumPy arrays
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Step 2: Element-wise multiplication
result = a * b

# Step 3: Display the result
print("Array A:", a)
print("Array B:", b)
print("Element-wise multiplication:", result)
Output:
Array A: [1 2 3 4]
Array B: [5 6 7 8]
Element-wise multiplication: [ 5 12 21 32]
a * b performs element-wise multiplication (i.e., a[i] * b[i]) for all elements.

Both arrays must be of the same shape or broadcastable shapes.

In [None]:
8.Create a line plot with multiple lines using Matplotlib.
ANS. import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 20, 25, 30]
y2 = [5, 10, 15, 20, 25]
y3 = [2, 4, 6, 8, 10]

# Create the line plot with multiple lines
plt.plot(x, y1, label='Line 1', color='blue', linestyle='-', marker='o')
plt.plot(x, y2, label='Line 2', color='green', linestyle='--', marker='s')
plt.plot(x, y3, label='Line 3', color='red', linestyle='-.', marker='^')

# Add titles and labels
plt.title('Multiple Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')

# Add legend
plt.legend()

# Show the plot
plt.grid(True)
plt.show()
plt.plot() is called multiple times to draw different lines.

label, color, linestyle, and marker customize each line.

plt.legend() adds a legend to identify each line.

plt.grid(True) adds grid lines for better readability.

In [None]:
9.Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.
ANS. import pandas as pd

# Step 1: Create a sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'score': [85, 72, 90, 60]
}

df = pd.DataFrame(data)

# Step 2: Filter rows where 'score' > 75
filtered_df = df[df['score'] > 75]

# Step 3: Display the result
print("Original DataFrame:\n", df)
print("\nFiltered DataFrame (score > 75):\n", filtered_df)
output:Original DataFrame:
      name  score
0   Alice     85
1     Bob     72
2 Charlie     90
3   David     60

Filtered DataFrame (score > 75):
      name  score
0   Alice     85
2 Charlie     90


In [None]:
10.Create a histogram using Seaborn to visualize a distribution.
ANS. import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Sample data
data = {
    'age': [22, 25, 27, 30, 32, 35, 36, 38, 40, 42, 45, 48, 50, 55, 60]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create histogram
sns.histplot(df['age'], bins=10, kde=True, color='skyblue')

# Add labels and title
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')

# Show the plot
plt.show()


In [None]:
11. Perform matrix multiplication using NumPy.
ANS.import numpy as np

# Step 1: Define two matrices
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

# Step 2: Perform matrix multiplication
result = np.matmul(A, B)  # or use A @ B

# Step 3: Display the result
print("Matrix A:\n", A)
print("Matrix B:\n", B)
print("Matrix Multiplication Result:\n", result)
Matrix A:
 [[1 2]
  [3 4]]
Matrix B:
 [[5 6]
  [7 8]]
Matrix Multiplication Result:
 [[19 22]
  [43 50]]
np.matmul(A, B) or A @ B performs matrix multiplication (not element-wise).

Shapes must align: if A is of shape (m, n), then B must be of shape (n, p).

In [None]:
12. Use Pandas to load a CSV file and display its first 5 rows.
ANS.import pandas as pd

# Step 1: Load the CSV file
df = pd.read_csv('your_file.csv')  # Replace 'your_file.csv' with the path to your CSV file

# Step 2: Display the first 5 rows
print(df.head())
pd.read_csv() loads the CSV into a DataFrame.

df.head() shows the first 5 rows by default.

In [None]:
13.Create a 3D scatter plot using Plotly.
ANS.import plotly.express as px
import pandas as pd

# Sample data
data = {
    'x': [1, 2, 3, 4, 5],
    'y': [10, 15, 13, 17, 20],
    'z': [5, 8, 6, 9, 12],
    'label': ['A', 'B', 'C', 'D', 'E']
}

df = pd.DataFrame(data)

# Create 3D scatter plot
fig = px.scatter_3d(df, x='x', y='y', z='z', text='label',
                    color='label', title='3D Scatter Plot Example')

# Show the plot
fig.show()
px.scatter_3d() creates a 3D scatter plot.

x, y, z: Coordinates for each point.

color='label': Differentiates points by color.

text='label': Shows text labels on hover.

fig.show(): Renders the interactive 3D plot.

