Data Toolkit

1. What is NumPy, and why is it widely used in Python ?
- NumPy (Numerical Python) is an open-source library for the Python programming language, widely used for scientific computing. It provides support for large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. Here's why it's so widely used:

- a . Efficient Array Operations: NumPy allows for the creation and manipulation of large arrays with high performance. It is highly optimized for numerical operations and offers significant speed improvements over traditional Python lists.

- b . Multi-dimensional Arrays: NumPy introduces the ndarray object, which is a powerful N-dimensional array. It allows you to perform mathematical operations on entire arrays of data, without the need for explicit loops, which enhances both code readability and performance.

- c . Vectorization: NumPy supports vectorized operations, meaning that operations on entire arrays are implemented in compiled C code, avoiding the need for Python loops and improving performance.

- d . Mathematical Functions: NumPy includes a wide array of mathematical functions for linear algebra, Fourier transforms, statistical analysis, and more, making it an essential tool in data science, machine learning, and scientific computing.

- e . Interoperability: NumPy arrays are compatible with other scientific libraries like SciPy, Pandas, and Matplotlib, which makes it easy to use in the broader Python ecosystem.

- f . Memory Efficiency: NumPy arrays consume less memory than Python lists and allow for the storage of large datasets in an efficient manner.

2.  How does broadcasting work in NumPy?
- Broadcasting in NumPy is a powerful mechanism that allows operations on arrays of different shapes and sizes without the need for explicit looping or reshaping. It enables element-wise operations by automatically expanding the smaller array to match the shape of the larger array when possible.
- How Broadcasting Works
When performing operations between two arrays, NumPy compares their shapes element-wise, starting from the trailing dimensions. Two dimensions are considered compatible if:

- a . They are equal, or
- b . One of them is 1.
- If the shapes are compatible, NumPy "broadcasts" the smaller array across the larger one so that they have the same shape. The smaller array is virtually duplicated along the dimensions where its size is 1.

3 . What is a Pandas DataFrame?
- A Pandas DataFrame is a two-dimensional, tabular data structure in Python provided by the Pandas library. It is one of the most widely used data structures for data analysis and manipulation. The DataFrame is highly versatile and resembles a spreadsheet or a SQL table.

4. Explain the use of the groupby() method in Pandas.
- The groupby() method in Pandas is a powerful tool for grouping and aggregating data within a DataFrame or Series. It is primarily used to split data into groups based on some criteria, apply a function to each group, and combine the results into a new DataFrame or Series.

- How groupby() Works
- 1 . The groupby() method operates in three steps, often referred to as the split-apply-combine process:

- 2 . Split: The data is split into groups based on one or more keys (e.g., column values).
- 3 . Apply: A function is applied to each group (e.g., aggregate functions like mean(), sum(), count(), or custom functions).
Combine: The results are combined into a new DataFrame or Series.

5. Why is Seaborn preferred for statistical visualizations?
- 1. Built-in Statistical Functions
Seaborn includes many functions designed for visualizing statistical relationships and distributions:

- Functions like relplot(), pairplot(), and lmplot() automatically compute and display statistical relationships between variables.
- Tools  for visualizing distributions (histplot(), kdeplot(), violinplot(), etc.) allow for detailed statistical insights.
- 2. Elegant Default Styles
- Seaborn comes with aesthetically pleasing default styles, making visualizations more attractive and publication-ready.
- Styles such as whitegrid, darkgrid, and ticks enhance readability and reduce the need for extensive customization.
- 3. Ease of Use
- Seaborn simplifies complex visualizations with concise syntax. For example, a violin plot or box plot can be created with just one line of code.
- Built-in integration with Pandas DataFrames allows for direct use of column names, eliminating the need for manual data extraction.
- 4. Works Well with DataFrames
- Seaborn seamlessly integrates with Pandas DataFrames, allowing for easy plotting directly from tabular data.
- It supports grouping, subsetting, and faceting data for multi-dimensional visualizations.

6. What are the differences between NumPy arrays and Python lists?
- 1. Data Type Consistency
- a . NumPy Arrays:
- Require all elements to be of the same data type (e.g., all integers, floats, etc.).
- Enforce type uniformity, making them more memory-efficient.
-- b . Python Lists:
Can store elements of mixed data types (e.g., integers, strings, objects).
- 2. Performance
- a . NumPy Arrays:
- Much faster for numerical computations due to optimized C implementations under the hood.
- Allow for vectorized operations (element-wise operations without explicit loops).
- b .  Python Lists:
- Slower for numerical computations since they rely on Python loops for operations.
- No built-in support for vectorized operations.
- 3. Memory Efficiency
- a . NumPy Arrays:
- More memory-efficient because elements are stored in a contiguous block of memory, using a fixed-size data type.
- Ideal for handling large datasets.
- b . Python Lists:
- Less memory-efficient as they store pointers to individual elements, and each element can be of arbitrary size.
- 4. Functionality
- a . NumPy Arrays:
- Designed specifically for numerical and scientific computing.
- Offer built-in support for mathematical operations, linear algebra, random number generation, and more.
- b . Python Lists:
- General-purpose data structure with no specific focus on numerical computations.
- Lack direct support for mathematical operations.

7 . What is a heatmap, and when should it be used?
- A heatmap is a data visualization technique that represents numerical data as a matrix of color-coded cells. Each cell in the grid corresponds to a data point, and its color intensity reflects the magnitude or value of that point. Heatmaps are widely used to visualize patterns, correlations, or distributions in datasets, particularly when working with large amounts of data.
- When to Use a Heatmap:
- a . Visualizing Correlations:
- Heatmaps are often used to display correlation matrices in statistical and machine learning workflows.
- Example: Understanding relationships between features in a dataset.
- b . Spotting Patterns:
- Use heatmaps to identify patterns, trends, or clusters within data.
- Example: In time-series data, heatmaps can show changes over time.
- c . Highlighting Outliers:
- Heatmaps can highlight anomalies or outliers in data due to their distinct colors.
- d . Summarizing Multi-dimensional Data:
- For datasets with two categorical dimensions, a heatmap can summarize the relationships or frequencies.
- e . Performance or Activity Visualization:
- Example: Visualizing website traffic, server performance metrics, or customer behavior.
- f . Genomics and Bioinformatics:
- Widely used in bioinformatics to visualize gene expression data or protein interactions.

8 . What does the term “vectorized operation” mean in NumPy?
- A vectorized operation in NumPy refers to performing operations on entire arrays (or collections of elements) in a single step, rather than iterating through the elements individually in a loop. These operations are optimized for performance and leverage low-level, highly efficient implementations in C, enabling faster execution compared to traditional Python loops.

9. How does Matplotlib differ from Plotly?
- 1. Interactivity
- Matplotlib:
- Primarily a static plotting library. Plots are typically non-interactive and rendered as static images (e.g., PNG, PDF).
- Limited interactivity via tools like Matplotlib’s mpl_toolkits or by embedding in GUIs.
- Suitable for creating publication-quality plots.
- Plotly:
- Built for interactive visualizations. Users can zoom, pan, hover for details, and more.
- Plots are rendered in web browsers or as interactive components in Jupyter Notebooks.
- Ideal for dashboards and presentations requiring dynamic user engagement.
- 2. Ease of Use
- Matplotlib:
- Offers a lot of control but can be verbose and complex for beginners.
- Requires more lines of code to create and customize visualizations.
- Plotly:
- Has a user-friendly API for creating complex and interactive plots with minimal code.
- Default settings often produce polished and visually appealing charts.
- 3. Output Formats
- Matplotlib:
- Outputs static files like PNG, PDF, SVG, and more.
- Can embed plots into desktop applications using frameworks like PyQt or Tkinter.
- Plotly:
- Outputs interactive HTML files that can be viewed in browsers or embedded in web applications.
- Can also generate static images (requires additional setup or libraries like kaleido).

10 . What is the significance of hierarchical indexing in Pandas?
- Hierarchical Indexing in Pandas
- Hierarchical indexing, also known as a MultiIndex, is a powerful feature in Pandas that allows you to have multiple levels of indexing for rows and/or columns in a DataFrame or Series. This enables you to work with high-dimensional data in a lower-dimensional form and facilitates advanced data manipulation, aggregation, and visualization.

11. What is the role of Seaborn’s pairplot() function?
- Seaborn’s pairplot() Function
- The pairplot() function in Seaborn is a powerful tool for visualizing pairwise relationships in a dataset. It creates a grid of plots where each numeric variable is plotted against every other numeric variable in the dataset. This function is particularly useful for exploratory data analysis (EDA) to understand the relationships between variables and their distributions.

12.What is the purpose of the describe() function in Pandas?
- The describe() function in Pandas provides a summary statistics of the numerical columns in a DataFrame or Series, allowing you to quickly gain insights into the distribution and central tendencies of your data. It is a commonly used function during the exploratory data analysis (EDA) phase to understand the basic characteristics of the data, such as the mean, median, spread, and presence of outliers.

13 . Why is handling missing data important in Pandas?
- 1. Impact on Analysis and Models
- Distorted Results: Many analytical methods (e.g., mean, median, regression, machine learning algorithms) rely on complete datasets. If missing values are not handled, they can distort statistical summaries and lead to misleading conclusions.
- Inconsistent Results: If missing data is not treated consistently, it can lead to biased or inconsistent results, especially when working with predictive models.
- Errors in Algorithms: Many machine learning algorithms (e.g., decision trees, linear regression, neural networks) do not handle missing values directly, and attempting to run these algorithms without handling NaN values may result in errors.
- 2. Types of Missing Data
- Understanding the type of missing data is essential for deciding how to handle it:
- Missing Completely at Random (MCAR): The missing values are independent of both observed and unobserved data. In this case, the missing data does not bias the analysis and can be safely removed.
- Missing at Random (MAR): The missing data depends on the observed data but not on the unobserved data. In such cases, imputing the missing data based on other available information is a reasonable option.
- Missing Not at Random (MNAR): The missing data depends on the unobserved data itself. This is the most challenging case to handle, and advanced techniques such as modeling or domain-specific methods might be needed.

14 . What are the benefits of using Plotly for data visualization?
- 1. Interactivity
- Dynamic Visualizations:
- Plotly’s visualizations are inherently interactive, allowing users to zoom, pan, hover for details, and filter data directly in the plots.
- This interactivity is particularly useful for exploring large datasets and identifying patterns or outliers.
- Hover Tooltips:
- Provides detailed information about data points when hovering, which enhances interpretability without cluttering the plot.
- 2. Wide Range of Plot Types
- Comprehensive Plot Library:
- Plotly supports over 40 different chart types, including:
- Line, bar, scatter, and pie charts.
- 3D plots and surface plots.
- Choropleth and geographic maps.
- Heatmaps and treemaps.
- Sankey diagrams and sunburst charts.
- 3 . Free and Open Source
- The core version of Plotly is free and open source, making it accessible to everyone. Advanced enterprise features are available through the commercial version, but most common use cases are covered in the free edition.

15. How does NumPy handle multidimensional arrays?
- 1. What Are Multidimensional Arrays in NumPy
- A multidimensional array is represented by the numpy.ndarray object. It is a grid of values of the same type, indexed by a tuple of integers.
- The number of dimensions (axes) is called the rank of the array.
1D Array: A list-like structure (e.g., [1, 2, 3]).
2D Array: A matrix-like structure (e.g., [[1, 2], [3, 4]]).
3D Array: A tensor-like structure (e.g., [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]).
-  Applications of Multidimensional Arrays
- Data Representation: Represent tabular data, images, or time-series data.
- Linear Algebra: Perform matrix multiplications, eigenvalue calculations, and other operations.
- Numerical Simulations: Model scientific phenomena in multiple dimensions (e.g., 3D physics simulations).
- Machine Learning: Process inputs like image tensors, datasets, or neural network weights.

16 . What is the role of Bokeh in data visualization?
- Bokeh is a powerful Python library designed for creating interactive, web-ready visualizations. It is particularly suited for building dynamic and visually appealing plots that can be seamlessly integrated into web applications.

17 .  Explain the difference between apply() and map() in Pandas?
- 1. Scope
- map():
- Works only on a Pandas Series.
- Cannot be directly applied to DataFrames.
- apply():
- Works on both Pandas Series and DataFrames.
- For DataFrames, it can operate along rows or columns.
- 2 . Performance
- map():
- Optimized for simple element-wise operations on Series and is usually faster than apply().
- apply():
- Slightly slower because it is more general and works on both Series and DataFrames, often requiring additional processing to determine the axis and shape of the result.
- 3 . Output
- map():
- The output is always a Series with the same index as the input.
- apply():
- The output can be a Series, DataFrame, or scalar value depending on the applied function and the input dimensions.

18. What are some advanced features of NumPy?
- 1. Broadcasting
- Feature: Allows operations on arrays of different shapes by automatically expanding their dimensions to make the shapes compatible.
- Use Case: Simplifies element-wise operations without explicitly reshaping arrays.
- 2. Vectorized Operations
- Feature: Enables applying operations to entire arrays without explicit loops, leading to faster execution.
- Use Case: Perform mathematical computations efficiently.
- 3. Memory-Mapped Files
- Feature: Allows working with large datasets by mapping files directly into memory, enabling efficient I/O operations without loading the entire file into RAM.
- Use Case: Handle datasets larger than the available memory.
- 4. Structured Arrays
- Feature: Allows defining arrays with heterogeneous data types (e.g., integers, floats, strings).
- Use Case: Work with tabular data in a structured format.
- 5. Universal Functions (ufuncs)
- Feature: Predefined functions in NumPy that operate element-wise on arrays and support broadcasting.
- Use Case: Perform element-wise operations like addition, trigonometric functions, and exponentiation.

19 . How does Pandas simplify time series analysis?
- 1. Time Indexing and Resampling
- Time-Based Indexing:
 - Pandas allows time series data to be indexed using DatetimeIndex, which enables intuitive slicing and selection.
- Resampling:
- Convert data to different time frequencies (e.g., daily to monthly, hourly to daily) using aggregation or interpolation.
- 2. Flexible Time Representations
- Datetime Conversion:
- Convert strings, integers, or other formats into datetime objects using pd.to_datetime.
- Custom Frequencies:
- Use flexible time frequencies like:
D (daily), H (hourly), T (minutes), B (business days).
- 3. Handling Missing Data
- Forward and Backward Filling:
- Handle missing values specific to time series using forward (ffill) or backward filling (bfill).
- Interpolate:
- Interpolate missing data using linear or other interpolation methods.

20 . What is the role of a pivot table in Pandas?
- Role of a Pivot Table
- 1. Data Summarization:
- Aggregates data based on one or more keys (columns) to generate summaries, such as sums, averages, counts, etc.
- Example: Summarizing total sales by region and product category.
- 2. Reshaping Data:
- Converts a long-format DataFrame into a wide-format, making it easier to analyze and visualize data.
- 3. Custom Aggregations:
- Allows applying specific aggregation functions (e.g., sum, mean, min, max, or custom functions) to groups of data.
- 4 . Multi-Dimensional Analysis:
- Facilitates multi-level grouping and analysis using hierarchical indexes for rows and columns.
-5 . Quick Data Insights:
- Provides an intuitive way to slice and dice data for exploratory data analysis (EDA).

21. Why is NumPy’s array slicing faster than Python’s list slicing?
- NumPy’s array slicing is faster than Python’s list slicing primarily due to the differences in how the two handle memory and operations. Here's a detailed explanation of why this is the case:

1. Memory Contiguity
- NumPy Arrays:
- NumPy arrays store data in a single contiguous block of memory (C-style or Fortran-style).
- When slicing a NumPy array, it creates a view of the same memory block rather than copying the data.
- This allows for constant-time access to the data because the slice refers to the same underlying memory.
- Python Lists:
- Python lists are collections of pointers to objects, which may be scattered in memory.
- Slicing a Python list creates a new list and copies the elements from the original list into the new one, leading to additional overhead in terms of memory allocation and data copying.
- 2. Low-Level Optimization
- NumPy:
- Written in C, NumPy uses highly optimized C functions for slicing and other array operations.
- These operations leverage vectorized computations and avoid Python's interpreter overhead.
- Slicing in NumPy is implemented using pointer arithmetic to calculate offsets, making it extremely fast.
- Python Lists:
- Python lists are general-purpose and not optimized for numerical operations.
- Each element access involves dereferencing a pointer and may involve type checking and other overhead, slowing down slicing operations.

22 . What are some common use cases for Seaborn?
- 1. Exploratory Data Analysis (EDA)
- Seaborn helps in visually exploring data to identify patterns, trends, correlations, and potential outliers.
- 2. Relationship Analysis
- Seaborn excels at visualizing relationships between variables.
- Scatter Plots:
- Explore relationships between two continuous variables.
- Regression Plots:
- Examine linear relationships with optional confidence intervals.
- Pairwise Relationships:
- Use pairplot() to visualize relationships across all numerical variables.
- 3. Use with Large Datasets
- Seaborn's efficient handling of large datasets makes it suitable for analyzing extensive data by creating aggregated visualizations.



In [None]:
1.  How do you create a 2D NumPy array and calculate the sum of each row?
'''
import numpy as np

# Step 1: Create a 2D NumPy array
array_2d = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print("2D Array:")
print(array_2d)

# Step 2: Calculate the sum of each row
row_sums = np.sum(array_2d, axis=1)
print("\nSum of each row:")
print(row_sums)
Output
2D Array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Sum of each row:
[ 6 15 24]
'''

In [None]:
2. Write a Pandas script to find the mean of a specific column in a DataFrame?
'''
import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 27, 22, 32, 29],
    'Salary': [50000, 60000, 45000, 80000, 70000]
}

df = pd.DataFrame(data)

# Specify the column for which to calculate the mean
column_name = 'Age'

# Calculate the mean of the specified column
mean_value = df[column_name].mean()

print(f"The mean of the '{column_name}' column is: {mean_value}")
Output
The mean of the 'Age' column is: 26.8
'''

In [None]:
3. A Create a scatter plot using Matplotlib.
'''
import matplotlib.pyplot as plt

# Sample data for the scatter plot
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create the scatter plot
plt.scatter(x, y, color='blue', marker='o')

# Add labels and title
plt.title("Sample Scatter Plot")
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")

# Show the plot
plt.show()
Output
This code generates a scatter plot with the given x and y data points.
'''


In [None]:
4.  How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
'''
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Create a sample dataset
data = {
    'Age': [25, 30, 35, 40, 45],
    'Salary': [40000, 50000, 60000, 80000, 90000],
    'Experience': [1, 3, 5, 8, 10]
}

df = pd.DataFrame(data)

# Step 2: Calculate the correlation matrix
correlation_matrix = df.corr()

# Step 3: Visualize with a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix Heatmap")
plt.show()
Output
he correlation values between variables are displayed in each cell.
Colors represent the strength and direction of correlations:
Dark Red: Strong positive correlation (close to +1).
Dark Blue: Strong negative correlation (close to -1).
White or Neutral: Weak or no correlation (close to 0).
'''

In [None]:
5. Generate a bar plot using Plotly.
'''
import plotly.express as px

# Sample data for the bar plot
data = {
    'Category': ['A', 'B', 'C', 'D', 'E'],
    'Values': [20, 34, 23, 45, 12]
}

# Create the bar plot
fig = px.bar(data, x='Category', y='Values', title='Bar Plot Example',
             labels={'Values': 'Value Count', 'Category': 'Category Name'},
             color='Category')  # Optional: Adds color by category

# Display the plot
fig.show()
Output
Categories (A, B, C, D, E) on the x-axis.
Their respective values on the y-axis as vertical bars.
Interactive tooltips for each bar.
'''

In [None]:
6. Create a DataFrame and add a new column based on an existing column.
'''
import pandas as pd

# Step 1: Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 22]
}
df = pd.DataFrame(data)

# Step 2: Add a new column based on an existing column
# Example: Categorize Age into 'Young' or 'Old'
df['Age Category'] = df['Age'].apply(lambda age: 'Young' if age < 30 else 'Old')

# Display the updated DataFrame
print(df)
Output
      Name  Age Age Category
0    Alice   25        Young
1      Bob   30         Old
2  Charlie   35         Old
3    David   40         Old
4      Eve   22        Young
'''

In [None]:
7. Write a program to perform element-wise multiplication of two NumPy arrays.
'''
import numpy as np

# Step 1: Create two NumPy arrays
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([10, 20, 30, 40, 50])

# Step 2: Perform element-wise multiplication
result = array1 * array2

# Display the result
print("Array 1:", array1)
print("Array 2:", array2)
print("Element-wise Multiplication:", result)
Output
Array 1: [1 2 3 4 5]
Array 2: [10 20 30 40 50]
Element-wise Multiplication: [ 10  40  90 160 250]
'''

In [None]:
8. A Create a line plot with multiple lines using Matplotlib.
'''
import matplotlib.pyplot as plt

# Step 1: Create data for multiple lines
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]  # Line 1
y2 = [1, 3, 5, 7, 9]   # Line 2
y3 = [3, 6, 9, 12, 15] # Line 3

# Step 2: Plot multiple lines
plt.plot(x, y1, label='Line 1', color='red', linestyle='-', marker='o')
plt.plot(x, y2, label='Line 2', color='blue', linestyle='--', marker='s')
plt.plot(x, y3, label='Line 3', color='green', linestyle='-.', marker='d')

# Step 3: Add title, labels, and legend
plt.title("Line Plot with Multiple Lines")
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
plt.legend()  # Display legend

# Step 4: Show the plot
plt.grid(True)  # Optional: Adds a grid for better readability
plt.show()
Output
The plot will display three lines:

Line 1: Red solid line with circle markers.
Line 2: Blue dashed line with square markers.
Line 3: Green dash-dot line with diamond markers.
Each line is labeled in the legend for clarity.
'''




In [None]:
9. A Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.
'''
import pandas as pd

# Step 1: Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 29, 35, 42, 18],
    'Score': [85, 90, 88, 75, 95]
}
df = pd.DataFrame(data)

# Step 2: Define a threshold
threshold = 30

# Step 3: Filter rows where 'Age' is greater than the threshold
filtered_df = df[df['Age'] > threshold]

# Display the original and filtered DataFrame
print("Original DataFrame:")
print(df)
print("\nFiltered DataFrame (Age > 30):")
print(filtered_df)
Output
      Name  Age  Score
0    Alice   24     85
1      Bob   29     90
2  Charlie   35     88
3    David   42     75
4      Eve   18     95
Filtered DataFrame (Age > 30):
      Name  Age  Score
2  Charlie   35     88
3    David   42     75
'''

In [None]:
10 . A Create a histogram using Seaborn to visualize a distribution.
'''
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Create sample data
data = [22, 23, 24, 25, 26, 27, 27, 28, 29, 30, 30, 31, 32, 33, 34, 35, 35, 36, 37, 38]

# Step 2: Create a Seaborn histogram
sns.histplot(data, bins=10, kde=True, color='blue', edgecolor='black')

# Step 3: Add titles and labels
plt.title("Histogram of Sample Data")
plt.xlabel("Value")
plt.ylabel("Frequency")

# Step 4: Display the plot
plt.show()
Output
The plot will display a histogram with:

A distribution of the sample data.
A Kernel Density Estimate (KDE) curve that estimates the probability density function of the data.
'''

In [None]:
11. A Perform matrix multiplication using NumPy.
'''
import numpy as np

# Step 1: Create two matrices
matrix1 = np.array([[1, 2], [3, 4], [5, 6]])
matrix2 = np.array([[7, 8], [9, 10]])

# Step 2: Perform matrix multiplication
result = np.matmul(matrix1, matrix2)

# Display the result
print("Matrix 1:")
print(matrix1)
print("\nMatrix 2:")
print(matrix2)
print("\nResult of Matrix Multiplication:")
print(result)
Output
Matrix 1:
[[1 2]
 [3 4]
 [5 6]]

Matrix 2:
[[ 7  8]
 [ 9 10]]

Result of Matrix Multiplication:
[[25 28]
 [57 64]
 [89 100]]
'''

In [None]:
12. Use Pandas to load a CSV file and display its first 5 rows.
'''
import pandas as pd

# Step 1: Load the CSV file into a DataFrame
df = pd.read_csv('path_to_your_file.csv')  # Replace with your file path

# Step 2: Display the first 5 rows
print(df.head())
Output
Name, Age, City
Alice, 24, New York
Bob, 30, San Francisco
Charlie, 28, Los Angeles
David, 35, Chicago
Eve, 22, Boston
'''

In [None]:
13. Create a 3D scatter plot using Plotly.
'''
import plotly.express as px
import pandas as pd

# Step 1: Create a sample DataFrame with 3D coordinates
data = {
    'X': [1, 2, 3, 4, 5],
    'Y': [5, 4, 3, 2, 1],
    'Z': [2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)

# Step 2: Create a 3D scatter plot
fig = px.scatter_3d(df, x='X', y='Y', z='Z', title="3D Scatter Plot")

# Step 3: Show the plot
fig.show()
Output
A 3D scatter plot will be displayed where each point is plotted in 3D space based on the X, Y, and Z values.
'''