# Question 1
# What is NumPy, and why is it widely used in Python?

# Answer:
# NumPy (Numerical Python) is a fundamental package for numerical computation in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
# NumPy is widely used because:
# - Efficiency: NumPy arrays are implemented in C, making operations on them much faster than equivalent operations on standard Python lists, especially for large datasets.
# - Functionality: It offers a rich set of functions for linear algebra, Fourier transforms, random number generation, and more, which are essential for scientific and mathematical computing.
# - Integration: NumPy arrays are the fundamental data structure used by many other scientific Python libraries like Pandas, SciPy, and scikit-learn, making it a cornerstone of the scientific computing ecosystem.
# - Broadcasting: NumPy's broadcasting feature allows operations on arrays with different shapes under certain conditions, simplifying complex calculations.

# Question 2
# How does broadcasting work in NumPy?

# Answer:
# Broadcasting in NumPy is a powerful mechanism that allows NumPy to perform arithmetic operations on arrays with different shapes. NumPy automatically expands the dimensions of the smaller array to match the larger array, without explicitly creating copies of the data.
# Broadcasting follows a set of rules to determine if two arrays are compatible for an operation:
# 1. If the arrays do not have the same rank (number of dimensions), prepend the shape of the lower rank array with 1s until both ranks match.
# 2. Two dimensions are compatible when:
#    a. They are equal, or
#    b. One of them is 1.
# 3. If these conditions are not met, a `ValueError` is raised, indicating that the arrays have incompatible shapes.
# The broadcasting rule ensures that operations can be performed element-wise between arrays of different but compatible shapes, leading to concise and efficient code.

# Question 3
# What is a Pandas DataFrame?

# Answer:
# A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table. Key features of a DataFrame include:
# - Labeled axes: Rows and columns have labels (indices and column names).
# - Data alignment: Operations between DataFrames automatically align data based on labels.
# - Column types: Columns can have different data types (integer, float, string, boolean, etc.).
# - Flexibility: Provides powerful tools for data manipulation, cleaning, analysis, and visualization.
# DataFrames are the primary data structure used in Pandas for working with tabular data.

# Question 4
# Explain the use of the `.groupby()` method in Pandas.

# Answer:
# The `.groupby()` method in Pandas is used to group rows in a DataFrame based on one or more columns. It allows you to split the DataFrame into groups, apply a function to each group independently, and then combine the results back into a data structure.
# The `.groupby()` method typically follows a "split-apply-combine" strategy:
# 1. Split: The DataFrame is split into groups based on the values in the specified column(s).
# 2. Apply: A function (e.g., aggregation, transformation, filtering) is applied to each of these groups.
# 3. Combine: The results from applying the function to each group are combined into a new DataFrame or Series.
# `.groupby()` is a fundamental tool for performing aggregate analysis and transformations on subsets of data within a DataFrame.

# Question 5
# Why is Seaborn preferred for statistical visualizations?

# Answer:
# Seaborn is a Python data visualization library built on top of Matplotlib. It is often preferred for statistical visualizations because:
# - High-level interface: Seaborn provides a high-level interface with intuitive functions for creating informative and attractive statistical graphics.
# - Statistical plots: It specializes in statistical visualizations, offering functions for comparing distributions, visualizing relationships between variables, and analyzing categorical data (e.g., box plots, violin plots, scatter plots with regression lines, heatmaps).
# - Integration with Pandas: Seaborn works seamlessly with Pandas DataFrames, making it easy to visualize data directly from these structures.
# - Aesthetics: Seaborn provides attractive default styles and color palettes, making it easier to create visually appealing plots with less code.
# - Complex visualizations: It simplifies the creation of complex statistical plots that would require more effort in Matplotlib.

# Question 6
# What are the differences between NumPy arrays and Python lists?

# Answer:
# The key differences between NumPy arrays and Python lists are:
# - Data type: NumPy arrays have a fixed data type for all elements, which allows for efficient storage and operations. Python lists can contain elements of different data types.
# - Performance: Operations on NumPy arrays are generally much faster than equivalent operations on Python lists, especially for numerical computations on large datasets, due to vectorized operations and C implementation.
# - Functionality: NumPy provides a wide range of built-in functions optimized for numerical operations (linear algebra, statistics, etc.), which are not directly available for Python lists.
# - Memory usage: NumPy arrays are often more memory-efficient for storing large amounts of numerical data due to their fixed data type and contiguous memory allocation.
# - Broadcasting: NumPy arrays support broadcasting, which allows operations on arrays with different shapes under certain conditions, a feature not available in Python lists.

# Question 7
# What is a heatmap, and when should it be used?

# Answer:
# A heatmap is a graphical representation of data where values are encoded by color. Typically, a two-dimensional array of data is visualized, with each cell colored according to its value. A color scale is used to map data values to colors.
# Heatmaps are useful in the following situations:
# - Correlation analysis: Visualizing the correlation matrix between different variables in a dataset.
# - Missing value analysis: Identifying patterns of missing data in a dataset.
# - Feature importance: Displaying the importance of different features in a machine learning model.
# - Clustering results: Showing the relationships between data points or features after clustering.
# - Time series data: Representing patterns in data over time across different categories.
# - Any situation where you need to visualize the magnitude of values across two dimensions.

# Question 8
# What does the term "vectorized operation" mean in NumPy?

# Answer:
# A vectorized operation in NumPy refers to performing an operation on an entire array (or a subset of it) at once, element-wise, without the need for explicit loops in Python. NumPy's underlying implementation in C allows these operations to be executed very efficiently.
# Benefits of vectorized operations:
# - Speed: Vectorized operations are significantly faster than using Python loops because they leverage optimized C code and can often utilize Single Instruction, Multiple Data (SIMD) capabilities of the processor.
# - Conciseness: Vectorized code is typically much shorter and easier to read than equivalent code using explicit loops.
# - Efficiency: They often lead to more memory-efficient computations by avoiding the overhead of Python loop iterations.
# Most arithmetic operations, comparisons, and mathematical functions in NumPy are vectorized.

# Question 9
# How does Matplotlib differ from Plotly?

# Answer:
# Matplotlib and Plotly are both Python libraries for creating visualizations, but they have key differences:
# - Interactivity: Plotly primarily creates interactive plots that can be zoomed, panned, and hovered over, often with tooltips providing more information. Matplotlib primarily creates static plots by default.
# - Output: Matplotlib typically generates static image files (e.g., PNG, JPG, SVG) or displays plots in a GUI window. Plotly can generate interactive HTML files that can be viewed in web browsers or embedded in web applications.
# - Ease of use for complex plots: Plotly often provides a higher-level API for creating complex interactive visualizations with less code compared to achieving similar results in Matplotlib.
# - Aesthetics: Plotly often has more modern and aesthetically pleasing default styles compared to Matplotlib's defaults, although Matplotlib is highly customizable.
# - Use cases: Matplotlib is widely used for creating publication-quality static plots and is a foundational library for other visualization libraries. Plotly is often preferred for creating interactive dashboards, web-based visualizations, and exploring data dynamically.

# Question 10
# What is the significance of hierarchical indexing in Pandas?

# Answer:
# Hierarchical indexing (also known as MultiIndex) in Pandas allows you to have multiple levels of row or column labels within a DataFrame or Series. This is significant because it enables you to:
# - Work with higher-dimensional data in a two-dimensional structure: You can represent data with more than two dimensions by using multiple index levels.
# - Perform complex data analysis: It facilitates operations like grouping, pivoting, and unstacking data based on different levels of the index.
# - More expressive data manipulation: It provides a more intuitive way to select, slice, and reshape subsets of your data based on multiple criteria.
# - Represent panel data: It's particularly useful for working with panel data, which has dimensions like entities, time, and variables.
# Hierarchical indexing provides a powerful way to structure and analyze complex datasets within the Pandas framework.

# Question 11
# What is the role of the `.apply()` function in Pandas?

# Answer:
# The `.apply()` function in Pandas is used to apply a given function along an axis of a DataFrame or Series. It allows you to perform custom operations that are not necessarily available as built-in vectorized functions.
# Key aspects of `.apply()`:
# - Flexibility: You can apply any Python function (built-in or user-defined) to the data.
# - Axis-wise operation: You can choose to apply the function to each column (`axis=0`) or each row (`axis=1`) of a DataFrame, or to all values of a Series.
# - Broadcasting: The function you apply should typically operate on a Series (when applied to columns or rows) or a scalar value (when applied element-wise to a Series).
# While `.apply()` offers flexibility, it's often less performant than vectorized operations or optimized Pandas functions for common tasks, as it may involve Python-level iteration.

# Question 12
# What is the purpose of the `.describe()` function in Pandas?

# Answer:
# The `.describe()` function in Pandas is used to generate descriptive statistics of a DataFrame or Series. By default, it calculates and returns statistics for numerical columns, including:
# - count: The number of non-missing values.
# - mean: The average value.
# - std: The standard deviation.
# - min: The minimum value.
# - 25%: The 25th percentile (first quartile).
# - 50%: The 50th percentile (median or second quartile).
# - 75%: The 75th percentile (third quartile).
# - max: The maximum value.
# For categorical or object type columns, `.describe()` can provide different statistics like count, unique (number of unique values), top (most frequent value), and freq (frequency of the top value).
# `.describe()` is a quick and useful way to get a summary of the distribution and central tendency of the data in your DataFrame.

# Question 13
# Why is handling missing data important in Pandas?

# Answer:
# Handling missing data is crucial in Pandas (and data analysis in general) because:
# - Impact on analysis: Missing values can lead to incorrect or biased results in statistical analyses and machine learning models.
# - Errors in computations: Many numerical and statistical functions cannot handle missing values and may return errors or unexpected results.
# - Data quality: The presence of missing data can affect the overall quality and reliability of the dataset.
# - Representation of reality: Missing data might indicate a significant aspect of the data collection process or the underlying phenomenon being studied. Ignoring or improperly handling it can lead to a distorted understanding.
# Pandas provides tools for identifying, filling, or removing missing data (`NaN` - Not a Number), allowing for more robust and accurate data analysis.

# Question 14
# What are the benefits of using Plotly for data visualization?

# Answer:
# The benefits of using Plotly for data visualization include:
# - Interactivity: Creates interactive plots that allow users to explore data through zooming, panning, hovering, and tooltips.
# - Web-based output: Generates interactive HTML files that can be easily shared and embedded in web applications or dashboards.
# - Wide range of plot types: Supports a variety of basic and advanced plot types, including statistical, scientific, financial, and geographical visualizations.
# - Customization: Offers extensive options for customizing the appearance and behavior of plots.
# - Integration: Works well with Pandas DataFrames and other scientific Python libraries.
# - Dashboards: Plotly Dash framework allows building interactive analytical web applications and dashboards.
# - Aesthetics: Often provides visually appealing default styles and color schemes.

# Question 15
# How does NumPy handle multidimensional arrays?

# Answer:
# NumPy's core data structure is the `ndarray` (n-dimensional array), which can efficiently store and operate on multidimensional arrays. NumPy handles these arrays by:
# - Contiguous memory allocation: Elements of a NumPy array are stored in a contiguous block of memory, which allows for fast access and vectorized operations.
# - Shape and strides: Each array has a `shape` attribute (a tuple indicating the size of each dimension) and `strides` (a tuple of bytes to step in each dimension when traversing the array). These attributes allow NumPy to efficiently access and manipulate subarrays without copying data.
# - Broadcasting: As mentioned earlier, NumPy's broadcasting rules enable element-wise operations between arrays with compatible shapes.
# - Optimized functions: NumPy provides a vast library of optimized functions that operate element-wise or along specific axes of multidimensional arrays, leveraging efficient C implementations.

# Question 16
# What is the role of tokens in data visualization? (This question seems out of context for the "Data Toolkit" context focusing on NumPy and Pandas. It might be related to a specific visualization library or concept not directly covered by these tools.)

# Answer:
# In the context of data visualization, the term "tokens" is not a standard or widely used term within libraries like NumPy, Pandas, Matplotlib, or Plotly in the way that "arrays," "DataFrames," or "plots" are. It's possible this term refers to a specific concept within a more specialized visualization tool or framework.

# If "tokens" here refers to individual data points or visual marks that represent data, then their role is fundamental:
# - Representation: Tokens visually encode data values through their properties (e.g., position, color, size, shape).
# - Communication: They allow viewers to perceive patterns, trends, and relationships within the data.
# - Interaction: In interactive visualizations, tokens can be targets for user interactions (e.g., hovering to see details).

# Without more context, a more specific answer is difficult.

# Question 17
# Explain the difference between `.apply()` and `.map()` in Pandas.

# Answer:
# Both `.apply()` and `.map()` in Pandas are used for applying a function to data, but they operate at different levels:
# - `.map()`: This method is used on a Pandas Series to apply a function element-wise. The function should take a single value as input and return a single value. It can also accept a dictionary or a Series to perform element-wise mapping based on values.
# - `.apply()`: This method can be used on both Pandas Series and DataFrames.
#   - On a Series, `.apply()` also applies a function element-wise, similar to `.map()`, but it can handle more complex functions that might return a Series or other array-like structures.
#   - On a DataFrame, `.apply()` applies a function along an axis (either rows or columns). The function receives a Series (representing a row or a column) as input and can return a scalar, a Series, or a DataFrame.
# In summary:
# - Use `.map()` for simple element-wise transformations on a Series.
# - Use `.apply()` on a Series for more complex element-wise transformations or when the function needs to return a Series.
# - Use `.apply()` on a DataFrame to apply a function row-wise or column-wise.
# `.map()` is generally faster for simple element-wise operations on a Series.

# Question 18
# What are some advanced features of NumPy?

# Answer:
# Some advanced features of NumPy include:
# - Broadcasting: Enables operations on arrays with different shapes.
# - Masking: Allows selective operations on array elements based on a boolean mask.
# - Fancy indexing: Enables accessing array elements using integer arrays or boolean arrays for indices, allowing for non-contiguous and complex selections.
# - Structured arrays: Allow creating arrays with elements that are tuples or records, with named fields and different data types for each field.
# - Memory views: Provide a way to access the memory of a NumPy array without copying the data, which can be useful for working with large arrays and interfacing with other libraries.
# - Universal functions (ufuncs): Vectorized functions that operate element-wise on arrays and support broadcasting, type casting, and other advanced features.
# - Linear algebra module (`numpy.linalg`): Provides a comprehensive set of functions for linear algebra operations.
# - Fourier transform module (`numpy.fft`): Offers functions for performing Fast Fourier Transforms.
# - Random number generation (`numpy.random`): Provides tools for generating various types of random numbers.

# Question 19
# How does Pandas simplify time series analysis?

# Answer:
# Pandas provides several powerful features that simplify time series analysis:
# - Dedicated data structures: The `Timestamp` object represents a single point in time, and the `DatetimeIndex` is a sequence of Timestamps, optimized for time series data. Pandas also has `Timedelta` for representing durations.
# - Time series indexing: You can easily index and select data based on dates and times using the `DatetimeIndex`.
# - Time series slicing: Powerful and intuitive ways to slice data based on date ranges.
# - Resampling: The `.resample()` method allows you to change the frequency of your time series data (e.g., from daily to monthly, or vice versa) and perform aggregations.
# - Time zone handling: Pandas provides tools for working with time zones.
# - Shifting and lagging: Functions to shift the time series forward or backward in time.
# - Rolling window operations: Methods like `.rolling()` allow you to calculate statistics (e.g., mean, sum) over a sliding window of time.
# - Date and time parsing: Functions like `pd.to_datetime()` can convert various string formats into datetime objects.


# Question 20
# What is the role of the pivot table in Pandas?

# Answer:
# A pivot table in Pandas is a powerful tool for reshaping and summarizing data. It allows you to transform a DataFrame into a new table-like structure where one or more columns become the new index, another column's unique values become the new column headers, and the remaining values are aggregated (e.g., sum, mean, count) based on these new row and column indices.
# The role of the pivot table is to:
# - Summarize data: Aggregate data based on different categories.
# - Reshape data: Transform data from a "long" format to a "wide" format, making it easier to compare values across different groups.
# - Analyze relationships: Help identify patterns and relationships between different variables in the data.
# - Create reports: Generate concise and informative summaries of complex datasets.
# The `pd.pivot_table()` function in Pandas provides a flexible way to create these pivot tables.

# Question 21
# Why is NumPy array slicing faster than Python's list slicing?

# Answer:
# NumPy array slicing is generally faster than Python's list slicing due to the underlying implementation and the nature of NumPy arrays:
# - Contiguous memory: NumPy arrays store their elements in a contiguous block of memory. This allows for efficient access to slices as it often involves just a change in the "view" of the underlying data without copying.
# - Optimized C implementation: NumPy is implemented in C, and its slicing operations are optimized C code, which is much faster than Python's interpreted loops that would be involved in list slicing for numerical data.
# - Vectorized operations: NumPy's design allows for vectorized operations on entire slices, further enhancing performance.
# - Type homogeneity: NumPy arrays have a fixed data type for all elements, which allows for more efficient memory access and calculations. Python lists, on the other hand, can contain elements of different types, requiring more overhead during slicing.
# While Python list slicing creates a new list object with copied elements, NumPy slicing often creates a view of the original array, which is much more memory and time-efficient, especially for large datasets.

# Question 22
# What are some common use cases for Seaborn?

# Answer:
# Seaborn is widely used for creating various types of statistical visualizations. Some common use cases include:
# - Comparing distributions: Visualizing distributions of single variables (e.g., histograms, KDE plots, rug plots) and comparing distributions across different groups (e.g., box plots, violin plots).
# - Examining relationships between variables: Creating scatter plots, joint plots (scatter plots with marginal distributions), and pair plots (pairwise relationships between multiple variables).
# - Analyzing categorical data: Visualizing the distribution and relationships of categorical variables (e.g., count plots, bar plots, point plots).
# - Visualizing correlations: Creating heatmaps to display correlation matrices between variables.
# - Time series analysis: Visualizing trends and patterns in time series data.
# - Regression analysis: Plotting regression lines and confidence intervals to understand the relationship between variables.
# - Multivariate analysis: Creating more complex visualizations to explore relationships among multiple variables.
# - Enhancing Matplotlib plots: Seaborn can be used to improve the aesthetics of Matplotlib plots with its default styles and color palettes.

# Practical

# Question 1
# How do you create a 2D NumPy array and calculate the sum of each row?

# Answer:
# You can create a 2D NumPy array using np.array() and then use np.sum() with axis=1 to calculate the sum of each row.

array_2d = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

row_sums = np.sum(array_2d, axis=1)
print("2D NumPy Array:\n", array_2d)
print("Sum of each row:", row_sums)

# Question 2
# Write a Pandas script to find the mean of a specific column in a DataFrame.

# Answer:
# First, create a Pandas DataFrame. Then, select the desired column and use the .mean() method.

data = {'col1': [10, 20, 30, 40],
        'col2': [1.1, 2.2, 3.3, 4.4],
        'col3': ['a', 'b', 'c', 'd']}
df = pd.DataFrame(data)

column_name = 'col2'
mean_of_column = df[column_name].mean()
print("DataFrame:\n", df)
print(f"Mean of column '{column_name}': {mean_of_column}")

# Question 3
# Create a scatter plot using Matplotlib.

# Answer:
# Use plt.scatter() to create a scatter plot, providing the x and y coordinates. You can also add labels and a title.

x = np.array([1, 5, 3, 7, 2])
y = np.array([4, 2, 6, 1, 8])

plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='blue', marker='o', label='Data Points')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot using Matplotlib')
plt.legend()
plt.grid(True)
plt.show()

# Question 4
# How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

# Answer:
# First, create a Pandas DataFrame. Then, calculate the correlation matrix using .corr() and visualize it with sns.heatmap().

data = {'A': [1, 2, 3, 4, 5],
        'B': [5, 4, 3, 2, 1],
        'C': [1, 3, 2, 5, 4]}
df_corr = pd.DataFrame(data)

correlation_matrix = df_corr.corr()

plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap using Seaborn')
plt.show()

# Question 5
# Generate a bar plot using Plotly.

# Answer:
# Use plotly.express.bar() to create an interactive bar plot, providing the data and specifying the x and y axes.

data_bar = {'Category': ['A', 'B', 'C', 'D'],
            'Value': [20, 35, 15, 45]}
df_bar = pd.DataFrame(data_bar)

fig_bar = px.bar(df_bar, x='Category', y='Value', title='Bar Plot using Plotly')
fig_bar.show()

# Question 6
# Create a DataFrame and add a new column based on an existing column.

# Answer:
# Create a DataFrame and then create a new column by performing an operation on an existing column.

data_new_col = {'Price': [10, 20, 30, 40]}
df_new_col = pd.DataFrame(data_new_col)

df_new_col['Tax'] = df_new_col['Price'] * 0.10  # Add a 10% tax column
print("DataFrame with new column:\n", df_new_col)

# Question 7
# Write a program to perform element-wise multiplication of two NumPy arrays.

# Answer:
# Create two NumPy arrays with compatible shapes and use the '*' operator for element-wise multiplication.

array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])

element_wise_product = array1 * array2
print("Array 1:", array1)
print("Array 2:", array2)
print("Element-wise product:", element_wise_product)

# Question 8
# Create a line plot with multiple lines using Matplotlib.

# Answer:
# Use plt.plot() multiple times to plot different lines on the same axes.

x_line = np.array([1, 2, 3, 4, 5])
y_line1 = np.array([2, 4, 1, 5, 3])
y_line2 = np.array([1, 3, 5, 2, 4])

plt.figure(figsize=(8, 6))
plt.plot(x_line, y_line1, marker='o', linestyle='-', color='green', label='Line 1')
plt.plot(x_line, y_line2, marker='x', linestyle='--', color='red', label='Line 2')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot with Multiple Lines using Matplotlib')
plt.legend()
plt.grid(True)
plt.show()

# Question 9
# Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

# Answer:
# Create a DataFrame and then use boolean indexing to filter rows based on a condition.

data_filter = {'ID': [1, 2, 3, 4, 5],
               'Value': [10, 25, 15, 30, 20]}
df_filter = pd.DataFrame(data_filter)

threshold = 20
filtered_df = df_filter[df_filter['Value'] > threshold]
print("Original DataFrame:\n", df_filter)
print(f"DataFrame where 'Value' > {threshold}:\n", filtered_df)

# Question 10
# Create a histogram using Seaborn to visualize a distribution.

# Answer:
# Use sns.histplot() to create a histogram.

data_hist = np.random.normal(loc=50, scale=15, size=100)

plt.figure(figsize=(8, 6))
sns.histplot(data_hist, bins=10, kde=True, color='skyblue')
plt.title('Histogram using Seaborn')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

# Question 11
# Perform matrix multiplication using NumPy.

# Answer:
# Create two NumPy arrays representing matrices and use np.dot() or the '@' operator for matrix multiplication.

matrix_a = np.array([[1, 2],
                     [3, 4]])
matrix_b = np.array([[5, 6],
                     [7, 8]])

matrix_product = np.dot(matrix_a, matrix_b)
# Alternatively: matrix_product = matrix_a @ matrix_b

print("Matrix A:\n", matrix_a)
print("Matrix B:\n", matrix_b)
print("Matrix Product:\n", matrix_product)

# Question 12
# Use Pandas to load a CSV file and display its first 5 rows.

# Answer:
# Use pd.read_csv() to load the CSV file and then the .head(5) method to display the first 5 rows.

# Assuming you have a file named 'data.csv' in the same directory
# Create a sample CSV file for demonstration
data_csv = {'col1': [1, 2, 3, 4, 5],
            'col2': [6, 7, 8, 9, 10]}
df_csv = pd.DataFrame(data_csv)
df_csv.to_csv('data.csv', index=False)

try:
    df_loaded = pd.read_csv('data.csv')
    print("First 5 rows of the loaded CSV file:\n", df_loaded.head())
except FileNotFoundError:
    print("Error: 'data.csv' not found.")


    # Question 13
# Create a 3D scatter plot using Plotly.

# Answer:
# Use plotly.express.scatter_3d() to create an interactive 3D scatter plot, providing the data and specifying the x, y, and z columns.

data_3d = {'X': [1, 2, 3, 4, 5],
           'Y': [5, 4, 3, 2, 1],
           'Z': [2, 3, 1, 4, 5],
           'Color': ['red', 'blue', 'green', 'yellow', 'purple']}
df_3d = pd.DataFrame(data_3d)

fig_3d = px.scatter_3d(df_3d, x='X', y='Y', z='Z', color='Color', title='3D Scatter Plot using Plotly')
fig_3d.show()




