#THEORY QUESTIONS

1.What is NumPy, and why is it widely used in Python?

- NumPy (Numerical Python) is a Python library for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them. It is widely used because it offers fast computations due to optimized C and Fortran implementations and supports vectorized operations.

2. How does broadcasting work in NumPy

- Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes by automatically expanding the smaller array’s dimensions to match the larger one. This eliminates the need for explicit looping.

4. Explain the use of the group by() method in Pandas.

- The group by() method in Pandas is used to split a dataset into groups based on a specified column and then apply an aggregate function (e.g., sum, mean, count) to those groups.

3.What is a Pandas DataFrame

- A Pandas DataFrame is a two-dimensional, tabular data structure with labeled rows and columns, similar to an Excel spreadsheet or SQL table. It is widely used for data manipulation and analysis.

5. Why is Seaborn preferred for statistical visualizations

- Seaborn is preferred because it provides aesthetically pleasing and informative visualizations with built-in statistical analysis features. It simplifies complex plots like heatmaps, pair plots, and violin plots.

6.What are the differences between NumPy arrays and Python lists?

- NumPy arrays have fixed data types, whereas Python lists can store different data types.
- NumPy arrays offer better performance and memory efficiency due to optimized C implementations.
Operations on NumPy arrays are faster because of vectorization.

7. What is a heatmap, and when should it be used?

- A heatmap is a graphical representation of data where values are represented by color gradients. It is commonly used to visualize correlation matrices and detect patterns in large datasets.

8. What does the term “vectorized operation” mean in NumPy?

- A vectorized operation refers to performing operations on entire arrays without using explicit loops. This makes computations significantly faster.

9.How does Matplotlib differ from Plotly?

- Matplotlib is static and better suited for traditional plots.
Plotly is interactive, allowing zooming, panning, and dynamic visualizations.

10. What is the significance of hierarchical indexing in Pandas?

- Hierarchical indexing allows multiple levels of index labels, making it easier to work with multi-dimensional data in a tabular format.

11. What is the role of Seaborn’s pair plot() function?

- pairplot() creates pairwise scatter plots for numerical data, helping to visualize relationships between multiple variables.

12.What is the purpose of the describe() function in Pandas?

- It provides summary statistics such as count, mean, standard deviation, min, and max for numerical columns.

13. Why is handling missing data important in Pandas?

- Handling missing data prevents inaccuracies in analysis and ensures data consistency.

14. What are the benefits of using Plotly for data visualization?

- Interactive and visually appealing charts
Built-in support for web-based dashboards
Easy integration with Python and JavaScript.

15. How does NumPy handle multidimensional arrays?

- NumPy represents multidimensional arrays using ndarray, where dimensions are referred to as axes.

16. What is the role of Bokeh in data visualization?

- Bokeh is used to create interactive web-based visualizations and dashboards.

17. Explain the difference between apply() and map() in Pandas.
- apply(): Used on DataFrames/Series for applying functions to rows/columns.
- map(): Used on Series to apply functions to each element.

18.What are some advanced features of NumPy?
- Broadcasting
-Linear algebra functions
-Fourier transforms
-Random number generation
-Universal functions (ufuncs)

19. How does Pandas simplify time series analysis?
- Pandas has built-in support for date-time indexing, resampling, and rolling window calculations, making time series analysis easier.

20. What is the role of a pivot table in Pandas?
- A pivot table summarizes and reorganizes data, allowing aggregation like sum, mean, and count.

21.Why is NumPy’s array slicing faster than Python’s list slicing?
- NumPy uses contiguous memory storage, enabling efficient slicing without copying data.

22. What are some common use cases for Seaborn?
- Correlation heatmaps
-Pair plots
-Distribution plots
-Categorical plots

# PRACTICAL QUESTIONS

1.How do you create a 2D NumPy array and calculate the sum of each row.
- import numpy as np
       arr = np.array([[1, 2, 3], [4, 5, 6]])
       row_sums = np.sum(arr, axis=1)
       print(row_sums)

2. Write a Pandas script to find the mean of a specific column in a DataFrame.
- import pandas as pd
      df = pd.DataFrame({'A': [1, 2, 3, 4]})
         print(df['A'].mean())

3. Create a scatter plot using Matplotlib.
- import matplotlib.pyplot as plt
           x = [1, 2, 3, 4]
           y = [10, 20, 25, 30]
           plt.scatter(x, y)
           plt.show()

4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
- import seaborn as sns
  import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

         data = np.random.rand(10, 10)
         df = pd.DataFrame(data)
          corr = df.corr()

       sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

5. Generate a bar plot using Plotly.
- import plotly.express as px
         df = px.data.gapminder().query("year == 2007")
         fig = px.bar(df, x="continent", y="pop")
         fig.show()

6.Create a DataFrame and add a new column based on an existing column.
-                                         df['B'] = df['A'] * 2
     print(df)

7. Write a program to perform element-wise multiplication of two NumPy arrays.
-   arr1 = np.array([1, 2, 3])
          arr2 = np.array([4, 5, 6])
          result = arr1 * arr2
            print(result)

8. Create a line plot with multiple lines using Matplotlib.
- plt.plot([1, 2, 3], [4, 5, 6], label="Line 1")
      plt.plot([1, 2, 3], [7, 8, 9],           label="Line 2")
      plt.legend()
      plt.show()

9.Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.
- df_filtered = df[df['A'] > 2]
        print(df_filtered)

10.Create a histogram using Seaborn to visualize a distribution.
- sns.histplot(df['A'])
      plt.show()

11. Perform matrix multiplication using NumPy.
- mat1 = np.array([[1, 2], [3, 4]])
        mat2 = np.array([[5, 6], [7, 8]])
        result = np.dot(mat1, mat2)
        print(result)

12.Use Pandas to load a CSV file and display its first 5 rows.
- df = pd.read_csv("file.csv")
      print(df.head())

13. Create a 3D scatter plot using Plotly.
- import plotly.graph_objects as go
      fig = go.Figure(data=[go.Scatter3d(x=[1,2,3], y=[4,5,6], z=[7,8,9], mode='markers')])
      fig.show()