# Data Toolkit

 1. What is NumPy, and why is it widely used in Python?
    >>>NumPy (Numerical Python) is a powerful Python library used for numerical computing. It provides an extensive collection of tools to handle large, multi-dimensional arrays and matrices, along with a vast array of mathematical functions to perform operations on these arrays efficiently.

2. How does broadcasting work in NumPy?
   >>>Broadcasting in NumPy is a powerful feature that allows operations on arrays of different shapes without requiring explicit reshaping. It simplifies code and enhances performance by enabling element-wise operations between arrays of incompatible dimensions.

3. What is a Pandas DataFrame?
   >>>A Pandas DataFrame is a two-dimensional, labeled data structure in the Pandas library, which is widely used for data manipulation and analysis in Python. It can be thought of as a table similar to a spreadsheet, a SQL table, or a dictionary of Series objects.

4. Explain the use of the groupby() method in Pandas?
   >>>The groupby() method in Pandas is a powerful tool used for splitting data into groups, applying operations to each group, and then combining the results. It is widely used for data aggregation, transformation, and analysis.



5. Why is Seaborn preferred for statistical visualizations?
   >>>Seaborn is a Python data visualization library built on top of Matplotlib that is specifically designed for creating statistical visualizations. It is widely preferred for such tasks due to its ease of use, aesthetic appeal, and its ability to handle complex datasets effortlessly.

6. What are the differences between NumPy arrays and Python lists?
   >>>NumPy arrays and Python lists are both used to store collections of data, but they differ significantly in terms of functionality, performance, and use cases.



7. What is a heatmap, and when should it be used?
   >>>A heatmap is a data visualization technique that uses colors to represent the magnitude of values in a matrix or table. The values in the matrix are encoded as colors, where higher or lower values are represented by different intensities or hues of the color scale.



8. What does the term “vectorized operation” mean in NumPy?
   >>>The term “vectorized operation” in NumPy refers to performing element-wise operations on entire arrays without the need for explicit loops in Python. This is made possible by NumPy’s highly optimized, low-level C implementations that operate on the entire array at once.



9.  How does Matplotlib differ from Plotly?
    >>>Matplotlib and Plotly are two popular Python libraries for data visualization. While they both serve the purpose of creating graphs and charts, they differ significantly in features, capabilities, and use cases.

10. What is the significance of hierarchical indexing in Pandas?
    >>>Hierarchical indexing, also known as MultiIndexing, is a powerful feature in Pandas that allows you to use multiple levels of indexing on a DataFrame or Series. It is particularly useful for working with data that has a natural structure with multiple dimensions, such as time series data, geographic data, or data grouped by multiple categories.

11. What is the role of Seaborn’s pairplot() function?
    >>>he pairplot() function in Seaborn is used to create a matrix of scatter plots to visualize relationships between multiple variables in a dataset. It is particularly useful for exploring pairwise interactions between numerical features. Each subplot shows a scatter plot for a pair of features, and along the diagonal, you can see univariate plots (like histograms or KDE plots) for each individual feature.

12. What is the purpose of the describe() function in Pandas?
    >>>The describe() function in Pandas provides a quick summary of the statistics for the numerical columns of a DataFrame or Series. It is commonly used to gain an overview of the distribution and key statistical properties of the data.

13. Why is handling missing data important in Pandas?
    >>>Handling missing data in Pandas is crucial because missing or null values can have a significant impact on the quality of your analysis and the performance of machine learning models.

14. What are the benefits of using Plotly for data visualization?
    >>>Plotly is a versatile and robust tool for creating interactive and high-quality data visualizations. Its ease of use, flexibility, support for a wide variety of plot types, and integration with other tools make it an excellent choice for both exploratory data analysis and professional presentations. It is particularly valuable when you need to make your visualizations interactive or build web-based dashboards and applications.

15. How does NumPy handle multidimensional arrays?
    >>>NumPy provides powerful support for multidimensional arrays, enabling efficient storage and manipulation of large datasets with complex structures. It handles multidimensional arrays using its primary object called ndarray (short for "n-dimensional array").

16. What is the role of Bokeh in data visualization?
    >>>Bokeh is an interactive data visualization library for Python that is specifically designed to create dynamic, visually appealing plots for modern web browsers. It focuses on creating interactive and web-friendly visualizations, making it a powerful tool for data scientists, analysts, and developers working on dashboards, reports, or applications.

17. A Explain the difference between apply() and map() in Pandas?
   >>>In Pandas, both apply() and map() are used to apply functions to data, but they differ in their usage, functionality, and scope.

18. What are some advanced features of NumPy?
    >>>NumPy is a highly powerful library in Python for numerical computing, offering many advanced features that go beyond basic array manipulation. These features make NumPy essential for a wide range of applications, from scientific computing to machine learning.

19. How does Pandas simplify time series analysis?
    >>>Pandas simplifies time series analysis by providing powerful tools and features that make it easy to work with time-based data. These features enable efficient manipulation, analysis, and visualization of time series data, which is often used in domains such as finance, economics, and weather forecasting.

20. What is the role of a pivot table in Pandas?
    >>>A pivot table in Pandas is a powerful tool for summarizing and aggregating data, often used to transform data into a more useful format for analysis and reporting. It allows you to reshape the data by grouping it based on one or more columns and applying aggregate functions (such as sum, mean, count, etc.) to the grouped data.

21. Why is NumPy’s array slicing faster than Python’s list slicing?
    >>>NumPy's array slicing is faster than Python's list slicing for several key reasons related to how NumPy arrays are implemented and optimized at a lower level, compared to standard Python lists.

22. What are some common use cases for Seaborn?
    >>>Seaborn is a powerful and flexible Python visualization library built on top of Matplotlib, designed to make it easier to generate informative and aesthetically pleasing statistical plots. It provides high-level functions that simplify the process of creating complex visualizations, making it a go-to tool for many data scientists and analysts.

# Practical

1. How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
import numpy as np
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])
row_sums = np.sum(arr, axis=1)

# Output the result
print("Original Array:")
print(arr)

print("\nSum of each row:")
print(row_sums)


2. Write a Pandas script to find the mean of a specific column in a DataFrame?

In [None]:
import pandas as pd
data = {
    'Name': ['Ram', 'Shyam', 'Vivek', 'Abhay', 'Ankit'],
    'Age': [23, 30, 35, 40, 25],
    'Salary': [70000, 80000, 120000, 95000, 85000]
}
df = pd.DataFrame(data)
mean_salary = df['Salary'].mean()
print("Mean Salary:", mean_salary)


3. Create a scatter plot using Matplotlib?


In [None]:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.scatter(x, y)
plt.show()

4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 4, 5, 6],
    'D': [5, 3, 2, 4, 1]
}
df = pd.DataFrame(data)
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix Heatmap')
plt.show()


5. Generate a bar plot using Plotly?

In [None]:
import plotly.graph_objects as go
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [10, 20, 30, 40]
fig = go.Figure(data=[
    go.Bar(x=categories, y=values, marker_color='orange', text=values, textposition='auto')
])

fig.update_layout(
    title='Example Bar Plot',
    xaxis_title='Categories',
    yaxis_title='Values',
    template='plotly_white'
)

fig.show()


6. Create a DataFrame and add a new column based on an existing column?

In [None]:
import pandas as pd
data = {
    'Name': ['Ram', 'Shyam', 'Vivek', 'Akhilesh'],
    'Age': [24, 27, 22, 32]
}
df = pd.DataFrame(data)

df['Category'] = df['Age'].apply(lambda age: 'Young' if age < 25 else 'Adult')

print(df)


7. Write a program to perform element-wise multiplication of two NumPy arrays?

In [None]:
import numpy as np
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
result = array1 * array2
print("Array 1:", array1)
print("Array 2:", array2)
print("Result of element-wise multiplication:", result)


8.  Create a line plot with multiple lines using Matplotlib?

In [None]:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
plt.plot(x, y1, label='Line 1')
y2 = [3, 6, 9, 12, 15]
plt.plot(x, y2, label='Line 2')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Multiple Line Plot')
plt.legend()
plt.show()

9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold?

In [None]:
import pandas as pd
data = {
    'Name': ['Ram', 'Shyam', 'Vivek', 'Akhilesh'],
    'Age': [24, 27, 22, 32]
}
df = pd.DataFrame(data)
threshold_age = 25
filtered_df = df[df['Age'] > threshold_age]
print(filtered_df)


10. Create a histogram using Seaborn to visualize a distribution?


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
data = [1, 2, 2, 3, 3, 3, 4, 4, 5, 5]
sns.histplot(data, kde=True)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()

11. Perform matrix multiplication using NumPy?

In [None]:
import numpy as np
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
result = np.dot(matrix1, matrix2)
print("Matrix 1:")
print(matrix1)
print("\nMatrix 2:")
print(matrix2)
print("\nResult of Matrix Multiplication:")
print(result)

12. Use Pandas to load a CSV file and display its first 5 rows?

In [None]:
imort pandas as pd
file_path = 'data.csv'
df = pd.read_csv(file_path)
print(df.head())

13. Create a 3D scatter plot using Plotly?

In [None]:
import plotly.graph_objects as go
import numpy as np
x = np.random.rand(100)
y = np.random.rand(100)
z = np.random.rand(100)