# Data Toolkit Theory Questions:

1. What is NumPy, and why is it widely used in Python?

   -> NumPy is a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions. It's widely used due to its speed, efficiency, and ability to integrate with other libraries like Pandas, SciPy, and scikit-learn.


2. How does broadcasting work in NumPy?

   -> Broadcasting allows NumPy to perform operations on arrays of different shapes and sizes without explicitly replicating data. It works by stretching the dimensions of arrays so element-wise operations can be applied.


3. What is a Pandas DataFrame?

 -> A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns) in Pandas.


4. Explain the use of the groupby() method in Pandas.

   -> The groupby() method splits data into groups based on a key or column, applies a function to each group (like sum or mean), and combines the results. It’s used for aggregation and transformation tasks.

5. Why is Seaborn preferred for statistical visualizations?

   -> Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. It integrates well with Pandas and simplifies complex visualizations like heatmaps, violin plots, and pair plots.


6. What are the differences between NumPy arrays and Python lists?


*   NumPy arrays are more memory efficient and faster.

*   They support vectorized operations.
*   Arrays must contain elements of the same data type, unlike lists.





7. What is a heatmap, and when should it be used?

 -> A heatmap is a data visualization technique that uses color to represent the values of a matrix. It's useful for identifying patterns, correlations, or missing values in datasets.


8. What does the term “vectorized operation” mean in NumPy?

 -> It refers to operations that are applied element-wise across arrays without using explicit loops, making computations faster and more concise.


9. How does Matplotlib differ from Plotly?



*   Matplotlib is static and customizable for basic plots.

*   Plotly provides interactive and dynamic plots, better suited for dashboards and web apps.



10. What is the significance of hierarchical indexing in Pandas?

 -> Hierarchical indexing allows for multi-level indexing of rows or columns, enabling complex data structures like panels or nested data to be represented more intuitively.


11. What is the role of Seaborn’s pairplot() function?

 -> pairplot() visualizes pairwise relationships between numerical variables in a dataset, often used for exploratory data analysis.


12. What is the purpose of the describe() function in Pandas?

 -> It generates descriptive statistics like mean, std, min, and quartiles for numerical columns in a DataFrame.


13. Why is handling missing data important in Pandas?

 -> Missing data can skew analysis and lead to incorrect conclusions. Pandas offers functions to detect, remove, or impute missing values to ensure data integrity.


14. What are the benefits of using Plotly for data visualization?


*   Interactive plots

*   Easy integration with web applications
*   Support for a wide variety of chart types





15. How does NumPy handle multidimensional arrays?

 -> NumPy uses the ndarray object to store arrays of any dimension. It provides tools like reshape(), transpose(), and axis operations to manipulate these arrays.


16. What is the role of Bokeh in data visualization?

 -> Bokeh is used for interactive web-based visualizations. It enables users to create dynamic plots that respond to UI elements like sliders or dropdowns.


17. Explain the difference between apply() and map() in Pandas.


*   map() is used for element-wise transformations on Series.
*   apply() works on both Series and DataFrames, applying functions to rows or columns.




18. What are some advanced features of NumPy?


*   Broadcasting

*   Masked arrays

*   Linear algebra operations
*   Random sampling


*   Fourier transforms



19. How does Pandas simplify time series analysis?

 -> Pandas provides datetime indexing, resampling, rolling statistics, and time zone handling to simplify time series data processing.


20. What is the role of a pivot table in Pandas?

 -> Pivot tables summarize data, allowing you to aggregate and rearrange it based on categories (rows/columns), similar to Excel pivot tables.


21. Why is NumPy’s array slicing faster than Python’s list slicing?

 -> NumPy arrays are stored in contiguous memory blocks, allowing efficient slicing and memory access. Python lists are arrays of pointers, making slicing slower.


22. What are some common use cases for Seaborn?


*   Visualizing distributions (histograms, KDEs)
*   Plotting categorical data (box plots, bar plots)


*   Correlation heatmaps
*   Pairwise relationship analysis




#Data Toolkit Pactical Questions:

In [1]:
#1.How do you create a 2D NumPy array and calculate the sum of each row?

'''
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(arr, axis=1)
print("Row sums:", row_sums)
'''

'\nimport numpy as np\n\narr = np.array([[1, 2, 3], [4, 5, 6]])\nrow_sums = np.sum(arr, axis=1)\nprint("Row sums:", row_sums)\n'

In [2]:
#2.Write a Pandas script to find the mean of a specific column in a DataFrame.

'''
import pandas as pd

df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
mean_val = df['A'].mean()
print("Mean of column A:", mean_val)
'''

'\nimport pandas as pd\n\ndf = pd.DataFrame({\'A\': [10, 20, 30], \'B\': [40, 50, 60]})\nmean_val = df[\'A\'].mean()\nprint("Mean of column A:", mean_val)\n'

In [4]:
#3.Create a scatter plot using Matplotlib.

'''
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot")
plt.show()
'''

'\nimport matplotlib.pyplot as plt\n\nx = [1, 2, 3, 4]\ny = [10, 20, 25, 30]\nplt.scatter(x, y)\nplt.xlabel("X-axis")\nplt.ylabel("Y-axis")\nplt.title("Scatter Plot")\nplt.show()\n'

In [5]:
#4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

'''
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
correlation = df.corr()
sns.heatmap(correlation, annot=True)
plt.title("Correlation Matrix Heatmap")
plt.show()
'''

'\nimport seaborn as sns\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\ndf = pd.DataFrame({\n    \'A\': [1, 2, 3],\n    \'B\': [4, 5, 6],\n    \'C\': [7, 8, 9]\n})\ncorrelation = df.corr()\nsns.heatmap(correlation, annot=True)\nplt.title("Correlation Matrix Heatmap")\nplt.show()\n'

In [6]:
#5.Generate a bar plot using Plotly.

'''
import plotly.express as px

data = {'Fruit': ['Apple', 'Banana', 'Cherry'], 'Count': [10, 15, 7]}
df = pd.DataFrame(data)
fig = px.bar(df, x='Fruit', y='Count', title='Fruit Count')
fig.show()
'''

"\nimport plotly.express as px\n\ndata = {'Fruit': ['Apple', 'Banana', 'Cherry'], 'Count': [10, 15, 7]}\ndf = pd.DataFrame(data)\nfig = px.bar(df, x='Fruit', y='Count', title='Fruit Count')\nfig.show()\n"

In [7]:
#6.Create a DataFrame and add a new column based on an existing column.

'''
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'] * 2
print(df)
'''

"\ndf = pd.DataFrame({'A': [1, 2, 3]})\ndf['B'] = df['A'] * 2\nprint(df)\n"

In [8]:
#7.Write a program to perform element-wise multiplication of two NumPy arrays.

'''
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print("Element-wise multiplication:", result)
'''

'\nimport numpy as np\n\na = np.array([1, 2, 3])\nb = np.array([4, 5, 6])\nresult = a * b\nprint("Element-wise multiplication:", result)\n'

In [9]:
#8.Create a line plot with multiple lines using Matplotlib.

'''
import matplotlib.pyplot as plt

x = [1, 2, 3]
y1 = [1, 4, 9]
y2 = [2, 5, 10]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.title("Multiple Lines")
plt.show()
'''

'\nimport matplotlib.pyplot as plt\n\nx = [1, 2, 3]\ny1 = [1, 4, 9]\ny2 = [2, 5, 10]\nplt.plot(x, y1, label=\'Line 1\')\nplt.plot(x, y2, label=\'Line 2\')\nplt.legend()\nplt.title("Multiple Lines")\nplt.show()\n'

In [10]:
#9.Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

'''
df = pd.DataFrame({'A': [5, 10, 15, 20]})
filtered_df = df[df['A'] > 10]
print(filtered_df)
'''

"\ndf = pd.DataFrame({'A': [5, 10, 15, 20]})\nfiltered_df = df[df['A'] > 10]\nprint(filtered_df)\n"

In [11]:
#10.Create a histogram using Seaborn to visualize a distribution.

'''
import seaborn as sns
import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4]
sns.histplot(data, bins=4)
plt.title("Histogram")
plt.show()
'''

'\nimport seaborn as sns\nimport matplotlib.pyplot as plt\n\ndata = [1, 2, 2, 3, 3, 3, 4]\nsns.histplot(data, bins=4)\nplt.title("Histogram")\nplt.show()\n'

In [12]:
#11.Perform matrix multiplication using NumPy.

'''
import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
result = np.dot(A, B)
print("Matrix Multiplication:\n", result)
'''

'\nimport numpy as np\n\nA = np.array([[1, 2], [3, 4]])\nB = np.array([[5, 6], [7, 8]])\nresult = np.dot(A, B)\nprint("Matrix Multiplication:\n", result)\n'

In [13]:
#12.Use Pandas to load a CSV file and display its first 5 rows.

'''
import pandas as pd

df = pd.read_csv('file.csv')  # Replace with actual file path
print(df.head())
'''

"\nimport pandas as pd\n\ndf = pd.read_csv('file.csv')  # Replace with actual file path\nprint(df.head())\n"

In [14]:
#13.Create a 3D scatter plot using Plotly.

'''
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6],
    'z': [7, 8, 9]
})
fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()
'''

"\nimport plotly.express as px\nimport pandas as pd\n\ndf = pd.DataFrame({\n    'x': [1, 2, 3],\n    'y': [4, 5, 6],\n    'z': [7, 8, 9]\n})\nfig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')\nfig.show()\n"