---
# Theoretical Answers of Data Toolkit:
---
---
## 1. What is NumPy, and why is it widely used in Python?
NumPy (Numerical Python) is a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. It is widely used because:

It is highly efficient due to optimized C and Fortran implementations.                                                             
It allows vectorized operations, making computations faster.                                                         
It integrates well with other libraries like Pandas, SciPy, and TensorFlow.

---

## 2. How does broadcasting work in NumPy?
Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes by automatically expanding the smaller array to match the larger array's shape.         
Example:

    import numpy as np
    a = np.array([1, 2, 3])
    b = np.array([[1], [2], [3]])
    result = a + b  # b is broadcasted to match a’s shape
    print(result)
              
---
## 3. What is a Pandas DataFrame?
A DataFrame is a two-dimensional, labeled data structure in Pandas that resembles an Excel spreadsheet. It consists of rows and columns and can store heterogeneous data types (e.g., integers, floats, strings).

---
## 4. Explain the use of the groupby() method in Pandas.
The groupby() method is used to group data based on one or more columns and then apply aggregate functions like mean, sum, count, etc,
Example:

    import pandas as pd
    df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40]})
    grouped = df.groupby('Category').sum()
    print(grouped)

---
## 5. Why is Seaborn preferred for statistical visualizations?
Seaborn is preferred because:

It provides built-in statistical functions and plots.             
It has better default aesthetics compared to Matplotlib.        
It simplifies complex visualizations like correlation heatmaps and violin plots.

---
## 6. What are the differences between NumPy arrays and Python lists?
| Feature      | NumPy Arrays                     | Python Lists                       |
|-------------|----------------------------------|------------------------------------|
| **Speed**   | Faster (vectorized operations)  | Slower (element-wise processing)  |
| **Memory**  | Uses less memory                | Uses more memory                  |
| **Operations** | Supports broadcasting       | No broadcasting                   |
| **Homogeneity** | Must have same data type  | Can have mixed data types         |

---
## 7.What is a heatmap, and when should it be used?    
A heatmap is a graphical representation of data using colors. It is commonly used to visualize correlations between variables.                                               
Example in Seaborn:

    import seaborn as sns
    import numpy as np
    import pandas as pd

    data = np.random.rand(5,5)
    df = pd.DataFrame(data, columns=list('ABCDE'))
    sns.heatmap(df, annot=True)

---
## 8. What does the term “vectorized operation” mean in NumPy?
Vectorized operations allow element-wise operations on arrays without explicit loops, making computations faster.        
Example:

    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    print(a + b)  # Vectorized addition

---
## 9. How does Matplotlib differ from Plotly?

| Feature        | Matplotlib               | Plotly                                |
|---------------|-------------------------|--------------------------------------|
| **Interactivity** | Static plots           | Interactive plots                    |
| **Customization** | High                   | Higher                               |
| **3D Support**    | Limited                | Strong                               |
| **Ease of Use**   | Requires manual settings | Easier for interactive visualizations |

---
## 10. What is the significance of hierarchical indexing in Pandas?                                            
Hierarchical indexing allows multiple index levels in a DataFrame, enabling better data organization and faster querying.

---
## 11. What is the role of Seaborn’s pairplot() function?       
pairplot() creates scatter plots for all numerical variables in a dataset, helping visualize relationships.                  
Example:

    import seaborn as sns
    df = sns.load_dataset("iris")
    sns.pairplot(df, hue="species")

---
## 12. What is the purpose of the describe() function in Pandas?                                              
The describe() function provides summary statistics (count, mean, std, min, max, etc.) for numerical columns.

---
## 13. Why is handling missing data important in Pandas?   
Handling missing data prevents errors in analysis and ensures accurate insights.                       
Techniques include:

Dropping missing values (dropna())           
Filling missing values (fillna())

---
## 14. What are the benefits of using Plotly for data visualization?                                      
Interactive and dynamic visualizations                                           
Supports multiple chart types (3D, maps, etc.)                      
Easy to use with Dash for web applications

---
## 15. How does NumPy handle multidimensional arrays?     
NumPy uses ndarray, which supports multiple dimensions (2D, 3D, etc.). Operations can be performed efficiently across axes.

---
## 16. What is the role of Bokeh in data visualization?

Bokeh is a Python library for interactive and web-based visualizations. It supports zooming, panning, and real-time updates.

---
## 17. Explain the difference between apply() and map() in Pandas.
apply() is used for row/column-wise operations in a DataFrame.
map() applies functions to Series (one column).

    df['new_col'] = df['Values'].apply(lambda x: x*2)  # apply() for DataFrame  
    df['new_col'] = df['Values'].map(lambda x: x*2)  # map() for Series  

---
## 18. What are some advanced features of NumPy?   

Linear algebra (numpy.linalg)    

Fourier transforms (numpy.fft)

Random number generation (numpy.random)

Memory mapping for large datasets

---
## 19. How does Pandas simplify time series analysis?     

Pandas provides built-in time series functions like:

to_datetime() for date conversions             
Resampling (resample())                        
Rolling window statistics (rolling())

---
## 20. What is the role of a pivot table in Pandas?          
A pivot table summarizes data by grouping and aggregating values, similar to Excel pivot tables.              

Example:

    df.pivot_table(values='Values', index='Category', aggfunc='sum')

---
##21. Why is NumPy’s array slicing faster than Python’s list slicing?
NumPy arrays use contiguous memory blocks and optimized C implementations, making slicing operations much faster than Python lists.

---
## 22. What are some common use cases for Seaborn?  

Correlation heatmaps                           
Pairplots and scatter plots                            
Categorical plots (bar, violin, box plots)                        
Regression plots

---
---
# Practical Answers:
---
---
## 1. Create a 2D NumPy array and calculate the sum of each row

In [None]:
import numpy as np

# Creating a 2D NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Sum of each row
row_sums = np.sum(arr, axis=1)
print(row_sums)

## 2. Write a Pandas script to find the mean of a specific column in a DataFrame

In [None]:
import pandas as pd

# Creating a DataFrame
data = {'Name': ['A', 'B', 'C'], 'Score': [85, 90, 78]}
df = pd.DataFrame(data)

# Finding the mean of the "Score" column
mean_score = df['Score'].mean()
print(mean_score)

## 3. Create a scatter plot using Matplotlib

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.random.rand(50)
y = np.random.rand(50)

# Creating a scatter plot
plt.scatter(x, y, color='blue')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()

## 4. Calculate the correlation matrix using Seaborn and visualize it with a heatmap



In [None]:
import seaborn as sns
import pandas as pd
import numpy as np

# Creating a random DataFrame
data = np.random.rand(5, 5)
df = pd.DataFrame(data, columns=list('ABCDE'))

# Calculating correlation matrix
corr_matrix = df.corr()

# Visualizing with a heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

## 5. Generate a bar plot using Plotly

In [None]:
import plotly.express as px
import pandas as pd

# Sample data
df = pd.DataFrame({'Category': ['A', 'B', 'C'], 'Values': [10, 20, 30]})

# Creating a bar plot
fig = px.bar(df, x='Category', y='Values', title="Bar Plot")
fig.show()

## 6. Create a DataFrame and add a new column based on an existing column

In [None]:
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['F', 'A', 'I', 'Z'], 'Score': [85, 90, 78, 82]})

# Adding a new column based on an existing column
df['Grade'] = df['Score'].apply(lambda x: 'Pass' if x >= 80 else 'Fail')
print(df)

## 7. Perform element-wise multiplication of two NumPy arrays

In [None]:
import numpy as np

# Creating two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise multiplication
result = a * b
print(result)

## 8. Create a line plot with multiple lines using Matplotlib

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Creating data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Plotting multiple lines
plt.plot(x, y1, label='sin(x)', color='blue')
plt.plot(x, y2, label='cos(x)', color='red')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot with Multiple Lines')
plt.legend()
plt.show()

## 9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

In [None]:
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['A', 'B', 'C', 'D'], 'Score': [85, 90, 75, 60]})

# Filtering rows where Score > 80
filtered_df = df[df['Score'] > 80]
print(filtered_df)

## 10. Create a histogram using Seaborn to visualize a distribution

In [None]:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Generating random data
data = np.random.randn(1000)

# Creating a histogram
sns.histplot(data, bins=30, kde=True)
plt.title("Histogram of Data Distribution")
plt.show()

## 11. Perform matrix multiplication using NumPy

In [None]:
import numpy as np

# Creating two matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
result = np.dot(A, B)
print(result)

## 12. Use Pandas to load a CSV file and display its first 5 rows

In [None]:
import pandas as pd

# Load CSV file
df = pd.read_csv('data.csv')

# Display first 5 rows
print(df.head())


## 13. Create a 3D scatter plot using Plotly

In [None]:
import plotly.express as px
import pandas as pd
import numpy as np

# Creating random data
df = pd.DataFrame({
    'x': np.random.rand(50),
    'y': np.random.rand(50),
    'z': np.random.rand(50)
})

# Creating a 3D scatter plot
fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()

---
---
#Thank You! 😊
---
---