#Theoritical Questions  



Que1: What is NumPy, and why is it widely used in Python?

Ans1: NumPy (Numerical Python) is a powerful library for numerical computing in Python. It is widely used because:  
- It provides support for large, multi-dimensional arrays and matrices.  
- It offers optimized mathematical functions for array operations.  
- It allows vectorized operations, making computations faster than traditional Python lists.  
- It is the foundation for many data science and machine learning libraries like Pandas, SciPy, and TensorFlow.  

Que2: How does broadcasting work in NumPy?  

Ans2:  Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes without explicitly replicating data. It works by expanding smaller arrays along a dimension to match the shape of the larger array, ensuring element-wise operations can be performed efficiently.  

In [None]:
#Example
import numpy as np
a = np.array([1, 2, 3])
b = np.array([[1], [2], [3]])
print(a + b)  # Broadcasting occurs


Que3: What is a Pandas DataFrame?

Ans3: A Pandas DataFrame is a two-dimensional, tabular data structure similar to an Excel spreadsheet or SQL table. It consists of labeled rows and columns, allowing easy manipulation and analysis of structured data.

In [None]:
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)


Que4: Explain the use of the groupby() method in Pandas.
Ans4: The groupby() method is used to group data based on one or more columns and perform aggregate functions like sum, mean, count, etc.

In [None]:
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40]})
grouped = df.groupby('Category').sum()
print(grouped)


Que5: Why is Seaborn preferred for statistical visualizations?
Ans5: Seaborn is preferred because:
- It provides built-in support for complex statistical plots.
- It offers better aesthetics and default themes.
- It integrates well with Pandas DataFrames.

In [None]:
#Example:
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(x=[1, 2, 3], y=[4, 5, 6])
plt.show()


Que6: What are the differences between NumPy arrays and Python lists?
Ans6:

Here's a comparison between NumPy arrays and Python lists:

- Data Type Homogeneity:
  
  - NumPy arrays require all elements to be of the same data type, while Python lists can hold elements of mixed data types.

- Memory Efficiency:
  
  - NumPy arrays are more memory-efficient than Python lists, as they store elements contiguously in memory. Python lists, on the other hand, store elements as references, leading to higher memory consumption.

- Performance:
  
  - NumPy arrays are optimized for numerical operations and offer significantly faster performance compared to Python lists, especially for large datasets. This is due to vectorized operations and the use of efficient C implementations.

- Functionality:

  - NumPy provides a wide range of built-in functions for array manipulation, mathematical operations, and linear algebra. Python lists have fewer built-in functionalities and often require loops for similar operations.

- Size:

 - NumPy arrays have a fixed size upon creation, while Python lists are dynamic and can be resized easily by adding or removing elements.

- Mutability:

 - Modifying a NumPy array might involve creating a new array, which can be inefficient. Python lists allow for in-place modifications without creating new lists.

In [None]:
import numpy as np
import time

# Performance comparison
size = 1000000

# Python list
python_list = list(range(size))
start_time = time.time()
sum(python_list)
end_time = time.time()
print("Time taken by Python list:", end_time - start_time, "seconds")

# NumPy array
numpy_array = np.arange(size)
start_time = time.time()
np.sum(numpy_array)
end_time = time.time()
print("Time taken by NumPy array:", end_time - start_time, "seconds")

# Memory comparison
import sys
print("Memory used by Python list:", sys.getsizeof(python_list), "bytes")
print("Memory used by NumPy array:", numpy_array.nbytes, "bytes")

Que7: What is a heatmap, and when should it be used?

Ans7: A heatmap is a visualization technique that represents data using color gradients. It is useful for showing correlations, missing values, or density distributions.

In [None]:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(5, 5)
sns.heatmap(data, annot=True, cmap='coolwarm')
plt.show()


Que8: What does the term "vectorized operation" mean in NumPy?

Ans8: Vectorized operations refer to element-wise operations on arrays without explicit loops, making computations faster and more efficient.

In [None]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)  # Vectorized addition


Que9: How does Matplotlib differ from Plotly?
Ans9:

Matplotlib: Used for static and basic visualizations.

Plotly: Used for interactive visualizations.

In [None]:
#Example of Matplotlib
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()


In [None]:
#Example of Plotly:
import plotly.express as px
fig = px.line(x=[1, 2, 3], y=[4, 5, 6])
fig.show()


Que10: What is the significance of hierarchical indexing in Pandas?
Ans10: Hierarchical indexing allows multiple levels of indexing in a DataFrame, making it easier to represent complex datasets.

In [None]:
import pandas as pd
arrays = [['A', 'A', 'B', 'B'], ['X', 'Y', 'X', 'Y']]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Subgroup'))
df = pd.DataFrame({'Values': [1, 2, 3, 4]}, index=index)
print(df)


Que11: What is the role of Seaborn's  pairplot() function?

Ans11: The pairplot() function visualizes pairwise relationships in a dataset.

In [None]:
import seaborn as sns
df = sns.load_dataset('iris')
sns.pairplot(df, hue='species')


Que12: What is the purpose of the describe() function in Pandas?

Ans12: The describe() function provides summary statistics (count, mean, std, min, max, etc.) for numerical columns in a DataFrame.

In [None]:
import pandas as pd

data = [[10, 18, 11], [13, 15, 8], [9, 20, 3]]

df = pd.DataFrame(data)

print(df.describe())


Que13: Why is handling missing data important in Pandas?

Ans13: Missing data can lead to incorrect analysis. Pandas provides methods like dropna() and fillna() to handle missing values.

In [None]:
import pandas as pd
import numpy as np

data = {'A': [1, 2, np.nan], 'B': [4, np.nan, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

df_dropped = df.dropna() #Removes rows with any NaN values
df_dropped_all = df.dropna(how='all') #Removes rows where all values are NaN
df_col_dropped = df.dropna(axis=1) #Removes columns with any NaN values
df.dropna(inplace=True) #Modifies the DataFrame directly

Que14: What are the benefits of using Plotly for data visualization?

Ans14: The benefits of using Plotly for data visualization are as follows :-

- Interactive graphs
- Supports 3D plots
- Easy to integrate with web applications

Que15: How does NumPy handle multidimensional arrays?

Ans15: NumPy uses ndarray objects to represent multidimensional arrays.

In [None]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
print(a.shape)  # Output: (2,2)


Que16: What is the role of Bokeh in data visualization?

Ans16:  Bokeh is a Python library for interactive and web-friendly visualizations.

In [None]:
from bokeh.plotting import figure, show
p = figure(title="Bokeh Example")
p.line([1, 2, 3], [4, 5, 6])
show(p)


Que17: Explain the difference between apply() and map() in Pandas.
Ans17: The difference between apply() and map() in Pandas :-

- apply() is used to apply a function to rows or columns in a DataFrame.

- map() is used for element-wise transformations on a Series.

In [None]:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df['A'] = df['A'].apply(lambda x: x * 2)  # Using apply()
print(df)

df['B'] = df['B'].map(lambda x: x + 1)  # Using map()
print(df)


Que18: What are some advanced features of NumPy?

Ans18: The advanced features of NumPy are :-

- Broadcasting for efficient operations.
- Memory-mapped files for large datasets.
- Vectorized operations for performance optimization.
- Masked arrays to handle missing data.

In [None]:
Example of masked array:
import numpy as np
masked = np.ma.array([1, 2, 3], mask=[False, True, False])
print(masked)


Que19: How does Pandas simplify time series analysis?
Ans19: Supports DateTimeIndex for time-based indexing.
Provides functions like resample(), shift(), and rolling() for analysis.

In [None]:
df = pd.DataFrame({'Date': pd.date_range(start='1/1/2023', periods=5), 'Values': [10, 20, 30, 40, 50]})
df.set_index('Date', inplace=True)
print(df.rolling(window=2).mean())  # Rolling average


Que20: What is the role of a pivot table in Pandas?

Ans20:A pivot table summarizes data in a DataFrame by aggregating values.

In [None]:
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Sales': [10, 20, 30, 40]})
pivot = df.pivot_table(values='Sales', index='Category', aggfunc='sum')
print(pivot)


Que21: Why is NumPy's array slicing faster than Python's list slicing?

Ans21: NumPy uses contiguous memory storage, making slicing operations efficient.
Python lists store elements individually in memory, making slicing slower.


In [None]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a[1:4])  # Faster slicing


Que22: What are some common use cases for Seaborn?

Ans22: The common use cases for Seaborn are :-

- Correlation analysis (heatmap())
- Pairwise relationships (pairplot())
- Distribution plots (histplot(), boxplot())

In [None]:
import seaborn as sns
df = sns.load_dataset('iris')
sns.boxplot(x='species', y='sepal_length', data=df)


#Practical Questions

Que1: How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
#Ans1:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(a, axis=1))  # Sum of rows


Que2: Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
#Ans2:
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
print(df['A'].mean())  # Mean of column A


Que3: Create a scatter plot using Matplotlib.

In [None]:
#Ans3:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.scatter(x, y)
plt.show()


Que4: How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
#Ans4:
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 5), columns=['A', 'B', 'C', 'D', 'E'])
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')


Que5: Generate a bar plot using Plotly.

In [None]:
#Ans5:
import plotly.express as px
df = pd.DataFrame({'Category': ['A', 'B', 'C'], 'Values': [10, 20, 30]})
fig = px.bar(df, x='Category', y='Values')
fig.show()


Que6: Create a DataFrame and add a new column based on an existing column.

In [None]:
#Ans6:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'Diana'], 'Age': [25, 30, 28, 42]}
df = pd.DataFrame(data)

df['Age_Category'] = pd.cut(df['Age'], bins=[0, 25, 35, 50], labels=['Young', 'Mid-Aged', 'Senior'])

print(df)

Que7: Write a program to perform element-wise multiplication of two NumPy arrays.

In [None]:
#Ans7:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a * b)  # Element-wise multiplication


Que8: Create a line plot with multiple lines using Matplotlib.

In [None]:
#Ans8:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y1 = [10, 20, 25, 30]
y2 = [5, 15, 20, 25]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()


Que9: Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

In [None]:
#Ans9:
df = pd.DataFrame({'A': [10, 20, 30, 40]})
filtered_df = df[df['A'] > 20]
print(filtered_df)


Que10: Create a histogram using Seaborn to visualize a distribution.

In [None]:
#Ans10:
import seaborn as sns
import numpy as np
data = np.random.randn(100)
sns.histplot(data, bins=20, kde=True)


Que11: Perform matrix multiplication using NumPy.

In [None]:
#Ans11:
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b))  # Matrix multiplication


Que12: Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
#Ans12:
df = pd.read_csv('data.csv')
print(df.head())


Que13: Create a 3D scatter plot using Plotly.

In [None]:
#Ans13:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_width', y='sepal_length', z='petal_width', color='species')
fig.show()
