- Questions on the following topics covered so far:-

1. NumPy
2. Pandas
3. Matplotib
4. Seaborn 

Q1. Seaborn and Pandas - Create a Scatter Plot.

Load the iris dataset using Seaborn and create a scatter plot to visualize the relationship between sepal_length and sepal_width. Color the points based on the species.

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load iris dataset
iris = sns.load_dataset('iris')

# Create scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species')
plt.title('Scatter Plot of Sepal Length vs Sepal Width')
plt.show()

Q2: Matplotlib and NumPy - Plot a Sine Wave.

Generate an array of values from 0 to 2π using NumPy and plot a sine wave using Matplotlib.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Create x-axis values
x = np.arange(0, 5, 0.1)  # Generates an array from 0 to 4.9 with steps of 0.1

# Create y-axis values (sine wave)
y = np.sin(x)

# Plot the sine wave
plt.plot(x, y)

# Add labels and title
plt.xlabel("X-axis")
plt.ylabel("Sine(X)")
plt.title("Sine Wave Plot")

# Display the plot
plt.show()


Q3. Pandas - Handling Missing Data.

Load a dataset with missing values and demonstrate how to handle missing data by filling the null values with the mean of the column.

In [None]:
import pandas as pd
import numpy as np

# Sample data with missing values
data = {'Age': [25, 30, np.nan, 35], 'Name': ['Alice', 'Bob', None, 'David']}
df = pd.DataFrame(data)

# Check for missing values (count)
missing_counts = df.isnull().sum()
print(missing_counts)

# Fill missing values with mean (Age) and most frequent value (Name)
df['Age'] = df['Age'].fillna(df['Age'].mean())
df['Name'] = df.fillna(df['Name'].mode()[0])  # Assuming a single most frequent value

# Print modified DataFrame (showing missing values handled)
print(df)


Q4.Seaborn and Matplotlib - Create a Box Plot

Load the tips dataset from Seaborn and create a box plot to show the distribution of total bills for each day of the week.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  # Optional, for data generation

# Generate some sample data (optional, replace with your data)
x = np.random.randn(100)  # Random values for x-axis
y = 3 * x + np.random.randn(100)  # Linear relationship with noise

# Create the scatter plot using Seaborn
sns.scatterplot(x, y)

# Customize the plot using Matplotlib (optional)
plt.title("Scatter Plot with Seaborn and Matplotlib")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)  # Add grid lines for better readability

# Display the plot
plt.show()


Q5. Pandas and NumPy - Data Manipulation

Create a DataFrame with random values and add a new column that is the square of an existing column.

In [None]:
import pandas as pd
import numpy as np

# Create DataFrame with random values
df = pd.DataFrame({
    'A': np.random.rand(10),
    'B': np.random.rand(10)
})

# Add new column that is the square of column A
df['A_squared'] = df['A'] ** 2
print(df)

Q6. Seaborn and Matplotlib - Create a Histogram

Load the diamonds dataset from Seaborn and create a histogram to visualize the distribution of diamond prices.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sample test score data (replace with your data if needed)
test_scores = [15, 25, 30, 38, 42, 55, 68, 70, 82, 95, 78, 60, 52, 40, 35, 28, 90, 85, 72, 63]

# Create the histogram using Seaborn
sns.histplot(test_scores)

# Optional customizations with Matplotlib
plt.xlabel("Test Scores (0-100)")
plt.ylabel("Frequency")
plt.title("Distribution of Test Scores")
plt.grid(True)  # Add grid lines for better readability

# Display the plot
plt.show()


Q7. Pandas - Grouping and Aggregation

Load the titanic dataset from Seaborn and calculate the average age of passengers for each class.

In [None]:
import seaborn as sns
import pandas as pd

# Load titanic dataset
titanic = sns.load_dataset('titanic')

# Calculate average age of passengers for each class
average_age = titanic.groupby('class')['age'].mean()
print(average_age)

Q8.  Matplotlib and NumPy - Multiple Subplots

Create a 2x2 grid of subplots using Matplotlib, each showing a different mathematical function (sine, cosine, tangent, and exponential).

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data (replace with your own data)
x = np.linspace(0.0, 5.0, 100)
y1 = np.sin(2 * np.pi * x) * np.exp(-x)
y2 = np.cos(2 * np.pi * x)

# Create a figure with 2 rows and 1 column of subplots
fig, axs = plt.subplots(2, 1)  # 2 rows, 1 column

# Plot data on each subplot
axs[0].plot(x, y1, label='Damped oscillation')  # Subplot at index 0 (top row)
axs[0].set_ylabel('Amplitude')  # Set label for y-axis of top subplot
axs[0].legend()  # Add legend for top subplot (optional)

axs[1].plot(x, y2, label='Undamped')  # Subplot at index 1 (bottom row)
axs[1].set_xlabel('Time (s)')  # Set label for x-axis of bottom subplot
axs[1].set_ylabel('Amplitude')  # Set label for y-axis of bottom subplot
axs[1].legend()  # Add legend for bottom subplot (optional)

# Optional: Set a common title for all subplots
plt.suptitle('Comparison of Oscillations')

# Display the plot
plt.show()
