<a href="https://colab.research.google.com/github/Deepika-Kondapalli3/ml-practice/blob/main/Matplotlib_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions for Matplotlib Assignment

Welcome to the Matplotlib practice exercise! In this assignment, you will use the Iris dataset to practice creating various types of plots. Please follow these instructions carefully to complete your assignment:

1. **Understand the Dataset**: The dataset used in this assignment is the Iris dataset, which contains measurements of iris flowers from three different species. Familiarize yourself with the dataset’s structure and columns before starting.

2. **Read Each Question Carefully**: Each question in this assignment is designed to test different Matplotlib functionalities. Make sure to understand what is being asked before you start coding.

3. **Write Your Code in the Provided Cells**: For each question, a code cell is provided where you should write your solution. Do not modify any other parts of the notebook.

4. **Run Your Code**: After writing your code, run each cell to ensure that your solution works as expected. Verify that the plots match the requirements specified in the question.

5. **Debug if Necessary**: If your plots do not appear as expected, review your code and make any necessary corrections. You may use additional code cells for debugging, but make sure to remove or comment out these extra cells before submission.

6. **Complete All Questions**: Ensure that you address all the questions provided. Each question tests a different aspect of Matplotlib, so be sure to complete each one.

7. **Review Your Work**: Before submitting, double-check that all plots are correctly labeled and that the code runs without errors. Ensure that each plot is displayed as required.

8. **Download Your Notebook**: Once you have completed all the questions and verified that your solutions are correct, download your notebook file (.ipynb). You can do this by selecting `File` > `Download` > `Download .ipynb` from the Google Colab menu.

9. **Submit Your Assignment**: Upload the downloaded .ipynb file to the designated learning platform for submission.

10. **Verify Submission**: Ensure that you have uploaded the correct file and that it is not corrupted. If there are any issues with the file, you may need to resubmit.

---

Good luck with the assignment!

### Initial Setup
Execute the below code cell for initial setup

In [52]:
# Import necessary libraries
import seaborn as sns
import pandas as pd

# Load the Iris dataset using seaborn
data = sns.load_dataset('iris')

# Display the first few rows of the dataset
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Q1. What are the minimum and maximum values of petal_width for each species?


In [53]:
# Calculate the min and max values of petal_width for each species
# Write the code here
petal_width_min_max =data.groupby('species')['petal_width'].agg(['min','max'])
print(petal_width_min_max)

            min  max
species             
setosa      0.1  0.6
versicolor  1.0  1.8
virginica   1.4  2.5


Q2. For each species, calculate the range (maximum - minimum) of the sepal_width

In [54]:
# Calculate the range of sepal_width for each species
# Write the code here
sepal_width_range =data.groupby('species')['petal_width'].agg(['min','max'])
sepal_width_range['range']=sepal_width_range['max']- sepal_width_range['min']
print(sepal_width_range[['range']])

            range
species          
setosa        0.5
versicolor    0.8
virginica     1.1


Q3. Calculate the average sepal length for each species

In [55]:
# Write the code here
avg_sepal_length =data.groupby('species')['sepal_length'].mean()
print(avg_sepal_length)

species
setosa        5.006
versicolor    5.936
virginica     6.588
Name: sepal_length, dtype: float64


Q4. Count the number of samples with sepal length > 5.0 cm

In [56]:
# Write the code here
sepal_length_above_5 = data[data['sepal_length'] > 5.0].shape[0]
print(sepal_length_above_5)

118


Q5. Determine how many species have average sepal width > 2.5 cm

In [57]:
# Write the code here
avg_sepal_width = data.groupby('species')['sepal_width'].mean()
species_above_2cm = avg_sepal_width[avg_sepal_width> 2.5].count()
print(species_above_2cm)

3


Q6. Compute the standard deviation of sepal length for 'virginica', in the format example(0.11223... cm), rounded to 15 decimal points

In [58]:
# Write the code here
virginica_sepal_length_std =data[data['species']=='virginica']['sepal_length'].std()
print(f"{virginica_sepal_length_std: .15f} cm")

 0.635879593274432 cm


Q7. Calculate the correlation coefficient between sepal_length and petal_length

In [59]:
# Calculate the correlation coefficient
# Write the code here
correlation = data['sepal_length'].corr(data['petal_length'])
print(correlation)

0.8717537758865831


Q8. Calculate the percentage of samples with petal length between 1.5 cm and 4.5 cm

In [60]:
# Write the code here
count_in_range = data[(data['petal_length'] >= 1.5) & (data['petal_length']<= 4.5)].shape[0]

percentage = (count_in_range/ data.shape[0]) * 100
print(percentage)


42.0


Q9. Group data by species and calculate the max **petal length** for each species

In [61]:
# Group data by species and calculate the max petal length for each species
# Write the code here
petal_length_by_species =data.groupby('species')['petal_length'].max()
print(petal_length_by_species)

species
setosa        1.9
versicolor    5.1
virginica     6.9
Name: petal_length, dtype: float64


Q10. What is the interquartile range (IQR) for the **petal_length** of the Iris dataset?

In [62]:
# Calculate the IQR for petal_length
# Write the code here
petal_length_q1 =data['petal_length'].quantile(0.25)
petal_length_q3 =data['petal_length'].quantile(0.75)
petal_length_iqr =petal_length_q3 - petal_length_q1
print(petal_length_iqr)

3.4999999999999996


Q11. For each species, calculate the difference between the maximum and minimum values of **sepal_width**. Identify the species with the largest difference.
- Print the result as a tuple in the format and no need to round it:
Example Output - ('string', numeric value)

In [63]:
# Calculate the difference between the max and min sepal_width for each species
sepal_width_diff =data.groupby('species')['sepal_width'].agg(['max', 'min'])
sepal_width_diff['diff'] = sepal_width_diff['max'] - sepal_width_diff['min']
# Identify the species with the largest difference
largest_diff_species =sepal_width_diff['diff'].idxmax()
largest_diff_species, sepal_width_diff.loc[largest_diff_species, 'diff']

print((largest_diff_species, sepal_width_diff.loc[largest_diff_species, 'diff']))

('setosa', np.float64(2.1000000000000005))


Q12. Find the species with the highest ratio of petal_length to petal_width. Provide the species name and the ratio.

In [64]:
# Calculate the ratio of petal_length to petal_width for each row
data['petal_ratio'] = data['petal_length'] / data['petal_width']

# Find the species with the highest average petal_ratio
highest_petal_ratio_species = data.groupby('species')['petal_ratio'].mean().idxmax()

highest_petal_ratio_species, data.groupby('species')['petal_ratio'].mean().max()

print((highest_petal_ratio_species, data.groupby('species')['petal_ratio'].mean().max()))

('setosa', 6.9079999999999995)


# End!