<a href="https://colab.research.google.com/github/Deepika-Kondapalli3/ml-practice/blob/main/Copy_of_Seaborn_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Instructions for Seaborn Assignment

Welcome to the Seaborn practice exercise! In this assignment, you will use the Iris dataset to create various plots using Seaborn. Please follow these instructions carefully to complete your assignment:

1. **Understand the Dataset**: The dataset used in this assignment is the Iris dataset, which contains measurements of iris flowers from three different species. Familiarize yourself with the dataset’s structure and columns before you start.

2. **Read Each Question Carefully**: Each question is designed to test different Seaborn functionalities. Make sure you understand what is being asked before starting to code.

3. **Write Your Code in the Provided Cells**: For each question, there is a code cell where you should write your solution. Ensure that you use Seaborn functions to create your plots.

4. **Run Your Code**: After writing your code in each cell, run the cell to check if your plot is displayed as expected. Verify that the plots meet the requirements specified in the questions.

5. **Debug if Necessary**: If the plots do not appear as expected, review and debug your code. You may use additional code cells for debugging if needed, but make sure to remove or comment out these extra cells before submission.

6. **Complete All Questions**: Ensure that you answer all the questions provided. Each question tests different Seaborn skills and functionalities, so be sure to address each one.

7. **Review Your Work**: Before submitting, double-check that all plots are correctly labeled, and that your code runs without errors. Ensure that each plot is accurate and meets the requirements.

8. **Download Your Notebook**: Once you have completed all the questions and verified that your solutions are correct, download your notebook file (.ipynb). You can do this by selecting File > Download > Download .ipynb from the Google Colab menu.

9. **Submit Your Assignment**: Upload the downloaded .ipynb file to the designated learning platform for submission.

10. **Verify Submission**: Ensure that the file you uploaded is correct and not corrupted. If there are any issues with the file, you may need to resubmit.

---

Good luck with the assignment!

### Initial Setup
Execute the below code cell for initial setup

In [1]:
# Import necessary libraries
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the Iris dataset using seaborn
data = sns.load_dataset('iris')

# Display the first few rows of the dataset
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


**Q1. Create a pivot table showing the mean sepal_length for each species and Petal Size Category.**

In [29]:
# Write the code here
# Step 1: Create the 'Petal Size Category' column
median_petal_length = data['petal_length'].median()
data['Petal Size Category']= data['petal_length'].apply(lambda x: "Large Petals" if x>median_petal_length else "Small Petals")

# Step 2: Create the pivot table  and display
pivot_table = data.pivot_table(
    values='sepal_length',
    index='species',
    columns='Petal Size Category',
    aggfunc='mean'
)
print(pivot_table)

Petal Size Category  Large Petals  Small Petals
species                                        
setosa                        NaN         5.006
versicolor                  6.256         5.616
virginica                   6.588           NaN


**Q2. Identify the species where sepal_length and sepal_width have the highest combined mean.** print it in the given format:
- `Species with highest combined sepal length and width mean: _______`

In [2]:
# Write the code here
mean_values= data.groupby('species')[['sepal_length', 'sepal_width']].mean()
mean_values['combined_mean']= mean_values['sepal_length']+ mean_values['sepal_width']
species_highest_sepal_sum = mean_values['combined_mean'].idxmax()
print(f"Species with highest combined sepal length and width mean: {species_highest_sepal_sum}")

Species with highest combined sepal length and width mean: virginica


**Q3. Find the percentage of flowers where petal_length is greater than sepal_width.** print it in the given format:
- `Percentage of flowers where petal length > sepal width: _______`

In [32]:
# Write the code here
percentage_petal_gt_sepal = ((data[data['petal_length'] > data['sepal_width']].shape[0] ) / len(data)) * 100
print(f"Percentage of flowers where petal length > sepal width: {percentage_petal_gt_sepal:.2f}%")

Percentage of flowers where petal length > sepal width: 66.67%


**Q4. Find the interquartile range (IQR) for sepal_length.** print it in the given format:
- `Interquartile Range (IQR) for Sepal Length: __________`

In [4]:
# Write the code here
Q1 = data.sepal_length.quantile(0.25)
Q3 =data.sepal_length.quantile(0.75)
IQR = Q3-Q1
print(f"Interquartile Range (IQR) for Sepal Length: {IQR}")

Interquartile Range (IQR) for Sepal Length: 1.3000000000000007


**Q5. Compute the ratio of sepal_length to sepal_width and find the species with the highest average ratio.** print it in the given format:
- `Species with highest average Sepal Ratio: _______`

In [5]:
# Write the code here
data['ratio']=data['sepal_length'] / data['sepal_width']
species_highest_sepal_ratio = data.groupby('species')['ratio'].mean().idxmax()
print(f"Species with highest average Sepal Ratio: {species_highest_sepal_ratio}")

Species with highest average Sepal Ratio: virginica


**Q6. You need to analyze the `petal_length`, group the data by the `species`, then find the three highest `petal_length` values within each species group. Display the result showing the species name, the corresponding row indices, and their petal lengths.**
```
species        
setosa      N     N
            N     N
            N     N
versicolor  N     N
            N     N
            N     N
virginica   N     N
            N     N
            N     N
Name: _____, dtype: float64
```

In [44]:
# Write the code here

top_3_petal_length =data.groupby('species')['petal_length'].nlargest(3)
print(top_3_petal_length)

species        
setosa      24     1.9
            44     1.9
            5      1.7
versicolor  83     5.1
            77     5.0
            52     4.9
virginica   118    6.9
            117    6.7
            122    6.7
Name: petal_length, dtype: float64


**Q7. Find the percentage of flowers where sepal_length is greater than twice the petal_length, with 2 decimals.** print it in the given format:

- `Percentage of flowers where sepal length > 2x petal length: ______`

In [37]:
# Write the code here
percentage_sepal_gt_2x_petal = ((data[data['sepal_length'] > 2*data['petal_length']].shape[0]) / len(data))*100
print(f"Percentage of flowers where sepal length > 2x petal length: {percentage_sepal_gt_2x_petal:.2f}%")

Percentage of flowers where sepal length > 2x petal length: 33.33%


**Q8. Find the species that has the lowest mean sepal_length but the highest mean petal_length.** print it in the given format:
- `Species with lowest mean sepal length: ________`
- `Species with highest mean petal length: ________`

In [25]:
# Write the code here

lowest_sepal_species = data.groupby('species')['sepal_length'].mean().idxmin()
highest_petal_species = data.groupby('species')['petal_length'].mean().idxmax()
print(f"Species with lowest mean sepal length: {lowest_sepal_species}")
print(f"Species with highest mean petal length: {highest_petal_species}")

Species with lowest mean sepal length: setosa
Species with highest mean petal length: virginica


**Q9. Create a new feature Sepal Area (sepal_length × sepal_width) and find the species with the highest average sepal area.** print it in the given format:
- `Species with highest average Sepal Area: ________`

In [26]:
# write your code here
data['Sepal Area']= data['sepal_length'] * data['sepal_width']
species_highest_sepal_area = data.groupby('species')['Sepal Area'].mean().idxmax()
print(f"Species with highest average Sepal Area: {species_highest_sepal_area}")

Species with highest average Sepal Area: virginica


**Q10. Identify the row with the maximum petal_length and display it**

In [45]:
median_petal_length = data["petal_length"].median()
data["Petal Size Category"] = data["petal_length"].apply(lambda x: "Large Petals" if x > median_petal_length else "Small Petals")

data["Sepal Ratio"] = data["sepal_length"] / data["sepal_width"]

data["Sepal Area"] = data["sepal_length"] * data["sepal_width"]

max_petal_length_row = data.loc[data["petal_length"].idxmax()]
print(max_petal_length_row)

sepal_length                    7.7
sepal_width                     2.6
petal_length                    6.9
petal_width                     2.3
species                   virginica
ratio                      2.961538
Petal Size Category    Large Petals
Sepal Area                    20.02
Sepal Ratio                2.961538
Name: 118, dtype: object


**Q11. Find the species with the lowest interquartile range (IQR) for sepal_width.** print it in the given format:
- `Species with lowest IQR for sepal width: ______`

In [42]:
# Write your code here
iqr_sepal_width = (data.groupby('species')['sepal_width'].quantile(0.75)) - (data.groupby('species')['sepal_width'].quantile(0.25))

species_lowest_iqr =iqr_sepal_width.idxmin()
print(f"Species with lowest IQR for sepal width: {species_lowest_iqr}")

Species with lowest IQR for sepal width: virginica


# End!