<a href="https://colab.research.google.com/github/Deepika-Kondapalli3/ml-practice/blob/main/Copy_of_Seaborn_Assignmet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Seaborn Assignment

For this assignment, please use Seaborn to generate the required visualizations. Ensure that you create and customize the plots as specified in the notebook. Submit your notebook with all visualizations properly displayed.

## **Instructions for CO2 Emissions Canada Dataset Assignment**  

Welcome to the **CO2 Emissions Data Analysis** assignment! In this assignment, you will use the **CO2 Emissions Canada dataset** to explore vehicle emissions and fuel consumption using **Seaborn** for data visualization. Please follow these instructions carefully to complete your assignment:

---

**1. Understand the Dataset**:
- The dataset contains information on various vehicle models, including their **fuel consumption, engine size, transmission type, and CO2 emissions**.
- Familiarize yourself with the dataset’s **columns and structure** before starting.

**2. Read Each Question Carefully**:
- Each question is designed to test your ability to **visualize and analyze CO2 emissions** using Seaborn. Ensure that you **understand the requirements** before coding.

**3. Write Your Code in the Provided Cells**:
- For each question, a **code cell** is provided. Write your Seaborn-based visualization code in the designated cells.

**4. Run Your Code**:
- After writing your code, **run the cell** to check if your visualization is displayed correctly. Ensure that your plots meet the requirements specified in each question.

**5. Debug if Necessary**: If the plots do not appear as expected:
   - Review your code for errors.
   - Use additional code cells for debugging if needed.
   - Ensure your dataset is properly loaded and processed.

**6. Complete All Questions**:
- Ensure you answer **all questions** provided, as each one tests different **Seaborn functionalities** and **data visualization techniques**.

**7. Review Your Work**: Before submission, check that:
   - All plots are **correctly labeled**.
   - The visualizations are **clear and meaningful**.
   - Your code runs without errors.

**8. Download Your Notebook**:
- Once completed, **download your notebook file (`.ipynb`)** by navigating to: **File > Download > Download `.ipynb`**

**9. Submit Your Assignment**:
- Upload the **downloaded `.ipynb` file** to the designated learning platform.

**10. Verify Submission**:
Double-check your submission to ensure:
   - The uploaded file is correct.
   - There are no **missing plots or errors**.
   - If any issues arise, re-upload the correct file.

---

**Good luck with your CO2 Emissions Analysis!**

In [29]:
# Run this code cell
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

url = 'https://new-assets.ccbp.in/frontend/content/aiml/classical-ml/co2_emissions_canada.csv'

# Upload the data on colab and execute this code cell
data = pd.read_csv(url)
data.head()

Unnamed: 0,Make,Model,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption City (L/100 km),Fuel Consumption Hwy (L/100 km),Fuel Consumption Comb (L/100 km),Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,ACURA,ILX,COMPACT,2.0,4,AS5,Z,9.9,6.7,8.5,33,196
1,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6.0,5.8,5.9,48,136
3,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
4,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244


**1. Find the top 3 vehicles with the highest CO2 emissions for each vehicle class.**

In [30]:
# Step 1: Stable sort by Vehicle Class and CO2 Emissions
data_sorted = data.sort_values(
    by=['Vehicle Class', 'CO2 Emissions(g/km)'],
    ascending=[True, False],
    kind='mergesort'  # stable sort preserves original order for ties
)

# Step 2: Take top 3 CO2 emitters per class (preserve group as multi-index)
top_3_co2_emissions = data_sorted.groupby('Vehicle Class')['CO2 Emissions(g/km)'] \
    .apply(lambda x: x.head(3))

# Step 3: Print
print(top_3_co2_emissions)


Vehicle Class                 
COMPACT                   3181    404
                          3180    403
                          4253    401
FULL-SIZE                 3179    404
                          3178    403
                          4250    400
MID-SIZE                  5511    465
                          75      437
                          3376    430
MINICOMPACT               2212    365
                          13      359
                          19      359
MINIVAN                   2858    296
                          3945    296
                          5012    296
PICKUP TRUCK - SMALL      3503    331
                          4563    331
                          5626    331
PICKUP TRUCK - STANDARD   391     414
                          374     402
                          864     398
SPECIAL PURPOSE VEHICLE   2922    298
                          3995    298
                          5070    298
STATION WAGON - MID-SIZE  7285    386
                   

**2. Find the top 3 car makes with the highest average CO2 emissions.**

In [31]:
# Write the code here

top_3_makes = data.groupby('Make')['CO2 Emissions(g/km)'].mean().sort_values(ascending=False).head(3)
print(top_3_makes)

Make
BUGATTI        522.000000
LAMBORGHINI    400.780488
SRT            389.000000
Name: CO2 Emissions(g/km), dtype: float64


**3. Identify the fuel type that has the lowest average CO2 emissions.** In the given format:
- `Fuel type with lowest average CO2 emissions: N`

In [32]:
# Write the code here

lowest_emission_fuel =  data.groupby('Fuel Type')['CO2 Emissions(g/km)'].mean().idxmin()
print(f"Fuel type with lowest average CO2 emissions: {lowest_emission_fuel}")

Fuel type with lowest average CO2 emissions: N


**4. Add a new column for CO2 emissions per liter of fuel consumed (CO2 Emissions / Fuel Consumption) and list first 10 rows**

In [33]:
# Write the code here
data['CO2 per Liter']= data['CO2 Emissions(g/km)']/ data['Fuel Consumption Comb (L/100 km)']
print(data[['Make', 'Model', 'CO2 per Liter']].head(10))

    Make       Model  CO2 per Liter
0  ACURA         ILX      23.058824
1  ACURA         ILX      23.020833
2  ACURA  ILX HYBRID      23.050847
3  ACURA     MDX 4WD      22.972973
4  ACURA     RDX AWD      23.018868
5  ACURA         RLX      23.000000
6  ACURA          TL      22.970297
7  ACURA      TL AWD      22.972973
8  ACURA      TL AWD      23.017241
9  ACURA         TSX      23.043478


**5. Find the car make with the highest fuel consumption on average.**

In [34]:
# Write the code here

highest_fuel_make = data.groupby('Make')['Fuel Consumption Comb (L/100 km)'].mean().idxmax()
print(f"Car make with highest average fuel consumption: {highest_fuel_make}")

Car make with highest average fuel consumption: BUGATTI


**6. Identify the top 3 car makes that have the most models with CO2 emissions above the 75th percentile.**

In [35]:
# Write the code here

threshold = data['CO2 Emissions(g/km)'].quantile(0.75)
top_makes =  data[data['CO2 Emissions(g/km)'] > threshold]['Make'].value_counts().head(3)
top_makes.index.name= 'Make'
top_makes.name= 'Model'
print(top_makes)

Make
CHEVROLET    231
FORD         205
GMC          197
Name: Model, dtype: int64


**7. Create a new feature that categorizes vehicles into 'Low', 'Medium', and 'High' based on percentiles of the "CO2 Emissions(g/km)" column and get first 10 rows:**

- `Low: ≤ 33rd percentile`
- `Medium: 33rd to 66th percentile`
- `High: > 66th percentile`

In [36]:
# Write the code here
p33= data['CO2 Emissions(g/km)'].quantile(0.33)
p66= data['CO2 Emissions(g/km)'].quantile(0.66)
def categorize_co2(x):
  if x <= p33:
    return "Low"
  elif x <=p66:
    return "Medium"
  else:
    return "High"

data['Emission Category']= data['CO2 Emissions(g/km)'].apply(categorize_co2)
print(data[['Make', 'Model', 'CO2 Emissions(g/km)', 'Emission Category']].head(10))

    Make       Model  CO2 Emissions(g/km) Emission Category
0  ACURA         ILX                  196               Low
1  ACURA         ILX                  221            Medium
2  ACURA  ILX HYBRID                  136               Low
3  ACURA     MDX 4WD                  255            Medium
4  ACURA     RDX AWD                  244            Medium
5  ACURA         RLX                  230            Medium
6  ACURA          TL                  232            Medium
7  ACURA      TL AWD                  255            Medium
8  ACURA      TL AWD                  267            Medium
9  ACURA         TSX                  212               Low


**8. Determine if turbocharged engines (assuming turbocharged cars contain 'T' in model name) have higher average CO2 emissions.**

In [37]:
# Step 1: Create a column to mark turbocharged cars (models containing 'T')
data['Turbocharged'] = data['Model'].str.contains('T', case=False)

# Step 2: Group by Turbocharged and calculate average CO2 emissions
turbo_vs_non_turbo = data.groupby('Turbocharged')['CO2 Emissions(g/km)'].mean()

# Step 3: Print in expected format
print(turbo_vs_non_turbo)

Turbocharged
False    249.110200
True     252.473132
Name: CO2 Emissions(g/km), dtype: float64


 **9. Compute the mean and standard deviation of CO2 emissions for each fuel type, and sort by highest variation.**

In [38]:
# Write the code here
fuel_stats =  data.groupby('Fuel Type')['CO2 Emissions(g/km)'].agg(['mean','std']).sort_values(by='std', ascending=False)
print(fuel_stats)

                 mean        std
Fuel Type                       
X          235.119329  57.401473
Z          266.043410  56.695972
E          275.091892  47.093198
D          237.548571  41.817704
N          213.000000        NaN


**10. Categorize vehicles into engine size (5 bins), calculates the average CO2 emissions for each engine size bin and prints the engine size range that has the highest average CO2 emissions.**

In [39]:
# Step 1: Categorize vehicles into engine size bins
engine_bins = pd.cut(data['Engine Size(L)'], bins=5)

# Step 2: Calculate average CO2 emissions for each bin
avg_co2_per_bin = data.groupby(engine_bins)['CO2 Emissions(g/km)'].mean()
avg_co2_per_bin.index.name = None
# Step 3: Print the result in the expected format
print("Engine size range with highest CO2 emissions: Engine Size Bin")
print(avg_co2_per_bin)


Engine size range with highest CO2 emissions: Engine Size Bin
(0.893, 2.4]    201.630952
(2.4, 3.9]      254.708745
(3.9, 5.4]      312.643119
(5.4, 6.9]      349.776471
(6.9, 8.4]      432.250000
Name: CO2 Emissions(g/km), dtype: float64


**11. Compare the median fuel consumption between vehicles with engine sizes above and below the median engine size.** In the given format:
- `Above median engine size fuel consumption: ____`
- `Below median engine size fuel consumption: ____`

In [40]:
# Write the code here
median_engine_size = data['Engine Size(L)'].median()
above_median_fuel = data[data['Engine Size(L)'] > median_engine_size]['Fuel Consumption Comb (L/100 km)'].median()
below_median_fuel =data[data['Engine Size(L)'] <= median_engine_size]['Fuel Consumption Comb (L/100 km)'].median()
print(f"Above median engine size fuel consumption: {above_median_fuel}")
print(f"Below median engine size fuel consumption: {below_median_fuel}")

Above median engine size fuel consumption: 12.9
Below median engine size fuel consumption: 9.2


**12. Find the vehicle with the best CO2 emission per horsepower ratio (assuming 'Engines Size' correlates with power).**

In [42]:
# Step 1: Calculate CO2 per engine size
data['CO2 per HP'] = data['CO2 Emissions(g/km)'] / data['Engine Size(L)']

# Step 2: Find the vehicle with the lowest CO2 per engine size
best_vehicle_idx = data['CO2 per HP'].idxmin()
best_vehicle = data.loc[[best_vehicle_idx], ['Make', 'Model', 'CO2 per HP']]
print(best_vehicle)

          Make     Model  CO2 per HP
196  CHEVROLET  CORVETTE   41.612903


**13. Find the average CO2 emissions for each make, considering only models with fuel consumption above 10 L/100 km.**

In [43]:
# Write the code here

high_fuel_cars =data[data['Fuel Consumption Comb (L/100 km)'] > 10.0]
avg_co2_per_make = high_fuel_cars.groupby('Make')['CO2 Emissions(g/km)'].mean()
print(avg_co2_per_make)

Make
ACURA            250.793103
ALFA ROMEO       283.285714
ASTON MARTIN     339.617021
AUDI             280.845161
BENTLEY          362.934783
BMW              283.726073
BUGATTI          522.000000
BUICK            266.375000
CADILLAC         281.008547
CHEVROLET        302.255814
CHRYSLER         248.987805
DODGE            285.913636
FORD             284.152993
GENESIS          284.840000
GMC              306.940199
HONDA            258.894737
HYUNDAI          264.377049
INFINITI         273.307692
JAGUAR           280.325758
JEEP             273.548872
KIA              265.150685
LAMBORGHINI      400.780488
LAND ROVER       301.373134
LEXUS            278.088235
LINCOLN          282.536585
MASERATI         318.147541
MAZDA            258.700000
MERCEDES-BENZ    296.652174
MITSUBISHI       258.692308
NISSAN           287.924370
PORSCHE          266.618421
RAM              298.241379
ROLLS-ROYCE      388.480000
SRT              389.000000
SUBARU           260.771429
TOYOTA         

# END