<h2>Initial Data Analysis</h2>

<h3>Importing Libraries</h3>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes



<h3>Reading Scraped Data</h3>

In [2]:
# authenticate
credential = DefaultAzureCredential()

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="###",
    resource_group_name="###",
    workspace_name="###",
)

In [3]:
version = "Cleaned_Data"
# get a handle of the data asset and print the URI
data_asset = ml_client.data.get(name="Car-Data", version=version)
print(f"Data asset URI: {data_asset.path}")

# read into pandas - note that you will see 2 headers in your data frame - that is ok, for now

df = pd.read_csv(data_asset.path)

Data asset URI: azureml://subscriptions/144c7089-5d3d-40fa-bfaf-6ffb69774b59/resourcegroups/AML-sdk-v2-RG1/workspaces/AML-sdk-v2-RG1-WS1/datastores/workspaceblobstore/paths/LocalUpload/38a8ca735227b6f486f243a31bb53bea/Cleaned_Data.csv


<pre>1. What is the maximum Power (kW) for cars with more than 4 Valves per Cylinder?</pre>

In [7]:
df[df['Valves_Per_Cylinder'] > 4]['Power(kw)'].max()

331.0

<pre>2. What is the average CO2 Emissions for cars with Diesel fuel system?</pre>

In [8]:
df["Fuel_System"].unique()

array(['multipoint injection', 'direct injection',
       'singplepoint injection', 'common rail', 'multijet',
       'indirect injection'], dtype=object)

In [9]:
df[df['Fuel_System'] == 'common rail']['Co2_Emissions(g/km)'].mean()

183.7797619047619

<pre>3. What is the average Engine Capacity for cars with more than 200 kW Power?</pre>

In [10]:
df[df['Power(kw)'] > 200]['Engine_Capacity(cc)'].mean()

3923.4779586756285

<pre>4. What is the average Engine Capacity for each Fuel System?</pre>

In [11]:
df.groupby('Fuel_System')['Engine_Capacity(cc)'].mean()

Fuel_System
common rail               2151.466690
direct injection          2176.804210
indirect injection        2494.200000
multijet                  1850.625000
multipoint injection      2061.938166
singplepoint injection    1220.035714
Name: Engine_Capacity(cc), dtype: float64

In [12]:
df["Valves_Per_Cylinder"].unique()

array([4.        , 2.        , 5.        , 3.        , 3.74708926])

<pre>5. What is the average Max Power RPM for cars with more than or equal to 4 Valves per Cylinder?</pre>

In [13]:
df[df['Valves_Per_Cylinder'] >= 4]['Max_Power_Rpm'].mean()

5575.527492072628

<pre>6. Best Fuel System based on average Co2 Emissions?</pre>

In [14]:
df.groupby('Fuel_System')['Co2_Emissions(g/km)'].mean().sort_values()

Fuel_System
singplepoint injection    154.678571
multijet                  169.000000
direct injection          169.122047
common rail               183.779762
multipoint injection      196.179815
indirect injection        246.800000
Name: Co2_Emissions(g/km), dtype: float64

<pre>7. What is the average Engine Capacity for cars with Gasoline fuel type</pre>

In [15]:
df[(df['Fuel_Type'] == 'gasoline')]['Engine_Capacity(cc)'].mean()

2090.7510537748817

<pre>8. Average Co2 Emissions released from each Fuel Type</pre>

In [16]:
df.groupby('Fuel_Type')['Co2_Emissions(g/km)'].mean().sort_values()

Fuel_Type
gasoline / bio ethanol    121.666667
lpg / gasoline            141.000000
gasoline                  186.182648
diesel                    187.184783
Name: Co2_Emissions(g/km), dtype: float64

<pre>9.  What is the average Engine Capacity for cars with Co2 Emissions less than 220?</pre>

In [17]:
df[(df['Co2_Emissions(g/km)'] < 220)]['Engine_Capacity(cc)'].mean()

1676.9892337597853

<pre>10. How many cars have Co2 Emissions between 220 and 250 (inclusive) and Diesel fuel type?</pre>

In [18]:
df[(df['Co2_Emissions(g/km)'] >= 220) & (df['Co2_Emissions(g/km)'] <= 250) & (df['Fuel_Type'] == 'diesel')].shape[0]

16