In [62]:
import warnings 
warnings.filterwarnings("ignore")

# **Data Visualization With Python**

## Downloading and viewing the data

Firstly, we start off with reading the file. We shall do that with the library pandas.

Step 1: Import Pandas (the library should be installed upon running the ```pip install -r requirements.txt``` command in your terminal. If it isn't, try to refresh your jupyter notebook or [refer to the following guide](https://www.geeksforgeeks.org/how-to-install-python-pandas-on-windows-and-linux/) to install it.)



Step 2: Read the file by running the line of code: pd.read_excel(filename), where filename is the name of the excel file, ending with ".xlsx"


Step 3: Check that the file has been read properly - this can be done by printing the result. If you see a bunch of data that means it is read properly.

In [None]:
# Step 1: Import Pandas Library
import pandas as pd

# Step 2: Read File
cars_data = pd.read_excel("Cars_below_50k_v2_11-12-2023.xlsx")

# Step 3: Check that file is read properly
print(cars_data.head())

# **Bar Chart with Matplotlib**

Next, we shall try to make a bar graph through the matplotlib library. 

Matplotlib is a commonly used library used to produce simple data visualisations in Python.

In [None]:
import matplotlib.pyplot as plt

# Calculates the average price for each car type and casts it into a new dataframe
average_price_df = cars_data.groupby('Car_type')['Price'].mean().reset_index()

# Plots the bar chart for average prices based on each car type
plt.bar(average_price_df['Car_type'], average_price_df['Price'], color='skyblue')

plt.title('Average Price vs Car Type') # Creates the title of the figure
plt.xlabel('Car Type') # labels the x axis of the figure
plt.ylabel('Average Price') # labels the y axis of the figure

plt.xticks(rotation=45) # rotates the x-ticks (names of each car) 45 degrees to allow for space
plt.show()

# Identify the highest and lowest average prices
highest_avg_price = average_price_df[average_price_df["Price"]==average_price_df["Price"].max()]
lowest_avg_price = average_price_df[average_price_df["Price"]==average_price_df["Price"].min()]

print(f"Highest Average Price is for the {highest_avg_price.values[0][0]} at: ${highest_avg_price.values[0][1]:.2f}")
print(f"Lowest Average Price is for the {lowest_avg_price.values[0][0]} at : ${lowest_avg_price.values[0][1]:.2f}")

# **Line Graph with Matplotlib**

In [None]:
plt.figure(figsize=(20, 12)) # Create a figure object (base template)
price_by_reg_year = cars_data.groupby('Reg_year')['Price'].mean() # Gets the Mean Price of Cars Based on Year of Registration

plt.plot(price_by_reg_year,marker='o',color='green') # plot mean price by registration year; circular marker in green
plt.title('Average Car Price Over Registration Years') # set title of plot
plt.xlabel('Registration Year') # set x axis label of plot
plt.ylabel('Average Price') # set y axis label of plot

plt.grid(True) # additional method that can be set to true if you want to see the background grid -> can change to False to see diff
plt.show()

# **Scatter Plot with Pandas**

In Pandas, plots can be made virtually instantly through the ```.plot()``` method which can be called on dataframes. By default, this method created a line plot but also supports visualisations like histograms, bar charts, scatter plots and more! Furthermore, as pandas is built atop matplotlib for Data Visualization Techniques -> working with such plots is very similar to; and indeed can include, matplotlib on the side.

In [None]:
cars_data.plot(x = 'Car_weight_kg', y= 'Engine_cap_cc', kind = 'scatter') # .plot method invoked on dataframe
# x axis is weight of car, y axis is engine capacity -> kind is set to scatterplot
plt.title('Car Weight vs Engine Capacity')
plt.xlabel('Car Weight (kg)')
plt.ylabel('Engine Capacity (cc)')

# **Pie Chart with Pandas**

Step 1: Count the occurrences of each unique car type using .value_counts().

Step 2: Plot the pie chart using .plot.pie() on the car_type_counts data

Step 3: Customize title and y-axis label.

Step 4: Display the pie chart using plt.show().

In [None]:
# Step 1: Count the occurrences of each unique car type
car_type_counts = cars_data['Car_type'].value_counts()  # value_counts: counts the number of occurrences of unique car types

# Step 2: Plot the pie chart using the counted data
car_type_counts.plot.pie(autopct='%1.1f%%')  # format to one decimal place percentage

# Step 3: Customize title and y-axis label
plt.title('Pie Chart of Car Types')
plt.ylabel('')  # Hide the y-label

# Step 4: Display the pie chart
plt.show()

# **Histogram with Seaborn**

Seaborn is another Data Visualization Library used by Data Analysts and Scientists. It is well known for its more aesthetic plot outputs compared to matplotlib for a relatively comparable coding experience. Seaborn also allows for plots that would generally require a lot of code and layering in matplotlib to be run fairly easily. There are some benefits to using Seaborn (and Matplotlib) for plotting, which we will explore here.

In [None]:
# Step 1: Import seaborn and matplotlib
import seaborn as sns
import matplotlib.pyplot as plt

# Step 2: Use sns.histplot() to generate a histogram with the variable you want to examine
sns.histplot(cars_data['Price'], color='pink')

# Step 3: Customize the title, x-axis label, and y-axis label
plt.title('Histogram of Car Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')

# Step 4: Show plot
plt.show()

## Try yourself: Histogram with Engine_cap_cc

***(Your turn) What if we change the variable to Engine_cap_cc? Plot the histogram and share the insights that you found!***

In [None]:
# Step 1: Import seaborn and matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
print(cars_data.info())
# Step 2: Use sns.histplot() to generate a histogram with the variable you want to examine
sns.histplot(cars_data["_______"], color="_____") # For histograms, you can work on columns containing numerical datatypes
# You can also feel free to change the color argument of the histogram; it is recommended to stick to basic colours like
# blue/ black/ white/ red etc.

# Step 3: Customize the title, x-axis label, and y-axis label
plt.title('Histogram of __________')
plt.xlabel("________")
plt.ylabel('Frequency')

# Step 4: Show plot
plt.show()

# **Boxplot with Seaborn**

In [None]:
# Step 1: give a size for the figure (general area to plot in)
plt.figure(figsize=(10, 6))

# Step 2: create the boxplot
sns.boxplot(x='Car_type', y='Price', data=cars_data, palette='cubehelix')

# Step 3: customise the boxplot
plt.title('Car Prices by Car Type')
plt.xlabel('Car Type')
plt.ylabel('Price')
plt.xticks(rotation=45)

#Step 4: show the boxplot
plt.show()

## Try yourself: Boxplot

In [None]:
plt.figure(figsize=(10, 6)) # You can also change the figsize parameters to amend the size of your resulting plot

filtered_data = cars_data[cars_data['Car_type'].isin(["Sports"])]
print(filtered_data)

sns.boxplot(y='____', data=filtered_data, palette='____') # Your boxplot should be on 1 numeric variable
# Feel free to explore the list of palette options on seaborn here: https://seaborn.pydata.org/tutorial/color_palettes.html
# You can choose to be Pastel/ Deep/ Bright !

plt.title('Boxplot of Sports Car Prices')
plt.ylabel('_____')
plt.xticks(rotation=45)

# **Heatmap with plotly**

Finally, Plotly is an open source library that allows for interactive plotting in Python! Just like matplotlib and seaborn, plotly also allows for plots like line plots, scatter charts , heatmaps and histograms - but also allows them to be interactive and even 3D! Such plots, while slightly more computationally expensive, can allow for a more interesting way to analyse and present data to other groups of people and even to you yourself!

In [None]:
# Step 1: Import plotly.graph_objs
import plotly.graph_objs as go

# Step 2: Select the relevant columns for the heatmap
selected_columns = cars_data[['Price', 'Car_weight_kg', 'Engine_cap_cc', 'Manufacture_date']]

# Step 3: Calculate the correlation matrix for the selected columns
correlation_matrix = selected_columns.corr()

# Step 4: Creating the heatmap using go.Heatmap() and correlation matrix
heatmap = go.Heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns,
    y=correlation_matrix.index,
    colorscale='Plasma'
)

# Step 5: Create a figure containing the heatmap, update the layout with a title, and display the heatmap
fig = go.Figure(data=[heatmap])
fig.update_layout(title='Heatmap of Selected Features')
fig.show()

# **Radar Chart using Plotly**

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

categories = ['Price', 'Engine_cap_cc', 'Car_weight_kg']
car_types = cars_data['Car_type'].unique() # select all unique car types

# Standardize the data by dividing each attribute by the mean of that attribute across all cars
standardized_data = cars_data[categories].apply(lambda x: x / x.mean())

# Radar chart data for each car type after standardization
values_list = []
for car_type in car_types:
    values = standardized_data[cars_data['Car_type'] == car_type].mean().values
    values = np.append(values, values[0])  # Close the radar chart
    values_list.append(values)

# Prepare the radar chart data for Plotly
fig = go.Figure()

# Generate the chart for each car type
for values, car_type in zip(values_list, car_types):
    fig.add_trace(go.Scatterpolar(
        r=values,
        theta=categories + [categories[0]],  # Close the radar chart
        fill='toself',
        name=car_type
    ))

# Update layout for better visualization
fig.update_layout(
    polar=dict(
        radialaxis=dict(visible=True, range=[0, 2.0])  # Set range to keep shapes within the circle
    ),
    title="Comparison of Car Types (Standardized)",
    showlegend=True
)

# Show the plot
fig.show()

# **Map using Plotly**

**Creating a new dataset**

In [None]:
# Sample data: cities with their latitude, longitude, and population
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Latitude': [40.7128, 34.0522, 41.8781, 29.7604, 33.4484],
    'Longitude': [-74.0060, -118.2437, -87.6298, -95.3698, -112.0740],
    'Population': [8419000, 3980400, 2716000, 2328000, 1690000]
}

# Creating a pandas DataFrame
cities_data = pd.DataFrame(data)
print(cities_data)

**Plotting a map**

In [None]:
import plotly.express as px

# Create a scatter mapbox plot
fig = px.scatter_mapbox(cities_data, lat="Latitude", lon="Longitude",
                        size="Population", hover_name="City", zoom=3)

# Set the map style
fig.update_layout(mapbox_style="open-street-map")

## **3D Plot using Plotly**

**Creating a new dataset**

In [None]:
data = {
 'X': np.random.rand(50) * 100, # Random values between 0 and 100
 'Y': np.random.rand(50) * 100, # Random values between 0 and 100
 'Z': np.random.rand(50) * 100 # Random values between 0 and 100
}

df_3d = pd.DataFrame(data)
print(df_3d.head())

**Displaying the 3D plot**

In [None]:
fig = go.Figure(data=[go.Scatter3d(
 x=df_3d['X'],
 y=df_3d['Y'],
 z=df_3d['Z'],
 mode='markers')])

fig.update_layout(
 title='3D Scatter Plot', scene=dict(
 xaxis_title='X Axis',
 yaxis_title='Y Axis',
 zaxis_title='Z Axis')
)