# Analysis and Performance evaluation of Terapixel rendering in (Super)Cloud Computing Data

## 1.Introduction

*The purpose of this study is to analyze the IoT environmental data collected by Newcastle Urban Observatory for the city of Newcastle Upon Tyne. The main objective is to evaluate and explore performance timings of render application and GPU card and in each task, the details of which part of the image was being rendered. Terapixel images are rendered using a scalable cloud-based visualization architecture. The Terapixel image, once created, allows for interactive exploration of the city and its data at a wide range of sensing scales ranging from the entire city to a separate desk in a room, and is accessible via a broad range of thin client devices. CRISP-DM (Cross- Industry Standard Process for Data Mining) model will be used in this data analysis. This project will be entirely dedicated to the EDA (Exploratory Data Analysis) process.*

## 2. Data Exploration Planning and Analysis Requirement:

Tera scope terapixel data is subjected to preliminary analysis to better understand the data and provide information to business stakeholders. Based on the data set, this analysis of GPU cards and XY coordinates will aid in rendering Terapixel images in an efficient and effective manner.

### 2.1 Data Exploration Planning:

* Assessing the event types that dominate task runtimes.
* Examining the correlation between GPU temperature and performance.
* Analyzing the connection between increased power draw and render time.
* Identifying GPU cards (based on their serial numbers) whose performance differs to other cards.  
* Exploring the effectiveness of the task scheduling process.

### 2.2 Analysis Plan and Requirement:

The analysis strategy for this report is to investigate the three data sets generated while different virtual machines render 3D images on 1024 GPU nodes during a run. This run is divided into three jobs to render the data visualization output, which show performance timing of the render application, performance of the GPU card, and details of which part of the image was rendered in each task. The requisite is to understand the data, then clean and preprocess the data before performing exploratory data analysis. This analysis will aid in the betterment of the rendering process.

In [1]:
# Installing pandasql

!pip install pandasql



In [2]:
# Importing all the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import datetime
from pandasql import sqldf
from datetime import datetime
from pandas import Series
sqlfn = lambda q: sqldf(q, globals())

## 3. Data Understanding:

This analysis utilizes data from a dataset generated during a run with 1024 nodes. This analysis will be performed using three csv files application-checkpoints.csv, gpu.csv, task-x-y.csv. Below all the three csv files are read:

In [3]:
# Reading all the three CSV files

appcheckpoint=pd.read_csv("application-checkpoints.csv")
gpu=pd.read_csv("gpu.csv")
xy=pd.read_csv("task-x-y.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'application-checkpoints.csv'

Table **application-checkpoints.csv**:

In [None]:
# Displaying first three rows of appcheckpoint table

appcheckpoint.head(3)

In [None]:
# Displaying data types of columns in appcheckpoint table

appcheckpoint.dtypes

Table **gpu.csv**

In [None]:
# Displaying first three rows of GPU table

gpu.head(3)

In [None]:
# Displaying data types of columns in gpu table

gpu.dtypes

Table **task-x-y.csv**

In [None]:
# Displaying first three rows of task-x-y table

xy.head(3)

In [None]:
# Displaying data types of columns in xy table

xy.dtypes

In [None]:
# Displaying the number of rows and columns in all the three tables
print("Number of rows and columns in appcheckpoint table : ",appcheckpoint.shape)
print("Number of rows and columns in gpu table : ",gpu.shape)
print("Number of rows and columns in xy table : ",xy.shape)

## 4. Data Preparation:

After Data Understanding, the data is cleaned and preprocessed in preparation for further data analysis. There are some redundant (duplicate) data values in this data set that must be cleaned before further processing.

Count of all the duplicates rows in all the three tables are displayed

In [None]:
# Count of duplicate records in appcheckpoint table

appcheckpoint.duplicated().sum()

In [None]:
# Count of duplicate records in gpu table

gpu.duplicated().sum()

In [None]:
# Count of duplicate records in xy table

xy.duplicated().sum()

Duplicate records from appcheckpoint table is dropped

In [None]:
# Removing duplicate records from appcheckpoint table

appcheck = appcheckpoint.drop_duplicates()
appcheck

Duplicate records from gpu table is dropped

In [None]:
# Removing duplicate records from gpu table

gpufinal = gpu.drop_duplicates()
gpufinal

Duplicate records from xy table is dropped

In [None]:
# Removing duplicate records from xy table

xyfinal = xy.drop_duplicates()
xyfinal

To ensure clarity, the Time Stamp format in appcheckand gpu is changed from object to DateTime64.

In [None]:
# Changing the Datatype column of Timestamp to DateTime64
appcheck["timestamp"] = appcheck["timestamp"].apply(lambda x: pd.Timestamp(x))

# Displaying each columns Datatype
appcheck.dtypes

The gpuSerial column in the gpufinal table is converted from numeric to object format in order to perform additional analysis.

In [None]:
# Changing the Datatype column of Timestamp to DateTime64
gpufinal["timestamp"] = gpufinal["timestamp"].apply(lambda x: pd.Timestamp(x))
# Changing the Datatype column of Timestamp to String
gpufinal["gpuSerial"] = gpufinal["gpuSerial"].apply(lambda x: str(x))

# Displaying each columns Datatype
gpufinal.dtypes

The start and stop times is calculated by using the eventType from appcheck table.

In [None]:
# Creating a new Data Frame for Event Type Start
appcheck_start = appcheck.loc[appcheck["eventType"] == "START"]

# Creating a new Data Frame for Event Type Stop
appcheck_stop = appcheck.loc[appcheck["eventType"] == "STOP"]

# Merging dataframes for Both Start and Stop Time
appcheck_start_stop = pd.merge(appcheck_start, appcheck_stop,  how='left', on=['hostname','eventName','jobId','taskId'],
                         suffixes=('_Start_Time', '_Stop_Time'))

# Displaying first three rows of appcheck_start_stop table
appcheck_start_stop.tail(4)

Event Render Time is calculated based on the variation between start and stop EventType times.

In [None]:
# Calculating Event Render Time based on the difference between Start and Stop Time
appcheck_start_stop["Event_RenderTime"] = (appcheck_start_stop["timestamp_Stop_Time"] - appcheck_start_stop["timestamp_Start_Time"]).dt.total_seconds()

# Displaying each columns Datatype
appcheck_start_stop.dtypes

In [None]:
# Displaying last three rows of appcheck_start_stop table
appcheck_start_stop.tail(3)

In [None]:
# Changing the Datatype column of Event Render Time to String
appcheck_start_stop["Event_RenderTime"] = appcheck_start_stop["Event_RenderTime"].astype(str)

# Changing the Datatype column of Event Render Time to Float32
appcheck_start_stop["Event_RenderTime"] = appcheck_start_stop["Event_RenderTime"].astype('float32')

# Displaying last three rows of appcheck_start_stop table
appcheck_start_stop.tail(3)

For further interpretation, Event Render Time of appcheck table is sorted by Hostname.

In [None]:
# Sorting appcheck_start_stop Dataframe based on Hostname
appcheck_start_stop.sort_values(by=["hostname"], inplace=True)

# Displaying last seven rows of appcheck_start_stop table
appcheck_start_stop.tail(7)

In [None]:
# DataFrame is filtered based on Total Render
host_PT = appcheck_start_stop.loc[appcheck_start_stop["eventName"] == "TotalRender"]

# DataFrame Sorted based on Hostname
host_PT = host_PT.groupby(by=["hostname"], as_index=False).mean()

# Displaying first three rows of host_PT table
host_PT.head(3)

In [None]:
# Displaying Column names of gpufinal table
gpufinal.columns

Grouping gpufinal table based on hostname and other performance parameters:

In [None]:
# Grouping the dataframe based on hostname
gpu_PF = gpufinal[["hostname","gpuUUID","gpuSerial","powerDrawWatt","gpuTempC","gpuUtilPerc","gpuMemUtilPerc"]].groupby(by=["hostname","gpuUUID","gpuSerial"], as_index=False).mean()

# Displaying last five rows of gpu_PF
gpu_PF.tail(5)

A TP data frame is created by combining host PT and gpu PF based on the hostname.

In [None]:
# Merging two dataframes based on hostname
TP = pd.merge(host_PT, gpu_PF, on="hostname")

# Displaying first three rows of TP
TP.head(3)

In [None]:
# Displaying first three rows of appcheck_start_stop
appcheck_start_stop.head(3)

xyfinal1 is made by combining xyfinal and appcheck_start_stop on the basis of Total Render (Entire task). The xy_level8 data frame is created for the level 8 image by grouping Event Render Time by x and y coordinates.

In [None]:
# Merging dataframes xyfinal, appcheck_start_stop for Total Render
xyfinal1 = pd.merge(xyfinal, appcheck_start_stop.loc[appcheck_start_stop["eventName"]=="TotalRender"],
                       how="inner", on=["jobId","taskId"], suffixes=("_task","_ap"))

# Creating xy_level8 and Sorting Event Render Time for Level 8
xy_level8 = xyfinal1[["Event_RenderTime","x","y"]].loc[xyfinal1["level"]==8]

# Event Render Time for x and y Coordinates for level 8
xy_level8 = xy_level8.groupby(by=["x","y"], as_index=False).mean()
xy_level8.head(5)

The Level RT table is created for calculating Average Event Render Time for the level 4, 8, and 12.

In [None]:
# Creating a new data frame for level and event render time
Level_RT = xyfinal1[["level","Event_RenderTime"]]

# Grouping based on level
Level_RT = Level_RT.groupby(by=["level"], as_index=False).mean()

# Displaying the table Level_RT
Level_RT

## 5. Exploratory Data Analysis:

Exploratory Data Analysis is used to analyze data and envision models on Terapixel preprocessed data. This analysis will help to improve the image rendering process.

### Assessing the event types that dominate task runtimes:

In [None]:
# Grouping the dataframe based on Event Name and Event Render Time
Event_Tab = appcheck_start_stop[["eventName","Event_RenderTime"]].groupby(by="eventName", as_index=False).mean()

# Sorting based on Event Render Time
Event_Tab.sort_values(by="Event_RenderTime", ascending=False, inplace=True)

# Resetting dataframe index
Event_Tab = Event_Tab.reset_index(drop=True)

# Display the Event_Tab table
Event_Tab

In [None]:
# Sorting based on Event Render Time
Event_Tab = Event_Tab.sort_values(by='Event_RenderTime', ascending=True)

In [None]:
# Plotting a barplot using seaborn library for Events and Average Render Time
sns.set_style('whitegrid')
sns.set_context('notebook', font_scale=1.0)
plt.subplots(figsize=(6,5))
palette = {i: "red" if i == "TotalRender" else 'black' for i in Event_Tab["eventName"]}
RenderTime_Plot = sns.barplot(data=Event_Tab, x="eventName", y="Event_RenderTime",palette=palette)

# Formatting the barplot
for g in RenderTime_Plot.patches:
    RenderTime_Plot.annotate(format(g.get_height(), '.7f'),
                       (g.get_x() + g.get_width() / 2., g.get_height()),
                       ha = 'center', va = 'center',
                       xytext = (0, 6),
                       textcoords = 'offset points')

# Adding the labels
plt.title("EventRenderTime", size=23)
plt.xlabel("Events", size = 16)
plt.ylabel("Average Render Time of Events(seconds)", size = 15)
plt.show()

The above bar plot depicts the average render time for all events. The plot shows that Total Render takes 42.6047 seconds of render time, which is the average time required to render the entire task. The Render event takes the most time, with an average of 41.2082 seconds, followed by Uploading and Tiling, which take 1.3936 and 0.9732 seconds, respectively. Saving Config event takes only 0.00247 seconds, which is the shortest average time when compared to other events.

### Correlation between GPU metrics:

GPU utility percentage and GPU memory utility percentage are found to correlate with each other, according to the plot, as GPU utilization keeps increasing with memory utilization. Furthermore, we can deduce that an increase in memory utilization can result in a long event render time. GPU power draw and temperature have the least significant relation. It's also worth noting that render time is inversely related to GPU temperature, so as Event Render time increases, so does GPU temperature.

In [None]:
# Plotting a pair Plot for GPU Measure Comparison
sns.set_style('dark')
plt.figure(figsize=(7,6))
sns.set_context('paper', font_scale=1.4)
p = sns.pairplot(data=TP)
p.fig.tight_layout()
p.fig.subplots_adjust(top=0.87)
p.fig.suptitle("GPU Measure Comparison", y=1.0)

### Identification of serial numbers of GPU with least performance:

In [None]:
# Calculating GPU with least performance and grouping by serial number for Top 10
Worst = """ select  gpuSerial, Event_RenderTime As "Time" from TP
 group by gpuSerial order by Time desc limit 10; """
GPU_Worst = pd.DataFrame(sqlfn(Worst)) 

In [None]:
# Ordering GPU with least performance in Descending order
GPU_Worst.index.name='Order'

In [None]:
# Displaying the dataframe GPU_Worst
GPU_Worst

The bar plot below depicts GPU card runtime. The GPU card with serial number 2821 has the worst efficiency, with a mean render time of 47.038776 seconds, followed by the GPU card with serial number 8645, which has a render time of 47.013439 seconds. Leaving GPU card ending with serial number 5378 with a 0.020271 second delay in render time from GPU card ending with serial number 5378. Leaving GPU cards 8802 and 1137 with render times of 46.399696 and 46.350880 seconds, respectively, with slightly better performance than the other GPU cards.

In [None]:
# Creating Figure and gridspec
import matplotlib.pyplot as plt
fg = plt.figure(figsize=(11,9), dpi= 85)
pg = plt.GridSpec(5, 4, hspace=0.6, wspace=0.3)
subp = fig.add_subplot(grid[:-2, :-1])

#Plotting Bar chart for GPU with least Performance(Top 10)
import seaborn as sns
g = sns.barplot(data=GPU_Worst, x="gpuSerial", y="Time",order=GPU_Worst.sort_values('Time',ascending = False).gpuSerial,color='yellow',alpha=.6)

# Annotate Text
for i, cty in enumerate(GPU_Worst.Time):
    g.text(i, cty+0.005, round(cty, 3),size = 11, horizontalalignment='center')

# Labelling
g.set_title('GPU with Least Performance (Top 10)', fontdict={'size':12})
g.set(ylabel='Run Time(Sec)',xlabel='GPU Serial Number',ylim=(46.3, 47.1),)
g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment='right')
# coloring the X axis labels
plt.show()

In [None]:
# Displaying DataFrame TP
TP

### Interplay between GPU Temperature and Power Consumption:

The below joint plot shows Interplay between GPU temperature and Power Consumption. It is inferred from the below plot that Power Consumption is directly proportional to GPU temperature as the Power Consumption increased the Temperature of the GPU increases linearly.

In [None]:
# Plotting a Joint Plot for Interplay between GPU Temperature and Power Consumption
sns.set_style('dark')
plt.figure(figsize=(6,5))
sns.set_context('paper', font_scale=1.2)
p = sns.jointplot(x='powerDrawWatt', y='gpuTempC', data=TP, kind='reg',joint_kws={'line_kws':{'color':'black'}})
p.fig.suptitle("Interplay between GPU Temperature and Power Consumption")
p.ax_joint.set_xlabel('Power Consumption (in watt)')
p.ax_joint.set_ylabel('GPU Temperature (in C)')
p.fig.tight_layout()
p.fig.subplots_adjust(top=0.87)

print(f'The Correlation Coefficient between GPU Temperature and Power Consumption = {np.round(np.corrcoef(TP.powerDrawWatt, TP.gpuTempC)[0,1], 2)}')

### Variation in Memory Utilization Percentage of the GPU with GPU Temperature:

The scatter plot below depicts the variation in GPU Memory Utilization Percentage with temperature. It depicts two clusters that formed in the same temperature range. When the Memory Utilization Percentage ranges from 28 to 32 and 34 to 37, the GPU temperature varies. There is no linearity between the two. There is no significant relationship between GPU Memory Utilization Percentage and temperature because they are independent of one another.

In [None]:
# Plotting a Joint Plot for Correlation coefficient between GPU Temperature and Memory Utilization
sns.set_style('dark')
plt.figure(figsize=(6,5))
sns.set_context('paper', font_scale=1.2)
sns.scatterplot(x='gpuMemUtilPerc', y='gpuTempC', data=TP)
plt.title('Correlation coefficient between GPU Temperature and Memory Utilization')
plt.xlabel('GPU Memory Utilization')
plt.ylabel('GPU Temperature(in C)')

print(f'The Correlation Coefficient between GPU Temperature and Memory Utilization = {np.round(np.corrcoef(TP.gpuMemUtilPerc, TP.gpuTempC)[0,1], 2)}')

### Event Render time for each co-ordinate of the Rendered Tile (Level 8):

In [None]:
# Plotting heatmap Event Render time for each co-ordinate of the Rendered tile for level 8
sns.set(font_scale=0.8)
sns.set_context('paper', font_scale=1.4)
fig, ax = plt.subplots(figsize=(15,10))
glue = xy_level8[["x","y","Event_RenderTime"]].pivot("x", "y", "Event_RenderTime")
ax = sns.heatmap(glue, cmap='YlGnBu', annot=False, xticklabels=False, yticklabels=False)
ax.tick_params()

# Adding Title to the visual
plt.title("Heatmap based on Event Render time for each co-ordinate of the Rendered tile for level 8", size=20,fontweight='bold' )

The heatmap for Event Render Time for each coordinate of the rendered tile for level 8 is shown above. Shades of green to blue represent tiles with steadily rising event render time, while shades of green represent tiles with average event render time and shades of yellow tiles represent areas with very little time taken for rendering.

----------------------Completion-------------------