# Features in OLED devices

To fabricate OLED devices, there are various materials that are being used. Different materials are used for hole transporting layer(HTL), hole injection layer(HIL), buffer layer, electron transport layer(ETL) and of course the emissive layer(EML). There are many combination of the material and several architecture used in fabricating these devices. For this model, I am focusing on the BLUE OLED.


Now, let's take a look at the features used for our regression model. 
Every material has a distinct property. However each material has their own energy level, the lowest unoccupied molecular orbital(LUMO) and highest occupied molecular orbital(HOMO) that can be measured measured. Another parameter that can be changed is the thickness of each layer. So, these three features are going to be considered for our model.  

Let's import some dependencies first.

In [None]:
import os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn import preprocessing
%matplotlib inline

In [None]:
# change directory to excel file and read excel file
os.chdir('C:/Anaconda3/projects/oled')
df = pd.read_excel('oled.xlsx', sheetname='Sheet3')
#fill the missing data
df = df.fillna(value=0)

PS: There will be a lot of missing data since devices have different number of layers. Some devices use 2 different layers for charge transport (HTL and ETL) and some devices don't even have some of the layers. Therefore, there will be lots of zeros in the dataframe.

In [None]:
# device efficiency distribution
plt.rcParams['figure.figsize'] = (12.0, 6.0)
efficiency = df['cd/a']
plt.ylabel('number')
plt.xlabel('efficiency')
efficiency.hist()

From the above graph, we can observe that most of our data has efficiency less than 10 Cd/A and the distribution is not normal distribution. It is skewed to the left.

Let's take a look at the distribution of the features as well. They are denoted by X below.

In [None]:
X = df[list(df.columns)[:-5]]
plt.rcParams['figure.figsize'] = (18.0, 18.0)
X.hist()

In [None]:
# we can also see the density of the distribution plot
X.plot(kind='density', subplots=True, layout=(7,3), sharex=False)

Based on the data, we can observe how the features correlate with efficiency.

In [None]:
# See correlation of features with each other and with efficiency
corr = X.select_dtypes(include = ['float64', 'int64']).iloc[:, 1:].corr()
plt.figure(figsize=(15, 15))
sns.heatmap(corr, vmax=1, square=True, cmap="YlGnBu",linecolor='black', annot=False)
plt.yticks(rotation=0)
plt.xticks(rotation=90)
plt.show()

cor_dict = corr['cd/a'].to_dict()
del cor_dict['cd/a']
print("List the features in descending order of their correlation with cd/a:\n")
for ele in sorted(cor_dict.items(), key=lambda x: -abs(x[1])):
    print("{0}:   {1}".format(*ele))