<a href="https://colab.research.google.com/github/Kensuzuki95/Corporate_AI_Ethics_Guideline_Analysis/blob/main/Document_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

** **
# Step 1: Load Package
** **

In [1]:
import numpy as np 
import pandas as pd 
import requests
import io

** **
# Step 2: Load Data
** **

In [2]:
# Downloading the csv file from your GitHub account

url_1 = ("https://raw.githubusercontent.com/Kensuzuki95/Corporate_AI_Ethics_Guideline_Analysis/main/Dataset/Dataset_Filtered.csv")
download = requests.get(url_1).content

dataset = pd.read_csv(io.StringIO(download.decode('utf-8')))

dataset.head()

Unnamed: 0,No.,Company Name,Country,Industry,Published Year,Last Revised,Link,Document Name,Main Text,Comment
0,1,Accenture,Ireland,Consulting,03-30-2021,03-30-2021,https://www.accenture.com/content/dam/accentur...,Responsible AI From principles to practice,Responsible AI\r\nFrom principles to practice\...,Addtional Details: https://www.accenture.com/u...
1,2,Adobe,United States of America,Software,,,https://www.adobe.com/content/dam/cc/en/ai-eth...,Adobe’s Commitment to AI Ethics,"Adobe’s Commitment to AI Ethics\r\nAt Adobe, o...",Addtional Details: https://www.adobe.com/conte...
2,3,Alphabet,United States of America,Software,,,https://ai.google/responsibilities/responsible...,Responsible AI practices,Responsible AI practices\r\nThe development of...,Addtional Information: https://ai.google/princ...
3,4,Amazon,United States of America,Software,,,https://d1.awsstatic.com/responsible-machine-l...,Responsible Use of Machine Learning,"Responsible Use of Machine Learning\r\nAt AWS,...",
4,5,Atos,France,Consulting,,,https://atos.net/en/lp/cybersecurity-magazine-...,The Atos Blueprint for Responsible AI,AI is a broad topic encompassing many differen...,


## Clean the Dataset Format

In [3]:
#Check for unecesarry columns
dataset.columns

Index(['No.', 'Company Name', 'Country', 'Industry', 'Published Year',
       'Last Revised', 'Link', 'Document Name', 'Main Text', 'Comment'],
      dtype='object')

In [4]:
text_data = dataset.drop(columns=['No.','Country', 'Industry', 'Published Year', 'Last Revised', 'Link', 'Comment'], axis=1)
text_data.info()
text_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   49 non-null     object
 1   Document Name  49 non-null     object
 2   Main Text      49 non-null     object
dtypes: object(3)
memory usage: 1.3+ KB


Unnamed: 0,Company Name,Document Name,Main Text
0,Accenture,Responsible AI From principles to practice,Responsible AI\r\nFrom principles to practice\...
1,Adobe,Adobe’s Commitment to AI Ethics,"Adobe’s Commitment to AI Ethics\r\nAt Adobe, o..."
2,Alphabet,Responsible AI practices,Responsible AI practices\r\nThe development of...
3,Amazon,Responsible Use of Machine Learning,"Responsible Use of Machine Learning\r\nAt AWS,..."
4,Atos,The Atos Blueprint for Responsible AI,AI is a broad topic encompassing many differen...


** **
#Step 3: Data Cleaning
** **

Since the goal of this analysis is to perform topic modeling, we will solely focus on the text data from each paper, and drop other metadata columns

## Remove white space

Next, let’s perform a simple preprocessing on the content of paper_text column to make them more amenable for analysis, and reliable results. To do that, we’ll use a regular expression to remove any punctuation, and then lowercase the text

In [5]:
# Remove white space using split() and then convert the list back to a sentences
text_data['w/o_white_space'] = pd.DataFrame(text_data['Main Text'].copy().str
                                            .split())['Main Text'].apply(lambda x: ' '.join(x))
text_data.info()
text_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company Name     49 non-null     object
 1   Document Name    49 non-null     object
 2   Main Text        49 non-null     object
 3   w/o_white_space  49 non-null     object
dtypes: object(4)
memory usage: 1.7+ KB


Unnamed: 0,Company Name,Document Name,Main Text,w/o_white_space
0,Accenture,Responsible AI From principles to practice,Responsible AI\r\nFrom principles to practice\...,Responsible AI From principles to practice Con...
1,Adobe,Adobe’s Commitment to AI Ethics,"Adobe’s Commitment to AI Ethics\r\nAt Adobe, o...","Adobe’s Commitment to AI Ethics At Adobe, our ..."
2,Alphabet,Responsible AI practices,Responsible AI practices\r\nThe development of...,Responsible AI practices The development of AI...
3,Amazon,Responsible Use of Machine Learning,"Responsible Use of Machine Learning\r\nAt AWS,...","Responsible Use of Machine Learning At AWS, we..."
4,Atos,The Atos Blueprint for Responsible AI,AI is a broad topic encompassing many differen...,AI is a broad topic encompassing many differen...


## Convert to lower case and remove stoop words

Let’s tokenize each sentence into a list of words, removing punctuations and unnecessary characters altogether.

In [6]:
# Load the regular expression library
import re
import nltk


# Remove punctuation
text_data['lower_cased'] = text_data['w/o_white_space'].map(lambda x: re.sub('[,\.!?()]', '', x))

# Convert the text to lowercase
text_data['lower_cased'] = text_data['lower_cased'].map(lambda x: x.lower())

# Print out the first rows of papers
text_data.head()

Unnamed: 0,Company Name,Document Name,Main Text,w/o_white_space,lower_cased
0,Accenture,Responsible AI From principles to practice,Responsible AI\r\nFrom principles to practice\...,Responsible AI From principles to practice Con...,responsible ai from principles to practice con...
1,Adobe,Adobe’s Commitment to AI Ethics,"Adobe’s Commitment to AI Ethics\r\nAt Adobe, o...","Adobe’s Commitment to AI Ethics At Adobe, our ...",adobe’s commitment to ai ethics at adobe our p...
2,Alphabet,Responsible AI practices,Responsible AI practices\r\nThe development of...,Responsible AI practices The development of AI...,responsible ai practices the development of ai...
3,Amazon,Responsible Use of Machine Learning,"Responsible Use of Machine Learning\r\nAt AWS,...","Responsible Use of Machine Learning At AWS, we...",responsible use of machine learning at aws we ...
4,Atos,The Atos Blueprint for Responsible AI,AI is a broad topic encompassing many differen...,AI is a broad topic encompassing many differen...,ai is a broad topic encompassing many differen...


In [7]:
#defining the function to remove stopwords from tokenized text
import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')

def remove_stopwords(text):
  sw_removed = [word for word in text.split() if word.lower() not in stopwords]
  sw_removed = " ".join(sw_removed)
  return sw_removed

#applying the function
text_data['w/o_stopwords'] = text_data['lower_cased'].apply(lambda x:remove_stopwords(x))
text_data.head()

Unnamed: 0,Company Name,Document Name,Main Text,w/o_white_space,lower_cased,w/o_stopwords
0,Accenture,Responsible AI From principles to practice,Responsible AI\r\nFrom principles to practice\...,Responsible AI From principles to practice Con...,responsible ai from principles to practice con...,responsible ai principles practice contents re...
1,Adobe,Adobe’s Commitment to AI Ethics,"Adobe’s Commitment to AI Ethics\r\nAt Adobe, o...","Adobe’s Commitment to AI Ethics At Adobe, our ...",adobe’s commitment to ai ethics at adobe our p...,adobe’s commitment ai ethics adobe purpose ser...
2,Alphabet,Responsible AI practices,Responsible AI practices\r\nThe development of...,Responsible AI practices The development of AI...,responsible ai practices the development of ai...,responsible ai practices development ai creati...
3,Amazon,Responsible Use of Machine Learning,"Responsible Use of Machine Learning\r\nAt AWS,...","Responsible Use of Machine Learning At AWS, we...",responsible use of machine learning at aws we ...,responsible use machine learning aws proud sup...
4,Atos,The Atos Blueprint for Responsible AI,AI is a broad topic encompassing many differen...,AI is a broad topic encompassing many differen...,ai is a broad topic encompassing many differen...,ai broad topic encompassing many different fam...


** **
#Step 4: Measure Text Similarity
** **



## Create BERT-based Text Similaarity Scoring Model

In [8]:
!pip install sentence_transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-mpnet-base-v2')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 KB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers<5.0.0,>=4.6.0
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub>=0.4.0
  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [9]:
# Create a Dataset with only Company Name & Preprocessed Main Text of the AI Ethics Principle Document
similarity_data = text_data[['Company Name','w/o_stopwords']].copy()
similarity_data = similarity_data.rename(columns = {'w/o_stopwords':'main_text'})
sentences = similarity_data['main_text'].values.tolist()

In [10]:
# Load UNESCO's AI Ethics Principles Dataset
url_2 = ("https://raw.githubusercontent.com/Kensuzuki95/Corporate_AI_Ethics_Guideline_Analysis/main/Dataset/UNESCO_AI_Ethics_Principles.csv")
download = requests.get(url_2).content

principles = pd.read_csv(io.StringIO(download.decode('utf-8')))
principles.head()

Unnamed: 0,No.,Principle Name,Content
0,1,Proportionality and Do No Harm,It should be recognized that AI technologies d...
1,2,Safety and security,"Unwanted harms (safety risks), as well as vuln..."
2,3,Fairness and non-discrimination,AI actors should promote social justice and sa...
3,4,Sustainability,The development of sustainable societies relie...
4,5,"Right to Privacy, and Data Protection","Privacy, a right essential to the protection o..."


In [11]:
## Create a function
from sklearn.metrics.pairwise import cosine_similarity

def similarity_score(principle_number):
  principle_principle_number = principles.iloc[principle_number-1]['Content']
  sentences_principle_number = sentences.copy()
  sentences_principle_number.insert(0, principle_principle_number) 
  sentence_embeddings_principle_number = model.encode(sentences_principle_number)
  sentence_embeddings_principle_number.shape
  results_principle_number = cosine_similarity([sentence_embeddings_principle_number[0]], 
                                               sentence_embeddings_principle_number[1:])
  results_principle_number = results_principle_number.tolist()
  results_principle_number = results_principle_number[0]
  principle_name_principle_number = principles.iloc[principle_number-1]['Principle Name']
  print("UNESCO AI Ethics Princple #" + str(principle_number) + ":" 
        + '\n' + str(principle_name_principle_number)
        + '\n' + str(principle_principle_number))
  return results_principle_number;

## Principle 1

In [12]:
# Product Similarity Score of all documents to Principle #1
results_1 = similarity_score(principle_number = 1)
#results_1

UNESCO AI Ethics Princple #1:
Proportionality and Do No Harm
It should be recognized that AI technologies do not necessarily, per se, ensure human and environmental and ecosystem flourishing. Furthermore, none of the processes related to the AI system life cycle shall exceed what is necessary to achieve legitimate aims or objectives and should be appropriate to the context. In the event of possible occurrence of any harm to human beings, human rights and fundamental freedoms, communities and society at large or the environment and ecosystems, the implementation of procedures for risk assessment and the adoption of measures in order to preclude the occurrence of such harm should be ensured.
The choice to use AI systems and which AI method to use should be justified in the following ways: (a) the AI method chosen should be appropriate and proportional to achieve a given legitimate aim; (b) the AI method chosen should not infringe upon the foundational values captured in this document, in

In [13]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_1'] = results_1
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 2

In [14]:
# Product Similarity Score of all documents to Principle #2
results_2 = similarity_score(principle_number = 2)
#results_2

UNESCO AI Ethics Princple #2:
Safety and security
Unwanted harms (safety risks), as well as vulnerabilities to attack (security risks) should be avoided and should be addressed, prevented and eliminated throughout the life cycle of AI systems to ensure human, environmental and ecosystem safety and security. Safe and secure AI will be enabled by the development of sustainable, privacyprotective data access frameworks that foster better training and validation of AI models utilizing quality data.


In [15]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_2'] = results_2
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 3

In [16]:
# Product Similarity Score of all documents to Principle #3
results_3 = similarity_score(principle_number = 3)
#results_3

UNESCO AI Ethics Princple #3:
Fairness and non-discrimination
AI actors should promote social justice and safeguard fairness and non-discrimination of any kind in compliance with international law. This implies an inclusive approach to ensuring that the benefits of AI technologies are available and accessible to all, taking into consideration the specific needs of different age groups, cultural systems, different language groups, persons with disabilities, girls and women, and disadvantaged, marginalized and vulnerable people or people in vulnerable situations. Member States should work to promote inclusive access for all, including local communities, to AI systems with locally relevant content and services, and with respect for multilingualism and cultural diversity. Member States should work to tackle digital divides and ensure inclusive access to and participation in the development of AI. At the national level, Member States should promote equity between rural and urban areas, and 

In [17]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_3'] = results_3
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 4

In [18]:
# Product Similarity Score of all documents to Principle #4
results_4 = similarity_score(principle_number = 4)
#results_4

UNESCO AI Ethics Princple #4:
Sustainability
The development of sustainable societies relies on the achievement of a complex set of objectives on a continuum of human, social, cultural, economic and environmental dimensions. The advent of AI technologies can either benefit sustainability objectives or hinder their realization, depending on how they are applied across countries with varying levels of development. The continuous assessment of the human, social, cultural, economic and environmental impact of AI technologies should therefore be carried out with full cognizance of the implications of AI technologies for sustainability as a set of constantly evolving goals across a range of dimensions, such as currently identified in the Sustainable Development Goals (SDGs) of the United Nations. 


In [19]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_4'] = results_4
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 5

In [20]:
# Product Similarity Score of all documents to Principle #5
results_5 = similarity_score(principle_number = 5)
#results_5

UNESCO AI Ethics Princple #5:
Right to Privacy, and Data Protection
Privacy, a right essential to the protection of human dignity, human autonomy and human agency, must be respected, protected and promoted throughout the life cycle of AI systems. It is important that data for AI systems be collected, used, shared, archived and deleted in ways that are consistent with international law and in line with the values and principles set forth in this Recommendation, while respecting relevant national, regional and international legal frameworks.
Adequate data protection frameworks and governance mechanisms should be established in a multi-stakeholder approach at the national or international level, protected by judicial systems, and ensured throughout the life cycle of AI systems. Data protection frameworks and any related mechanisms should take reference from international data protection principles and standards concerning the collection, use and disclosure of personal data and exercise of

In [21]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_5'] = results_5
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 6

In [22]:
# Product Similarity Score of all documents to Principle #6
results_6 = similarity_score(principle_number = 6)
#results_6

UNESCO AI Ethics Princple #6:
Human oversight and determination 
Member States should ensure that it is always possible to attribute ethical and legal responsibility for any stage of the life cycle of AI systems, as well as in cases of remedy related to AI systems, to physical persons or to existing legal entities. Human oversight refers thus not only to individual human oversight, but to inclusive public oversight, as appropriate.
It may be the case that sometimes humans would choose to rely on AI systems for reasons of efficacy, but the decision to cede control in limited contexts remains that of humans, as humans can resort to AI systems in decision-making and acting, but an AI system can never replace ultimate human responsibility and accountability. As a rule, life and death decisions should not be ceded to AI systems.


In [23]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_6'] = results_6
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 7

In [24]:
# Product Similarity Score of all documents to Principle #7
results_7 = similarity_score(principle_number = 7)
#results_7

UNESCO AI Ethics Princple #7:
Transparency and explainability
The transparency and explainability of AI systems are often essential preconditions to ensure the respect, protection and promotion of human rights, fundamental freedoms and ethical principles. Transparency is necessary for relevant national and international liability regimes to work effectively. A lack of transparency could also undermine the possibility of effectively challenging decisions based on outcomes produced by AI systems and may thereby infringe the right to a fair trial and effective remedy, and limits the areas in which these systems can be legally used.
While efforts need to be made to increase transparency and explainability of AI systems, including those with extra-territorial impact, throughout their life cycle to support democratic governance, the level of transparency and explainability should always be appropriate to the context and impact, as there may be a need to balance between transparency and expla

In [25]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_7'] = results_7
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 8

In [26]:
# Product Similarity Score of all documents to Principle #8
results_8 = similarity_score(principle_number = 8)
#results_8

UNESCO AI Ethics Princple #8:
Responsibility and accountability
AI actors and Member States should respect, protect and promote human rights and fundamental freedoms, and should also promote the protection of the environment and ecosystems, assuming their respective ethical and legal responsibility, in accordance with national and international law, in particular Member States’ human rights obligations, and ethical guidance throughout the life cycle of AI systems, including with respect to AI actors within their effective territory and control. The ethical responsibility and liability for the decisions and  actions based in any way on an AI system should always ultimately be attributable to AI actors corresponding to their role in the life cycle of the AI system.
Appropriate oversight, impact assessment, audit and due diligence mechanisms, including whistle-blowers’ protection, should be developed to ensure accountability for AI systems and their impact throughout their life cycle. Bot

In [27]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_8'] = results_8
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 9

In [28]:
# Product Similarity Score of all documents to Principle #9
results_9 = similarity_score(principle_number = 9)
#results_9

UNESCO AI Ethics Princple #9:
Awareness and literacy 
Learning about the impact of AI systems should include learning about, through and for human rights and fundamental freedoms, meaning that the approach and understanding of AI systems should be grounded by their impact on human rights and access to rights, as well as on the environment and ecosystems


In [29]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_9'] = results_9
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

## Principle 10

In [30]:
# Product Similarity Score of all documents to Principle #10
results_10 = similarity_score(principle_number = 10)
#results_10

UNESCO AI Ethics Princple #10:
Multi-stakeholder and adaptive governance and collaboration
International law and national sovereignty must be respected in the use of data. That means that States, complying with international law, can regulate the data generated within or passing through their territories, and take measures towards effective regulation of data, including data protection, based on respect for the right to privacy in accordance with international law and other human rights norms and standards.
Participation of different stakeholders throughout the AI system life cycle is necessary for inclusive approaches to AI governance, enabling the benefits to be shared by all, and to contribute to sustainable development. Stakeholders include but are not limited to governments, intergovernmental organizations, the technical community, civil society, researchers and academia, media, education, policy-makers, private sector companies, human rights institutions and equality bodies, anti

In [31]:
# Create a new column storing the results in the Dataset
similarity_data['Principle_10'] = results_10
similarity_data = similarity_data.loc[:,~similarity_data.T.duplicated(keep='first')]
#similarity_data.head()
#similarity_data.info()

# Step 5: Create Results Dataset

## Produce a CSV File

In [37]:
final_results = similarity_data.drop(columns = ['main_text'])
final_results.info()
final_results#.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company Name  49 non-null     object 
 1   Principle_1   49 non-null     float64
 2   Principle_2   49 non-null     float64
 3   Principle_3   49 non-null     float64
 4   Principle_4   49 non-null     float64
 5   Principle_5   49 non-null     float64
 6   Principle_6   49 non-null     float64
 7   Principle_7   49 non-null     float64
 8   Principle_8   49 non-null     float64
 9   Principle_9   49 non-null     float64
 10  Principle_10  49 non-null     float64
dtypes: float64(10), object(1)
memory usage: 4.3+ KB


Unnamed: 0,Company Name,Principle_1,Principle_2,Principle_3,Principle_4,Principle_5,Principle_6,Principle_7,Principle_8,Principle_9,Principle_10
0,Accenture,0.713096,0.669657,0.696445,0.659763,0.708105,0.661163,0.694236,0.713024,0.710053,0.652552
1,Adobe,0.557423,0.482382,0.595035,0.521534,0.59586,0.523106,0.572609,0.587471,0.59005,0.53577
2,Alphabet,0.590823,0.580485,0.569626,0.49617,0.61106,0.488678,0.579575,0.508057,0.576965,0.545252
3,Amazon,0.547408,0.545269,0.531693,0.449577,0.576093,0.445394,0.551653,0.509662,0.524196,0.548336
4,Atos,0.674369,0.625587,0.627129,0.600989,0.668474,0.61354,0.677598,0.686376,0.658663,0.609365
5,Capgemini,0.631962,0.589781,0.644962,0.596583,0.683541,0.610688,0.65579,0.678433,0.657638,0.602237
6,Cisco,0.621231,0.669476,0.636864,0.532768,0.675567,0.62118,0.665532,0.679788,0.65093,0.644216
7,Facebook,0.626738,0.581239,0.68203,0.567379,0.681261,0.613212,0.655881,0.62665,0.654343,0.648579
8,FUJIFILM,0.567884,0.471385,0.572382,0.52742,0.520009,0.503902,0.530379,0.54032,0.552501,0.525709
9,Fujitsu Ltd.,0.57403,0.483531,0.632524,0.600331,0.575793,0.530526,0.577927,0.599841,0.573774,0.577466


In [34]:
# Use this code to download the dataset
final_results.to_csv('Similarity_Analysis_Results.csv', index=False)

## Convert the similarity results to degree of representation of each principles

In [38]:
principle_representation = final_results.set_index('Company Name')
principle_representation = principle_representation.div(principle_representation.sum(axis=1), axis=0).reset_index()
principle_representation.info()
principle_representation#.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company Name  49 non-null     object 
 1   Principle_1   49 non-null     float64
 2   Principle_2   49 non-null     float64
 3   Principle_3   49 non-null     float64
 4   Principle_4   49 non-null     float64
 5   Principle_5   49 non-null     float64
 6   Principle_6   49 non-null     float64
 7   Principle_7   49 non-null     float64
 8   Principle_8   49 non-null     float64
 9   Principle_9   49 non-null     float64
 10  Principle_10  49 non-null     float64
dtypes: float64(10), object(1)
memory usage: 4.3+ KB


Unnamed: 0,Company Name,Principle_1,Principle_2,Principle_3,Principle_4,Principle_5,Principle_6,Principle_7,Principle_8,Principle_9,Principle_10
0,Accenture,0.103676,0.097361,0.101255,0.095922,0.102951,0.096126,0.100934,0.103666,0.103234,0.094874
1,Adobe,0.100234,0.08674,0.106997,0.09378,0.107145,0.094063,0.102964,0.105637,0.1061,0.09634
2,Alphabet,0.106518,0.104654,0.102697,0.089453,0.110167,0.088103,0.10449,0.091596,0.10402,0.098302
3,Amazon,0.104681,0.104272,0.101676,0.085973,0.110167,0.085173,0.105493,0.097463,0.100242,0.104859
4,Atos,0.104682,0.097109,0.097349,0.093291,0.103767,0.095239,0.105183,0.106546,0.102244,0.094591
5,Capgemini,0.099496,0.092855,0.101543,0.093926,0.107617,0.096147,0.103248,0.106813,0.103539,0.094816
6,Cisco,0.097105,0.104646,0.099548,0.083277,0.105598,0.097096,0.104029,0.106258,0.101747,0.100697
7,Facebook,0.098896,0.091717,0.107621,0.08953,0.1075,0.096762,0.103495,0.098883,0.103252,0.102343
8,FUJIFILM,0.106908,0.088741,0.107755,0.09929,0.097895,0.094863,0.099848,0.101719,0.104012,0.098968
9,Fujitsu Ltd.,0.100254,0.084449,0.11047,0.104848,0.100562,0.092656,0.100935,0.104762,0.10021,0.100854


In [36]:
# Use this code to download the dataset
principle_representation.to_csv('Principle_Representation_Results.csv', index=False)