### Personality and It's Transformations ###

An analysis of prof. Jordan Peterson's collection of lectures from University of Toronto personality course.

|    More about prof. Peterson at https://www.jordanbpeterson.com/

---

In [None]:
# main imports
import os
import json

import pandas as pd
from tqdm import tqdm

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import CharacterTextSplitter
import textwrap

from constants import *

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Credentials set-up</h4> 
 </div>

In [None]:
def load_api_keys(credentials_file_name: str = 'credentials.json') -> tuple:
    '''Load API keys from file

    Arguments:
        credentials_file_name: name of file containing credentials

    Returns:
        A tuple containing OpenAI API Key, Pinecone API key and Pinecone API
        environment name

    '''
    
    if os.path.exists(credentials_file_name):

      # open credentials file 
        with open(credentials_file_name) as f:
            content = json.load(f)

            # load api keys
            OPENAI_API_KEY = content['OPENAI_API_KEY']
    else:
        return f'No file {credentials_file_name} or file corrupted'

    return OPENAI_API_KEY


In [None]:
# load the API keys from credential file and setup OPENAI API KEY as an environmental
# variable
os.environ['OPENAI_API_KEY'] = load_api_keys('credentials.json')

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Load the data</h4> 
</div>

The data consists of text and summaries constructed from previous notebooks.

In [None]:
df =  pd.read_csv(f'{OUTPUT_FOLDER}\{CLUSTERED_TOPICS_DATAFRAME_NAME}_with_summaries.csv')

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      ELI5 sentence summaries</h4> 
</div>

There already are texts that contain summaries that are present in the dataframe. The goal of this section is to construct a concise summary in the most simple form possible. This will be one of the steps in gradually revealing the knowledge hidden inside the lecture notes.

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Define text splitter</h4> 
</div>

In [None]:
llm = OpenAI(temperature = 0)
text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 0, separator=' ')

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Define the prompts for eli5 type summary</h4> 
</div>

In [None]:
# main prompt for summarization
ELI5_SENTENCE_SUMMARY_PROMPT = PromptTemplate(
    input_variables=['text'],
    template = """
    You are given a text. Write a concise summary of the provided text. The summary must have a proper gramatical structure. It should begin with an upper case letter and end with a full stop. The summary must not begin with a space or a word with a lower case letter and still it must be gramatically correct.
    Provided text: {text}"""
)

# refinement prompt
ELI5_SENTENCE_REFINE_PROMPT = PromptTemplate(
    input_variables = ['existing_answer', 'text'],
    template = """
    Your job is to produce the final  sentence summary of a text. The final summary should be presented in the most simple terms possible. Explain it to me like I am 5. There is an existing summary up to a certain point: {existing_answer}. But there is an opportunity to refine the existing summary with some new context: {text}.
    ---
    Given the numer information refine the current summary. The final summary must be one sentence.
    """
)

The command of generatring one sentence as the final output seems to be discarded when dealing with refine type summary.

In [None]:
# construct the summarization chain
eli5_sentence_summary_chain = load_summarize_chain(llm, chain_type = 'refine',
                                                   question_prompt = ELI5_SENTENCE_SUMMARY_PROMPT, 
                                                   refine_prompt = ELI5_SENTENCE_REFINE_PROMPT)

In [None]:
# list as a placeholder for the outputs; the first entry is just noise, so no 
# need to perform the summary on noisy data
eli5_sentence_summaries = ['[NOISE]']

# number of all clusters when we exclude the noise cluster (22)
N = df.shape[0] - 1

# iterate over every summary
for n, doc in enumerate(df['summary'][1:],1):

  # split the text using text_splitter
  texts = text_splitter.split_text(doc)
  docs = [Document(page_content=t) for t in texts]

  # perform a summarization chain 
  output = eli5_sentence_summary_chain({'input_documents': docs}, return_only_outputs = True)
  
  print(f'Done {n}/{N}')
  eli5_sentence_summaries.append(output)  

Done 1/22
Done 2/22
Done 3/22
Done 4/22
Done 5/22
Done 6/22
Done 7/22
Done 8/22
Done 9/22
Done 10/22
Done 11/22
Done 12/22
Done 13/22
Done 14/22
Done 15/22
Done 16/22
Done 17/22
Done 18/22
Done 19/22
Done 20/22
Done 21/22
Done 22/22


In [193]:
# extract text from the generated dictionary
cleaned_up_sentence_summaries = [val['output_text'] for n, val in enumerate(eli5_sentence_summaries[1:])]
cleaned_up_sentence_summaries.insert(0, '[NOISE]')
cleaned_up_sentence_summaries

['[NOISE]',
 "\nJordan Peterson's Course is a wide-ranging program that follows certain principles, such as not teaching anything irrelevant, and encourages students to read original source material, consider the implications of their decisions, strive for an optimal ratio domain, and explore the concept of morality being relative and the lack of ultimate meaning.",
 '\nReading books, watching movies, and TV shows can help us understand what is important in life. Additionally, books can help us interpret the world around us and the interplay of all of the elements of being, allowing us to follow threads of meaning and discover meaningful things by branching out a network and following a pathway.',
 '\nThis article examines how stories and rituals, passed down for thousands of years, have been used to shape the world by exploring human behavior, archetypal story structures, and shamanic rituals to understand the archetype of transformation.',
 "\nBabies are born prematurely with large h

In [194]:
df['eli5_sentence_summary'] = cleaned_up_sentence_summaries

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Titles generation</h4> 
</div>

In [196]:
# main prompt for title generation
TITLE_PROMPT = PromptTemplate(
    input_variables=['text'],
    template = """
    You are given a text which is a breaf summary of a part of a lecture. Write a title for this text. The title should be a couple of words long. Don't produce full sentences only concise tutles.
    Provided text: {text}"""
)

title_chain = LLMChain(llm = llm, prompt = TITLE_PROMPT)

# placeholder for generated titles
titles = []
titles.append('[NOISE]')

for n, doc in enumerate(df['eli5_sentence_summary'][1:],1):
  output = title_chain.run(doc)
  print(f'Done {n}/{N}')
  titles.append(output)  

Done 1/22
Done 2/22
Done 3/22
Done 4/22
Done 5/22
Done 6/22
Done 7/22
Done 8/22
Done 9/22
Done 10/22
Done 11/22
Done 12/22
Done 13/22
Done 14/22
Done 15/22
Done 16/22
Done 17/22
Done 18/22
Done 19/22
Done 20/22
Done 21/22
Done 22/22


In [197]:
titles

['[NOISE]',
 '\n\nJordan Peterson Course Overview',
 '\n\nUnderstanding Life Through Media',
 '\n\n"Story and Rituals: Shaping the World"',
 "\n\nPremature Babies' Development",
 "\n\nPiaget's Theories and Games",
 '\n\nHumans and Fire.',
 '\n\n"Inadequacy of Meaning"',
 '\n\nPersonality Development',
 "\n\nWilder Penfield's Research",
 '\n\nPhenomenologists: Complex Group',
 '\n\nPsychology 230 Exam',
 '\n\nFinding Meaning in Life.',
 '\n\nConsidering Context and Perspective',
 '\n\nUnderstanding Anger Causes',
 '\n\nSelf-Consciousness Exploration',
 '\n\nFreud and Jung: Comparisons',
 '\n\nPersonality Self-Analysis Programs',
 '\n\nPersonality Psychology.',
 '\n\nInterpreting People and Health Domains',
 '\n\nRefining Solutions.',
 '\n\nRelationship Breakdowns',
 '\n\nLearning and Growth.']

In [198]:
df['titles'] = titles

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Triples construction</h4> 
</div>

Triples are three group words or phrases that share a common relationship. Usually the first phrase is connected with the last phrase through the middle one. Triples are comonly used when building knowledge graphs.

In [None]:
# triple generation prompt
TRIPLES_PROMPT = PromptTemplate(
    input_variables = ['doc'],
    template = 'Take the following text and extract only the key information out of it. Then take your answer and turn it into triples for a knowledge graph. Your final answer should be output in a form of a triple. Your text: {doc}'
)

In [None]:
# a simple chain for triple generation
tri_chain = LLMChain(llm = llm, prompt = TRIPLES_PROMPT)

In [None]:
triples = []

for n, doc in enumerate(df['summary'][1:],1):
  output = tri_chain.run(doc)
  print(f'Done {n}/{N}')
  triples.append(output)  

Done 1/22
Done 2/22
Done 3/22
Done 4/22
Done 5/22
Done 6/22
Done 7/22
Done 8/22
Done 9/22
Done 10/22
Done 11/22
Done 12/22
Done 13/22
Done 14/22
Done 15/22
Done 16/22
Done 17/22
Done 18/22
Done 19/22
Done 20/22
Done 21/22
Done 22/22


### Triples overview

In [None]:
triples[1]

' you want to understand the world, you should read books, watch movies, and engage in conversations with people.\n\nReading books, watching movies, and engaging in conversations with people can help gain a better understanding of the world. Triple: (Subject: Understanding the World, Predicate: Can be Gained By, Object: Reading Books, Watching Movies, Engaging in Conversations)'

In [None]:
triples[5]

' have the unique ability to use language, which no other animal can do.\n\nTriples: \n(Dolphins, possess, self-consciousness) \n(Humans, master, fire) \n(2 million years ago, time of, mastering fire) \n(Wolves, establish hierarchy, dominance and submission strategies) \n(400 million years, time of, dominance and submission strategies) \n(Baboons, establish hierarchy, dominance and submission strategies) \n(50,000 years, time of, dominance and submission strategies) \n(Dogs, genetically similar, wolves) \n(Humans, unique ability, get along with dogs) \n(Humans, unique ability, learn and adapt quickly) \n(Humans, unique ability, use fire) \n(Humans, unique ability, use language)'

In [None]:
triples[11]

' implications of seemingly irrational decisions.\n\nFacts and useful facts differ in relevance criteria, and understanding why certain solutions make sense is important for active agents. Triple: (Facts, differ_in, relevance_criteria), (understanding, important_for, active_agents).'

In [None]:
df['summary'][18]

".  Personality is a complex field of study in psychology that encompasses multiple subdisciplines, such as clinical and experimental psychology, philosophy, engineering, and medicine. It is a mix of science and value-laden categories, as it is concerned with the promotion of health and wellness. When considering what a healthy personality might be, it is not just the absence of pathology, but rather the development of positive potential. Personality psychology is a wide-ranging course, with an emphasis on clinical theoreticians, such as psychoanalysts, depth psychologists, constructivists, humanists, existentialists, and phenomenologists. The last half of the course concentrates on two elements of personality theory; more modern elements such as psychometrics, which is an unpopular field of psychology due to the discovery of technical psychometric intelligence by engineers, and clinical psychologists' suggestion that running away from something one is afraid of can make it larger and 

In [None]:
triples.insert(0, '[NOISE]')

In [None]:
df['triples'] = triples

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Export</h4> 

In [200]:
df.to_csv(f'{OUTPUT_FOLDER}/{CLUSTERED_TOPICS_DATAFRAME_NAME}_full_breakdown.csv')

In [199]:
df

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,cluster_ids,text,summary,triples,eli5_sentence_summary,titles
0,0,0,-1,"So, there’s a website—I don’t really like Blac...",[NOISE_CLUSTER],[NOISE],[NOISE],[NOISE]
1,1,1,0,"So, what I’m going to do today—how I’m going t...",Jordan Peterson's course is wide-ranging and...,\n\nTriple 1: Subject: Jordan Peterson's Cours...,\nJordan Peterson's Course is a wide-ranging p...,\n\nJordan Peterson Course Overview
2,2,2,1,So the first issue is that there’s a lot of re...,"The speaker suggests reading novels, literat...","you want to understand the world, you should ...","\nReading books, watching movies, and TV shows...",\n\nUnderstanding Life Through Media
3,3,3,2,But I think we might as well jump right into t...,"This article discusses rituals, stories, and...","Beauty, have been used to explain the human c...",\nThis article examines how stories and ritual...,"\n\n""Story and Rituals: Shaping the World"""
4,4,4,3,So when we’re first born—we’re very primitive ...,. Babies are born with limited control of the...,up objects and put them in their mouths.\nTri...,\nBabies are born prematurely with large heads...,\n\nPremature Babies' Development
5,5,5,4,And so in the mirror test what you essentially...,. The Mirror Test is used to assess the socia...,to learn about the world and to develop their...,\nPiaget's theories suggest that children come...,\n\nPiaget's Theories and Games
6,6,6,5,"Now, dolphins seem to be able to manage that, ...","Dolphins, crows, ravens, whales, and humans ...","have the unique ability to use language, whic...",\nHumans have the unique ability to recognize ...,\n\nHumans and Fire.
7,7,7,6,"Now, part of the reason for that is that Nietz...",Nietzsche argued that philosophers are often...,need to create new systems of meaning that co...,\nNietzsche and Dostoevsky argued that the sys...,"\n\n""Inadequacy of Meaning"""
8,8,8,7,"It brings in elements of cultural history, ele...",. This discussion focuses on the elements of ...,\n\nTriple 1: Human Personality - Shaped By - ...,"\nPersonality is a complex phenomenon, shaped ...",\n\nPersonality Development
9,9,9,8,Most of the brain is structured with the olfac...,. Humans have a unique brain structure with th...,allowing for the highest resolution of vision...,\nWilder Penfield was a neurosurgeon who studi...,\n\nWilder Penfield's Research
