# What Tone does Top Philosophers Talk Like?

## Project 1 by Christopher Halim

### The history of philosophy is very vast and wide-ranging. When we're taking about history, there are various ways we can perceive it. One can start to analyze the history of philosophy all the way back to BC time or the beginning of the 19th century. For my analysis, I am interested in exploring the way Philosophers change over time. 

### My objective for this analysis to find out 2 different things: 
#### 1. Most widely used words/vocabulary, classifying them by author and school.
#### 2. Which philosopheres have the most sentence length?
#### 3. Finding the sentiment analysis for top 5 philosophers with the most sentence length. I aim to explore the different ways these philosophers write their own narrative. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import nltk
import os

## First, we start our analysis by importing the dataset and exploring it

In [None]:
df = pd.read_csv("/Users/christopherhalim888/Downloads/philosophy_data.csv")
df.head()

In [None]:
df.info

### I decided to then see the occurences/frequency of each title, school, and author just to see the general distirbution of the data before proceeding further ahead.

In [None]:
features_cat = ['title', 'author', 'school','corpus_edition_date']

for f in features_cat:
    plt.figure(figsize=(14,5))
    df[f].value_counts().plot(kind='bar')
    plt.title(f)
    plt.grid()
    plt.savefig(f+".jpeg")
    plt.show()

### As we can see from the 3 graphs above, Aristotle has the most features in terms of author, while analytic and 1997 are the most in terms of school and corpus edition date respectively 

## Objective 1: To answer my first objective, finding the most widely used words/vocabulary,and classifying them by author and school. I am going to use Word Cloud in order to achieve this goal

### Classifying the words by school

In [None]:
from nltk.corpus import stopwords
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from matplotlib.backends.backend_pdf import PdfPages

schools = df.school.unique().tolist()

stop_words = set(stopwords.words('english'))

for sc in schools:
    df_temp = df[df.school==sc]
    
    print('School = ', sc, ':')
    
    text = " ".join(txt for txt in df_temp.sentence_lowered)
    wordcloud = WordCloud(stopwords=stop_words, max_font_size=50, max_words=500,
                          width = 600, height = 400,
                          background_color="white").generate(text)
    plt.figure(figsize=(12,8))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.savefig(sc+".jpeg")
    plt.show()

### From here, we can see the most frequent words that come up from each author, classified by school

### Classifying the words by author

In [None]:
authors = df.author.unique().tolist()

stop_words = set(stopwords.words('english'))

for au in authors:
    df_temp = df[df.author == au]
    
    print('Authors = ', au)
    
    text = " ".join(txt for txt in df_temp.sentence_lowered)
    wordcloud = WordCloud(stopwords=stop_words, max_font_size=50, max_words=500,
                          width = 600, height = 400,
                          background_color="white").generate(text)
    plt.figure(figsize=(12,8))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.savefig(au+".jpeg")
    plt.show()

### From here, we can see the most frequent words that come up from each author, classified by authors.

## Objective 2: Author and Title with the Most Sentence Length

In [None]:
pd.DataFrame(df.groupby(by=['title','author','school'])['sentence_length'].count().nlargest(10000))

### We can see from the result above that Aristotle, Plato, Lewis, Beauvoir, and Malebranche are 5 of the philosophers with the most sentence length in the data. I am going to focus on these 5 philosophers as I'm analyzing how does these philosophers create nuances in their writings

## Objective 3: Sentiment Analysis

### I am using the nltk package in order to classify the sentiment as positive, negative, and neutral. Any polarity score that is above 0.5 is considered positive, below -0.5 is negative, and in between is neutral

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
def SentimentAnalysis(sentence):
    sentiment_analyzer = SentimentIntensityAnalyzer() 
    polarity_score = sentiment_analyzer.polarity_scores(sentence)
    
    if polarity_score['compound'] >= 0.5:
        return "positive"
    elif polarity_score['compound'] <= -0.5 :
        return "negative"
    else:
        return "neutral"

### The analysis below is intended to create the proportion of positive, negative, and neutral words for the 5 chosen philosophers 

In [None]:
from time import perf_counter
def Analyzer(df, author):
    start = perf_counter()
    df1 = df [df['author'] == author] 
        
    corpus = ''
    pos = 0
    neg = 0
    neu = 0
    
    for result in df1['sentence_lowered']:
        corpus += result
    
    for i in range (len(df1)):
        sentiment = (SentimentAnalysis(df1['sentence_lowered'].iloc[i]))
        if sentiment == "positive":
            pos += 1
        elif sentiment == "negative":
            neg += 1
        else:
            neu += 1
    
    plt.figure(figsize = (7, 7))
    plt.pie([pos, neg, neu], labels = ['Positive', 'Negative', 'Neutral'], colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99'], autopct='%1.2f%%')
    centre_circle = plt.Circle((0, 0), 0.70, fc='white')
    fig = plt.gcf()
    fig.gca().add_artist(centre_circle)
    plt.title(author)

In [None]:
Analyzer(df, 'Aristotle')
plt.savefig('aristotle.png')

In [None]:
Analyzer(df, 'Plato')
plt.savefig('plato.png')

In [None]:
Analyzer(df, 'Lewis')
plt.savefig('lewis.png')

In [None]:
Analyzer(df, 'Beauvoir')
plt.savefig('beauvoir.png')

In [None]:
Analyzer(df, 'Malebranche')
plt.savefig('malebranche.png')

### As we can see in the pie charts above, Malebranche is the philosopher with the most proportion of positive words, while Beauvoir writes the most negative words. In addition, Lewis work has the most neutral proportion of words in comparison with the other 4 philosophers that have the most sentence length