analyze text with empath
Python
Clone or download
Latest commit 3f77509 Apr 22, 2017
Permalink
Failed to load latest commit information.
dist readme May 4, 2016
empath with cat construction cache Jun 6, 2016
test with cat construction cache Jun 6, 2016
.gitignore okay Apr 21, 2016
LICENSE.txt okay Apr 17, 2016
MANIFEST readme May 4, 2016
README.md Update README.md Apr 22, 2017
setup.py with cat construction cache Jun 6, 2016

README.md

Empath is a tool for analyzing text across lexical categories (similar to LIWC), and also generating new lexical categories to use for an analysis. See our paper.

You can install in python via pip:

pip install empath

Then in a python shell, import like this:

from empath import Empath
lexicon = Empath()

Analyze text over all pre-built categories:

lexicon.analyze("he hit the other person", normalize=True)
# => {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0, 'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0, 'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0, 'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0, 'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0, 'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0, 'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0, 'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual': 0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish': 0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0, 'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval': 0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0, 'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0, 'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order': 0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0, 'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0, 'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0, 'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0, 'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0, 'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0, 'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0, 'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0, 'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy': 0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends': 0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon': 0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0, 'musical': 0.0}

Or over a specific set of categories:

lexicon.analyze("he hit the other person", categories=["violence"])
# => {'violence': 1.0}

By default, Empath will return raw counts, but you can ask it to normalize over words in the document.

lexicon.analyze("he hit the other person", categories=["violence"], normalize=True)
# => {'violence': 0.2}

You can create new lexical categories for analysis using word embeddings in our VSM:

lexicon.create_category("colors",["red","blue","green"])
# => ["blue", "green", "purple", "purple", "green", "yellow", "red", "grey", "violet", "gray", "blue", "orange", "white", "pink", "yellow", "black", "brown", "brown", "red", "aqua", "turquoise", "blue_color", "colored", "color", "same_shade", "violet", "gray", "grey", "teal", "nice_shade", "coloured", "forest_green", "colored", "different_shade", "colour", "sparkly", "reddish", "beautiful_shade", "greenish", "indigo", "darker_shade", "emerald", "lovely_shade", "tints", "crimson", "dark_purple", "pink", "emerald", "sapphire", "golden", "lighter_shade", "lime_green", "coloured", "bright", "same_color", "specks", "red", "golden_color", "different_shades", "chocolate_brown", "orange", "bluish", "green", "deep_purple", "magenta", "green_color", "dark_shade", "bright_orange", "milky", "lilac", "light_brown", "sparkling", "golden_brown", "silvery", "baby_blue", "blood_red", "pink", "teal", "blue", "yellowish", "turquoise", "same_colour", "sparkly", "aquamarine", "black_color", "white", "cerulean", "perfect_shade", "dark", "speckled", "charcoal", "greyish", "midnight_blue", "emerald_green", "deep_brown", "ocean_blue", "flecks", "amber", "pinkish", "jet_black"]

Then analyze with those categories:

lexicon.analyze("My favorite color is blue", categories=["colors"], normalize=True)
# => {'colors': 0.4}

Right now Empath has three different models you can use to create categories: fiction, nytimes, and reddit. (I'm working on integrating all the different models soon). For now, they have different strengths and weaknesses in terms of generating categories. Nytimes would be better for something like the cold war:

lexicon.create_category("cold_war", ["cold_war"], model="nytimes")
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism"]

You can adjust the size of the requested categories. You may not always get a bigger category when you ask for it because we're still filtering on a minimum cosine similarity.

lexicon.create_category("cold_war", ["cold_war"], model="nytimes", size=300)
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism", "socialism", "Vietnam", "totalitarianism", "new_Europe", "American_leadership", "long_war", "World_War_II.", "colonial_rule", "the_Persian_Gulf_war", "atom_bomb", "NATO_alliance", "world_affairs", "military_threat", "home_front", "Western_Europe", "Eastern_Europe", "German_reunification", "glasnost", "Stalin", "Iraq_war", "Reagan_Presidency", "military_might", "American_policy", "colonialism", "major_war", "East-West_relations", "Soviet_history", "Soviet_rule", "Russians", "the_Gulf_War", "Atlantic_alliance", "the_Bay_of_Pigs", "democracies", "coups", "old_order", "Islamic_world", "Soviet_leadership", "unification", "Stalinism", "nuclear_threat", "Vietnam_era", "the_Afghan_war", "Gorbachev_era", "the_Vietnam_war", "American_President", "American_military_power", "Western_powers", "American_Government", "Soviet_domination", "foreign_policy", "military_establishment", "new_thinking", "Communist_regime", "Communist_era", "militarism", "isolationism", "the_Persian_Gulf", "first_gulf_war", "upheavals", "Saddam_Hussein's", "reunification", "Second_World_War", "Reagan_Administration", "Eastern_Europe's", "disintegration", "empires", "American_strategy", "civil_war", "Soviet_society", "Western_democracies", "common_enemy", "Communist_state", "Korean_Peninsula", "New_Deal", "the_Marshall_Plan", "Berlin_wall", "American_influence", "American_president", "Communist_dictatorship", "political_struggle", "the_Reagan_Administration", "American_public_opinion", "military_victory", "American_policy_makers", "Central_Europe", "modern_history"]