Skip to content
Branch: master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Empath is a tool for analyzing text across lexical categories (similar to LIWC), and also generating new lexical categories to use for an analysis. See our paper.

You can install in python via pip:

pip install empath

Then in a python shell, import like this:

from empath import Empath
lexicon = Empath()

Analyze text over all pre-built categories:

lexicon.analyze("he hit the other person", normalize=True)
# => {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0, 'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0, 'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0, 'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0, 'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0, 'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0, 'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0, 'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual': 0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish': 0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0, 'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval': 0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0, 'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0, 'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order': 0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0, 'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0, 'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0, 'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0, 'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0, 'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0, 'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0, 'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0, 'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy': 0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends': 0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon': 0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0, 'musical': 0.0}

Or over a specific set of categories:

lexicon.analyze("he hit the other person", categories=["violence"])
# => {'violence': 1.0}

By default, Empath will return raw counts, but you can ask it to normalize over words in the document.

lexicon.analyze("he hit the other person", categories=["violence"], normalize=True)
# => {'violence': 0.2}

You can create new lexical categories for analysis using word embeddings in our VSM:

lexicon.create_category("colors",["red","blue","green"])
# => ["blue", "green", "purple", "purple", "green", "yellow", "red", "grey", "violet", "gray", "blue", "orange", "white", "pink", "yellow", "black", "brown", "brown", "red", "aqua", "turquoise", "blue_color", "colored", "color", "same_shade", "violet", "gray", "grey", "teal", "nice_shade", "coloured", "forest_green", "colored", "different_shade", "colour", "sparkly", "reddish", "beautiful_shade", "greenish", "indigo", "darker_shade", "emerald", "lovely_shade", "tints", "crimson", "dark_purple", "pink", "emerald", "sapphire", "golden", "lighter_shade", "lime_green", "coloured", "bright", "same_color", "specks", "red", "golden_color", "different_shades", "chocolate_brown", "orange", "bluish", "green", "deep_purple", "magenta", "green_color", "dark_shade", "bright_orange", "milky", "lilac", "light_brown", "sparkling", "golden_brown", "silvery", "baby_blue", "blood_red", "pink", "teal", "blue", "yellowish", "turquoise", "same_colour", "sparkly", "aquamarine", "black_color", "white", "cerulean", "perfect_shade", "dark", "speckled", "charcoal", "greyish", "midnight_blue", "emerald_green", "deep_brown", "ocean_blue", "flecks", "amber", "pinkish", "jet_black"]

Then analyze with those categories:

lexicon.analyze("My favorite color is blue", categories=["colors"], normalize=True)
# => {'colors': 0.4}

Right now Empath has three different models you can use to create categories: fiction, nytimes, and reddit. (I'm working on integrating all the different models soon). For now, they have different strengths and weaknesses in terms of generating categories. Nytimes would be better for something like the cold war:

lexicon.create_category("cold_war", ["cold_war"], model="nytimes")
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism"]

You can adjust the size of the requested categories. You may not always get a bigger category when you ask for it because we're still filtering on a minimum cosine similarity.

lexicon.create_category("cold_war", ["cold_war"], model="nytimes", size=300)
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism", "socialism", "Vietnam", "totalitarianism", "new_Europe", "American_leadership", "long_war", "World_War_II.", "colonial_rule", "the_Persian_Gulf_war", "atom_bomb", "NATO_alliance", "world_affairs", "military_threat", "home_front", "Western_Europe", "Eastern_Europe", "German_reunification", "glasnost", "Stalin", "Iraq_war", "Reagan_Presidency", "military_might", "American_policy", "colonialism", "major_war", "East-West_relations", "Soviet_history", "Soviet_rule", "Russians", "the_Gulf_War", "Atlantic_alliance", "the_Bay_of_Pigs", "democracies", "coups", "old_order", "Islamic_world", "Soviet_leadership", "unification", "Stalinism", "nuclear_threat", "Vietnam_era", "the_Afghan_war", "Gorbachev_era", "the_Vietnam_war", "American_President", "American_military_power", "Western_powers", "American_Government", "Soviet_domination", "foreign_policy", "military_establishment", "new_thinking", "Communist_regime", "Communist_era", "militarism", "isolationism", "the_Persian_Gulf", "first_gulf_war", "upheavals", "Saddam_Hussein's", "reunification", "Second_World_War", "Reagan_Administration", "Eastern_Europe's", "disintegration", "empires", "American_strategy", "civil_war", "Soviet_society", "Western_democracies", "common_enemy", "Communist_state", "Korean_Peninsula", "New_Deal", "the_Marshall_Plan", "Berlin_wall", "American_influence", "American_president", "Communist_dictatorship", "political_struggle", "the_Reagan_Administration", "American_public_opinion", "military_victory", "American_policy_makers", "Central_Europe", "modern_history"]

About

analyze text with empath

Resources

License

Languages

You can’t perform that action at this time.