# ASER Tutorials

## Installation

Firstly, install the necessay packages

In [1]:
!pip install -r requirements.txt 

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/


Then, install the aser package.

In [3]:
!python setup.py install

running install
running bdist_egg
running egg_info
writing aser.egg-info/PKG-INFO
writing dependency_links to aser.egg-info/dependency_links.txt
writing entry points to aser.egg-info/entry_points.txt
writing requirements to aser.egg-info/requires.txt
writing top-level names to aser.egg-info/top_level.txt
reading manifest file 'aser.egg-info/SOURCES.txt'
writing manifest file 'aser.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-x86_64/egg
running install_lib

creating build
creating build/bdist.macosx-10.9-x86_64
creating build/bdist.macosx-10.9-x86_64/egg
creating build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying aser.egg-info/PKG-INFO -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying aser.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying aser.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying aser.egg-info/entry_points.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying aser.egg-info/re

We also need to download some packages and files.

We use the standford-corenlp to parse raw data.
As there is some little difference of parsed results since v4.0.0, we still use v3.9.2, which is the version released on 2018-10-05.

In [9]:
import urllib

urllib.request.urlretrieve("http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip", "stanford-corenlp-3.9.2.zip")

('stanford-corenlp-3.9.2.zip', <http.client.HTTPMessage at 0x7fa031289a58>)

In [11]:
import zipfile

with zipfile.ZipFile("stanford-corenlp-3.9.2.zip", "r") as zip_ref:
    zip_ref.extractall("./")

In [13]:
import shutil

shutil.move("stanford-corenlp-full-2018-10-05", "stanford-corenlp-3.9.2")

'stanford-corenlp-3.9.2'

Also, we use the probase for conceptualization.

In [16]:
import urllib

urllib.request.urlretrieve("https://concept.research.microsoft.com/Home/DownloadData?key=t9RdYhnkv94TFcd8tkVdzF9cEwNFdaFe&h=602979237", "probase.zip")

('probase.zip', <http.client.HTTPMessage at 0x7fa032921438>)

In [19]:
import zipfile

with zipfile.ZipFile("probase.zip", "r") as zip_ref:
    zip_ref.extractall("./")

In [20]:
import shutil

shutil.move("data-concept/data-concept-instance-relations.txt", "probase.txt")
shutil.rmtree("data-concept")

'probase.txt'

## Local Pipeline: aser-pipe

There are three ways to extract eventuality commonsense knowledge:
1. run the ASER pipeline by `aser-pipe`;
2. utilize `ASERExtractor` and `ASERConceptualizer`;
3. use `ASERServer` and ASERClient.

### 1. ASER Pipeline

We write three reviews from yelp into a file and want to run the pipepline to build `KG.db` and `concept.db`.

In [17]:
import os

os.mkdir("raw")
os.mkdir("processed")
os.mkdir("core")
os.mkdir("full")
os.mkdir("concept")

In [18]:
with open("raw/yelp.txt", "w") as f:
    f.write("I went there based on reviews on yelp. I was not let down! I got the wild mushroom personal pie and added spinach and fresh jalapenos on the ancient grains crust. It was amazing. The crust was perfectly cooked and all the toppings meshed well together. They have many vegan options(daiya cheese). The owner and the other employee there were very nice and friendly. I will definitely be going back next time I am in town.")
    f.write("\n\n")
    f.write("My experience at this kneaders location was great! I wasn't there during a busy time, about 3:45, but they were very attentive. I placed my to go order and it was ready within 10 minutes. I will definitely be back!")
    f.write("\n\n")
    f.write("I came here for breakfast before a conference, I was excited to try some local breakfast places. I think the highlight of my breakfast was the freshly squeezed orange juice: it was very sweet, natural, and definitely the champion of the day. I ordered the local fresh eggs on ciabatta with the house made sausage. I would have given a four if the bread had been toasted, but it wasnt. The sausage had good flavor, but I would have liked a little salt on my eggs. All in all a good breakfast. If I am back in town I would try the pastries, they looked and smelled amazing.")
    f.write("\n\n")
    

In [None]:
!aser-pipe -n_extractors 1 -n_workers 1 \
-corenlp_path "stanford-corenlp-3.9.2" -base_corenlp_port 9000 \
-raw_dir "raw" -processed_dir "processed" \
-core_kg_dir "core" -full_kg_dir "full" \
-eventuality_frequency_threshold 0 -relation_weight_threshold 0 \
-concept_kg_dir "concept" -concept_method probase -probase_path "probase.txt" \
-eventuality_threshold_to_conceptualize 0 -concept_weight_threshold 0 -concept_topk 5 \
-log_path "core/aser_pipe.log"

Now, you can check the `core`, `full`, and `concept` directories.

### 2. step-by-step extraction

Let's see how to utilize an ASER extractor and a conceptualizer.

In [4]:
text = "I came here for breakfast before a conference, I was excited to try some local breakfast places. I think the highlight of my breakfast was the freshly squeezed orange juice: it was very sweet, natural, and definitely the champion of the day. I ordered the local fresh eggs on ciabatta with the house made sausage. I would have given a four if the bread had been toasted, but it wasnt. The sausage had good flavor, but I would have liked a little salt on my eggs. All in all a good breakfast. If I am back in town I would try the pastries, they looked and smelled amazing."

print(text)

I came here for breakfast before a conference, I was excited to try some local breakfast places. I think the highlight of my breakfast was the freshly squeezed orange juice: it was very sweet, natural, and definitely the champion of the day. I ordered the local fresh eggs on ciabatta with the house made sausage. I would have given a four if the bread had been toasted, but it wasnt. The sausage had good flavor, but I would have liked a little salt on my eggs. All in all a good breakfast. If I am back in town I would try the pastries, they looked and smelled amazing.


We provide two kinds of `ASERExtractors`: the `SeedRuleASERExtractor` corresponding to the WWW"2020 and a new `DiscourseASERExtractor` which is implemented based on a discourse parsing system.

In [17]:
from pprint import pprint
from aser.extract.aser_extractor import SeedRuleASERExtractor, DiscourseASERExtractor

aser_extractor = DiscourseASERExtractor(
  corenlp_path="stanford-corenlp-3.9.2", corenlp_port=9000
)

print("In-order:")
pprint(aser_extractor.extract_from_text(text, in_order=True))

print("-" * 80)
print("Out-of-Order:")
results = aser_extractor.extract_from_text(text, in_order=False)
pprint(results)

eventualities, relations = results

In-order:
([[],
  [i think, it be very sweet],
  [i order, the local fresh egg on ciabatta make sausage],
  [i would have give a four, the bread have be toast],
  [the sausage have good flavor, i would have like a little salt on egg],
  [],
  [i be back in town, i would try the pastry, they look, they smell amazing]],
 [[],
  [(010ec054737a144cb77e99954ff032bc5dff472c, 55704c606666f41a73ac5ae0eabe582892aa163c, {'Co_Occurrence': 1.0})],
  [(b875a4b94675e057fa643beb334e071e4ddf3760, 41876cb7188cb3398572af71ff9d98d61f46c20b, {'Co_Occurrence': 1.0})],
  [(766f00c08dcac14353629c12125f05697eb58a2e, 13bb4ed9f70c37253246c2051ef05fe4795f4fee, {'Co_Occurrence': 1.0, 'Condition': 1.0})],
  [(253e8b127b833c3aa7d79e2b91ce030299a646d6, 8dd8fbc06d2810add7b2cfd637a78f90fa2e5e9e, {'Co_Occurrence': 1.0, 'Contrast': 1.0})],
  [],
  [(dac82e8bc75bd0221e86194e6e3cd607a72aba7e, 2dd66bdf5849fe8d4a28d3355f0fc0a50b7f61e2, {'Co_Occurrence': 1.0}),
   (dac82e8bc75bd0221e86194e6e3cd607a72aba7e, a8eec375e86e467cf8

As shown above, the `in-order` will keep the sentence order and token order so that it is a nested list. On the contrary, the `out-of-order` will return a set of eventualities and a set of relations.

Then, we use the conceptualizer based on probase to conceptualize eventualities.

In [16]:
from aser.conceptualize.aser_conceptualizer import SeedRuleASERConceptualizer, ProbaseASERConceptualizer
from aser.conceptualize.utils import conceptualize_eventualities, build_concept_relations

aser_conceptualizer = ProbaseASERConceptualizer(
  probase_path="probase.txt", probase_topk=5
)

cid2concept, concept_instance_pairs, cid_to_filter_score = conceptualize_eventualities(
  aser_conceptualizer, eventualities
)

print("concepts:")
pprint(list(cid2concept.values()))

print("-" * 80)
print("concept_instance_pairs:")
pprint(concept_instance_pairs)

100%|██████████| 12/12 [00:00<00:00, 437.11it/s]

concepts:
[__PERSON__0 think,
 food toast,
 carbohydrate toast,
 item toast,
 starchy-food toast,
 product toast,
 __PERSON__0 smell amazing,
 meat have flavor,
 sausage have ingredient,
 processed-meat have flavor,
 food have flavor,
 sausage have additive,
 sausage have excipients,
 sausage have factor,
 sausage have characteristic,
 meat-product have flavor,
 item have flavor,
 meat have ingredient,
 processed-meat have ingredient,
 food have ingredient,
 meat have additive,
 meat have excipients,
 meat have factor,
 processed-meat have additive,
 meat have characteristic,
 food have additive,
 processed-meat have excipients,
 processed-meat have factor,
 meat-product have ingredient,
 food have excipients,
 item have ingredient,
 food have factor,
 processed-meat have characteristic,
 food have characteristic,
 meat-product have additive,
 item have additive,
 meat-product have excipients,
 meat-product have factor,
 item have excipients,
 item have factor,
 meat-product have chara




From the conceptualization results, we can find each eventuality would result in multiple concepts. You can use these to make your eventuality representation meaningful.

we do not show `build_concept_relations` because it requires the concept database. If you are interested, you can use `build_concept_relations(aser.database.kg_connection.ASERConceptConnection, List[aser.relation.Relations])`.

Do not forget to close your database connections to make the databases available for other processes.

In [None]:
aser_conceptualizer.close()
aser_extractor.close()

### 3. Client/Server Mode 

It is interesting to serve a global server to provide extraction, conceptualization, and retrival functions.

We have done!

In [2]:
text = "I came here for breakfast before a conference, I was excited to try some local breakfast places. I think the highlight of my breakfast was the freshly squeezed orange juice: it was very sweet, natural, and definitely the champion of the day. I ordered the local fresh eggs on ciabatta with the house made sausage. I would have given a four if the bread had been toasted, but it wasnt. The sausage had good flavor, but I would have liked a little salt on my eggs. All in all a good breakfast. If I am back in town I would try the pastries, they looked and smelled amazing."

print(text)

I came here for breakfast before a conference, I was excited to try some local breakfast places. I think the highlight of my breakfast was the freshly squeezed orange juice: it was very sweet, natural, and definitely the champion of the day. I ordered the local fresh eggs on ciabatta with the house made sausage. I would have given a four if the bread had been toasted, but it wasnt. The sausage had good flavor, but I would have liked a little salt on my eggs. All in all a good breakfast. If I am back in town I would try the pastries, they looked and smelled amazing.


Now, start an `ASERServer` and an `ASERClient`.

In [2]:
import subprocess

cmd = 'aser-server -n_workers 1 -n_concurrent_back_socks 10 \
-port 8000 -port_out 8001 \
-corenlp_path "stanford-corenlp-3.9.2" -base_corenlp_port 9000 \
-aser_kg_dir "core" -concept_kg_dir "concept" -probase_path "probase.txt"'

subprocess.Popen(cmd, shell=True)

<subprocess.Popen at 0x7fb09eeecc88>

In [3]:
from aser.client import ASERClient

In [4]:
client = ASERClient(port=8000, port_out=8001)

You can try the following methods.

In [5]:
client.extract_eventualities(text)

[[],
 [i think, it be very sweet],
 [i order, the local fresh egg on ciabatta make sausage],
 [i would have give a four, the bread have be toast],
 [the sausage have good flavor, i would have like a little salt on egg],
 [],
 [i be back in town, i would try the pastry, they look, they smell amazing]]

In [17]:
e1 = client.extract_eventualities("The sausage had good flavor.")[0][0]
e2 = client.extract_eventualities("I would have liked a little salt on my eggs.")[0][0]

print(e1.__repr__())
print(e2.__repr__())

the sausage have good flavor
i would have like a little salt on egg


In [19]:
client.predict_eventuality_relation(e1, e2)

(253e8b127b833c3aa7d79e2b91ce030299a646d6, 8dd8fbc06d2810add7b2cfd637a78f90fa2e5e9e, {'Contrast': 1.0, 'Co_Occurrence': 1.0})

In [8]:
client.fetch_related_eventualities(e1)

[(i would have like a little salt on egg,
  (253e8b127b833c3aa7d79e2b91ce030299a646d6, 8dd8fbc06d2810add7b2cfd637a78f90fa2e5e9e, {'Contrast': 1.0, 'Co_Occurrence': 1.0}))]

In [12]:
concepts = client.conceptualize_eventuality(e1)
print(concepts)

[(meat have flavor, 0.13801169590643275), (sausage have ingredient, 0.1330749354005168), (processed-meat have flavor, 0.09395711500974659), (food have flavor, 0.08070175438596491), (sausage have additive, 0.06847545219638243), (sausage have excipients, 0.050387596899224806), (sausage have factor, 0.04909560723514212), (sausage have characteristic, 0.040051679586563305), (meat-product have flavor, 0.03391812865497076), (item have flavor, 0.030019493177387915), (meat have ingredient, 0.018365897517264307), (processed-meat have ingredient, 0.012503337010340954), (food have ingredient, 0.010739380751620654), (meat have additive, 0.009450413285582604), (meat have excipients, 0.006954077700711728), (meat have factor, 0.006775768016078094), (processed-meat have additive, 0.006433755937359909), (meat have characteristic, 0.005527600223642655), (food have additive, 0.005526089124620336), (processed-meat have excipients, 0.004734273236925215), (processed-meat have factor, 0.004612881615465595), 

In [18]:
c1 = client.conceptualize_eventuality(e1)[0][0]
c2 = client.conceptualize_eventuality(e2)[0][0]

print(c1.__repr__())
print(c2.__repr__())

meat have flavor
__PERSON__0 like inorganic-contaminant


In [22]:
client.predict_concept_relation(c1, c2)

(5a49d855f23b29d0a769d638a0944c0d35815ca9, 86e7181b3e449dd70dd9bd0eebcca5b73b432a8c, {'Contrast': 0.02687880595658219, 'Co_Occurrence': 0.02687880595658219})

In [24]:
client.fetch_related_concepts(c1)

[(__PERSON__0 like additive,
  (5a49d855f23b29d0a769d638a0944c0d35815ca9, 2342e1896c34cac33974473c5b52ac22d7182fe9, {'Contrast': 0.0019171057650086054, 'Co_Occurrence': 0.0019171057650086054})),
 (__PERSON__0 like item,
  (5a49d855f23b29d0a769d638a0944c0d35815ca9, e8809e959614713c0622e23b0ab5dc06e2f2bf46, {'Contrast': 0.0020252501927783217, 'Co_Occurrence': 0.0020252501927783217})),
 (__PERSON__0 like seasoning,
  (5a49d855f23b29d0a769d638a0944c0d35815ca9, 69864d92726be0d4b7f52fd4f32e38ad1f97974e, {'Contrast': 0.002413587001587757, 'Co_Occurrence': 0.002413587001587757})),
 (__PERSON__0 like ingredient,
  (5a49d855f23b29d0a769d638a0944c0d35815ca9, 58004d785a3ee08eac2f51ea4cbc44bba3a1ba22, {'Contrast': 0.003976765548440927, 'Co_Occurrence': 0.003976765548440927})),
 (__PERSON__0 like inorganic-contaminant,
  (5a49d855f23b29d0a769d638a0944c0d35815ca9, 86e7181b3e449dd70dd9bd0eebcca5b73b432a8c, {'Contrast': 0.02687880595658219, 'Co_Occurrence': 0.02687880595658219}))]

If you think the ASER knowledge graph is interesting or the extraction and other functions are useful. Please star this repo. Thanks!