### ***Physical Common Sense Knowledge***

<u>Author:</u> <br/>
Awantee Deshpande, <br/>
MS Computer Science <br/>
Saarland University <br/> 



---

This implementation is a part of the seminar conducted by MPI-Informatics Saarland, titled **Commonsense Knowledge Extraction and Curation (Winter Semester 2020/21)**. <br/>

The notebook provides the steps to reproduce the results of the paper "Do Neural Language Representations Learn Physical Commonsense?" by Forbes et al.(2019) [https://arxiv.org/pdf/1908.02899.pdf]. 

The codebase requires a Python 3.7+ environment. This implementation has been slightly restructured to work in a Google Colab environment. The implementation takes a few hours to run in a GPU runtime.

The main git repository by Forbes et al. is available at
https://github.com/mbforbes/physical-commonsense

Step I: Clone the original git repo.

In [1]:
!git clone https://github.com/mbforbes/physical-commonsense.git

Cloning into 'physical-commonsense'...
remote: Enumerating objects: 127, done.[K
remote: Total 127 (delta 0), reused 0 (delta 0), pack-reused 127[K
Receiving objects: 100% (127/127), 6.31 MiB | 4.79 MiB/s, done.
Resolving deltas: 100% (26/26), done.


Step II: Install the requirements for the project

In [None]:
!pip install -r "/content/physical-commonsense/requirements.txt"

Step III: Retrieve external data. (The main data is already in subfolders of data/; this is for larger blobs like GloVe.) This script also creates some other directories. 

In [3]:
!"/content/physical-commonsense/scripts/get_data.sh"

+ set -e
+ mkdir -p data/glove/
+ '[' '!' -f data/glove/vocab-pc.glove.840B.300d.txt.npz ']'
+ curl https://homes.cs.washington.edu/~mbforbes/physical-commonsense/vocab-pc.glove.840B.300d.txt.npz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1495k  100 1495k    0     0   965k      0  0:00:01  0:00:01 --:--:--  965k
+ mkdir -p data/dep-embs/
+ '[' '!' -f data/dep-embs/vocab-pc.dep-embs.npz ']'
+ curl https://homes.cs.washington.edu/~mbforbes/physical-commonsense/vocab-pc.dep-embs.npz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2307k  100 2307k    0     0  1640k      0  0:00:01  0:00:01 --:--:-- 1639k
+ mkdir -p data/elmo/
+ '[' '!' -f data/elmo/sentences.elmo.npz ']'
+ curl https://homes.cs.washington.edu/~mbforbes/physical-commonsense/sentences.elmo.npz
  

In Google Colab, the data folder is created outside the cloned physical-commonsense repository. The directories inside it have to be manually moved to the data folder in the physical-commonsense repo. 

This can be done in the Colab environment by dragging and dropping the folders dep-embs, elmo, glove, and results into physical-commonsense/data. 

The results and graphs get accumulated in the physical-commonsense/data/results folder.

Step IV: Run random and majority baselines

In [4]:
%cd physical-commonsense/
!python -m pc.baselines

/content/physical-commonsense
02/04/2021 03:22:22 PM INFO: Running random baseline for Abstract_ObjectsProperties
02/04/2021 03:22:22 PM INFO: 	Acc: 0.505, Micro F1: 0.257, object macro F1: 0.257, property macro F1: 0.256
02/04/2021 03:22:22 PM INFO: Running random baseline for Situated_ObjectsProperties
02/04/2021 03:22:23 PM INFO: 	Acc: 0.500, Micro F1: 0.229, object macro F1: 0.249, property macro F1: 0.262
02/04/2021 03:22:23 PM INFO: Running random baseline for Situated_ObjectsAffordances
02/04/2021 03:22:23 PM INFO: 	Acc: 0.486, Micro F1: 0.480, object macro F1: 0.473, affordance macro F1: 0.551
02/04/2021 03:22:23 PM INFO: Running random baseline for Situated_AffordancesProperties
02/04/2021 03:22:23 PM INFO: NumExpr defaulting to 2 threads.
02/04/2021 03:22:24 PM INFO: 	Acc: 0.503, Micro F1: 0.229, affordance macro F1: 0.241, property macro F1: 0.258
02/04/2021 03:22:24 PM INFO: 
02/04/2021 03:22:24 PM INFO: Running majority: by category baseline for Abstract_ObjectsProperties


Step V: Run GloVe, Dependency Embeddings, and ELMo

In [5]:
!python -m pc.experiments

02/04/2021 03:23:34 PM INFO: Running train+test for Abstract_ObjectsProperties, Glove
02/04/2021 03:23:43 PM INFO: Epoch 19. Train acc: 83.54. Train loss: 0.1646
02/04/2021 03:23:43 PM INFO: Epoch 39. Train acc: 83.22. Train loss: 0.1527
02/04/2021 03:23:43 PM INFO: Epoch 59. Train acc: 86.39. Train loss: 0.1042
02/04/2021 03:23:43 PM INFO: Epoch 79. Train acc: 89.45. Train loss: 0.0831
02/04/2021 03:23:43 PM INFO: Epoch 99. Train acc: 90.66. Train loss: 0.0752
02/04/2021 03:23:43 PM INFO: Epoch 119. Train acc: 90.87. Train loss: 0.0728
02/04/2021 03:23:43 PM INFO: Epoch 139. Train acc: 91.09. Train loss: 0.0715
02/04/2021 03:23:44 PM INFO: Epoch 159. Train acc: 91.22. Train loss: 0.0708
02/04/2021 03:23:44 PM INFO: Epoch 179. Train acc: 91.27. Train loss: 0.0704
02/04/2021 03:23:44 PM INFO: Epoch 199. Train acc: 91.33. Train loss: 0.0702
02/04/2021 03:23:44 PM INFO: Epoch 219. Train acc: 91.37. Train loss: 0.0700
02/04/2021 03:23:44 PM INFO: Epoch 239. Train acc: 91.39. Train loss: 0.

Step VI: Run BERT based experiements on the abstract OP dataset

In [6]:
!python -m pc.bert --task "abstract-OP"

Building model...
100% 434/434 [00:00<00:00, 278934.71B/s]
100% 1344997306/1344997306 [01:34<00:00, 14198128.52B/s]
Loading traning data
5 Samples:
- canoe/wet: "A canoe is wet."
- colander/slimy: "A colander is slimy."
- dolphin/sharp: "A dolphin is sharp."
- spider/worn_on_feet: "A spider is worn on feet."
- toothbrush/hand_held: "A toothbrush is hand-held."
Loading tokenizer...
100% 231508/231508 [00:00<00:00, 322691.52B/s]
Loading test data
5 Samples:
- building/eaten_in_summer: "A building is eaten in summer."
- plate/large: "A plate is big."
- buckle/hot: "A buckle is hot."
- elk/used_for_killing: "An elk is used for killing."
- housefly/used_for_eating: "A housefly is used for eating."
Loading tokenizer...
Num train optimization steps: 1610
Starting epoch 1/5.
	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg.mul_(

Step VII: Run BERT on the situated OP dataset

In [7]:
!python -m pc.bert --task "situated-OP"

Building model...
Loading traning data
5 Samples:
- baseball_glove/an_animal: "A baseball glove is an animal."
- bed/wet: "A bed is wet."
- car/hard: "A car is hard."
- suitcase/an_animal: "A suitcase is an animal."
- wine_glass/smooth: "A wine glass is smooth."
Loading tokenizer...
Loading test data
5 Samples:
- sheep/expensive: "A sheep is expensive."
- bottle/light_weight: "A bottle is light."
- chair/heavy: "A chair is heavy."
- dining_table/eaten_in_summer: "A dining table is eaten in summer."
- bottle/used_for_transportation: "A bottle is used for transportation."
Loading tokenizer...
Num train optimization steps: 3200
Starting epoch 1/5.
	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
Batch: 100% 640/640 [16:56<00:00,  1.59s/it]
Average train loss: 0.32282703471358654
train acc

Step VIII: Run BERT on the situated OA dataset

In [8]:
!python -m pc.bert --task "situated-OA"

Building model...
Loading traning data
5 Samples:
- motorcycle/adjust: "He adjusted the motorcycle."
- oven/train: "He trained the oven."
- motorcycle/kiss: "He kissed the motorcycle."
- car/deflect: "He deflected the car."
- surfboard/jump: "He jumped the surfboard."
Loading tokenizer...
Loading test data
5 Samples:
- horse/disembark: "He disembarked the horse."
- skateboard/immerse: "He immersed the skateboard."
- sheep/splash: "He splashed the sheep."
- chair/sit: "He sat the chair."
- skateboard/shave: "He shaved the skateboard."
Loading tokenizer...
Num train optimization steps: 385
Starting epoch 1/5.
	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
Batch: 100% 77/77 [02:02<00:00,  1.59s/it]
Average train loss: 0.5280586086158939
train accuracy: 32.04278728606357
Starting epoch 2

Step IX: Run BERT on the situated AP dataset <br/>
NOTE: Only 1 epoch here is not to handicap the model; The authors observe that the model overfits and achieves 0.0 F1 score for 2+ epochs.

In [9]:
!python -m pc.bert --task "situated-AP" --epochs 1

Building model...
Loading traning data
5 Samples:
- drive/sharp: "If you can drive something, then it is sharp."
- call/edible: "If you can call something, then it is edible."
- bake/used_by_children: "If you can bake something, then it is used by children."
- bandage/fun: "If you can bandage something, then it is fun."
- arrest/used_for_cleaning: "If you can arrest something, then it is used for cleaning."
Loading tokenizer...
Loading test data
5 Samples:
- clean/large: "If you can clean something, then it is big."
- swing/used_for_holding_things: "If you can swing something, then it is used for holding things."
- attach/light_weight: "If you can attach something, then it is light."
- aim/decorative: "If you can aim something, then it is decorative."
- swing/lives_in_water: "If you can swing something, then it lives in water."
Loading tokenizer...
Num train optimization steps: 1918
Starting epoch 1/1.
	add_(Number alpha, Tensor other)
Consider using one of the following signatures ins

Step X: Display human baseline

In [10]:
!python -m pc.human

02/04/2021 06:46:32 PM INFO: Task.Abstract_ObjectsProperties
02/04/2021 06:46:32 PM INFO: 	Acc: 0.900, Micro F1: 0.667, object macro F1: 0.779, property macro F1: 0.800
02/04/2021 06:46:32 PM INFO: Task.Situated_ObjectsProperties
02/04/2021 06:46:32 PM INFO: 	Acc: 0.820, Micro F1: 0.609, object macro F1: 0.701, property macro F1: 0.693
02/04/2021 06:46:32 PM INFO: Task.Situated_ObjectsAffordances
02/04/2021 06:46:32 PM INFO: 	Acc: 0.780, Micro F1: 0.800, object macro F1: 0.833, affordance macro F1: 0.929
02/04/2021 06:46:32 PM INFO: Task.Situated_AffordancesProperties
02/04/2021 06:46:32 PM INFO: 	Acc: 0.700, Micro F1: 0.400, affordance macro F1: 0.650, property macro F1: 0.665


Step XI: Compute statistical significance

In [11]:
!python -m pc.significance

abstract-OP:
- Random: ***
- Majority: ***
- Glove: ***
- DepEmbs: ***
- Bert: (base)
- Elmo: **

situated-OP:
- Random: ***
- Majority: ***
- Glove: ***
- DepEmbs: ***
- Bert: (base)
- Elmo: ***

situated-OA:
- Random: ***
- Majority: ***
- Glove: 
- DepEmbs: **
- Bert: (base)
- Elmo: 

situated-AP:
- Random: ***
- Majority: ***
- Glove: ***
- DepEmbs: ***
- Bert: (base)
- Elmo: ***



Step XII: Convert BERT's output on the situated-AP task to per-category output (for making graphs)

In [12]:
!python -m scripts.perdatum_to_category

Writing A results to data/results/Bert-AP-A.txt
Writing P results to data/results/Bert-AP-P.txt


Step XIII: Produce graphs (also shown in the paper) for analyzing BERT's output on the situated-AP task per-category, as well as comparing performance vs word occurrence in natural language (found in data/nl/). The graphs are written to physical-commonsense/data/results/graphs.

In [13]:
!python -m pc.graph

---