# Results for Place Name Recognition using Wapiti

This notebook shows the results for various configurations of Wapiti tool on several different training and test sets. For now the list of datasets:
    -  ACE corpus
    -  Conll corpus


In [31]:
##calculates and prints precision and recall for the resf file only taking into consideration the tags in  tagf file
def acc(resf,tagf):
    t=0.0
    c=0
    tot=0
    ptot=0
    p=0.0
    res1=open(resf).readlines()
    for line in res1:
        line1=line.split()
        if len(line1)>2:
            if line1[-2] in tagf:
                if line1[-2]==line1[-1]:
                    t+=1
                c+=1
            if line1[-1] in tagf:
                if line1[-1]==line1[-2]:
                    p+=1
                ptot+=1
            tot+=1
    print("Total predictions: "+str(ptot))
    print("Total entities: "+ str(c))
    rec=t/c
    pre=p/ptot
    print("recall: "+str(rec))
    print("precision: "+str(pre))
    return pre,rec

In [32]:
## calculates and returns fbeta score using the parameters
def fbeta(beta,pre,rec):
    den = beta*beta* pre + rec
    nom = (beta*beta+1)*pre*rec
    return nom/den

In [33]:
tagsace=["GPE","LOC"]
tagscon=["I-LOC"]


***NOTE:*** User must change the value of the resultsfile variable for each result accordingly. I used the default addresses for them. The program will not work if the address is not given properly.

## Results for Conll alone

Results below use the Conll dataset and the predefined Conll features for learning

### Result 1

* **Training set**: Conll training set 

* **Test set**: Conll testa,testb
* **Pattern file**: nppattern.txt

* **Configurations**: L1 norm penalty 5 

* **Terminal call for training**: `wapiti train -p patternfile -1 5 trainfile modelfile`
* **Terminal call for prediction**: `wapiti label -m modelfile testfile outputfile`

**testa**

In [59]:
resultsfile="wapitideneme/conlldene/baseline/resa.txt"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 2158
Total entities: 2094
recall: 0.872015281757
precision: 0.846153846154
F1 score:0.858889934149


**testb**

In [60]:
resultsfile="wapitideneme/conlldene/baseline/resb.txt"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 1946
Total entities: 1919
recall: 0.813444502345
precision: 0.802158273381
F1 score:0.807761966365


### Result 2

* **Training set**: Conll training set 

* **Test set**: Conll testa,testb
* **Pattern file**: nppattern.txt

* **Configurations**: default mode with no L1 penalty 

* **Terminal call for training**: `wapiti train -p patternfile trainfile modelfile`
* **Terminal call for prediction**: `wapiti label -m modelfile testfile outputfile`

**testa**

In [61]:
resultsfile="wapitideneme/conlldene/baseline/resa2.txt"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 2098
Total entities: 2094
recall: 0.916427889207
precision: 0.914680648236
F1 score:0.915553435115


**testb**

In [62]:
resultsfile="wapitideneme/conlldene/baseline/resb2.txt"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 1878
Total entities: 1919
recall: 0.853048462741
precision: 0.87167199148
F1 score:0.862259678694


### Result 3

* **Training set**: Conll training set (sentence splitted version)

* **Test set**: Conll testa,testb
* **Pattern file**: nppattern.txt

* **Configurations**: default mode with no L1 penalty

* **Terminal call for training**: `wapiti train -p patternfile trainfile modelfile`
* **Terminal call for prediction**: `wapiti label -m modelfile testfile outputfile`

** testa **

In [63]:
resultsfile="wapitideneme/conlldene/base2/wapresa"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 2108
Total entities: 2094
recall: 0.920248328558
precision: 0.914136622391
F1 score:0.917182294146


**testb**

In [64]:
resultsfile="wapitideneme/conlldene/base2/wapresb"
beta=1
pre,rec=acc(resultsfile,tagscon)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 1896
Total entities: 1919
recall: 0.852006253257
precision: 0.862341772152
F1 score:0.857142857143


## Results for ACE alone

There are many entity types available in the ACE dataset. For the purpose of our project we only consider the entities with tags GPE and LOC. As in the case for Conll we ignore the boundaries (BIO representation).

### Result 4

No features are given to the CRF learner. These can be considered as the baseline performance of Wapiti on ACE alone.
A lot of improvements can be done over these scores.
* **Training set**: 90% of ACE corpus (no features except for surface form + regex)
* **Test set**:  10% of ACE corpus
* **Pattern file**: acepats
* **Configurations**: L1 penalty 1

* **Terminal call for training**: `wapiti train -p -1 1 patternfile trainfile modelfile`
* **Terminal call for prediction**: `wapiti label -m modelfile testfile outputfile`

In [68]:
resultsfile="wapitideneme/acedene/res1"
beta=1
pre,rec=acc(resultsfile,tagsace)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 358
Total entities: 284
recall: 0.838028169014
precision: 0.664804469274
F1 score:0.741433021807


### Result 5

* **Training set**: 90% of ACE corpus (no features except for surface form + regex)
* **Test set**:  10% of ACE corpus
* **Pattern file**: acepats
* **Configurations**: Default mode with no penalty

* **Terminal call for training**: `wapiti train -p patternfile trainfile modelfile`
* **Terminal call for prediction**: `wapiti label -m modelfile testfile outputfile`

In [69]:
resultsfile="wapitideneme/acedene/second/res2"
beta=1
pre,rec=acc(resultsfile,tagsace)
f=fbeta(beta,pre,rec)
print("F1 score:"+str(f))

Total predictions: 349
Total entities: 284
recall: 0.845070422535
precision: 0.687679083095
F1 score:0.758293838863
