Skip to content
This repository has been archived by the owner on Nov 28, 2023. It is now read-only.

Commit

Permalink
Reverted library path, added more instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
Richard Caudle committed Jan 24, 2014
1 parent 155271e commit 66e6ac4
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 12 deletions.
12 changes: 9 additions & 3 deletions linearclassification/README.md
Expand Up @@ -22,7 +22,7 @@ You specify how much data to hold back for validation on the command line using

##Dependencies

* python v2.7 or greater
* Python v2.7 or greater
* scikit-learn: http://scikit-learn.org/stable/install.html


Expand Down Expand Up @@ -65,9 +65,15 @@ You can create your own configuration by inheriting from the ConfigBase module (



##Command line
##Executing

You execute like this:
Firstly, clone the repository to your local machine.

Next, you'll need to add the library to your PYTHONPATH environment setting by modifying your .profile or .bash_profile file, for example:

<pre>export PYTHONPATH=$PYTHONPATH:~/Documents/Datasift/Code/vedo-data-science-toolkit</pre>

You execute the tool as follows:

python scored_tags_classifier.py --test_period=[Test period] --config_module_path=[config file] --training_json=[training interactions] --training_csv=[OPTIONAL: label file] --classpath=[label path] > [output file]

Expand Down
2 changes: 1 addition & 1 deletion linearclassification/lib/config_base.py
@@ -1,4 +1,4 @@
from lib.features import *
from linearclassification.lib.features import *

class ConfigBase:

Expand Down
4 changes: 2 additions & 2 deletions linearclassification/lib/default_config.py
@@ -1,5 +1,5 @@
from lib.features import *
from lib.config_base import ConfigBase
from linearclassification.lib.features import *
from linearclassification.lib.config_base import ConfigBase

class Config(ConfigBase):

Expand Down
2 changes: 1 addition & 1 deletion linearclassification/lib/features.py
@@ -1,5 +1,5 @@
import re,sys
from lib.utils import jpath,chunk,contains,has_subsequence
from linearclassification.lib.utils import jpath,chunk,contains,has_subsequence

wordsplitter=re.compile(r'[^\w\$]+')

Expand Down
10 changes: 5 additions & 5 deletions linearclassification/scored_tags_classifier.py
Expand Up @@ -5,11 +5,11 @@
import numpy as np
from sklearn.linear_model import SGDClassifier

from lib.metrics import confusion_matrix
from linearclassification.lib.metrics import confusion_matrix
import itertools

from lib.features import *
from lib.utils import jpath,nvl,ngrams,all_zeroes,chunk
from linearclassification.lib.features import *
from linearclassification.lib.utils import jpath,nvl,ngrams,all_zeroes,chunk

'''
Run as follows
Expand Down Expand Up @@ -98,8 +98,8 @@ def report_confusion(interactions,targets,fvectors,title):
confusion_matrix(expectedvsactuals)
for i,(exp,act) in enumerate(expectedvsactuals):
if exp!=act:
logging.info("exp:act (%s,%s): %s |%s",exp,act, nvl(jpath(featurepath,interactions[i])),\
'|'.join([selected_features[idx].string() for idx,satisfied in enumerate(fvectors[i]) if satisfied]))
logging.info("exp:act (%s,%s): %s |%s",exp,act, nvl(jpath(featurepath,interactions[i])).encode('utf-8','ignore'),\
'|'.join([selected_features[idx].string() for idx,satisfied in enumerate(fvectors[i]) if satisfied]).encode('utf-8','ignore'))

def urldomain(url):
domain=re.sub('https?://','',url)
Expand Down

0 comments on commit 66e6ac4

Please sign in to comment.