Named Entity Recognition + Valence Tests by agshruti12 · Pull Request #226 · Watts-Lab/team_comm_tools

agshruti12 · 2024-06-10T20:25:21Z

Pull Request Template:
If you are merging in a feature or other major change, use this template to check your pull request!

Basic Info

What's this pull request about?

Tests for NER and valence. NER tests are a modified unit test, checking correct NUMBER of named entities, along with correct named entities themselves. Has it's own test dataset called 'test_named_entity.csv'. Performed valence testing as well using perturbation framework. For a given positive/negative sentence, have generated a corresponding INV and DIR perturbed sentence. Test checks to see if the magnitude of the DIR change is greater than the magnitude of the INV change. Has it's own test dataset called test_chat_level_complex.csv. We can input more tests that operated with the perturbation framework at the chat level in this csv.

Feature Documentation

Did you document your feature? Make sure you do the following before you pull request!

Copy the Template. Go to the Feature Wiki and Copy/Paste the Feature Template into a new page.
Fill out the Template. Fill out the basic information for the feature in the template. Use the template to document your plan for implementation and major design decisions; if anything changes along the way, update the documentation as you go.
At the top of my feature, I indicate whether the feature is conversation level or chat level.

Code Basics

My feature is a .py file.
My feature uses snake case in the name. That means the name of the format is my_feature, NOT myFeature (camel case).
My feature has the name, NAME_features.py, where NAME is the name of my feature.
My feature is located in feature_engine/features.

Testing

I have thought about test cases for my features, with inputs and expected outputs.
I have linked to a location (e.g., .py or .ipynb) where I can run my test cases and show they work (inputs match expected outputs).

The location of my tests are here:

[PASTE LINK HERE]

If you check all the boxes above, then you ready to merge!

xehu · 2024-06-17T16:21:40Z

I notice that not all the NER tests pass --- but this is because the feature isn't perfect! Would it be possible to run the feature on the full test dataset in order to get metrics (e.g., precision/recall), but then only run the test on a subset of the NER features that we know are supposed to work? That way, we won't have all the tests return as 'failing' ...

agshruti12 · 2024-08-07T19:37:53Z

Changes made:

assign_chunk_nums: removed temporal parameters
get_all_DD_features: removed temporal parameters from assign_chunk_nums call
burstiness: use wait_times starting at positions 1 (omitted first value), and ensuring time_diff parsing works for both timedelta (when passing in a datetime timstamp) and float (when passing in a unixtime or integer timestamp) objects.
fflow: cosine similarity --> cosine distance
tests for valence + NER + discursive diversity + variance in DD + incongruent modulation + within person discursive range + forward flow + team burstiness (across 3 different timestamps)

* add pyproject.toml * Update README.md with team-comm-tools rather than team-process-map * Update README.md to remove outdated requirement (#264) * delete junit * move preprocessing notebooks to tests * remove unnecessary deps * update requirements * more slimming of reqs * remove packaging related deps * get rid of requests and xgboost deps * addressing #267 * edit src paths * test with python 3.7 * test python 3.7 * test 3.8 * test 3.8 pt. 2 * test 3.9 * test 3.10 * restore to 3.11; play with docs * Update README.md with new path names. * standardize package structure and solve path issues * update requirement.txt path in workflow * update workflow file * updating test workflow * update test workflow * update test workflow * update test workflow * update test workflow * update imports in example * fix bugs * move lexiconx_dict.pkl to features/assets * update dependencies * delete legacy files and remove constant nltk import * clean up package structure and warnings * resolve relative imports issue in sphinx * create single installation script * commit setup script * update setup script and documentationZ * update README to point to setup script * add linkes to website and Rtd to readme * disable tokenizer parallelism to avoid error * add badges to home page * Named Entity Recognition + Valence Tests (#226) * valence testing * rearranging files * intermediate ner testing * NER testing * fix featurizer * fix featurize bug * updating test dataset + function * code coverage * burstiness * move testing FB's into run_tests.py * move NER dataframe to test file * adding complex tests back to run_tests.py * add chat_complex_df and conv_complex_df to run_tests.py * correct dataset paths * rebase * changing references as part of rebase * correcting FB calls based on latest interface updates * correct run_tests.py * add dd tests * burstiness fix * dd tests add * forward flow tests * src changes * testing timestamp variations * src changes * update test ds * fix formatting * fix formatting --------- Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu> Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu> * Amy/website (#270) * website updates * renaming tpm-website to website * deploying via gh-pages * changed from tpm-website to website * deployed website * copyright and team * team headshots and footer * edits to the pages * website updates * updated links * updated homepage * link updates * mobile compatibility * mobile adjustments * navbar mobile updates * whitespace edits * homepage updates * feature table * website updates * renaming tpm-website to website * deploying via gh-pages * changed from tpm-website to website * edits to the pages * website updates * updated links * updated homepage * link updates * mobile compatibility * mobile adjustments * navbar mobile updates * homepage updates * add table of features * updated team page titles * include flask in requirements.txt * updates to table of features * load pages from top * fix to 404 issues * moved build under website folder * add flask back into requirements --------- Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu> Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu> --------- Co-authored-by: sundy1994 <yuxuanzh@seas.upenn.edu> Co-authored-by: Shruti Agarwal <46203852+agshruti12@users.noreply.github.com> Co-authored-by: amytangzheng <145236844+amytangzheng@users.noreply.github.com>

* valence testing * rearranging files * intermediate ner testing * NER testing * fix featurizer * fix featurize bug * updating test dataset + function * code coverage * burstiness * move testing FB's into run_tests.py * move NER dataframe to test file * adding complex tests back to run_tests.py * add chat_complex_df and conv_complex_df to run_tests.py * correct dataset paths * rebase * changing references as part of rebase * correcting FB calls based on latest interface updates * correct run_tests.py * add dd tests * burstiness fix * dd tests add * forward flow tests * src changes * testing timestamp variations * src changes * update test ds * fix formatting * fix formatting --------- Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu> Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu>

agshruti12 force-pushed the shruti/ner-valence-test branch from cd9da03 to 4d91a21 Compare July 26, 2024 12:41

xehu changed the base branch from main to initial_package_version August 7, 2024 21:10

xehu merged this pull request into initial_package_version Aug 7, 2024

xehu deleted the shruti/ner-valence-test branch August 7, 2024 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Named Entity Recognition + Valence Tests#226

Named Entity Recognition + Valence Tests#226
xehu merged 0 commit intoinitial_package_versionfrom
shruti/ner-valence-test

agshruti12 commented Jun 10, 2024 •

edited by rowbotham-evan

Loading

Uh oh!

xehu commented Jun 17, 2024

Uh oh!

agshruti12 commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agshruti12 commented Jun 10, 2024 • edited by rowbotham-evan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Basic Info

Feature Documentation

Code Basics

Testing

Uh oh!

xehu commented Jun 17, 2024

Uh oh!

agshruti12 commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agshruti12 commented Jun 10, 2024 •

edited by rowbotham-evan

Loading